# [pandas](http://pandas.pydata.org/)
Is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language. It is already well on its way towards this goal.

In [1]:
import pandas as pd

### [Series](https://pandas.pydata.org/pandas-docs/stable/reference/series.html)

Is similar to a **numpy** array, but with an index that can be used to label the array items. The index can be a list of strings, numbers, or dates.

In [2]:
letters = pd.Series(['a', 'b', 'c', 'd', 'e'],  # data
          index=[1, 2, 3, 4, 5])  # index

In [3]:
# A pandas Series can be also created from a dictionary where the keys are the index and the values are the data
letters = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
letters

a    1
b    2
c    3
d    4
e    5
dtype: int64

### [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html)

Is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

In [4]:
# data dict
data = {'name': ['John', 'Jane', 'Mary', 'Peter', 'Paul', 'Mark'],  # list of names
        'age': [24, 13, 53, 33, 43, 53],  # list of ages
        'state': ['NY', 'AL', 'AK', 'TX', 'AK', 'TX'],  # list of states
        'point': [64, 74, 87, 99, 87, 100]}  # list of points

names = pd.DataFrame(data)  # create a DataFrame from a dict
names

Unnamed: 0,name,age,state,point
0,John,24,NY,64
1,Jane,13,AL,74
2,Mary,53,AK,87
3,Peter,33,TX,99
4,Paul,43,AK,87
5,Mark,53,TX,100


In [5]:
pd.DataFrame(data, index=['rank1', 'rank2', 'rank3', 'rank4', 'rank5', 'rank6'])  # Add custom index
# pd.DataFrame(data, columns=['name', 'age', 'point', 'state'])  # Add custom columns

Unnamed: 0,name,age,state,point
rank1,John,24,NY,64
rank2,Jane,13,AL,74
rank3,Mary,53,AK,87
rank4,Peter,33,TX,99
rank5,Paul,43,AK,87
rank6,Mark,53,TX,100


### Indexing

In [6]:
names[['name']]  # access a column using the column name
# names.name  # access a column (alternative way)
names[['name', 'age']]  # access multiple columns

Unnamed: 0,name,age
0,John,24
1,Jane,13
2,Mary,53
3,Peter,33
4,Paul,43
5,Mark,53


In [7]:
names[1:3]

Unnamed: 0,name,age,state,point
1,Jane,13,AL,74
2,Mary,53,AK,87
