### Import convention
By convention `Pandas` is imported using the alias `pd`.

In [1]:
import pandas as pd
print("Pandas version:", pd.__version__)

Pandas version: 0.24.2


## Pandas `Series`
The Series object is the built upon Numpy arrays and share much of the functionality like element wise operations, broadcasting etc. The Series is an crucial building block of the Pandas `DataFrame`. Basically, each column in a DataFrame is a Series. Each Series has a name, index and values (among other things). Since, the Pandas DataFrame is the core data manipulation tool used by Data Scientists, it is of utmost importance to understand the Series first.

Making a series is just the same as making a 1D Numpy `array`.

In [2]:
ser = pd.Series([0.25, 0.5, 0.75, 1.0])
ser

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

We can see that a `Series` has a sequence of values like arrays, but also has a sequnce of indicies. Lets access them.

In [3]:
ser.values

array([0.25, 0.5 , 0.75, 1.  ])

We can see that all the values in a `Series` are stored as a Numpy `array`.

In [4]:
ser.index

RangeIndex(start=0, stop=4, step=1)

Accessing values from a `Series` is the same as `array`s, we pass an the index or slice in square brackets to get the value.

In [5]:
ser[1:3] # Like other range objects the end is excluded

1    0.50
2    0.75
dtype: float64

Every `Series` also has a name. Currently ours doesn't have a name. So we will assign a name.

In [6]:
ser.name = "My Series!"
ser

0    0.25
1    0.50
2    0.75
3    1.00
Name: My Series!, dtype: float64

Now we can see that the `Series` is shown with its name.

`Series` indecies dont have to be numbers, they can be any _unique sequence of values_.

In [7]:
# Dictionary (Key:Value pairs)
city_pop = {
    'Dhaka': 15000000,
    'Chittagong': 12000000,
    'Khulna': 8500000,
    'Sylhet': 9000000, 
    'Kushtia': 50000,
    'Barisal': 9500000
}

# Making a Series of the data (also assigning a Name)
city_pop = pd.Series(city_pop, name="City Populations")
city_pop

Dhaka         15000000
Chittagong    12000000
Khulna         8500000
Sylhet         9000000
Kushtia          50000
Barisal        9500000
Name: City Populations, dtype: int64

In [8]:
city_pop['Sylhet']

9000000

In [9]:
# While slicing Series the end index is included
city_pop['Khulna':'Kushtia']

Khulna     8500000
Sylhet     9000000
Kushtia      50000
Name: City Populations, dtype: int64

#### Element wise operations
Just like Numpy `array`s we can do vectorized operations on `Series`.

In [10]:
a = pd.Series([2, 4, 6, 8, 10])
b = pd.Series([1, 3, 5, 7, 9])

a - b

0    1
1    1
2    1
3    1
4    1
dtype: int64

## Pandas `DataFrame`
We can make a DataFrame from any type of nested listlike structure (eg: list of lists) or other labeled data like Dictionaries or Series.

When passing in a __list of lists__ each inner list is treated as a row.

In [11]:
ls = [[1.616, 'Plank Length', 'Physical'],
      [2.176, 'Plank Mass', 'Physical'],
      [5.391, 'Plank Time', 'Physical'],
      [3.1416, 'PI', 'Math constant']]

df = pd.DataFrame(ls)
df

Unnamed: 0,0,1,2
0,1.616,Plank Length,Physical
1,2.176,Plank Mass,Physical
2,5.391,Plank Time,Physical
3,3.1416,PI,Math constant


Since, we didn't specify any column names or indecies by default the columns and rows are Numpy like 0 indexed.

We can name the columns and rows __when creating__ the DataFrame by using the `columns` kwarg and passsing a list of column names.

In [12]:
df = pd.DataFrame(ls, columns=['xoxo', 'momo', 'bobo'], index=['blue', 'green', 'yellow', 'red'])
df

Unnamed: 0,xoxo,momo,bobo
blue,1.616,Plank Length,Physical
green,2.176,Plank Mass,Physical
yellow,5.391,Plank Time,Physical
red,3.1416,PI,Math constant


We also can name the rows and colums after creating the DataFrame

In [13]:
df.columns = ['Value', 'Name', 'Constant type']
df.index = [1, 2, 3, 4]
df

Unnamed: 0,Value,Name,Constant type
1,1.616,Plank Length,Physical
2,2.176,Plank Mass,Physical
3,5.391,Plank Time,Physical
4,3.1416,PI,Math constant


### Indexing DataFrames
We can think of __DataFrames__ as specialized __Dictionary__.
- `dict[Key]` returns a value

- `DataFrame[Key]` returns a column (a Pandas `Series`)

In [14]:
df['Value']

1    1.6160
2    2.1760
3    5.3910
4    3.1416
Name: Value, dtype: float64

In [15]:
type(df['Value'])

pandas.core.series.Series

We can also get multiple columns by passing in a list of column names

In [16]:
df[['Name', 'Value']]

Unnamed: 0,Name,Value
1,Plank Length,1.616
2,Plank Mass,2.176
3,Plank Time,5.391
4,PI,3.1416


Note that the returned DataFrame columns are ordered just as we entered them when slicing.