### Pandas
Pandas is a python library which provides tools and data structures for performing data analysis tasks. With pandas, you can read,manipulate and transform the data very easily.  
Pandas has only two data structures, series and dataframe.

#### Pandas Series

Series are one-dimensional structure, similar to list in python, except that the series can only be one type (similar to ndarray). 

In [2]:
import pandas as pd

In [4]:
s = pd.Series([1,2,3,4,5])

In [5]:
print(s)

0    1
1    2
2    3
3    4
4    5
dtype: int64


The sequence of numbers on the left are the index. You can create custom indexes too:

In [7]:
s = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
print(s)

a    1
b    2
c    3
d    4
e    5
dtype: int64


#### Pandas Dataframe

Dataframes are similar to tables, which ties groups of related series. Each column in a dataframe is a series object.

In [10]:
df = pd.DataFrame({
    'name':['ram','shyam'],
    'age':[15,16],
    'occupation':['chef','admin']
})

In [12]:
print(df)

   age   name occupation
0   15    ram       chef
1   16  shyam      admin


Get dataframe columns:

In [13]:
df.columns

Index(['age', 'name', 'occupation'], dtype='object')

In [18]:
print(type(df.name))

<class 'pandas.core.series.Series'>


#### Selecting Data
##### Subscript notation

In [14]:
df['name']

0      ram
1    shyam
Name: name, dtype: object

##### Index notation

In [15]:
df.name

0      ram
1    shyam
Name: name, dtype: object

Finding column datatypes:

In [20]:
print(df.age.dtype)

int64


Reading data from file:

In [21]:
df = pd.read_csv('height_weight.csv')

Inspect few data rows

In [22]:
df.head()

Unnamed: 0,height,weight,age,male
0,151.765,47.825606,63.0,1
1,139.7,36.485807,63.0,0
2,136.525,31.864838,65.0,0
3,156.845,53.041915,41.0,1
4,145.415,41.276872,51.0,0


In [25]:
df.describe()

Unnamed: 0,height,weight,age,male
count,149.0,149.0,149.0,149.0
mean,144.391826,38.44269,31.754027,0.47651
std,20.234737,13.168042,18.529102,0.501132
min,74.295,9.752228,0.6,0.0
25%,139.7,29.596878,17.0,0.0
50%,148.59,41.248522,30.0,0.0
75%,158.4198,47.939004,44.0,1.0
max,179.07,62.992589,81.75,1.0


Slicing:

In [29]:
df[:2]

Unnamed: 0,height,weight,age,male
0,151.765,47.825606,63.0,1
1,139.7,36.485807,63.0,0


In [39]:
df[-2:-1]

Unnamed: 0,height,weight,age,male
147,121.92,19.787951,8.0,0


In [35]:
df.iloc[0:2,0:2]

Unnamed: 0,height,weight
0,151.765,47.825606
1,139.7,36.485807


In [61]:
print('Max:',df.age.max())
print('Min:',df.age.min())
print('Mean:',df.age.mean())
print('Median:',df.age.median())
print('Mode:',df.age.mode())
print('Sum:',df.age.sum())

Max: 81.75
Min: 0.6
Mean: 31.7540268456
Median: 30.0
Mode: 0    12.0
1    29.0
dtype: float64
Sum: 4731.35


In [46]:
print('Unique Values:',df.age.unique())

Unique Values: [ 63.    65.    41.    51.    35.    32.    27.    19.    54.    47.    66.
  73.    20.    65.3   36.    44.    31.    12.     8.     6.5   39.    29.
  13.     7.    56.    45.    17.    16.    11.    30.    24.    33.    52.
  42.     5.    55.    43.    18.     9.    60.    37.    50.    25.    23.
  79.3   14.    38.     0.6   46.    22.     6.    79.    34.    73.3    7.6
  58.    53.    48.    81.75   1.    15.     3.    62.    49.  ]


#### Subsetting a dataframe

In [51]:
df2 = df[df.age>31]

In [52]:
df2.head()

Unnamed: 0,height,weight,age,male
0,151.765,47.825606,63.0,1
1,139.7,36.485807,63.0,0
2,136.525,31.864838,65.0,0
3,156.845,53.041915,41.0,1
4,145.415,41.276872,51.0,0


In [55]:
df3=df.query('age>31')

In [58]:
print(df2.shape)

(71, 4)


In [56]:
df3.head()

Unnamed: 0,height,weight,age,male
0,151.765,47.825606,63.0,1
1,139.7,36.485807,63.0,0
2,136.525,31.864838,65.0,0
3,156.845,53.041915,41.0,1
4,145.415,41.276872,51.0,0


In [59]:
print(df3.shape)

(71, 4)
