**PANDAS** is a python package providing fast, flexible and expressive data structures designed to make working with the label data easy. It's fundamental for doing practical data analysis in Python.

In [7]:
# Importing Pandas

import pandas as pd

**PANDAS** provides two primary components.

These are the **_Series_** and the **_Data Frame_**.

A **_Series_** is similar to a list as a one dimensional array. It will assign an index to each item it contains by default. Each item will receive an index label from zero to N minus one where N is the size of the series.

In [9]:
s = pd.Series(['John', 'Rick', 'Micheal', 'Sue', 'Kath'])
print(s)

print(s[0])

print(s[1:4])

param = s!= 'John'
print(s[param])

0       John
1       Rick
2    Micheal
3        Sue
4       Kath
dtype: object
John
1       Rick
2    Micheal
3        Sue
dtype: object
1       Rick
2    Micheal
3        Sue
4       Kath
dtype: object


The series constructor can convert a dictionary as well using the keys of the dictionary as its index.

In [17]:
d = {'New York': 1300,
     'Chicago':900,
     'San Francisco': 1100,
     'Austin': 450}

cities = pd.Series(d)
print(cities)


New York         1300
Chicago           900
San Francisco    1100
Austin            450
dtype: int64


In [39]:
# cities[['chicago', 'austin']]

In [40]:
cities[cities < 1000]

Chicago    750
Austin     750
dtype: int64

In [41]:
cities[cities < 1000] = 750
print(cities)

New York         1300
Chicago           750
San Francisco    1100
Austin            750
dtype: int64


A **_Data Frame_** is a tabular data structure made up of rose and columns similar to a spreadsheet or a database table.
You can think of a data frame as a collection of series.

_Converting_ **CSV** File into a **_Data Frame_**


In [28]:
titanic = pd.read_csv('Titanic.txt', sep='\t')

print(titanic.head())

  pclass    age   sex survived
0    1st  adult  male      yes
1    1st  adult  male      yes
2    1st  adult  male      yes
3    1st  adult  male      yes
4    1st  adult  male      yes


In [29]:
titanic['age']

0       adult
1       adult
2       adult
3       adult
4       adult
        ...  
2196    adult
2197    adult
2198    adult
2199    adult
2200    adult
Name: age, Length: 2201, dtype: object

In [30]:
titanic[['age', 'pclass']]

Unnamed: 0,age,pclass
0,adult,1st
1,adult,1st
2,adult,1st
3,adult,1st
4,adult,1st
...,...,...
2196,adult,crew
2197,adult,crew
2198,adult,crew
2199,adult,crew


In [33]:
titanic[(titanic.age == 'child') & (titanic.pclass == '2nd')]

Unnamed: 0,pclass,age,sex,survived
586,2nd,child,male,yes
587,2nd,child,male,yes
588,2nd,child,male,yes
589,2nd,child,male,yes
590,2nd,child,male,yes
591,2nd,child,male,yes
592,2nd,child,male,yes
593,2nd,child,male,yes
594,2nd,child,male,yes
595,2nd,child,male,yes


In [35]:
titanic[320:330]

Unnamed: 0,pclass,age,sex,survived
320,1st,child,male,yes
321,1st,child,male,yes
322,1st,child,male,yes
323,1st,child,male,yes
324,1st,child,female,yes
325,2nd,adult,male,yes
326,2nd,adult,male,yes
327,2nd,adult,male,yes
328,2nd,adult,male,yes
329,2nd,adult,male,yes


In [36]:
by_age = titanic.groupby('age')

by_age.size()

age
adult    2092
child     109
dtype: int64

In [37]:
print(titanic.groupby(['sex']).size())

sex
female     470
male      1731
dtype: int64


In [38]:
print(titanic.groupby(['survived', 'sex']).size())

survived  sex   
no        female     126
          male      1364
yes       female     344
          male       367
dtype: int64
