### Data Analysis in Python
- Data Analysis in Python can be happened with **Pandas** module
- Extension support with **Numpy, MatplotLib, Seaborn**

### Why Pandas?
- Pandas provide extended data structures to hold different types of labeled and relational data
- This makes python highly flexible and extermly useful for data cleaning and manipulation
- There is two different types
    - Series - Single Dimensional 
    - DataFrame - 2D Dimensional
    

### When do you use Pandas?
- Import data
- Clean up messy data
- Explpore data, gain insight into data
- Process and prepare your data for analysis
- Analyse your data(Scikit-learn, statsmodels...)

In [1]:
import pandas as pd

In [2]:
help(pd)

Help on package pandas:

NAME
    pandas

DESCRIPTION
    pandas - a powerful data analysis and manipulation library for Python
    
    **pandas** is a Python package providing fast, flexible, and expressive data
    structures designed to make working with "relational" or "labeled" data both
    easy and intuitive. It aims to be the fundamental high-level building block for
    doing practical, **real world** data analysis in Python. Additionally, it has
    the broader goal of becoming **the most powerful and flexible open source data
    analysis / manipulation tool available in any language**. It is already well on
    its way toward this goal.
    
    Main Features
    -------------
    Here are just a few of the things that pandas does well:
    
      - Easy handling of missing data in floating point as well as non-floating
        point data.
      - Size mutability: columns can be inserted and deleted from DataFrame and
        higher dimensional objects
      - Automatic an

In [4]:
obj = pd.Series([1,'John',3.23,'Good'])

In [5]:
print(type(obj))

<class 'pandas.core.series.Series'>


In [6]:
obj

0       1
1    John
2    3.23
3    Good
dtype: object

In [7]:
obj.values

array([1, 'John', 3.23, 'Good'], dtype=object)

In [10]:
obj1 = pd.Series([1, 'John', 3.23, 'Good'],index = ['a','b','c','d'])
obj1

a       1
b    John
c    3.23
d    Good
dtype: object

In [11]:
print(type(obj1))

<class 'pandas.core.series.Series'>


In [12]:
obj1.values

array([1, 'John', 3.23, 'Good'], dtype=object)

In [13]:
obj1.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [14]:
dict1 = {'Smith':90,'Adam':59,'Bill':89,'Elon':65,'Tom':75,'Tim':99}

In [15]:
print(dict1,type(dict1))

{'Smith': 90, 'Adam': 59, 'Bill': 89, 'Elon': 65, 'Tom': 75, 'Tim': 99} <class 'dict'>


In [16]:
names = pd.Series(dict1)

In [17]:
names

Smith    90
Adam     59
Bill     89
Elon     65
Tom      75
Tim      99
dtype: int64

In [18]:
names.values

array([90, 59, 89, 65, 75, 99], dtype=int64)

In [19]:
names.index

Index(['Smith', 'Adam', 'Bill', 'Elon', 'Tom', 'Tim'], dtype='object')

In [20]:
print(type(names))

<class 'pandas.core.series.Series'>


In [21]:
names['Adam']

59

In [22]:
names[names>=65]

Smith    90
Bill     89
Elon     65
Tom      75
Tim      99
dtype: int64

In [23]:
names[names>=85]

Smith    90
Bill     89
Tim      99
dtype: int64

In [27]:
names['Tim'] = 92
names

Smith    90
Adam     59
Bill     89
Elon     65
Tom      75
Tim      92
dtype: int64

In [28]:
'Tim' in names

True

In [29]:
'Anil' in names

False

In [30]:
names.isnull()

Smith    False
Adam     False
Bill     False
Elon     False
Tom      False
Tim      False
dtype: bool

In [37]:
names['Anil']

0

In [35]:
names

Smith    90
Adam     59
Bill     89
Elon     65
Tom      75
Tim      92
Anil      0
dtype: object

In [36]:
names.isnull()

Smith    False
Adam     False
Bill     False
Elon     False
Tom      False
Tim      False
Anil     False
dtype: bool

In [38]:
names/10

Smith      9
Adam     5.9
Bill     8.9
Elon     6.5
Tom      7.5
Tim      9.2
Anil       0
dtype: object

In [39]:
names**2

Smith    8100
Adam     3481
Bill     7921
Elon     4225
Tom      5625
Tim      8464
Anil        0
dtype: object

In [40]:
data = {'names':['Bill','Adam','Tim','John','Kate','Alex'],
       'scores':[90,76,12,65,87,59],
       'sports':['Cricket','Tennis','Cricket','FootBall','Skiing','FootBall'],
       'sex':['Male','Female','Male','Male','Female','Male']}

In [41]:
print(dict,type(dict))

<class 'dict'> <class 'type'>


In [42]:
df =  pd.DataFrame(data)

In [43]:
df

Unnamed: 0,names,scores,sports,sex
0,Bill,90,Cricket,Male
1,Adam,76,Tennis,Female
2,Tim,12,Cricket,Male
3,John,65,FootBall,Male
4,Kate,87,Skiing,Female
5,Alex,59,FootBall,Male


In [45]:
df1 =  pd.DataFrame(data,columns = ['NAME','SCORE','SPORT','SEX'])
df1

Unnamed: 0,NAME,SCORE,SPORT,SEX


In [47]:
df.head() # Show the first few records(5) of the data frame

Unnamed: 0,names,scores,sports,sex
0,Bill,90,Cricket,Male
1,Adam,76,Tennis,Female
2,Tim,12,Cricket,Male
3,John,65,FootBall,Male
4,Kate,87,Skiing,Female


In [48]:
df.tail() # Show the bottom five records of the data frame

Unnamed: 0,names,scores,sports,sex
1,Adam,76,Tennis,Female
2,Tim,12,Cricket,Male
3,John,65,FootBall,Male
4,Kate,87,Skiing,Female
5,Alex,59,FootBall,Male


In [49]:
df.head(2)

Unnamed: 0,names,scores,sports,sex
0,Bill,90,Cricket,Male
1,Adam,76,Tennis,Female


In [50]:
df.tail(3)

Unnamed: 0,names,scores,sports,sex
3,John,65,FootBall,Male
4,Kate,87,Skiing,Female
5,Alex,59,FootBall,Male


In [51]:
df = pd.DataFrame(data,columns=['names','scores','sports','sex'],
                 index=['a','b','c','d','e','f'])

In [52]:
df

Unnamed: 0,names,scores,sports,sex
a,Bill,90,Cricket,Male
b,Adam,76,Tennis,Female
c,Tim,12,Cricket,Male
d,John,65,FootBall,Male
e,Kate,87,Skiing,Female
f,Alex,59,FootBall,Male


In [53]:
df['names']

a    Bill
b    Adam
c     Tim
d    John
e    Kate
f    Alex
Name: names, dtype: object

In [54]:
df['sports']

a     Cricket
b      Tennis
c     Cricket
d    FootBall
e      Skiing
f    FootBall
Name: sports, dtype: object

In [55]:
my_column = ['names','sports']
df[my_column]

Unnamed: 0,names,sports
a,Bill,Cricket
b,Adam,Tennis
c,Tim,Cricket
d,John,FootBall
e,Kate,Skiing
f,Alex,FootBall


In [56]:
df = pd.DataFrame(data,columns=['names','scores','sports','sex','age'],
                 index=['a','b','c','d','e','f'])

In [57]:
df

Unnamed: 0,names,scores,sports,sex,age
a,Bill,90,Cricket,Male,
b,Adam,76,Tennis,Female,
c,Tim,12,Cricket,Male,
d,John,65,FootBall,Male,
e,Kate,87,Skiing,Female,
f,Alex,59,FootBall,Male,


In [58]:
df['age']

a    NaN
b    NaN
c    NaN
d    NaN
e    NaN
f    NaN
Name: age, dtype: object

In [59]:
values = [18,29,91,11,22,34]
df['age'] = values

In [60]:
df

Unnamed: 0,names,scores,sports,sex,age
a,Bill,90,Cricket,Male,18
b,Adam,76,Tennis,Female,29
c,Tim,12,Cricket,Male,91
d,John,65,FootBall,Male,11
e,Kate,87,Skiing,Female,22
f,Alex,59,FootBall,Male,34


In [61]:
df['Status'] = df.scores >= 65

In [62]:
df

Unnamed: 0,names,scores,sports,sex,age,Status
a,Bill,90,Cricket,Male,18,True
b,Adam,76,Tennis,Female,29,True
c,Tim,12,Cricket,Male,91,False
d,John,65,FootBall,Male,11,True
e,Kate,87,Skiing,Female,22,True
f,Alex,59,FootBall,Male,34,False


In [63]:
del df['Status']
df

Unnamed: 0,names,scores,sports,sex,age
a,Bill,90,Cricket,Male,18
b,Adam,76,Tennis,Female,29
c,Tim,12,Cricket,Male,91
d,John,65,FootBall,Male,11
e,Kate,87,Skiing,Female,22
f,Alex,59,FootBall,Male,34
