## Introduction to Pandas Library

pandas - a powerful data analysis and manipulation library for Python
=============================================================

**pandas** is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real world** data analysis in Python.

We Importing pandas library by using "import pandas" command.

In [1]:
import pandas as pd

## Pandas --- Series

* Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). 

* The axis labels are collectively referred to as the index. 

* The basic method to create a Series is to call :-
    
    pd.Series(data,index = index)
    
    The passed **index** is a list of axis labels.

#### Creating Series with the help :-
    
    1) From Dictionary 
    
    2) From Scalar Value

#### 1) From Dictionary

***When the data is a dict, and an index is not passed, the Series index will be ordered by the dict’s insertion order***

In [2]:
d = {'a':7,'b':5,'c':9,'d':10,'e':11}

In [3]:
s = pd.Series(d)
print(s)

a     7
b     5
c     9
d    10
e    11
dtype: int64


In [4]:
type(s)

pandas.core.series.Series

***If an index is passed, the values in data corresponding to the labels in the index will be pulled out.***

In [5]:
s = pd.Series(d,index = ['b','e','a','d','c'])
print(s)

b     5
e    11
a     7
d    10
c     9
dtype: int64


In [6]:
s = pd.Series(d,index = ['b','e','a','d','c','f'])
print(s)

b     5.0
e    11.0
a     7.0
d    10.0
c     9.0
f     NaN
dtype: float64


***A Series is like a fixed-size dict in that you can get and set values by index label***

In [7]:
s['a']

7.0

In [8]:
s['f'] = 98
print(s)

b     5.0
e    11.0
a     7.0
d    10.0
c     9.0
f    98.0
dtype: float64


#### 2) From Scalar Value

***If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.***

In [9]:
s = pd.Series(data = 9.6,index = ['a','b','c','d','e','f'])
print(s)

a    9.6
b    9.6
c    9.6
d    9.6
e    9.6
f    9.6
dtype: float64


#### Vectorized operations with Series

In [10]:
s+s

a    19.2
b    19.2
c    19.2
d    19.2
e    19.2
f    19.2
dtype: float64

In [11]:
s**2

a    92.16
b    92.16
c    92.16
d    92.16
e    92.16
f    92.16
dtype: float64

## Pandas DataFrame

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.It just like a spreadsheet or SQL table, or a dict of Series objects. 

***Method of creating a DataFrame***

d = { 
        "column_name" : pd.Series(data,index = ["row_name"])
    }

df = pd.DataFrame(d)

***The row and column label can be accessed respectively by accessing the index and columns attributes.***

#### 1) From dict of series or dictionary

In [12]:
d = {
        "X": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
        "Y": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
    }

In [13]:
df = pd.DataFrame(d)
df

Unnamed: 0,X,Y
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [14]:
type(df)

pandas.core.frame.DataFrame

#### For accessing the columns and index of dataframe 

In [15]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [16]:
df.columns

Index(['X', 'Y'], dtype='object')

#### 2) From a list of dictionary

In [17]:
data = [{'a' : 10,'b' : 20 , 'c' : 25},{'a' : 11 , 'b' : 18}]

In [18]:
df = pd.DataFrame(data)
df

Unnamed: 0,a,b,c
0,10,20,25.0
1,11,18,


In [19]:
df = pd.DataFrame(data,index = ['first','second'])
df

Unnamed: 0,a,b,c
first,10,20,25.0
second,11,18,


In [20]:
df = pd.DataFrame(data,index = ['first','second','third'])
df

ValueError: Shape of passed values is (2, 3), indices imply (3, 3)

***There is many methods for creating a dataframe pls do practice for that.***

### DataFrame Basic functionalities 

In [21]:
df

Unnamed: 0,a,b,c
first,10,20,25.0
second,11,18,


##### 1) DataFrame.T

It return the transpose of the dataframe.

In [22]:
df.T

Unnamed: 0,first,second
a,10.0,11.0
b,20.0,18.0
c,25.0,


##### 2) DataFrame.axes

It return the list of row axis label and column axis labels.

In [23]:
df.axes

[Index(['first', 'second'], dtype='object'),
 Index(['a', 'b', 'c'], dtype='object')]

#### 3) DataFrame.dtypes
It return the data types of each columns.

In [24]:
df.dtypes

a      int64
b      int64
c    float64
dtype: object

##### 4) DataFrame.empty
It return the boolean value saying whether the object is empty or not.**True** indicates that the object is empty

In [25]:
df.empty

False

##### 5) DataFrame.shape
It return a tuple representing the dimensional of dataframe.It return in the form of tuple(no._of_rows,no_of_columns)

In [26]:
df.shape

(2, 3)

##### 6) DataFrame.size
It return the number of elements in the dataframe.

In [27]:
df

Unnamed: 0,a,b,c
first,10,20,25.0
second,11,18,


In [28]:
df.size

6

#### 7) DataFrame.values
It return the actual data in the dataframe as array format.

In [29]:
df.values

array([[10., 20., 25.],
       [11., 18., nan]])

## Pandas :- files read and write operation

#### 1) Read CSV file

In [30]:
df = pd.read_csv("workingfile.csv")
df

Unnamed: 0,ID,first_name,company,salary
0,11,David,Aon,74
1,12,Jamie,TCS,76
2,13,Steve,Google,96
3,14,Stevart,RBS,71
4,15,John,.,78


#### 2) read HTML file

In [31]:
url = "http://www.basketball-reference.com/leagues/NBA_2015_totals.html"
df = pd.read_html(url)
df

[      Rk          Player Pos Age   Tm   G  GS    MP   FG  FGA  ...   FT%  ORB  \
 0      1      Quincy Acy  PF  24  NYK  68  22  1287  152  331  ...  .784   79   
 1      2    Jordan Adams  SG  20  MEM  30   0   248   35   86  ...  .609    9   
 2      3    Steven Adams   C  21  OKC  70  67  1771  217  399  ...  .502  199   
 3      4     Jeff Adrien  PF  28  MIN  17   0   215   19   44  ...  .579   23   
 4      5   Arron Afflalo  SG  29  TOT  78  72  2502  375  884  ...  .843   27   
 ..   ...             ...  ..  ..  ...  ..  ..   ...  ...  ...  ...   ...  ...   
 670  490  Thaddeus Young  PF  26  TOT  76  68  2434  451  968  ...  .655  127   
 671  490  Thaddeus Young  PF  26  MIN  48  48  1605  289  641  ...  .682   75   
 672  490  Thaddeus Young  PF  26  BRK  28  20   829  162  327  ...  .606   52   
 673  491     Cody Zeller   C  22  CHO  62  45  1487  172  373  ...  .774   97   
 674  492    Tyler Zeller   C  25  BOS  82  59  1731  340  619  ...  .823  146   
 
      DRB  TRB

In [32]:
df = df[0]
df.head() # If u want to read top 5 rows

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Quincy Acy,PF,24,NYK,68,22,1287,152,331,...,0.784,79,222,301,68,27,22,60,147,398
1,2,Jordan Adams,SG,20,MEM,30,0,248,35,86,...,0.609,9,19,28,16,16,7,14,24,94
2,3,Steven Adams,C,21,OKC,70,67,1771,217,399,...,0.502,199,324,523,66,38,86,99,222,537
3,4,Jeff Adrien,PF,28,MIN,17,0,215,19,44,...,0.579,23,54,77,15,4,9,9,30,60
4,5,Arron Afflalo,SG,29,TOT,78,72,2502,375,884,...,0.843,27,220,247,129,41,7,116,167,1035


In [33]:
df.tail() # If u want to read bottom 5 rows

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
670,490,Thaddeus Young,PF,26,TOT,76,68,2434,451,968,...,0.655,127,284,411,173,124,25,117,171,1071
671,490,Thaddeus Young,PF,26,MIN,48,48,1605,289,641,...,0.682,75,170,245,135,86,17,75,115,685
672,490,Thaddeus Young,PF,26,BRK,28,20,829,162,327,...,0.606,52,114,166,38,38,8,42,56,386
673,491,Cody Zeller,C,22,CHO,62,45,1487,172,373,...,0.774,97,265,362,100,34,49,62,156,472
674,492,Tyler Zeller,C,25,BOS,82,59,1731,340,619,...,0.823,146,319,465,113,18,52,76,205,833


#### 3) read tsv file

In [35]:
df = pd.read_table("test.tsv")
df

Unnamed: 0,test,test.1,test.2,test.3,test.4
0,vikas,vikas,vikas,vikas,vikas
1,Parjapati,Parjapati,Parjapati,Parjapati,Parjapati
2,Aditya,Aditya,Aditya,Aditya,Aditya
3,kumar,kumar,kumar,kumar,kumar


#### 4) read json file

In [36]:
df = pd.read_json("example.json")
df

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


#### 5) read excel file

In [37]:
df = pd.read_excel("ex1.xlsx")
df

Unnamed: 0.1,Unnamed: 0,a,b,c,d,message
0,0,1,2,3,4,hello
1,1,5,6,7,8,world
2,2,9,10,11,12,foo


#### 6) read data from  github repository 

In [40]:
data = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
data

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [41]:
data.head() #read top 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [42]:
data.tail() #read bottom 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q
