<h1 style="color:red">What is pandas</h1>
<p>Pandas is an open source library in python used mainly for the purpose of data analysis, data manipulation and data exploration</p>

<p><i>[pandas] is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. — Wikipedia</i></p>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcShYvd6EGuSC3rbnEC0S-uyyVJdeuBDJnB8oUoKzeVXgj_Rx34A"/>

<p>The readme in the official pandas github repository describes pandas as “a Python package providing <b>fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive</b>. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.</p>

<h3>When can I use pandas?</h3>
<p>
    a. Calculate statistics and answer questions about the data, like<br>
        ------- What's the average, median, max, or min of each column?<br>
        ------- Does column A correlate with column B?<br>
        ------- What does the distribution of data in column C look like?<br>
    b. Clean the data by doing things like removing missing values and filtering rows or columns by some criteria<br>
    c. Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.<br>
    d. Store the cleaned, transformed data back into a CSV, other file or database<br>
</p>

<h3>What is so great about Pandas?</h3>
<p>
    1. It has got tons of functionality to help you in every possible scenario.<br>
    2. Kickass documentation<br>
    3. Open Source - Active community and active development.<br>
    4. Plays well with other libraries like numpy and scikit.learn<br>
    
    
</p>


<h3>Pandas Popularity</h3>
<img src="https://storage.googleapis.com/lds-media/images/the-rise-in-popularity-of-pandas.width-1200.png"/>

<h1 style="color:red">Importing Stuff</h1>

#### 1. Import Libraries

In [32]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### 2. Import Datasets

In [33]:
match=pd.read_csv('matches.csv')
delivery=pd.read_csv('deliveries.csv')
company=pd.read_csv('Fortune501.csv')
titanic=pd.read_csv('titanic.csv')
food=pd.read_csv('food.csv')

<h1 style="color:red">Series and Dataframes</h1>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRK5vl7PcWTN02CXdNczGUYxwtuJRwuAueqfhzzca4Jq6RjH2CZ"/>

In [3]:
type(food)

pandas.core.frame.DataFrame

#### 1. The Shape attribute

In [4]:
delivery.shape

(150460, 21)

#### 2. The columns attribute

In [5]:
titanic.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

#### 3. The head() and tail() method

In [6]:
titanic.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


#### 4. The info() method

In [7]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


#### 5. The describe() method

In [8]:
titanic.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


#### 6. The nunique/unique() method

#### 7. The astype() method

#### 8. Extracting one column

In [9]:
titanic['Age'].values

array([22.  , 38.  , 26.  , 35.  , 35.  ,   nan, 54.  ,  2.  , 27.  ,
       14.  ,  4.  , 58.  , 20.  , 39.  , 14.  , 55.  ,  2.  ,   nan,
       31.  ,   nan, 35.  , 34.  , 15.  , 28.  ,  8.  , 38.  ,   nan,
       19.  ,   nan,   nan, 40.  ,   nan,   nan, 66.  , 28.  , 42.  ,
         nan, 21.  , 18.  , 14.  , 40.  , 27.  ,   nan,  3.  , 19.  ,
         nan,   nan,   nan,   nan, 18.  ,  7.  , 21.  , 49.  , 29.  ,
       65.  ,   nan, 21.  , 28.5 ,  5.  , 11.  , 22.  , 38.  , 45.  ,
        4.  ,   nan,   nan, 29.  , 19.  , 17.  , 26.  , 32.  , 16.  ,
       21.  , 26.  , 32.  , 25.  ,   nan,   nan,  0.83, 30.  , 22.  ,
       29.  ,   nan, 28.  , 17.  , 33.  , 16.  ,   nan, 23.  , 24.  ,
       29.  , 20.  , 46.  , 26.  , 59.  ,   nan, 71.  , 23.  , 34.  ,
       34.  , 28.  ,   nan, 21.  , 33.  , 37.  , 28.  , 21.  ,   nan,
       38.  ,   nan, 47.  , 14.5 , 22.  , 20.  , 17.  , 21.  , 70.5 ,
       29.  , 24.  ,  2.  , 21.  ,   nan, 32.5 , 32.5 , 54.  , 12.  ,
         nan, 24.  ,

#### 9. Extracting multiple columns

In [10]:
titanic[['Pclass','Name','Age']]

Unnamed: 0,Pclass,Name,Age
0,3,"Braund, Mr. Owen Harris",22.0
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,3,"Heikkinen, Miss. Laina",26.0
3,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,3,"Allen, Mr. William Henry",35.0
5,3,"Moran, Mr. James",
6,1,"McCarthy, Mr. Timothy J",54.0
7,3,"Palsson, Master. Gosta Leonard",2.0
8,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",27.0
9,2,"Nasser, Mrs. Nicholas (Adele Achem)",14.0


#### 10. Creating a new column

In [11]:
titanic['Family']="None"

In [12]:
titanic.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Family
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,


#### 11. Extracting one row

 titanic.iloc[

#### 12. Extracting multiple rows

#### 13. Extracting both rows and columns

In [13]:
#done

#### 14. The value_counts() method

In [14]:
match.head()

Unnamed: 0,id,season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,2017,Hyderabad,2017-04-05,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,2,2017,Pune,2017-04-06,Mumbai Indians,Rising Pune Supergiant,Rising Pune Supergiant,field,normal,0,Rising Pune Supergiant,0,7,SPD Smith,Maharashtra Cricket Association Stadium,A Nand Kishore,S Ravi,
2,3,2017,Rajkot,2017-04-07,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium,Nitin Menon,CK Nandan,
3,4,2017,Indore,2017-04-08,Rising Pune Supergiant,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium,AK Chaudhary,C Shamshuddin,
4,5,2017,Bangalore,2017-04-08,Royal Challengers Bangalore,Delhi Daredevils,Royal Challengers Bangalore,bat,normal,0,Royal Challengers Bangalore,15,0,KM Jadhav,M Chinnaswamy Stadium,,,


In [15]:
match['player']

KeyError: 'player'

#### 15. Filtering data based on a condition

#### 16. Filtering data based on multiple conditions

In [None]:
match[match['city']=='kolkata'].shape

In [31]:
mask1=match['seasons']==2008
mask2=match['city']=="kolkata"
match[mask1 & mask2].shape[0]

KeyError: 'seasons'

<h3 style="color:#00a65a">Exercise 1 : Find the total number of matches that have been played in the IPL</h3>

In [32]:
match.shape[0]

636

<h3 style="color:#00a65a">Exercise 2 : Find the top 5 teams in terms of number of matches won</h3>

In [33]:
match['winner'].value_counts().head()

Mumbai Indians                 92
Chennai Super Kings            79
Kolkata Knight Riders          77
Royal Challengers Bangalore    73
Kings XI Punjab                70
Name: winner, dtype: int64

<h3 style="color:#00a65a">Exercise 3 : At which venue most number of matches have been played?</h3>

In [35]:
match['venue'].value_counts().head(1)

M Chinnaswamy Stadium    66
Name: venue, dtype: int64

#### 17. The plot() method

<h3 style="color:#00a65a">Exercise 4 : Find the top 5 teams who have played the most number of matches?</h3>

<h3 style="color:#00a65a">Exercise 5 : Find the player who has won the most number of player of the match award in Chennai?</h3>

In [18]:
match[match['city']=="chennai"]['player_of_match'].value_counts().head(1)


Series([], Name: player_of_match, dtype: int64)

In [None]:
mask=match['city']=='chennai'


<h3 style="color:#00a65a">Exercise 6 : What percentage of teams opt to bat first after winning the toss?</h3>

#### 18. The sort_values() method

#### 19. The set_index() method

In [21]:
match.set_index('id')

Unnamed: 0_level_0,season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1,2017,Hyderabad,2017-04-05,Sunrisers Hyderabad,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
2,2017,Pune,2017-04-06,Mumbai Indians,Rising Pune Supergiant,Rising Pune Supergiant,field,normal,0,Rising Pune Supergiant,0,7,SPD Smith,Maharashtra Cricket Association Stadium,A Nand Kishore,S Ravi,
3,2017,Rajkot,2017-04-07,Gujarat Lions,Kolkata Knight Riders,Kolkata Knight Riders,field,normal,0,Kolkata Knight Riders,0,10,CA Lynn,Saurashtra Cricket Association Stadium,Nitin Menon,CK Nandan,
4,2017,Indore,2017-04-08,Rising Pune Supergiant,Kings XI Punjab,Kings XI Punjab,field,normal,0,Kings XI Punjab,0,6,GJ Maxwell,Holkar Cricket Stadium,AK Chaudhary,C Shamshuddin,
5,2017,Bangalore,2017-04-08,Royal Challengers Bangalore,Delhi Daredevils,Royal Challengers Bangalore,bat,normal,0,Royal Challengers Bangalore,15,0,KM Jadhav,M Chinnaswamy Stadium,,,
6,2017,Hyderabad,2017-04-09,Gujarat Lions,Sunrisers Hyderabad,Sunrisers Hyderabad,field,normal,0,Sunrisers Hyderabad,0,9,Rashid Khan,"Rajiv Gandhi International Stadium, Uppal",A Deshmukh,NJ Llong,
7,2017,Mumbai,2017-04-09,Kolkata Knight Riders,Mumbai Indians,Mumbai Indians,field,normal,0,Mumbai Indians,0,4,N Rana,Wankhede Stadium,Nitin Menon,CK Nandan,
8,2017,Indore,2017-04-10,Royal Challengers Bangalore,Kings XI Punjab,Royal Challengers Bangalore,bat,normal,0,Kings XI Punjab,0,8,AR Patel,Holkar Cricket Stadium,AK Chaudhary,C Shamshuddin,
9,2017,Pune,2017-04-11,Delhi Daredevils,Rising Pune Supergiant,Rising Pune Supergiant,field,normal,0,Delhi Daredevils,97,0,SV Samson,Maharashtra Cricket Association Stadium,AY Dandekar,S Ravi,
10,2017,Mumbai,2017-04-12,Sunrisers Hyderabad,Mumbai Indians,Mumbai Indians,field,normal,0,Mumbai Indians,0,4,JJ Bumrah,Wankhede Stadium,Nitin Menon,CK Nandan,


match

#### 20. The inplace parameter

In [26]:
match.set_index('id',inplace=True)

#### 21. The sort_index() method

In [30]:
match.set_index(asending=False)

TypeError: set_index() got an unexpected keyword argument 'asending'

#### 22. The reset_index() method

In [34]:
match.reset_index(inplace=True)

#### 23. Maths functions

In [35]:
match['season'].sum()

1279944

#### 24. The drop_duplicates() method

In [39]:
match.drop_duplicates(subset=['city']).shape

(31, 19)

<h3 style="color:#00a65a">Exercise 7 : List down all the IPL winning teams year-wise?</h3>

#### 25. The groupby() method

In [40]:
match.drop_duplicates(subset=['season'],keep='last')[['season','winner']]

Unnamed: 0,season,winner
58,2017,Mumbai Indians
116,2008,Rajasthan Royals
173,2009,Deccan Chargers
233,2010,Chennai Super Kings
306,2011,Chennai Super Kings
380,2012,Kolkata Knight Riders
456,2013,Mumbai Indians
516,2014,Kolkata Knight Riders
575,2015,Mumbai Indians
635,2016,Sunrisers Hyderabad


<h3 style="color:#00a65a">Exercise 8 : Find the top 5 most successful batsman in the hostory of IPL</h3>

In [43]:
def add(a,b):
    print(a+b)

In [44]:
add(3,4)

7


In [None]:
#making a function
def result(team1, team2,winner)


<h3 style="color:#00a65a">Exercise 9 : Find the top 5 batsman who have hit the most number of 6's</h3>

<h3 style="color:#00a65a">Exercise 10 : Find the top 5 bowlers</h3>

<h3 style="color:#00a65a">Exercise 11 : Against which team has Virat Kohli scored most number of his runs?</h3>

<h3 style="color:#00a65a">Exercise 12 : Against which bowler has R?</h3>