<h1 style="color:red">What is pandas</h1>
<p>Pandas is an open source library in python used mainly for the purpose of data analysis, data manipulation and data exploration</p>

<p><i>[pandas] is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. — Wikipedia</i></p>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcShYvd6EGuSC3rbnEC0S-uyyVJdeuBDJnB8oUoKzeVXgj_Rx34A"/>

<p>The readme in the official pandas github repository describes pandas as “a Python package providing <b>fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive</b>. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.</p>

<h3>When can I use pandas?</h3>
<p>
    a. Calculate statistics and answer questions about the data, like<br>
        ------- What's the average, median, max, or min of each column?<br>
        ------- Does column A correlate with column B?<br>
        ------- What does the distribution of data in column C look like?<br>
    b. Clean the data by doing things like removing missing values and filtering rows or columns by some criteria<br>
    c. Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.<br>
    d. Store the cleaned, transformed data back into a CSV, other file or database<br>
</p>

<h3>What is so great about Pandas?</h3>
<p>
    1. It has got tons of functionality to help you in every possible scenario.<br>
    2. Kickass documentation<br>
    3. Open Source - Active community and active development.<br>
    4. Plays well with other libraries like numpy and scikit.learn<br>
    
    
</p>


<h3>Pandas Popularity</h3>
<img src="https://storage.googleapis.com/lds-media/images/the-rise-in-popularity-of-pandas.width-1200.png"/>

<h1 style="color:red">Importing Stuff</h1>

#### 1. Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


#### 2. Import Datasets

In [2]:
match=pd.read_csv('matches.csv') # importing using csv functions
delivery=pd.read_csv('deliveries.csv')
company=pd.read_csv('Fortune501.csv')
titanic=pd.read_csv('titanic.csv')
food=pd.read_csv('food.csv')

In [5]:
food

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
0,Nitish,M,Kolkata,Weekly,Burger,11
1,Anu,F,Gurgaon,Daily,Sandwich,14
2,Mukku,M,Kolkata,Once,Vada,25
3,Suri,M,Kolkata,Monthly,Pizza,56
4,Rajiv,M,Patna,Never,Paneer,34
5,Vandanda,F,Patna,Once,Chicken,23
6,Piyush,M,Ranchi,Never,Chicken,67
7,Radhika,F,Mumbai,Monthly,Pizza,43
8,Sunil,M,Mumbai,Monthly,Vada,34
9,Madhuri,F,Pune,Daily,Paneer,66


<h1 style="color:red">Series and Dataframes</h1>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRK5vl7PcWTN02CXdNczGUYxwtuJRwuAueqfhzzca4Jq6RjH2CZ"/>

#### 1. The Shape attribute

In [6]:
titanic.shape #shows rows and column  

(891, 12)

In [7]:
delivery.shape

(150460, 21)

In [8]:
match.shape

(636, 18)

#### 2. The columns attribute

In [9]:
titanic.columns # shows all columns name

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

#### 3. The head() and tail() method

In [10]:
# default head top 5 rows deta
# default tail last 5 rows
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [11]:
titanic.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [12]:
titanic.head(3) #starting 3 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


#### 4. The info() method

In [13]:
#gives data summary
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


#### 5. The describe() method

In [14]:
# works only on numeric columns
# median is 50%
titanic.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


#### 6. The nunique/unique() method

#### 7. The astype() method

#### 8. Extracting one column

In [16]:
#syntax
#it is a series
titanic['Age']


pandas.core.series.Series

In [17]:
titanic['Age'].values #for only getting value not index of that column
#nan is missing values


array([22.  , 38.  , 26.  , 35.  , 35.  ,   nan, 54.  ,  2.  , 27.  ,
       14.  ,  4.  , 58.  , 20.  , 39.  , 14.  , 55.  ,  2.  ,   nan,
       31.  ,   nan, 35.  , 34.  , 15.  , 28.  ,  8.  , 38.  ,   nan,
       19.  ,   nan,   nan, 40.  ,   nan,   nan, 66.  , 28.  , 42.  ,
         nan, 21.  , 18.  , 14.  , 40.  , 27.  ,   nan,  3.  , 19.  ,
         nan,   nan,   nan,   nan, 18.  ,  7.  , 21.  , 49.  , 29.  ,
       65.  ,   nan, 21.  , 28.5 ,  5.  , 11.  , 22.  , 38.  , 45.  ,
        4.  ,   nan,   nan, 29.  , 19.  , 17.  , 26.  , 32.  , 16.  ,
       21.  , 26.  , 32.  , 25.  ,   nan,   nan,  0.83, 30.  , 22.  ,
       29.  ,   nan, 28.  , 17.  , 33.  , 16.  ,   nan, 23.  , 24.  ,
       29.  , 20.  , 46.  , 26.  , 59.  ,   nan, 71.  , 23.  , 34.  ,
       34.  , 28.  ,   nan, 21.  , 33.  , 37.  , 28.  , 21.  ,   nan,
       38.  ,   nan, 47.  , 14.5 , 22.  , 20.  , 17.  , 21.  , 70.5 ,
       29.  , 24.  ,  2.  , 21.  ,   nan, 32.5 , 32.5 , 54.  , 12.  ,
         nan, 24.  ,

In [18]:
type(titanic['Age'])

pandas.core.series.Series

In [19]:
#shows index for a particular column
titanic['Age'].index

RangeIndex(start=0, stop=891, step=1)

#### 9. Extracting multiple columns

In [20]:
#pass python list for multiple column
titanic[['Pclass','Name','Age']]


Unnamed: 0,Pclass,Name,Age
0,3,"Braund, Mr. Owen Harris",22.0
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,3,"Heikkinen, Miss. Laina",26.0
3,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,3,"Allen, Mr. William Henry",35.0
5,3,"Moran, Mr. James",
6,1,"McCarthy, Mr. Timothy J",54.0
7,3,"Palsson, Master. Gosta Leonard",2.0
8,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",27.0
9,2,"Nasser, Mrs. Nicholas (Adele Achem)",14.0


In [21]:
titanic[['Pclass','Name','Age']].shape

(891, 3)

#### 10. Creating a new column

In [23]:
#siblingSpouse is SibSp and Parch is no parents no child if 0 columns in table titanic
titanic['Family']="None"

In [24]:
titanic.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Family
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,


#### 11. Extracting one row

In [25]:
#iloc as in location usmein just pass the index
titanic.iloc[0]

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                                 22
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Family                            None
Name: 0, dtype: object

In [26]:
# use slicing if want range of rows
titanic.iloc[0:3]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Family
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,


In [27]:
#fro extracting particular rows and columns
titanic.iloc[0:3,0:3]

Unnamed: 0,PassengerId,Survived,Pclass
0,1,0,3
1,2,1,1
2,3,1,3


In [28]:
#skippig by 1
titanic.iloc[::2,::2]

Unnamed: 0,PassengerId,Pclass,Sex,SibSp,Ticket,Cabin,Family
0,1,3,male,1,A/5 21171,,
2,3,3,female,0,STON/O2. 3101282,,
4,5,3,male,0,373450,,
6,7,1,male,0,17463,E46,
8,9,3,female,0,347742,,
10,11,3,female,1,PP 9549,G6,
12,13,3,male,0,A/5. 2151,,
14,15,3,female,0,350406,,
16,17,3,male,4,382652,,
18,19,3,female,1,345763,,


In [29]:
# fancy indexing here we are using for columns
titanic.iloc[::2,[0,2,3,4]]


Unnamed: 0,PassengerId,Pclass,Name,Sex
0,1,3,"Braund, Mr. Owen Harris",male
2,3,3,"Heikkinen, Miss. Laina",female
4,5,3,"Allen, Mr. William Henry",male
6,7,1,"McCarthy, Mr. Timothy J",male
8,9,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female
10,11,3,"Sandstrom, Miss. Marguerite Rut",female
12,13,3,"Saundercock, Mr. William Henry",male
14,15,3,"Vestrom, Miss. Hulda Amanda Adolfina",female
16,17,3,"Rice, Master. Eugene",male
18,19,3,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",female


#### 12. Extracting multiple rows

In [None]:
#done

#### 13. Extracting both rows and columns

In [None]:
#done

#### 14. The value_counts() method

In [35]:
#har unique variable ka frequency count batata hai
#and then head top 5 high frequencies ka deraha
match['player_of_match'].value_counts().head()

CH Gayle          18
YK Pathan         16
AB de Villiers    15
DA Warner         15
RG Sharma         14
Name: player_of_match, dtype: int64

#### 15. Filtering data based on a condition

In [37]:
#returns city values as true false jismein kolkata hai
match['city']=="Kolkata"

0      False
1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10      True
11     False
12     False
13      True
14     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22      True
23     False
24     False
25     False
26      True
27     False
28     False
29     False
       ...  
606    False
607     True
608    False
609    False
610    False
611    False
612    False
613     True
614    False
615    False
616    False
617    False
618    False
619    False
620     True
621    False
622    False
623     True
624    False
625    False
626    False
627    False
628    False
629    False
630     True
631    False
632    False
633    False
634    False
635    False
Name: city, Length: 636, dtype: bool

In [39]:
match[match['city']=="Kolkata"]

(61, 18)

In [None]:
#tells kitne kolkata mein hue hai
match[match['city']=="Kolkata"].shape

In [43]:
match[match['city']=="Kolkata" & match['season']=="2008"].shape

TypeError: cannot compare a dtyped [int64] array with a scalar of type [bool]

#### 16. Filtering data based on multiple conditions

<h3 style="color:#00a65a">Exercise 1 : Find the total number of matches that have been played in the IPL</h3>

<h3 style="color:#00a65a">Exercise 2 : Find the top 5 teams in terms of number of matches won</h3>

<h3 style="color:#00a65a">Exercise 3 : At which venue most number of matches have been played?</h3>

#### 17. The plot() method

<h3 style="color:#00a65a">Exercise 4 : Find the top 5 teams who have played the most number of matches?</h3>

<h3 style="color:#00a65a">Exercise 5 : Find the player who has won the most number of player of the match award in Chennai?</h3>

<h3 style="color:#00a65a">Exercise 6 : What percentage of teams opt to bat first after winning the toss?</h3>

#### 18. The sort_values() method

#### 19. The set_index() method

#### 20. The inplace parameter

#### 21. The sort_index() method

#### 22. The reset_index() method

#### 23. Maths functions

#### 24. The drop_duplicates() method

<h3 style="color:#00a65a">Exercise 7 : List down all the IPL winning teams year-wise?</h3>

#### 25. The groupby() method

<h3 style="color:#00a65a">Exercise 8 : Find the top 5 most successful batsman in the hostory of IPL</h3>

<h3 style="color:#00a65a">Exercise 9 : Find the top 5 batsman who have hit the most number of 6's</h3>

<h3 style="color:#00a65a">Exercise 10 : Find the top 5 bowlers</h3>

<h3 style="color:#00a65a">Exercise 11 : Against which team has Virat Kohli scored most number of his runs?</h3>

<h3 style="color:#00a65a">Exercise 12 : Against which bowler has R?</h3>