<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcShYvd6EGuSC3rbnEC0S-uyyVJdeuBDJnB8oUoKzeVXgj_Rx34A" style="position:center"/>
<h1 style="color:green">What is pandas ?</h1>
<p style= "font-size:18px">
    Pandas is an open source library in python used mainly for the purpose of data analysis, data manipulation and data exploration
</p>
<p style= "font-size:15px">
    The readme in the official pandas github repository describes pandas as a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.
</p>

<h2 style="color:green">When can I use pandas?</h2>
<ol style= "font-size:16px">
    <li>Calculate statistics and answer questions about the data, like
        <ul>
        <li>What's the average, median, max, or min of each column?</li>
        <li>Does column A correlate with column B?</li>
        <li>What does the distribution of data in column C look like?</li>
        </ul>
    </li>
    <li>Clean the data by doing things like removing missing values and filtering rows or columns by some criteria</li>
    <li>Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.</li>
    <li> Store the cleaned, transformed data back into a CSV, other file or database</li>
</ol>

<h2 style="color:green">What is so great about Pandas?</h2>
<ol style= "font-size:16px">
    <li>It has got tons of functionality to help you in every possible scenario.</li>
    <li>Kickass documentation</li>
    <li>Open Source - Active community and active development.</li>
    <li>Plays well with other libraries like numpy and scikit.learn</li>
</ol>
<hr>

<h1 style="color:red">Importing Stuff</h1>

### 1. Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 2. Importing Datasets

In [2]:
match=pd.read_csv('matches.csv')
delivery=pd.read_csv('deliveries.csv')
company=pd.read_csv('Fortune501.csv')
titanic=pd.read_csv('titanic.csv')
food=pd.read_csv('food.csv')

## 1. Shape of the Datasets

In [3]:
match.shape

(636, 18)

In [4]:
delivery.shape

(150460, 21)

In [5]:
company.shape

(500, 8)

In [6]:
titanic.shape

(891, 12)

In [7]:
food.shape

(50, 6)

## 2. Column name of the Datasets

In [8]:
match.columns

Index(['id', 'season', 'city', 'date', 'team1', 'team2', 'toss_winner',
       'toss_decision', 'result', 'dl_applied', 'winner', 'win_by_runs',
       'win_by_wickets', 'player_of_match', 'venue', 'umpire1', 'umpire2',
       'umpire3'],
      dtype='object')

In [9]:
delivery.columns

Index(['match_id', 'inning', 'batting_team', 'bowling_team', 'over', 'ball',
       'batsman', 'non_striker', 'bowler', 'is_super_over', 'wide_runs',
       'bye_runs', 'legbye_runs', 'noball_runs', 'penalty_runs',
       'batsman_runs', 'extra_runs', 'total_runs', 'player_dismissed',
       'dismissal_kind', 'fielder'],
      dtype='object')

In [10]:
company.columns

Index(['Rank', 'Title', 'Employees', 'Sector', 'Industry', 'Hqlocation',
       'Revenues', 'Profits'],
      dtype='object')

In [11]:
food.columns

Index(['Name', 'Gender', 'City', 'Frequency', 'Item', 'Spends'], dtype='object')

In [12]:
titanic.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

## 3. First and Last row of the Datasets  [ head() and tail() method ]

In [13]:
food.head(5)

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
0,Nitish,M,Kolkata,Weekly,Burger,11
1,Anu,F,Gurgaon,Daily,Sandwich,14
2,Mukku,M,Kolkata,Once,Vada,25
3,Suri,M,Kolkata,Monthly,Pizza,56
4,Rajiv,M,Patna,Never,Paneer,34


In [14]:
food.tail(5)

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
45,Meenal,F,Pune,Monthly,Pizza,67
46,Akshay,M,Ranchi,Daily,Paneer,32
47,Gurpreet,M,Kolkata,Never,Pizza,56
48,Kishore,M,Gurgaon,Never,Vada,22
49,Jaideep,M,Chennai,Once,Burger,44


## 4. The info() method

In [15]:
food.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Name       50 non-null     object
 1   Gender     50 non-null     object
 2   City       50 non-null     object
 3   Frequency  50 non-null     object
 4   Item       50 non-null     object
 5   Spends     50 non-null     int64 
dtypes: int64(1), object(5)
memory usage: 2.5+ KB


In [16]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


## 5. The describe() method

In [17]:
food.describe()

Unnamed: 0,Spends
count,50.0
mean,55.58
std,24.326931
min,11.0
25%,34.0
50%,55.5
75%,75.5
max,99.0


In [18]:
titanic.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


## 6. The nunique() method
<p style = "font-size:20px" >no of unique values in each column</p>

In [19]:
titanic.nunique()

PassengerId    891
Survived         2
Pclass           3
Name           891
Sex              2
Age             88
SibSp            7
Parch            7
Ticket         681
Fare           248
Cabin          147
Embarked         3
dtype: int64

In [20]:
food.nunique()

Name         50
Gender        2
City          7
Frequency     5
Item          6
Spends       34
dtype: int64

## 7. Extract one column

In [21]:
food.columns

Index(['Name', 'Gender', 'City', 'Frequency', 'Item', 'Spends'], dtype='object')

In [22]:
food['Item']

0       Burger
1     Sandwich
2         Vada
3        Pizza
4       Paneer
5      Chicken
6      Chicken
7        Pizza
8         Vada
9       Paneer
10       Pizza
11      Burger
12    Sandwich
13    Sandwich
14        Vada
15      Paneer
16       Pizza
17       Pizza
18    Sandwich
19        Vada
20      Burger
21      Burger
22      Burger
23      Paneer
24       Pizza
25        Vada
26        Vada
27    Sandwich
28      Burger
29    Sandwich
30       Pizza
31      Paneer
32    Sandwich
33        Vada
34        Vada
35    Sandwich
36      Paneer
37       Pizza
38      Burger
39    Sandwich
40       Pizza
41      Paneer
42        Vada
43        Vada
44    Sandwich
45       Pizza
46      Paneer
47       Pizza
48        Vada
49      Burger
Name: Item, dtype: object

## 8. Find the unique values in a column

In [23]:
food['Item'].unique()

array(['Burger', 'Sandwich', 'Vada', 'Pizza', 'Paneer', 'Chicken'],
      dtype=object)

## 9. Extracting multiple column

In [24]:
food.columns

Index(['Name', 'Gender', 'City', 'Frequency', 'Item', 'Spends'], dtype='object')

In [25]:
food[['Name','City','Item']].head(5)

Unnamed: 0,Name,City,Item
0,Nitish,Kolkata,Burger
1,Anu,Gurgaon,Sandwich
2,Mukku,Kolkata,Vada
3,Suri,Kolkata,Pizza
4,Rajiv,Patna,Paneer


## 10. Add new Column

In [26]:
food['New Column'] = "NULL"
food.head(5)

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends,New Column
0,Nitish,M,Kolkata,Weekly,Burger,11,
1,Anu,F,Gurgaon,Daily,Sandwich,14,
2,Mukku,M,Kolkata,Once,Vada,25,
3,Suri,M,Kolkata,Monthly,Pizza,56,
4,Rajiv,M,Patna,Never,Paneer,34,


In [27]:
food=pd.read_csv('food.csv')
food.head(5)

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
0,Nitish,M,Kolkata,Weekly,Burger,11
1,Anu,F,Gurgaon,Daily,Sandwich,14
2,Mukku,M,Kolkata,Once,Vada,25
3,Suri,M,Kolkata,Monthly,Pizza,56
4,Rajiv,M,Patna,Never,Paneer,34


## 11. Extracting one row  ( .iloc[ ] )

In [28]:
food.iloc[3]

Name            Suri
Gender             M
City         Kolkata
Frequency    Monthly
Item           Pizza
Spends            56
Name: 3, dtype: object

In [29]:
food.iloc[[3]]

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
3,Suri,M,Kolkata,Monthly,Pizza,56


## 12. Extracting multiple rows

In [30]:
food.iloc[[4,5]]

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
4,Rajiv,M,Patna,Never,Paneer,34
5,Vandanda,F,Patna,Once,Chicken,23


## 13. Extracting both rows and columns

In [31]:
food.iloc[[1,2,3],[0,1,3]]

Unnamed: 0,Name,Gender,Frequency
1,Anu,F,Daily
2,Mukku,M,Once
3,Suri,M,Monthly


In [32]:
food.iloc[::2,[0,1,5]].head(10)

Unnamed: 0,Name,Gender,Spends
0,Nitish,M,11
2,Mukku,M,25
4,Rajiv,M,34
6,Piyush,M,67
8,Sunil,M,34
10,Sonal,F,89
12,Vineet,M,34
14,Ranbir,M,55
16,Pooja,F,34
18,Aditya,M,99


## 14. The value_counts() method
<p style = "font-size:20px" >How many time a unique value appears in a column</p>

In [33]:
food['Item'].value_counts()

Pizza       11
Vada        11
Sandwich    10
Burger       8
Paneer       8
Chicken      2
Name: Item, dtype: int64

## 15. Filtering data based on a condition

In [34]:
pizza_filter = food['Item'] == "Pizza"
pizza_filter.head(5)

0    False
1    False
2    False
3     True
4    False
Name: Item, dtype: bool

In [35]:
food[pizza_filter]

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
3,Suri,M,Kolkata,Monthly,Pizza,56
7,Radhika,F,Mumbai,Monthly,Pizza,43
10,Sonal,F,Pune,Monthly,Pizza,89
16,Pooja,F,Chennai,Daily,Pizza,34
17,Neha,F,Pune,Never,Pizza,88
24,Anjani,F,Pune,Never,Pizza,27
30,Bijurika,F,Ranchi,Weekly,Pizza,90
37,Sachin,M,Mumbai,Daily,Pizza,99
40,Rohit,M,Chennai,Once,Pizza,54
45,Meenal,F,Pune,Monthly,Pizza,67


## 16. Filtering data based on multiple conditions

In [38]:
mask1 = food['Item'] == "Pizza"
mask2 = food['City'] == "Pune"

food[mask1 & mask2]

Unnamed: 0,Name,Gender,City,Frequency,Item,Spends
10,Sonal,F,Pune,Monthly,Pizza,89
17,Neha,F,Pune,Never,Pizza,88
24,Anjani,F,Pune,Never,Pizza,27
45,Meenal,F,Pune,Monthly,Pizza,67
