# Working with Missing Data in Pandas

<b>None:</b> A Python object used to represent missing values in object-type arrays.

<b>NaN:</b> A special floating-point value from NumPy which is recognized by all systems that use IEEE floating-point standards.

# Checking Missing Values in Pandas
Pandas provides <b>two important functions</b> which help in detecting whether a value is <b>NaN</b> helpful in making data cleaning and preprocessing easier in a DataFrame or Series are given below :

<b>1. Using isnull()</b>

<b>isnull()</b> returns a DataFrame of Boolean value where <b>True represents missing data (NaN)</b>. This is simple if we want to find and fill missing data in a dataset.

In [1]:
import pandas as pd
import numpy as np

d = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)

mv = df.isnull()

print(mv)

   First Score  Second Score  Third Score
0        False         False         True
1        False         False        False
2         True         False        False
3        False          True        False


# Example 2: Filtering Data Based on Missing Values

Here we used random <b>Employee dataset</b>. The <b>isnull() function</b> is used over the <b>"Gender" column</b> in order <b>to filter</b> and print out rows containing missing gender data.

In [4]:
import pandas as pd
d = pd.read_csv("/home/biometric/Downloads/employees.csv")

bool_series = pd.isnull(d["Gender"])
bool_series
missing_gender_data = d[bool_series]
print(missing_gender_data)


    First Name Gender  Start Date Last Login Time  Salary  Bonus %  \
20        Lois    NaN   4/22/1995         7:18 PM   64714    4.934   
22      Joshua    NaN    3/8/2012         1:58 AM   90816   18.816   
27       Scott    NaN   7/11/1991         6:58 PM  122367    5.218   
31       Joyce    NaN   2/20/2005         2:40 PM   88657   12.752   
41   Christine    NaN   6/28/2015         1:08 AM   66582   11.308   
..         ...    ...         ...             ...     ...      ...   
961    Antonio    NaN   6/18/1989         9:37 PM  103050    3.050   
972     Victor    NaN   7/28/2006         2:49 PM   76381   11.159   
985    Stephen    NaN   7/10/1983         8:10 PM   85668    1.909   
989     Justin    NaN   2/10/1991         4:58 PM   38344    3.794   
995      Henry    NaN  11/23/2014         6:09 AM  132483   16.655   

    Senior Management                  Team  
20               True                 Legal  
22               True       Client Services  
27              False

## 2. Using isna()
<b>isna()</b> returns a DataFrame of Boolean values where True indicates missing data (NaN). It is used to detect missing values just like isnull().

#### Finding Missing Values in a DataFrame

In [5]:
import pandas as pd
import numpy as np

data = {'Name': ['Amit', 'Sita', np.nan, 'Raj'],
        'Age': [25, np.nan, 22, 28]}

df = pd.DataFrame(data)

# Check for missing values using isna()
print(df.isna())

    Name    Age
0  False  False
1  False   True
2   True  False
3  False  False


# 3. Checking for Non-Missing Values Using notnull()
<b>notnull()</b> function returns a DataFrame with Boolean values where True indicates non-missing (valid) data. This function is useful when we want to focus only on the rows that have valid, non-missing values.

#### Example 1: Identifying Non-Missing Values in a DataFrame

In [6]:
import pandas as pd
import numpy as np

d = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)

nmv = df.notnull()

print(nmv)

   First Score  Second Score  Third Score
0         True          True        False
1         True          True         True
2        False          True         True
3         True         False         True


#### Example 2: Filtering Data with Non-Missing Values

notnull() function is used over the "Gender" column in order to filter and print out rows containing missing gender data.

In [7]:
import pandas as pd
d = pd.read_csv("/home/biometric/Downloads/employees.csv")

nmg = pd.notnull(d["Gender"])

nmgd= d[nmg]

display(nmgd)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.170,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,138705,9.340,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
994,George,Male,6/21/2013,5:47 PM,98874,4.479,True,Marketing
996,Phillip,Male,1/31/1984,6:30 AM,42392,19.675,False,Finance
997,Russell,Male,5/20/2013,12:39 PM,96914,1.421,False,Product
998,Larry,Male,4/20/2013,4:45 PM,60500,11.985,False,Business Development


# Filling Missing Values in Pandas
Following functions allow us to replace missing values with a specified value or use interpolation methods to find the missing data.

## 1. Using fillna()
<b>fillna()</b> used to <b>replace missing values (NaN)</b> with a given value.

#### Example 1: Fill Missing Values with Zero

In [5]:
import pandas as pd
import numpy as np

d = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(d)

df.fillna(8)

Unnamed: 0,First Score,Second Score,Third Score
0,100.0,30.0,8.0
1,90.0,45.0,40.0
2,8.0,56.0,80.0
3,95.0,8.0,98.0


#### Example 2: Fill with Previous Value (Forward Fill)

The pad method is used to fill missing values with the previous value.

In [9]:
df.fillna(method='pad')

Unnamed: 0,First Score,Second Score,Third Score
0,100.0,30.0,
1,90.0,45.0,40.0
2,90.0,56.0,80.0
3,95.0,56.0,98.0


#### Example 3: Fill with Next Value (Backward Fill)

The <b>bfill</b> function is used to <b>fill it with the next value</b>.

In [12]:
df.fillna(method='bfill') 

Unnamed: 0,First Score,Second Score,Third Score
0,100.0,30.0,40.0
1,90.0,45.0,40.0
2,95.0,56.0,80.0
3,95.0,,98.0


#### Example 4: Fill NaN Values with 'No Gender'

In [7]:
import pandas as pd
import numpy as np
d = pd.read_csv("/home/biometric/Downloads/employees.csv")

d[10:25]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
10,Louise,Female,8/12/1980,9:01 AM,63241,15.132,True,
11,Julie,Female,10/26/1997,3:19 PM,102508,12.637,True,Legal
12,Brandon,Male,12/1/1980,1:08 AM,112807,17.492,True,Human Resources
13,Gary,Male,1/27/2008,11:40 PM,109831,5.831,False,Sales
14,Kimberly,Female,1/14/1999,7:13 AM,41426,14.543,True,Finance
15,Lillian,Female,6/5/2016,6:09 AM,59414,1.256,False,Product
16,Jeremy,Male,9/21/2010,5:56 AM,90370,7.369,False,Human Resources
17,Shawn,Male,12/7/1986,7:45 PM,111737,6.414,False,Product
18,Diana,Female,10/23/1981,10:27 AM,132940,19.082,False,Client Services
19,Donna,Female,7/22/2010,3:48 AM,81014,1.894,False,Product


In [12]:
d["Gender"].fillna('No Gender', inplace = True) 
d[10:25]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
10,Louise,Female,8/12/1980,9:01 AM,63241,15.132,True,
11,Julie,Female,10/26/1997,3:19 PM,102508,12.637,True,Legal
12,Brandon,Male,12/1/1980,1:08 AM,112807,17.492,True,Human Resources
13,Gary,Male,1/27/2008,11:40 PM,109831,5.831,False,Sales
14,Kimberly,Female,1/14/1999,7:13 AM,41426,14.543,True,Finance
15,Lillian,Female,6/5/2016,6:09 AM,59414,1.256,False,Product
16,Jeremy,Male,9/21/2010,5:56 AM,90370,7.369,False,Human Resources
17,Shawn,Male,12/7/1986,7:45 PM,111737,6.414,False,Product
18,Diana,Female,10/23/1981,10:27 AM,132940,19.082,False,Client Services
19,Donna,Female,7/22/2010,3:48 AM,81014,1.894,False,Product


## 2. Using replace()
Use <b>replace()</b> function to <b>replace NaN</b> values with a specific value.

In [15]:
import pandas as pd
import numpy as np

data = pd.read_csv("/home/biometric/Downloads/employees.csv")
data[10:25]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
10,Louise,Female,8/12/1980,9:01 AM,63241,15.132,True,
11,Julie,Female,10/26/1997,3:19 PM,102508,12.637,True,Legal
12,Brandon,Male,12/1/1980,1:08 AM,112807,17.492,True,Human Resources
13,Gary,Male,1/27/2008,11:40 PM,109831,5.831,False,Sales
14,Kimberly,Female,1/14/1999,7:13 AM,41426,14.543,True,Finance
15,Lillian,Female,6/5/2016,6:09 AM,59414,1.256,False,Product
16,Jeremy,Male,9/21/2010,5:56 AM,90370,7.369,False,Human Resources
17,Shawn,Male,12/7/1986,7:45 PM,111737,6.414,False,Product
18,Diana,Female,10/23/1981,10:27 AM,132940,19.082,False,Client Services
19,Donna,Female,7/22/2010,3:48 AM,81014,1.894,False,Product


#### Now, we are going to replace the all NaN value in the data frame with -99 value. 

In [16]:
data = data.replace(to_replace=np.nan, value=-99)
print(data[10:25])

   First Name  Gender  Start Date Last Login Time  Salary  Bonus %  \
10     Louise  Female   8/12/1980         9:01 AM   63241   15.132   
11      Julie  Female  10/26/1997         3:19 PM  102508   12.637   
12    Brandon    Male   12/1/1980         1:08 AM  112807   17.492   
13       Gary    Male   1/27/2008        11:40 PM  109831    5.831   
14   Kimberly  Female   1/14/1999         7:13 AM   41426   14.543   
15    Lillian  Female    6/5/2016         6:09 AM   59414    1.256   
16     Jeremy    Male   9/21/2010         5:56 AM   90370    7.369   
17      Shawn    Male   12/7/1986         7:45 PM  111737    6.414   
18      Diana  Female  10/23/1981        10:27 AM  132940   19.082   
19      Donna  Female   7/22/2010         3:48 AM   81014    1.894   
20       Lois     -99   4/22/1995         7:18 PM   64714    4.934   
21    Matthew    Male    9/5/1995         2:12 AM  100612   13.645   
22     Joshua     -99    3/8/2012         1:58 AM   90816   18.816   
23        -99    Mal

In [27]:
data = data.replace(to_replace="Female", value="Feemale")
print(data[10:25])

   First Name   Gender  Start Date Last Login Time  Salary  Bonus %  \
10     Louise  Feemale   8/12/1980         9:01 AM   63241   15.132   
11      Julie  Feemale  10/26/1997         3:19 PM  102508   12.637   
12    Brandon     Male   12/1/1980         1:08 AM  112807   17.492   
13       Gary     Male   1/27/2008        11:40 PM  109831    5.831   
14   Kimberly  Feemale   1/14/1999         7:13 AM   41426   14.543   
15    Lillian  Feemale    6/5/2016         6:09 AM   59414    1.256   
16     Jeremy     Male   9/21/2010         5:56 AM   90370    7.369   
17      Shawn     Male   12/7/1986         7:45 PM  111737    6.414   
18      Diana  Feemale  10/23/1981        10:27 AM  132940   19.082   
19      Donna  Feemale   7/22/2010         3:48 AM   81014    1.894   
20       Lois      NaN   4/22/1995         7:18 PM   64714    4.934   
21    Matthew     Male    9/5/1995         2:12 AM  100612   13.645   
22     Joshua      NaN    3/8/2012         1:58 AM   90816   18.816   
23    

## 3. Using interpolate()
The <b>interpolate</b> function <b>fills</b> missing values <b> using interpolation techniques</b> such as the linear method.



In [19]:
import pandas as pd
   
df = pd.DataFrame({"A": [12, 4, 5, None, 1], 
                   "B": [None, 2, 54, 3, None], 
                   "C": [20, 16, None, 3, 8], 
                   "D": [14, 3, None, None, 6]})  
print(df)

      A     B     C     D
0  12.0   NaN  20.0  14.0
1   4.0   2.0  16.0   3.0
2   5.0  54.0   NaN   NaN
3   NaN   3.0   3.0   NaN
4   1.0   NaN   8.0   6.0


#### Let’s interpolate the missing values using Linear method. This method ignore the index and consider the values as equally spaced. 

In [20]:
df.interpolate(method ='linear', limit_direction ='forward')

Unnamed: 0,A,B,C,D
0,12.0,,20.0,14.0
1,4.0,2.0,16.0,3.0
2,5.0,54.0,9.5,4.0
3,3.0,3.0,3.0,5.0
4,1.0,3.0,8.0,6.0


# Dropping Missing Values in Pandas
The dropna() function used to removes rows or columns with NaN values. It can be used to drop data based on different conditions.

## 1. Dropping Rows with At Least One Null Value
Remove rows that contain at least one missing value.

In [21]:
import pandas as pd
import numpy as np

dict = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, 40, 80, 98],
        'Fourth Score': [np.nan, np.nan, np.nan, 65]}
df = pd.DataFrame(dict)

df.dropna()

Unnamed: 0,First Score,Second Score,Third Score,Fourth Score
3,95.0,56.0,98,65.0


## 2. Dropping Rows with All Null Values
We can drop rows where all values are missing using dropna(how='all').

In [22]:
dict = {'First Score': [100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, np.nan, 80, 98],
        'Fourth Score': [np.nan, np.nan, np.nan, 65]}
df = pd.DataFrame(dict)

df.dropna(how='all')

Unnamed: 0,First Score,Second Score,Third Score,Fourth Score
0,100.0,30.0,52.0,
2,,45.0,80.0,
3,95.0,56.0,98.0,65.0


## 3. Dropping Columns with At Least One Null Value
To remove columns that contain at least one missing value we use dropna(axis=1).

In [23]:
dict = {'First Score': [100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, np.nan, 80, 98],
        'Fourth Score': [60, 67, 68, 65]}
df = pd.DataFrame(dict)

df.dropna(axis=1)

Unnamed: 0,Fourth Score
0,60
1,67
2,68
3,65


## 4. Dropping Rows with Missing Values in CSV Files
When working with CSV files, we can drop rows with missing values using dropna().



In [26]:
import pandas as pd
d = pd.read_csv("/home/biometric/Downloads/employees.csv")

nd = d.dropna(axis=0, how='any')

print("Old data frame length:", len(d))
print("New data frame length:", len(nd))
print("Rows with at least one missing value:", (len(d) - len(nd)))

Old data frame length: 1000
New data frame length: 764
Rows with at least one missing value: 236


# 5. Dropping all duplicate rows

In [35]:
import pandas as pd
import numpy as np

dict = {'First Score': [100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, np.nan, 80, 98],
        'Fourth Score': [60, 67, 68, 65]}
df = pd.DataFrame(dict)

# make a duplicate row
df_dup = pd.concat([df, df])   # duplicates all rows
df_dup

Unnamed: 0,First Score,Second Score,Third Score,Fourth Score
0,100.0,30.0,52.0,60
1,,,,67
2,,45.0,80.0,68
3,95.0,56.0,98.0,65
0,100.0,30.0,52.0,60
1,,,,67
2,,45.0,80.0,68
3,95.0,56.0,98.0,65


In [34]:
df_clean = df_dup.drop_duplicates()
df_clean

Unnamed: 0,First Score,Second Score,Third Score,Fourth Score
0,100.0,30.0,52.0,60
1,,,,67
2,,45.0,80.0,68
3,95.0,56.0,98.0,65


# Merging, grouping, pivot tables, reshaping

## 1. Merging Dataframes

Merging in Pandas is similar to joining tables in SQL. It combines rows based on common columns.

In [17]:
import pandas as pd

df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Amit', 'Sita', 'Raj', 'Mohan']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 3, 5],
    'Score': [85, 90, 95, 80]
})


In [18]:
# merge on "ID"
merged = pd.merge(df1, df2, on='ID')
print(merged)


   ID  Name  Score
0   1  Amit     85
1   2  Sita     90
2   3   Raj     95


In [41]:
pd.merge(df1, df2, on='ID', how='outer') # keeps all rows from both
# you can try with left, right, inner

Unnamed: 0,ID,Name,Score
0,1,Amit,85.0
1,2,Sita,90.0
2,3,Raj,95.0
3,4,Mohan,
4,5,,80.0


## 2. GROUPING DATA – groupby()

groupby() groups rows based on column values and applies an operation (mean, sum, count, etc.)

### 2.1 Basic GroupBy

In [43]:
import pandas as pd

data = {
    'Team': ['A', 'A', 'B', 'B', 'C'],
    'Score': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Team,Score
0,A,10
1,A,20
2,B,30
3,B,40
4,C,50


In [44]:
df.groupby('Team')['Score'].mean()

Team
A    15.0
B    35.0
C    50.0
Name: Score, dtype: float64

### 2.2 GroupBy with multiple functions

In [47]:
df.groupby('Team')['Score'].agg(['mean', 'sum', 'count'])

Unnamed: 0_level_0,mean,sum,count
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,15.0,30,2
B,35.0,70,2
C,50.0,50,1


## 3. Pivot Tables
Pivot tables help summarize data using rows, columns, and aggregation.

In [5]:
# basic pivot tables
df = pd.DataFrame({
    'Team': ['A', 'A', 'B', 'B'],
    'Year': [2020, 2021, 2020, 2021],
    'Sales': [100, 150, 200, 250]
})
df

Unnamed: 0,Team,Year,Sales
0,A,2020,100
1,A,2021,150
2,B,2020,200
3,B,2021,250


In [49]:
pivot = df.pivot_table(values='Sales', index='Team', columns='Year')
print(pivot)


Year  2020  2021
Team            
A      100   150
B      200   250


### 3.2 Pivot Table with Aggregation

In [6]:
pivot_agg = df.pivot_table(values='Sales', index='Team', columns='Year', aggfunc='sum')
print(pivot_agg)

Year  2020  2021
Team            
A      100   150
B      200   250


In [7]:
import pandas as pd

df = pd.DataFrame({
    'Team': ['A','A','A','B','B','B'],
    'Year': [2020,2020,2021,2020,2021,2021],
    'Sales': [100, 50, 150, 200, 100, 50]
})

print(df)


  Team  Year  Sales
0    A  2020    100
1    A  2020     50
2    A  2021    150
3    B  2020    200
4    B  2021    100
5    B  2021     50


In [8]:
pivot_sum = df.pivot_table(values='Sales', index='Team', columns='Year', aggfunc='sum')
print(pivot_sum)


Year  2020  2021
Team            
A      150   150
B      200   150


# 4. Reshaping Data (Melt & Pivot)
Reshaping changes the layout of the DataFrame.

### 4.1 Melt – convert wide format to long format

In [52]:
df = pd.DataFrame({
    'Name': ['Amit', 'Sita'],
    'Math': [90, 80],
    'English': [85, 88]
})
df

Unnamed: 0,Name,Math,English
0,Amit,90,85
1,Sita,80,88


In [53]:
df_melt = pd.melt(df, id_vars=['Name'], value_vars=['Math', 'English'])
print(df_melt)


   Name variable  value
0  Amit     Math     90
1  Sita     Math     80
2  Amit  English     85
3  Sita  English     88


### 4.2 Pivot – convert long format to wide format

In [54]:
pivot = df_melt.pivot(index='Name', columns='variable', values='value')
print(pivot)


variable  English  Math
Name                   
Amit           85    90
Sita           88    80


# 5. Concatenation (Adding Rows or Columns)

### 5.1 Concatenate Rows

In [55]:
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

pd.concat([df1, df2], axis=0)


Unnamed: 0,A
0,1
1,2
0,3
1,4


### 5.2 Concatenate Columns

In [56]:
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})

pd.concat([df1, df2], axis=1)


Unnamed: 0,A,B
0,1,3
1,2,4


# 6 Stack & Unstack (Advanced Reshaping)
  ### 6.1 Stack – convert columns to rows

In [10]:
df = pd.DataFrame({
    'Name': ['Amit', 'Sita'],
    'Math': [90, 80],
    'English': [85, 88]
}).set_index('Name')

df

Unnamed: 0_level_0,Math,English
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Amit,90,85
Sita,80,88


In [11]:
df.stack()


Name         
Amit  Math       90
      English    85
Sita  Math       80
      English    88
dtype: int64

### 6.2 Unstack – convert rows to columns

In [59]:
df.stack().unstack()


Unnamed: 0_level_0,Math,English
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Amit,90,85
Sita,80,88


# Sorting


### 1. sort_values() – Sorting by Column

In [None]:
# Syntax
df.sort_values(by='column_name')

In [19]:
import pandas as pd

df = pd.DataFrame({
    'Name': ['Amit', 'Sita', 'Raj', 'Mohan'],
    'Age': [25, 30, 22, 28],
    'Score': [88, 92, 70, 85]
})
df

Unnamed: 0,Name,Age,Score
0,Amit,25,88
1,Sita,30,92
2,Raj,22,70
3,Mohan,28,85


### 1.1 Sort by one column (ascending)

In [20]:
df.sort_values(by='Age')

Unnamed: 0,Name,Age,Score
2,Raj,22,70
0,Amit,25,88
3,Mohan,28,85
1,Sita,30,92


### 1.2 Sort by one column (descending)

In [14]:
df.sort_values(by='Age', ascending=False)

Unnamed: 0,Name,Age,Score
1,Sita,30,92
3,Mohan,28,85
0,Amit,25,88
2,Raj,22,70


### 1.3 Sort by multiple columns

In [66]:
#Example: Sort by Age first, then Score.
df.sort_values(by=['Age', 'Score'])


Unnamed: 0,Name,Age,Score
2,Raj,22,70
0,Amit,25,88
3,Mohan,28,85
1,Sita,30,92


### 1.4 Sorting with different orders for each column

In [21]:
dfs = df.sort_values(by=['Age', 'Score'], ascending=[True, True])
dfs
# Age - smallest to largest
# Score - largest to smallest for same age 

Unnamed: 0,Name,Age,Score
2,Raj,22,70
0,Amit,25,88
3,Mohan,28,85
1,Sita,30,92


### 1.5 Sort by column when NaN values exist

In [69]:
df.sort_values(by='Age', na_position='first') # 'first' → NaN on top
df.sort_values(by='Age', na_position='last') # 'last' → NaN at bottom (default)

Unnamed: 0,Name,Age,Score
2,Raj,22,70
0,Amit,25,88
3,Mohan,28,85
1,Sita,30,92


# Excercise Questions

### 1 Create two DataFrames and merge them on a common column. 
### 2 Merge two DataFrames based on multiple columns.
### 3 You have two DataFrames: one with student details and one with fees.
      Merge them to get a combined sheet. Try how='left', then how='outer'.

### 4 Given the DataFrame:
Team | Player | Runs  

A    | P1     | 30

A    | P2     | 40

B    | P3     | 20

B    | P4     | 10


Find total runs scored by each team.

Find average runs per team.

Count number of players in each team.

#### 5. Using the same data, group by Player and find:

Highest run

Lowest run

Sum of runs

Use agg(['sum', 'min', 'max']).

#### 6. Group a DataFrame by a column and find:

Mean of all numeric columns

Count of rows in each group

### 7. Create a DataFrame:
City | Year | Sales

Delhi | 2020 | 100

Delhi | 2021 | 140

Mumbai | 2020 | 120

Mumbai | 2021 | 160

Make a pivot table with City as rows, Year as columns.

Fill the table with Sales.

Use sum as aggregation.