# Introduction to Pandas
 Pandas is a powerful library that is used for _**data manipulation**_ and _**data analysis**_ in python. Pandas supports data cleaning, transformation, filtering, and aggregation, making it essential for data science and machine learning tasks.

>Start by installing pandas to your environment by using pip in the folllowing manner :

```python
pip install pandas

Now to start working on pandas you have to import it to your _.py_ file in such manner : 

In [1]:
import pandas as pd

As we have imported pandas as **pd** we can now use pandas functions by referring it as pd instead of writing pandas. You can write whatever you like instead of pd but it is a good practice and international standard to use pd 

# Key Data Structure in Pandas

* Series (1 Dimensional)
* DataFrame (2 Dimensional)

Let's start with series , as the name suggests you can create a series by following the method below:

In [13]:
s= pd.Series([1,2,3,4,5])
print(s)

0    1
1    2
2    3
3    4
4    5
dtype: int64


Series is similar to columns and we see in excel. You can also add **index** to change index as follow: 

In [15]:
s=pd.Series([1,2,3,4,5] , index=["A","B","C","D","E"])
print(s)

A    1
B    2
C    3
D    4
E    5
dtype: int64


Similarly, you can create DataFrame like this:

In [8]:
df=pd.DataFrame({
    "Name" : ["Muneeb" , "Hashir" ,"Usman"] ,
    "Marks"  : [100,96,79]
    })
print(df)

     Name  Marks
0  Muneeb    100
1  Hashir     96
2   Usman     79


# Reading .CSV files in pandas
You can read your .csv/.xls or json files like this using pandas:

In [3]:
df=pd.read_csv("Iris.csv")
print(df)

      Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm  \
0      1            5.1           3.5            1.4           0.2   
1      2            4.9           3.0            1.4           0.2   
2      3            4.7           3.2            1.3           0.2   
3      4            4.6           3.1            1.5           0.2   
4      5            5.0           3.6            1.4           0.2   
..   ...            ...           ...            ...           ...   
145  146            6.7           3.0            5.2           2.3   
146  147            6.3           2.5            5.0           1.9   
147  148            6.5           3.0            5.2           2.0   
148  149            6.2           3.4            5.4           2.3   
149  150            5.9           3.0            5.1           1.8   

            Species  
0       Iris-setosa  
1       Iris-setosa  
2       Iris-setosa  
3       Iris-setosa  
4       Iris-setosa  
..              ...  
145  

In [4]:
df.drop_duplicates()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


# Different functions for DataFrame

You can use **head()** and **tail()** to view only top 5 or bottom 5 respectively

In [22]:
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


Similarly,

In [23]:
df.tail()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica
149,150,5.9,3.0,5.1,1.8,Iris-virginica


You can use **.describe()** to get summary statistics of columns in the DataFrame

In [24]:
df.describe()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0,150.0
mean,75.5,5.843333,3.054,3.758667,1.198667
std,43.445368,0.828066,0.433594,1.76442,0.763161
min,1.0,4.3,2.0,1.0,0.1
25%,38.25,5.1,2.8,1.6,0.3
50%,75.5,5.8,3.0,4.35,1.3
75%,112.75,6.4,3.3,5.1,1.8
max,150.0,7.9,4.4,6.9,2.5


You can use .info() to get information about the columns in your DataFrame

In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


# Data Selection

You can extract specific data like this:

In [29]:
df["Id"]

0        1
1        2
2        3
3        4
4        5
      ... 
145    146
146    147
147    148
148    149
149    150
Name: Id, Length: 150, dtype: int64

However, the ID that we extracted from Dataframe is of series type as it in 1 Dimension

In [30]:
type(df["Id"])

pandas.core.series.Series

You can also extract mutiple columns

In [32]:
df[["Id","PetalLengthCm"]]

Unnamed: 0,Id,PetalLengthCm
0,1,1.4
1,2,1.4
2,3,1.3
3,4,1.5
4,5,1.4
...,...,...
145,146,5.2
146,147,5.0
147,148,5.2
148,149,5.4


You can also extract a specific row like this:

In [35]:
df.iloc[2]

Id                         3
SepalLengthCm            4.7
SepalWidthCm             3.2
PetalLengthCm            1.3
PetalWidthCm             0.2
Species          Iris-setosa
Name: 2, dtype: object

> Inside square brackets you give the index of the row you want to extract

# Dealing with empty cells

In [28]:
df2=pd.read_csv("Iris2.csv.csv")
df2.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1.0,5.1,3.5,1.4,0.2,Iris-setosa
1,2.0,,3.0,1.4,0.2,
2,3.0,4.7,3.2,1.3,0.2,Iris-setosa
3,4.0,4.6,,1.5,0.2,Iris-setosa
4,5.0,5.0,3.6,1.4,0.2,Iris-setosa


In order to remove the cells with no data (NaN), We can use **.dropna()** as follow:

In [39]:
df2.dropna()


Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1.0,5.1,3.5,1.4,0.2,Iris-setosa
2,3.0,4.7,3.2,1.3,0.2,Iris-setosa
4,5.0,5.0,3.6,1.4,0.2,Iris-setosa
5,6.0,5.4,3.9,1.7,0.4,Iris-setosa
6,7.0,4.6,3.4,1.4,0.3,Iris-setosa
...,...,...,...,...,...,...
144,145.0,6.7,3.3,5.7,2.5,Iris-virginica
145,146.0,6.7,3.0,5.2,2.3,Iris-virginica
147,148.0,6.5,3.0,5.2,2.0,Iris-virginica
148,149.0,6.2,3.4,5.4,2.3,Iris-virginica


Sometimes you do not want to remove empty cells instead you want to replace it with a value for this you can use .fillna() and inside parenthesis you add the value you want to replace

In [40]:
df2.fillna(0)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1.0,5.1,3.5,1.4,0.2,Iris-setosa
1,2.0,0.0,3.0,1.4,0.2,0
2,3.0,4.7,3.2,1.3,0.2,Iris-setosa
3,4.0,4.6,0.0,1.5,0.2,Iris-setosa
4,5.0,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146.0,6.7,3.0,5.2,2.3,Iris-virginica
146,0.0,6.3,2.5,5.0,1.9,Iris-virginica
147,148.0,6.5,3.0,5.2,2.0,Iris-virginica
148,149.0,6.2,3.4,5.4,2.3,Iris-virginica


To detect if there is an empty entry in the DataFrame we can use **.isna()** function as follow:

In [29]:
df2.isna()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,False,False,False,False,False,False
1,False,True,False,False,False,True
2,False,False,False,False,False,False
3,False,False,True,False,False,False
4,False,False,False,False,False,False
...,...,...,...,...,...,...
145,False,False,False,False,False,False
146,True,False,False,False,False,False
147,False,False,False,False,False,False
148,False,False,False,False,False,False


Here True value means that the cell is empty and false means that it is not empty

> Note: There is a parameter **implace=true** which changes the original DataFrame. Whatever function we apply on your DataFrame is not applied to the original DataFrame unless we use it. However, We don't use it in most of the cases

# Renaming Columns
Use following syntax to rename a column

In [43]:
df.rename(columns={"SepalLengthCm":"SL"})

Unnamed: 0,Id,SL,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


# Dealing with Datatypes


In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


SepalLengthCM is of float type. We can change the datatype of columns according to our preference as follow:

In [46]:
df["SepalLengthCm"]=df["SepalLengthCm"].astype(int)

we use **.astype()** function to change the datatype, Now as you can see the datatype is changed to int

In [49]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    int64  
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(3), int64(2), object(1)
memory usage: 7.2+ KB


# Length of DataFrame

In [53]:
x=len(df)
print(x)

150


# Adding new column in DataFrame
You can add a new column in an existing Dataframe using list comprehension as follow:

In [55]:
df["PetalLengthSquare"]=[0 for i in range(len(df))]
df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species,PetalLengthSquare
0,1,5,3.5,1.4,0.2,Iris-setosa,0
1,2,4,3.0,1.4,0.2,Iris-setosa,0
2,3,4,3.2,1.3,0.2,Iris-setosa,0
3,4,4,3.1,1.5,0.2,Iris-setosa,0
4,5,5,3.6,1.4,0.2,Iris-setosa,0
...,...,...,...,...,...,...,...
145,146,6,3.0,5.2,2.3,Iris-virginica,0
146,147,6,2.5,5.0,1.9,Iris-virginica,0
147,148,6,3.0,5.2,2.0,Iris-virginica,0
148,149,6,3.4,5.4,2.3,Iris-virginica,0


You can also apply a function to a column to modify its entries. Like we can apply a function that gives the square of PetalLength by using **.apply()** 

In [58]:
def square(x):
    return x*x

df["PetalLengthSquare"]=df["PetalLengthCm"].apply(square)
df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species,PetalLengthSquare
0,1,5,3.5,1.4,0.2,Iris-setosa,1.96
1,2,4,3.0,1.4,0.2,Iris-setosa,1.96
2,3,4,3.2,1.3,0.2,Iris-setosa,1.69
3,4,4,3.1,1.5,0.2,Iris-setosa,2.25
4,5,5,3.6,1.4,0.2,Iris-setosa,1.96
...,...,...,...,...,...,...,...
145,146,6,3.0,5.2,2.3,Iris-virginica,27.04
146,147,6,2.5,5.0,1.9,Iris-virginica,25.00
147,148,6,3.0,5.2,2.0,Iris-virginica,27.04
148,149,6,3.4,5.4,2.3,Iris-virginica,29.16


# Basic operations on DataFrame

Selecting a specific column:

In [None]:
df["Species"]

0         Iris-setosa
1         Iris-setosa
2         Iris-setosa
3         Iris-setosa
4         Iris-setosa
            ...      
145    Iris-virginica
146    Iris-virginica
147    Iris-virginica
148    Iris-virginica
149    Iris-virginica
Name: Species, Length: 150, dtype: object

Filtering a column:


In [17]:
df[df["PetalLengthCm"]<1.2]

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
13,14,4.3,3.0,1.1,0.1,Iris-setosa
22,23,4.6,3.6,1.0,0.2,Iris-setosa


We can remove a column by using either **.drop()** function or by using **del** or **pop**:

In [None]:
 df.drop(columns={"Species"})
# del df["Species"]
# df.pop("Species")

# All three of them do the same work


Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
0,1,5.1,3.5,1.4,0.2
1,2,4.9,3.0,1.4,0.2
2,3,4.7,3.2,1.3,0.2
3,4,4.6,3.1,1.5,0.2
4,5,5.0,3.6,1.4,0.2
...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3
146,147,6.3,2.5,5.0,1.9
147,148,6.5,3.0,5.2,2.0
148,149,6.2,3.4,5.4,2.3


# Sorting Values
Consider a DataFrame as follow:

In [35]:
df3=pd.DataFrame({'A':[4,62,2],'B':[34,1,4]})
df3

Unnamed: 0,A,B
0,4,34
1,62,1
2,2,4


Simply follow the syntax to sort a DataFrame:

In [43]:
df3.sort_values(by=['A','B'],ascending=[True,True])

Unnamed: 0,A,B
2,2,4
0,4,34
1,62,1


Ascending = True means we want to sort in ascending order and False means we want to sort in Descnding order.
>Sorting in Pandas is done hierarchically , means that the 2nd column is sorted if there are duplicates in the first column.

# Grouping and Aggregation


Consider the following DataFrame:

In [72]:
df=pd.DataFrame({"Name":["Muneeb","Hashir","Usman"],"Marks":[90,95,90],"Grade":['A','B','C'],"Score":[90,22,11]})
df

Unnamed: 0,Name,Marks,Grade,Score
0,Muneeb,90,A,90
1,Hashir,95,B,22
2,Usman,90,C,11


You can group one or more columns and aggregate them in such a manner:

In [73]:
group=df.groupby("Marks")
group.count()

Unnamed: 0_level_0,Name,Grade,Score
Marks,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
90,2,2,2
95,1,1,1


Similarly, you can use other aggregate functions like **sum(),min(),max()** etc

# Rearranging and Reshaping - (Pivot,Melt)
You can reshape your DataFrame using functions like pivot or melt. Let's look at them. \
Here is DataFrame

In [76]:
df=pd.read_csv("Weather.csv")
df

Unnamed: 0,Date,City,Temperature,Humidity
0,5/2/2025,Lahore,40,45
1,5/3/2025,Lahore,41,55
2,5/4/2025,Lahore,29,57
3,5/5/2025,Karachi,30,23
4,5/6/2025,Karachi,32,33
5,5/7/2025,Karachi,35,12
6,5/8/2025,Islamabad,25,25
7,5/9/2025,Islamabad,23,61


In [87]:
df.pivot(index="Date",columns="City",values="Temperature")

City,Islamabad,Karachi,Lahore
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5/2/2025,,,40.0
5/3/2025,,,41.0
5/4/2025,,,29.0
5/5/2025,,30.0,
5/6/2025,,32.0,
5/7/2025,,35.0,
5/8/2025,25.0,,
5/9/2025,23.0,,


Values argument is not compulsory and it can be dropped

Pivot table allows you to summarize and aggregate data in your DataFrame.

In [88]:
df=pd.read_csv("Weather2.csv")
df

Unnamed: 0,Date,City,Temperature,Humidity
0,5/1/2025,Lahore,65,56
1,5/1/2025,Lahore,61,54
2,5/2/2025,Lahore,70,60
3,5/2/2025,Lahore,72,62
4,5/1/2025,Karachi,75,80
5,5/1/2025,Karachi,78,83
6,5/2/2025,Karachi,82,85
7,5/2/2025,Karachi,80,26


In [89]:
df.pivot_table(index="City",columns="Date")

Unnamed: 0_level_0,Humidity,Humidity,Temperature,Temperature
Date,5/1/2025,5/2/2025,5/1/2025,5/2/2025
City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Karachi,81.5,55.5,76.5,81.0
Lahore,55.0,61.0,63.0,71.0


Now, we can use melt function to rearrange the DataFrame

In [91]:
df=pd.read_csv("Melt.csv")
df

Unnamed: 0,Day,Chicago,Newyork,Dubai
0,Monday,32,29,31
1,Tuesday,21,28,34
2,Wednesday,35,39,33
3,Thursday,40,29,38
4,Friday,23,31,39
5,Saturday,12,32,43
6,Sunday,11,19,40


In [None]:
df1=pd.melt(df, id_vars="Day",value_name="Temperature",var_name="City")
# df1[df1["City"]=="Chicago"]

Unnamed: 0,Day,City,Temperature
0,Monday,Chicago,32
1,Tuesday,Chicago,21
2,Wednesday,Chicago,35
3,Thursday,Chicago,40
4,Friday,Chicago,23
5,Saturday,Chicago,12
6,Sunday,Chicago,11


> The mojor difference between melt() and pivot() is that they are opposite of each other.\
> melt() is used to convert wide range data to long range form and pivot() is used to convert long range data to wide form 

# Merging and joining DataFrame
Consider two DataFrames.We can combine two or more DataFrame in different ways discussed as follow:

In [5]:
df=pd.DataFrame(
    {
        "FellowshipID":[1001,1002,1003,1004],
        "FirstName":["Frodo","SamWise","Gandalf","Pippin"],
        "Skills":["Hiding","Gardening","Spells","Fireworks"]
    }
)
df

Unnamed: 0,FellowshipID,FirstName,Skills
0,1001,Frodo,Hiding
1,1002,SamWise,Gardening
2,1003,Gandalf,Spells
3,1004,Pippin,Fireworks


In [6]:
df1=pd.DataFrame(
    {
        "FellowshipID":[1001,1002,1006,1007,1008],
        "FirstName":["Frodo","SamWise","Legolas","Elrond","Barromir"],
        "Age":[59,39,2931,6520,51]
    }
)
df1

Unnamed: 0,FellowshipID,FirstName,Age
0,1001,Frodo,59
1,1002,SamWise,39
2,1006,Legolas,2931
3,1007,Elrond,6520
4,1008,Barromir,51


In [None]:
df.merge(df1)

Unnamed: 0,FellowshipID,FirstName,Age,Skills
0,1001,Frodo,59,Hiding
1,1002,SamWise,39,Gardening


In [None]:
df.merge(df1,how="inner") # Gives columns that share same key(in this case FellowshopID)

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,59
1,1002,SamWise,Gardening,39


In [None]:
df.merge(df1,how="outer") # Merges and gives all columns

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,59.0
1,1002,SamWise,Gardening,39.0
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,
4,1006,Legolas,,2931.0
5,1007,Elrond,,6520.0
6,1008,Barromir,,51.0


In [12]:
df.merge(df1,how="left")

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,59.0
1,1002,SamWise,Gardening,39.0
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,


In [13]:
df.merge(df1,how="right")

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,59
1,1002,SamWise,Gardening,39
2,1006,Legolas,,2931
3,1007,Elrond,,6520
4,1008,Barromir,,51


You can understand it better through this diagram:\
![pic](4-pandas-merge-inner-outer-left-right-1024x771.webp)

In [None]:
pd.concat([df,df1]) # It is just like that the fist dataframe is sitting on the other dataframe

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,
1,1002,SamWise,Gardening,
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,
0,1001,Frodo,,59.0
1,1002,SamWise,,39.0
2,1006,Legolas,,2931.0
3,1007,Elrond,,6520.0
4,1008,Barromir,,51.0


In [None]:
pd.concat([df,df1])

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,
1,1002,SamWise,Gardening,
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,
0,1001,Frodo,,59.0
1,1002,SamWise,,39.0
2,1006,Legolas,,2931.0
3,1007,Elrond,,6520.0
4,1008,Barromir,,51.0


# Conclusion and What to do next?

With this we have covered most of Pandas. Now what you can do is **practice on Leetcode or other platforms** to get more grip on Pandas!\
Thank you for reading this notebook.I hope you find this helpful!