 
# 📌Topic 1: Introduction to Pandas

## What is Pandas?
Pandas is a powerful Python library for data manipulation and analysis. It provides flexible data structures such as Series and DataFrame that allow for efficient data handling.

## Installing Pandas
Before using Pandas, you need to install it. Run the following command in your terminal or command prompt:
```sh
pip install pandas
```

## Importing Pandas
Let's start by importing the Pandas library:
 

In [1]:
# Importing pandas

import pandas as pd

# Checking the version of pandas
print("Pandas Version:", pd.__version__)

Pandas Version: 2.2.3


 
## 📍 Creating a Simple Series
A Series is a one-dimensional labeled array capable of holding any data type.
 

In [2]:
# Creating a Series

import pandas as pd

data = [10, 20, 30, 40, 50]

series = pd.Series(data)
series

0    10
1    20
2    30
3    40
4    50
dtype: int64


 
## 📍 Creating a Simple DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
 

In [3]:
# Creating a DataFrame from a dictionary

import pandas as pd 

data = {
    'Name': ['Amit', 'Neha', 'Rahul', 'Priya'],
    'Age': [25, 30, 22, 28],
    'City': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai']
}

df = pd.DataFrame(data)

df

Unnamed: 0,Name,Age,City
0,Amit,25,Mumbai
1,Neha,30,Delhi
2,Rahul,22,Bangalore
3,Priya,28,Chennai


 
# Topic 2: Pandas Data Structures

## 📍 Pandas Series
A Series is a one-dimensional array-like structure with labels (index). It can hold any data type.
 

In [4]:
import pandas as pd

# Creating a Series from a list

data = [100, 200, 300, 400, 500]
series = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])

series

Series:
 a    100
b    200
c    300
d    400
e    500
dtype: int64


 
## 📍 Pandas DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
 

In [5]:
# Creating a DataFrame from a dictionary

import pandas as pd 

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}
df = pd.DataFrame(data)

df 

DataFrame:



Unnamed: 0,Employee,Salary,Department
0,Raj,50000,IT
1,Krish,60000,HR
2,Amit,55000,Finance
3,Priya,52000,Marketing


 
## Creating a DataFrame from a CSV file
Let's assume we have an Indian employee dataset stored in 'employees.csv'.
 

In [6]:
# Reading a CSV file into a DataFrame
# df_csv = pd.read_csv("employees.csv")
# print("CSV DataFrame:\n", df_csv)

 
## Creating a DataFrame from an Excel file
Let's assume we have an Indian sales dataset stored in 'sales.xlsx'.
 

In [7]:
# Reading an Excel file into a DataFrame
# df_excel = pd.read_excel("sales.xlsx")
# print("Excel DataFrame:\n", df_excel)

 
# Topic 3: DataFrame Indexing and Selection

## 📍 Selecting Columns in a DataFrame
We can select a single column or multiple columns from a DataFrame.
 

In [8]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)

# Selecting a single column

a = df['Employee'])

# Selecting multiple columns
print("\n Multiple Columns Selection:")
df[['Employee', 'Salary']]

Single Column Selection:
 0      Raj
1    Krish
2     Amit
3    Priya
Name: Employee, dtype: object

 Multiple Columns Selection:


Unnamed: 0,Employee,Salary
0,Raj,50000
1,Krish,60000
2,Amit,55000
3,Priya,52000


In [None]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)

# Selecting multiple columns

a = df[['Employee', 'Salary']]

 
## 📍 Selecting Rows in a DataFrame
We can use `.loc[]` and `.iloc[]` to select rows based on labels or positions.
 

In [9]:
# Creating a sample DataFrame

import pandas as pd 

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)

# Selecting rows using loc

a = df.loc[1]

Row Selection using loc:
 Employee      Krish
Salary        60000
Department       HR
Name: 1, dtype: object


In [10]:
# Creating a sample DataFrame

import pandas as pd

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)


# Selecting rows using iloc

a = df.iloc[2]

Row Selection using iloc:
 Employee         Amit
Salary          55000
Department    Finance
Name: 2, dtype: object


 
## 📍 Filtering Data based on Conditions
We can filter data based on conditions applied to DataFrame columns.
 

In [11]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)

# Filtering employees with Salary greater than 55000

filtered_df = df[df['Salary'] > 55000]

a = filtered_df

Filtered Data:
   Employee  Salary Department
1    Krish   60000         HR


 
# Topic 4: DataFrame Operations and Manipulations

## 📍 Adding a New Column
We can add a new column to a DataFrame by assigning values to it.
 

In [12]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}
df = pd.DataFrame(data)

a = df

Unnamed: 0,Employee,Salary,Department
0,Raj,50000,IT
1,Krish,60000,HR
2,Amit,55000,Finance
3,Priya,52000,Marketing


In [13]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)

# Adding a new column

df['Bonus'] = df['Salary'] * 0.1

a = df 


Unnamed: 0,Employee,Salary,Department,Bonus
0,Raj,50000,IT,5000.0
1,Krish,60000,HR,6000.0
2,Amit,55000,Finance,5500.0
3,Priya,52000,Marketing,5200.0


 
## 📍 Modifying an Existing Column
We can modify an existing column by reassigning values.
 

In [14]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)


# Increasing salary by 5%

df['Salary'] = df['Salary'] * 1.05

a = df

DataFrame after salary increment:
   Employee   Salary Department
0      Raj  52500.0         IT
1    Krish  63000.0         HR
2     Amit  57750.0    Finance
3    Priya  54600.0  Marketing


 
## 📍  Deleting a Column
We can remove a column using `drop()`.
 

In [15]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

df = pd.DataFrame(data)

# Dropping the Bonus column

df = df.drop(columns=['Salary'])
a = df

DataFrame after dropping Bonus column:
   Employee Department
0      Raj         IT
1    Krish         HR
2     Amit    Finance
3    Priya  Marketing


 
## 📍 Renaming Columns
We can rename columns using `rename()`.
 

In [16]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, 60000, 55000, 52000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
}

# Renaming columns

df = df.rename(columns={'Employee': 'Emp Name', 'Salary': 'Monthly Salary'})
a = df 

DataFrame after renaming columns:
   Emp Name  Monthly Salary Department
0      Raj           50000         IT
1    Krish           60000         HR
2     Amit           55000    Finance
3    Priya           52000  Marketing


 
# Topic 5: Handling Missing Data in Pandas

## 📍  Identifying Missing Data
We can check for missing values in a DataFrame using `isna()` or `isnull()`.
 

In [17]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing values
data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, np.nan, 55000, 52000],
    'Department': ['IT', 'HR', np.nan, 'Marketing']
}
df = pd.DataFrame(data)

# Checking for missing values

a = df.isna()

print("Count of missing values:\n", df.isna().sum())

Missing values in DataFrame:
    Employee  Salary  Department
0     False   False       False
1     False    True       False
2     False   False        True
3     False   False       False
Count of missing values:
 Employee      0
Salary        1
Department    1
dtype: int64


In [None]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing values
data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, np.nan, 55000, 52000],
    'Department': ['IT', 'HR', np.nan, 'Marketing']
}
df = pd.DataFrame(data)

# Checking for missing values

a =  df.isna().sum()

 
## 📍  Filling Missing Data
We can fill missing values using `fillna()`.
 

In [18]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing values
data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, np.nan, 55000, 52000],
    'Department': ['IT', 'HR', np.nan, 'Marketing']
}
df = pd.DataFrame(data)


# Filling missing values with a default value

df_filled = df.fillna({'Salary': df['Salary'].mean(), 'Department': 'Unknown'})

a = df_filled

DataFrame after filling missing values:
   Employee        Salary Department
0      Raj  50000.000000         IT
1    Krish  52333.333333         HR
2     Amit  55000.000000    Unknown
3    Priya  52000.000000  Marketing


 
## 📍 Dropping Missing Data
We can drop rows or columns containing missing values using `dropna()`.
 

In [19]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing values
data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, np.nan, 55000, 52000],
    'Department': ['IT', 'HR', np.nan, 'Marketing']
}
df = pd.DataFrame(data)

# Dropping rows with missing values

df_dropped = df.dropna()
a =  df_dropped

DataFrame after dropping missing values:
   Employee   Salary Department
0      Raj  50000.0         IT
3    Priya  52000.0  Marketing


 
# Topic 6 : Merging, Joining, and Concatenation in Pandas

## 📍 Merging DataFrames
We can merge DataFrames using `merge()`.
 

In [20]:
import pandas as pd

# Creating sample DataFrames
df1 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000]
})

df2 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Experience': [5, 7, 3, 4]
})

# Merging on Employee column

merged_df = pd.merge(df1, df2, on='Employee')

a = merged_df

Merged DataFrame:
   Employee Department  Salary  Experience
0      Raj         IT   50000           5
1    Krish         HR   60000           7
2     Amit    Finance   55000           3
3    Priya  Marketing   52000           4


 
## 📍 Joining DataFrames
We can join DataFrames using `join()`.
 

In [None]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing values

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Salary': [50000, np.nan, 55000, 52000],
    'Department': ['IT', 'HR', np.nan, 'Marketing']
}
df = pd.DataFrame(data)

# Checking for missing values

a = df.isna()

print("Count of missing values:\n", df.isna().sum())

df1 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000]
})

df2 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Experience': [5, 7, 3, 4]
})


# Creating sample DataFrames

df3 = pd.DataFrame({
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Location': ['Bangalore', 'Mumbai', 'Delhi', 'Pune']
})

# Creating sample DataFrame using join()

df4 = df1.set_index('Department').join(df3.set_index('Department'))

a = df4 

In [21]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame df1

df1 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000]
})


# Creating sample DataFrame df3

df3 = pd.DataFrame({
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Location': ['Bangalore', 'Mumbai', 'Delhi', 'Pune']
})

df4 = df1.set_index('Department').join(df3.set_index('Department'))


# Creating another sample DataFrame df5

df5 = pd.DataFrame({
    'Employee': ['Vikas', 'Neha'],
    'Department': ['IT', 'HR'],
    'Salary': [62000, 58000]
})

# Concatenating DataFrames : We can concatenate DataFrames using `concat()`.

concatenated_df = pd.concat([df1, df5], ignore_index=True)

a = concatenated_df

Joined DataFrame:
            Employee  Salary   Location
Department                            
IT              Raj   50000  Bangalore
HR            Krish   60000     Mumbai
Finance        Amit   55000      Delhi
Marketing     Priya   52000       Pune
Concatenated DataFrame:



Unnamed: 0,Employee,Department,Salary
0,Raj,IT,50000
1,Krish,HR,60000
2,Amit,Finance,55000
3,Priya,Marketing,52000
4,Vikas,IT,62000
5,Neha,HR,58000


 
# Topic 7 : Pivot Tables and Crosstab in Pandas

## 📍 Creating Pivot Tables
We can summarize data using `pivot_table()`.
 

In [22]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya', 'Vikas', 'Neha'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing', 'IT', 'HR'],
    'Salary': [50000, 60000, 55000, 52000, 62000, 58000],
    'Experience': [5, 7, 3, 4, 6, 8]
}

df = pd.DataFrame(data)

# Creating a pivot table

pivot_table = pd.pivot_table(df, values='Salary', index='Department', columns='Experience', aggfunc='mean', fill_value=0)

a = pivot_table

Pivot Table:



Experience,3,4,5,6,7,8
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Finance,55000.0,0.0,0.0,0.0,0.0,0.0
HR,0.0,0.0,0.0,0.0,60000.0,58000.0
IT,0.0,0.0,50000.0,62000.0,0.0,0.0
Marketing,0.0,52000.0,0.0,0.0,0.0,0.0


 
## 📍 Creating a Crosstab
We can use `crosstab()` to compute frequency tables.
 

In [23]:
import pandas as pd 

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya', 'Vikas', 'Neha'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing', 'IT', 'HR'],
    'Salary': [50000, 60000, 55000, 52000, 62000, 58000],
    'Experience': [5, 7, 3, 4, 6, 8]
}

df = pd.DataFrame(data)


# Creating a crosstab for Department and Experience

crosstab_result = pd.crosstab(df['Department'], df['Experience'])

a = crosstab_result

Crosstab Result:



Experience,3,4,5,6,7,8
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Finance,1,0,0,0,0,0
HR,0,0,0,0,1,1
IT,0,0,1,1,0,0
Marketing,0,1,0,0,0,0


 
# Topic 8 : Applying Functions and Lambda in Pandas

## 📍 Using `apply()` Method
We can apply functions to Series and DataFrames using `apply()`.
 

In [24]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya', 'Vikas', 'Neha'],
    'Salary': [50000, 60000, 55000, 52000, 62000, 58000],
    'Experience': [5, 7, 3, 4, 6, 8]
}

df = pd.DataFrame(data)

# Defining a function to categorize experience

def experience_category(exp):
    if exp < 5:
        return 'Junior'
    elif exp <= 7:
        return 'Mid-Level'
    else:
        return 'Senior'

# Applying the function to the Experience column

df['Experience Level'] = df['Experience'].apply(experience_category)

a = df

DataFrame with Experience Category:



Unnamed: 0,Employee,Salary,Experience,Experience Level
0,Raj,50000,5,Mid-Level
1,Krish,60000,7,Mid-Level
2,Amit,55000,3,Junior
3,Priya,52000,4,Junior
4,Vikas,62000,6,Mid-Level
5,Neha,58000,8,Senior


 
## 📍 Using Lambda Functions
We can use `lambda` inside `apply()` to perform quick operations.
 

In [25]:
import pandas as %pdb

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya', 'Vikas', 'Neha'],
    'Salary': [50000, 60000, 55000, 52000, 62000, 58000],
    'Experience': [5, 7, 3, 4, 6, 8]
}

df = pd.DataFrame(data)

# Increasing salary by 10% using lambda

df['Updated Salary'] = df['Salary'].apply(lambda x: x * 1.10)

a = df

DataFrame with Updated Salary:
   Employee  Salary  Experience  Updated Salary
0      Raj   50000           5         55000.0
1    Krish   60000           7         66000.0
2     Amit   55000           3         60500.0
3    Priya   52000           4         57200.0
4    Vikas   62000           6         68200.0
5     Neha   58000           8         63800.0


 
# Topic 9 : GroupBy Operations in Pandas

## 📍 Using `groupby()` to Aggregate Data
The `groupby()` function helps in grouping data based on specific columns.
 

In [26]:
import pandas as pd

# Creating a sample DataFrame

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya', 'Vikas', 'Neha', 'Ankit', 'Meera'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing', 'IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000, 62000, 58000, 53000, 51000],
    'Experience': [5, 7, 3, 4, 6, 8, 2, 5]
}

df = pd.DataFrame(data)

# Grouping by Department and calculating mean salary

grouped_salary = df.groupby('Department')['Salary'].mean()

a = grouped_salary


Mean Salary by Department:
 Department
Finance      54000.0
HR           59000.0
IT           56000.0
Marketing    51500.0
Name: Salary, dtype: float64


 
## 📍 Using Multiple Aggregations
We can use multiple aggregation functions at once.
 

In [27]:
import pandas as %pdb

data = {
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya', 'Vikas', 'Neha', 'Ankit', 'Meera'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing', 'IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000, 62000, 58000, 53000, 51000],
    'Experience': [5, 7, 3, 4, 6, 8, 2, 5]
}

df = pd.DataFrame(data)

# Grouping by Department and applying multiple aggregations

grouped_agg = df.groupby('Department').agg({'Salary': ['mean', 'max', 'min'], 'Experience': 'mean'})

a = grouped_agg

Aggregated Data by Department:
              Salary               Experience
               mean    max    min       mean
Department                                  
Finance     54000.0  55000  53000        2.5
HR          59000.0  60000  58000        7.5
IT          56000.0  62000  50000        5.5
Marketing   51500.0  52000  51000        4.5


 
# Topic 10: Merging, Joining, and Concatenation in Pandas

## 📍 Merging DataFrames
The `merge()` function allows combining DataFrames based on a common column.
 

In [28]:
import pandas as pd

# Creating sample DataFrames

df1 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000]
})

df2 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Neha'],
    'Experience': [5, 7, 3, 8]
})

# Merging DataFrames on Employee column

merged_df = pd.merge(df1, df2, on='Employee', how='inner')


a = merged_df


Merged DataFrame:



Unnamed: 0,Employee,Department,Salary,Experience
0,Raj,IT,50000,5
1,Krish,HR,60000,7
2,Amit,Finance,55000,3


 
## 📍 Concatenating DataFrames
We can use `concat()` to stack DataFrames vertically or horizontally.
 

In [29]:
import pandas as pd

df1 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000]
})

df3 = pd.DataFrame({
    'Employee': ['Vikas', 'Meera'],
    'Department': ['IT', 'Marketing'],
    'Salary': [62000, 51000]
})

# Concatenating DataFrames vertically

concat_df = pd.concat([df1, df3], ignore_index=True)

a = concat_df

Concatenated DataFrame:



Unnamed: 0,Employee,Department,Salary
0,Raj,IT,50000
1,Krish,HR,60000
2,Amit,Finance,55000
3,Priya,Marketing,52000
4,Vikas,IT,62000
5,Meera,Marketing,51000


 
## 📍 Joining DataFrames
The `join()` method is used to combine DataFrames using their index.
 

In [30]:
import pandas as pd

df1 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Priya'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing'],
    'Salary': [50000, 60000, 55000, 52000]
})

df2 = pd.DataFrame({
    'Employee': ['Raj', 'Krish', 'Amit', 'Neha'],
    'Experience': [5, 7, 3, 8]
})

# Setting Employee as index

df1.set_index('Employee', inplace=True)
df2.set_index('Employee', inplace=True)

# Performing join operation

joined_df = df1.join(df2, how='left')

a = joined_df

Joined DataFrame:



Unnamed: 0_level_0,Department,Salary,Experience
Employee,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Raj,IT,50000,5.0
Krish,HR,60000,7.0
Amit,Finance,55000,3.0
Priya,Marketing,52000,


 
# Topic 11 : Time Series Analysis in Pandas

## 📍 Creating and Handling Time Series Data
Pandas provides powerful tools for working with time series data.
 

In [31]:
import pandas as pd

# Creating a date range

date_rng = pd.date_range(start='2024-01-01', end='2024-01-10', freq='D')

# Creating a DataFrame with time series data

ts_df = pd.DataFrame({'Date': date_rng, 'Sales': [200, 220, 250, 210, 190, 230, 240, 280, 300, 310]})
ts_df.set_index('Date', inplace=True)

a = ts_df 

Time Series Data:



Unnamed: 0_level_0,Sales
Date,Unnamed: 1_level_1
2024-01-01,200
2024-01-02,220
2024-01-03,250
2024-01-04,210
2024-01-05,190
2024-01-06,230
2024-01-07,240
2024-01-08,280
2024-01-09,300
2024-01-10,310


 
## 📍 Resampling Time Series Data
The `resample()` function helps in aggregating data over different time periods.
 

In [32]:
import pandas as pd

# Creating a date range

date_rng = pd.date_range(start='2024-01-01', end='2024-01-10', freq='D')

# Creating a DataFrame with time series data

ts_df = pd.DataFrame({'Date': date_rng, 'Sales': [200, 220, 250, 210, 190, 230, 240, 280, 300, 310]})
ts_df.set_index('Date', inplace=True)

# Resampling to weekly frequency

weekly_sales = ts_df.resample('W').sum()

a = weekly_sales

Weekly Resampled Sales Data:



Unnamed: 0_level_0,Sales
Date,Unnamed: 1_level_1
2024-01-07,1540
2024-01-14,890


 
## 📍 Rolling Window Analysis
Rolling operations help in calculating moving averages and trends.
 

In [33]:
import pandas as pd

# Creating a date range

date_rng = pd.date_range(start='2024-01-01', end='2024-01-10', freq='D')

# Creating a DataFrame with time series data

ts_df = pd.DataFrame({'Date': date_rng, 'Sales': [200, 220, 250, 210, 190, 230, 240, 280, 300, 310]})
ts_df.set_index('Date', inplace=True)


# Calculating a 3-day moving average

ts_df['Moving_Avg'] = ts_df['Sales'].rolling(window=3).mean()

a = ts_df


Time Series Data with Moving Average:



Unnamed: 0_level_0,Sales,Moving_Avg
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2024-01-01,200,
2024-01-02,220,
2024-01-03,250,223.333333
2024-01-04,210,226.666667
2024-01-05,190,216.666667
2024-01-06,230,210.0
2024-01-07,240,220.0
2024-01-08,280,250.0
2024-01-09,300,273.333333
2024-01-10,310,296.666667
