# Advanced Pandas Operations

This notebook will guide you through some of the more advanced operations in Pandas, such as merging and joining DataFrames, pivot tables, reshaping data, and working with time series.

### Topics Covered:
1. Merging and Joining DataFrames
2. Pivot Tables
3. Reshaping Data
4. Working with Time Series

## 1. Merging and Joining DataFrames
Merging and joining operations are useful when you need to combine data from multiple DataFrames based on a common key. We'll demonstrate these operations with a simple example.

In [1]:
import pandas as pd
# Example DataFrames for merging and joining
df1 = pd.DataFrame({
    'EmployeeID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Department': ['HR', 'IT', 'Sales', 'HR']
})

df2 = pd.DataFrame({
    'EmployeeID': [3, 4, 5, 6],
    'Name': ['Charlie', 'David', 'Eva', 'Frank'],
    'Salary': [70000, 80000, 50000, 60000]
})
df1.head(), df2.head()

(   EmployeeID     Name Department
 0           1    Alice         HR
 1           2      Bob         IT
 2           3  Charlie      Sales
 3           4    David         HR,
    EmployeeID     Name  Salary
 0           3  Charlie   70000
 1           4    David   80000
 2           5      Eva   50000
 3           6    Frank   60000)

### Merge Operation
You can merge two DataFrames on a key column (such as `EmployeeID` in this case). By default, it performs an inner join, but you can specify left, right, or outer join.

In [2]:
# Merge DataFrames on EmployeeID
merged_df = pd.merge(df1, df2, on='EmployeeID', how='inner')
merged_df

Unnamed: 0,EmployeeID,Name_x,Department,Name_y,Salary
0,3,Charlie,Sales,Charlie,70000
1,4,David,HR,David,80000


## 2. Pivot Tables
Pivot tables allow you to summarize data and group it in a flexible way. Let's create a pivot table to analyze the average salary per department.

In [3]:
# Example DataFrame for pivot table
df_pivot = pd.DataFrame({
    'Department': ['HR', 'IT', 'Sales', 'HR', 'Sales', 'IT'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank'],
    'Salary': [50000, 60000, 70000, 52000, 46000, 62000]
})
df_pivot.head()

Unnamed: 0,Department,Employee,Salary
0,HR,Alice,50000
1,IT,Bob,60000
2,Sales,Charlie,70000
3,HR,David,52000
4,Sales,Eva,46000


In [4]:
# Creating a pivot table to calculate the average salary by department
pivot_table = df_pivot.pivot_table(values='Salary', index='Department', aggfunc='mean')
pivot_table

Unnamed: 0_level_0,Salary
Department,Unnamed: 1_level_1
HR,51000.0
IT,61000.0
Sales,58000.0


## 3. Reshaping Data
Reshaping operations like `melt()` and `pivot()` allow you to restructure DataFrames. `melt()` is useful for turning columns into rows, and `pivot()` is used to create a new DataFrame from long-form data.

In [5]:
# Reshaping with melt
df_melt = pd.DataFrame({
    'Employee': ['Alice', 'Bob', 'Charlie'],
    'HR_Score': [88, 92, 85],
    'IT_Score': [78, 85, 89]
})
df_melt

Unnamed: 0,Employee,HR_Score,IT_Score
0,Alice,88,78
1,Bob,92,85
2,Charlie,85,89


In [6]:
# Melting the DataFrame to convert columns into rows
df_melted = pd.melt(df_melt, id_vars='Employee', value_vars=['HR_Score', 'IT_Score'],
                    var_name='Department', value_name='Score')
df_melted

Unnamed: 0,Employee,Department,Score
0,Alice,HR_Score,88
1,Bob,HR_Score,92
2,Charlie,HR_Score,85
3,Alice,IT_Score,78
4,Bob,IT_Score,85
5,Charlie,IT_Score,89


## 4. Working with Time Series
Pandas makes it easy to work with time series data. You can convert columns to datetime, resample data, and apply rolling statistics for time series analysis.

In [7]:
# Creating a time series DataFrame
time_data = pd.DataFrame({
    'Date': pd.date_range(start='2024-01-01', periods=10, freq='D'),
    'Sales': [200, 220, 210, 230, 250, 270, 260, 280, 300, 310]
})
time_data.head()

Unnamed: 0,Date,Sales
0,2024-01-01,200
1,2024-01-02,220
2,2024-01-03,210
3,2024-01-04,230
4,2024-01-05,250
