Data Transformation with Pandas (GroupBy, Merge, Pivot, etc.)
These tools are used to reshape, summarize, and combine data before loading it into data stores or analytics tools.


✅ 1. Grouping and Aggregating – groupby()
GroupBy allows you to split data into groups and apply calculations on each group.

Common aggregation functions: .mean(), .sum(), .count(), .max(), .min(), .agg()
🧪 Example:


In [20]:
import pandas as pd

df = pd.DataFrame({
    'department': ['HR', 'HR', 'IT', 'IT', 'Sales'],
    'salary': [3000, 3500, 5000, 5500, 4000]
})

grouped = df.groupby('department')['salary'].mean()
print(grouped)

department
HR       3250.0
IT       5250.0
Sales    4000.0
Name: salary, dtype: float64


✅ 2. Aggregating with Multiple Functions

In [21]:
df.groupby('department')['salary'].agg(['mean', 'max', 'count'])


Unnamed: 0_level_0,mean,max,count
department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HR,3250.0,3500,2
IT,5250.0,5500,2
Sales,4000.0,4000,1


🧹 7. Sorting and Filtering

In [26]:
df = pd.DataFrame({
    'department': ['HR', 'HR', 'IT', 'IT', 'Sales'],
    'salary': [3000, 3500, 5000, 5500, 4000]
})

df.sort_values(by='salary', ascending=False)
df[df['department'] == 'IT']
df[df['salary'] > 4000] #filter


Unnamed: 0,department,salary
2,IT,5000
3,IT,5500


Or with custom names:

In [6]:
df.groupby('department')['salary'].agg(
    avg_salary='mean',
    max_salary='max',
    count='count'
    
)

Unnamed: 0_level_0,avg_salary,max_salary,count
department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HR,3250.0,3500,2
IT,5250.0,5500,2
Sales,4000.0,4000,1


🔗 3. Merging and Joining DataFrames
Use merge() to join datasets like SQL joins. 

📌 Types of Joins: inner, outer, left, right

In [17]:
employees = pd.DataFrame({
    'emp_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

salaries = pd.DataFrame({
    'emp_id': [1, 2],
    'salary': [3000, 4000]
})

merged = pd.merge(employees,salaries, on='emp_id',how='left')
print(merged)

   emp_id     name  salary
0       1    Alice  3000.0
1       2      Bob  4000.0
2       3  Charlie     NaN


🔁 4. Concatenation – pd.concat()
Combine DataFrames vertically or horizontally.

In [14]:
df1 = pd.DataFrame({'a': [1, 2]})
df2 = pd.DataFrame({'a': [3, 4]})

result = pd.concat([df1, df2], ignore_index=True)
result

Unnamed: 0,a
0,1
1,2
2,3
3,4


📊 5. Pivot Table – Reshape and Aggregate
Pivot tables let you summarize data like Excel:

In [17]:
data = pd.DataFrame({
    'employee': ['A', 'A', 'B', 'B'],
    'month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'sales': [100, 150, 80, 120]
})

pivot = data.pivot_table(index='employee', columns='month', values='sales', aggfunc='sum')
print(pivot)

month     Feb  Jan
employee          
A         150  100
B         120   80


🔄 6. Melt – Unpivot Wide Format to Long

In [12]:
df = pd.DataFrame({
    'name': ['A', 'B'],
    'Jan': [100, 90],
    'Feb': [120, 110]
})
melted_result = pd.melt(df, id_vars=['name'], var_name='month', value_name='sales')
melted_result

Unnamed: 0,name,month,sales
0,A,Jan,100
1,B,Jan,90
2,A,Feb,120
3,B,Feb,110
