# Pandas Practice Notebook 3: Advanced & Real-World Techniques

This notebook focuses on more **advanced and real-world pandas tasks**.
Topics: MultiIndex, reshape, apply/lambda, rolling windows, and integration with files.
These are often used in professional data analysis workflows.

## Section 1: MultiIndex
1. Create a `DataFrame` with sales data for 2 regions (East, West) and 2 products (A, B).
   Use a MultiIndex (Region, Product).
2. Select sales for West region only.
3. Select sales for product A across all regions.

In [1]:
import pandas as pd

df = pd.read_csv('csv/sales.csv')
df = df.groupby(['region', 'product'])['revenue'].sum().to_frame()
df.loc['West']
df.xs('Product A', level='product')

Unnamed: 0_level_0,revenue
region,Unnamed: 1_level_1
Central,2040.0
East,1325.0
North,2525.0
South,1396.0
West,1964.0


## Section 2: Reshape (melt, stack, unstack)
4. Create a `DataFrame` with monthly sales of 3 products.
5. Reshape it using `melt` so that you have columns: Month, Product, Sales.
6. Use `pivot_table` to go back to wide format.
7. Practice with `stack` and `unstack`.

In [2]:
import pandas as pd

data = {
    "month": ["2025-01", "2025-02", "2025-03"],
    "Product_A_sales": [1200, 950, 1100],
    "Product_A_profit": [400, 300, 350],
    "Product_B_sales": [800, 1230, 990],
    "Product_B_profit": [250, 420, 310],
    "Product_C_sales": [1500, 1600, 1700],
    "Product_C_profit": [600, 650, 700],
}
df = pd.DataFrame(data)

df_melted = df.melt(
    id_vars="month",
    var_name="product_metric",
    value_name="value"
)
df_melted[["prefix", "product", "metric"]] = df_melted["product_metric"].str.split("_", expand=True)

df_melted = df_melted.drop(columns=["product_metric", "prefix"])

df_melted
pvt = pd.pivot_table(
    df_melted,
    values = 'value',
    index = ['month','product'],
    columns = ['metric'],
    fill_value=0
)
pvt.stack()
pvt.unstack().unstack().unstack().unstack().unstack().unstack().unstack()

metric,profit,profit,profit,sales,sales,sales
product,A,B,C,A,B,C
month,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2025-01,400.0,250.0,600.0,1200.0,800.0,1500.0
2025-02,300.0,420.0,650.0,950.0,1230.0,1600.0
2025-03,350.0,310.0,700.0,1100.0,990.0,1700.0


## Section 3: Apply & Lambda
8. Create a `DataFrame` with 5 employees: name, base_salary, bonus.
9. Create a new column `total_salary` = base_salary + bonus using `apply`.
10. Create another column `taxed_salary` = total_salary * 0.8 using `lambda`.

In [3]:
data = {
    "name": ["Alice", "Bob", "Charlie", "Diana", "Ethan"],
    "base_salary": [50000, 60000, 55000, 70000, 65000],
    "bonus": [5000, 7000, 6000, 8000, 7500]
}
df = pd.DataFrame(data)
df['total_salary'] = df.apply(lambda x: x['base_salary'] + x['bonus'], axis=1)
df['taxed_salary'] = df.apply(lambda x: x['total_salary'] * 0.8, axis=1)

## Section 4: Rolling & Expanding
11. Create a time series of daily stock prices (30 days).
12. Calculate 7-day rolling mean.
13. Calculate cumulative sum using `expanding`.

In [4]:
import numpy as np

dates = pd.date_range(start='2025-03-1', periods=30, freq='D')
rng = np.round(100 + np.random.default_rng(42).normal(loc=0,scale=1,size=30).cumsum(), 2)
time_series = pd.Series(rng, index=dates)
rolling_7day = time_series.rolling(window=7, min_periods=1).mean()
cum_sum = time_series.expanding().sum()
cum_sum

2025-03-01     100.30
2025-03-02     199.56
2025-03-03     299.58
2025-03-04     400.54
2025-03-05     499.54
2025-03-06     597.24
2025-03-07     695.07
2025-03-08     792.58
2025-03-09     890.08
2025-03-10     986.72
2025-03-11    1084.24
2025-03-12    1182.54
2025-03-13    1280.91
2025-03-14    1380.40
2025-03-15    1480.36
2025-03-16    1579.46
2025-03-17    1678.93
2025-03-18    1777.44
2025-03-19    1876.83
2025-03-20    1976.17
2025-03-21    2075.33
2025-03-22    2173.81
2025-03-23    2273.51
2025-03-24    2373.05
2025-03-25    2472.17
2025-03-26    2570.93
2025-03-27    2670.23
2025-03-28    2769.89
2025-03-29    2869.96
2025-03-30    2970.46
Freq: D, dtype: float64

## Section 5: Integration with Files
14. Load `sales.csv` and `products.csv`.
15. Merge them on `ProductID`.
16. Save the merged DataFrame into `report.csv`.

In [7]:
sales = pd.read_csv('csv/sales1.csv')
products = pd.read_csv('csv/products.csv')
merged = pd.merge(sales, products, on='ProductID', how='left')
merged.to_csv('report.csv')

## Section 6: Portfolio Challenge
17. Write a function `advanced_report(df)` that returns:
- top-5 products by revenue
- monthly revenue trend (grouped by month)
- rolling 3-month average revenue
18. Apply it to `sales.csv`.

In [6]:
def advanced_report(filename: str)->dict:
    df = pd.read_csv(filename)
    if not pd.api.types.is_datetime64_any_dtype(df['date']):
        df['date'] = pd.to_datetime(df['date'])
    top5 = (
        df.groupby('product')['revenue']
        .sum()
        .nlargest(5)
    )
    
    monthly_trend = (
        df.groupby(df['date'].dt.to_period('M'))['revenue']
          .sum()
          .to_timestamp()
    )

    rolling_avg = monthly_trend.rolling(window=3, min_periods=1).mean()
    return {
        "Top 5 products by revenue": top5,
        "Monthly revenue trend": monthly_trend,
        "Rolling 3-month average revenue": rolling_avg
    }
advanced_report('csv/sales.csv')

{'Top 5 products by revenue': product
 Product E    10579.0
 Product A     9250.0
 Product D     9128.0
 Product B     8749.0
 Product C     6487.0
 Name: revenue, dtype: float64,
 'Monthly revenue trend': date
 2023-01-01    23786.0
 2023-02-01    19140.0
 2023-03-01     1267.0
 Freq: MS, Name: revenue, dtype: float64,
 'Rolling 3-month average revenue': date
 2023-01-01    23786.0
 2023-02-01    21463.0
 2023-03-01    14731.0
 Freq: MS, Name: revenue, dtype: float64}