# 📊 Day 7: Pivot Tables & Filling Missing Data in Pandas

Welcome to Day 7 of our 45-day Data Science with AI Challenge! 🧠

Today we learned how to:
1. 🔄 Transform a time-series dataset using **pivot tables**
2. 🧼 Handle missing data using **forward-fill** and **backward-fill**

Let’s walk through the steps using a real dataset of financial trading info!


In [4]:
# Load another dataset
import pandas as pd
df = pd.read_csv("42 - bank-data.csv")
df.head()


Unnamed: 0,datetime,symbol,average_traded_price,volume_traded
0,2023-09-28 09:15:00,FEDERALBNK,150.73,166515.0
1,2023-09-28 09:16:00,FEDERALBNK,150.68,324106.0
2,2023-09-28 09:17:00,FEDERALBNK,150.7,498940.0
3,2023-09-28 09:18:00,FEDERALBNK,150.72,719863.0
4,2023-09-28 09:19:00,FEDERALBNK,150.8,853283.0


In [6]:
# Drop duplicate records for same datetime and symbol
df = df.drop_duplicates(subset=['datetime', 'symbol'])

# Convert 'datetime' column to actual datetime format
df['datetime'] = pd.to_datetime(df['datetime'])

# Optional: reset index if needed
df.reset_index(drop=True, inplace=True)

df.head()


Unnamed: 0,datetime,symbol,average_traded_price,volume_traded
0,2023-09-28 09:15:00,FEDERALBNK,150.73,166515.0
1,2023-09-28 09:16:00,FEDERALBNK,150.68,324106.0
2,2023-09-28 09:17:00,FEDERALBNK,150.7,498940.0
3,2023-09-28 09:18:00,FEDERALBNK,150.72,719863.0
4,2023-09-28 09:19:00,FEDERALBNK,150.8,853283.0


In [8]:
# Pivot: reshape the dataframe so that symbols become columns
pivot_df = df.pivot(index='datetime', columns='symbol', values='volume_traded')

# View the reshaped data
pivot_df.head()


symbol,AUBANK,BANDHANBNK,BANKBARODA,BANKNIFTY Future,FEDERALBNK,HDFCBANK,ICICIBANK,IDFCFIRSTB,INDUSINDBK,KOTAKBANK,PNB,SBIN
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2023-09-28 09:15:00,29171.0,388187.0,164608.0,10005.0,166515.0,497836.0,241150.0,615592.0,48259.0,55410.0,1059337.0,119776.0
2023-09-28 09:16:00,62263.0,1075138.0,350940.0,18525.0,324106.0,743844.0,421184.0,995890.0,87952.0,108050.0,1698640.0,274082.0
2023-09-28 09:17:00,99136.0,1464294.0,561062.0,23655.0,498940.0,956945.0,661807.0,1270947.0,143595.0,159414.0,2288335.0,433418.0
2023-09-28 09:18:00,122503.0,1772170.0,709995.0,32640.0,719863.0,1070935.0,824429.0,1733513.0,182321.0,198169.0,3534604.0,578056.0
2023-09-28 09:19:00,142768.0,2059036.0,813341.0,41850.0,853283.0,1186313.0,878034.0,1945801.0,195894.0,215367.0,4208777.0,677000.0


In [10]:
# Adjust Columns for Readability
for col in pivot_df.columns:
    print(col)

AUBANK
BANDHANBNK
BANKBARODA
BANKNIFTY Future
FEDERALBNK
HDFCBANK
ICICIBANK
IDFCFIRSTB
INDUSINDBK
KOTAKBANK
PNB
SBIN


In [12]:
# Fill missing values with the most recent valid value (forward fill)
pivot_df.ffill(inplace=True)

# Fill any remaining NaNs using next available value (backward fill)
pivot_df.bfill(inplace=True)

# Now the dataset is clean and complete
pivot_df.head()


symbol,AUBANK,BANDHANBNK,BANKBARODA,BANKNIFTY Future,FEDERALBNK,HDFCBANK,ICICIBANK,IDFCFIRSTB,INDUSINDBK,KOTAKBANK,PNB,SBIN
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2023-09-28 09:15:00,29171.0,388187.0,164608.0,10005.0,166515.0,497836.0,241150.0,615592.0,48259.0,55410.0,1059337.0,119776.0
2023-09-28 09:16:00,62263.0,1075138.0,350940.0,18525.0,324106.0,743844.0,421184.0,995890.0,87952.0,108050.0,1698640.0,274082.0
2023-09-28 09:17:00,99136.0,1464294.0,561062.0,23655.0,498940.0,956945.0,661807.0,1270947.0,143595.0,159414.0,2288335.0,433418.0
2023-09-28 09:18:00,122503.0,1772170.0,709995.0,32640.0,719863.0,1070935.0,824429.0,1733513.0,182321.0,198169.0,3534604.0,578056.0
2023-09-28 09:19:00,142768.0,2059036.0,813341.0,41850.0,853283.0,1186313.0,878034.0,1945801.0,195894.0,215367.0,4208777.0,677000.0


🎉 That’s a wrap for Day 7!

Coming up: Visualizing this time-series data and learning how to merge different datasets.

🚀 Share your progress using #45DaysOfDataScience and keep growing!
