# Reshaping

We've now seen some of the benefits of panel data, and how we can take advantage of pandas to manipulate it and get some insights. Sometimes though we'll need to reshape our data to work with it more easily.

In [28]:
import pandas as pd
import numpy as np

# Load the data
df = pd.read_csv("data/sp500_q1_2025.csv")

# Convert 'datadate' to datetime
df.DlyCalDt = pd.to_datetime(df.DlyCalDt)

# Data cleaning as we did previously
df.dropna(inplace=True)

print("Missing data after cleaning", df.isnull().sum().sum())

# Handling duplicates
print("Checking for duplicates, which we forgot to do previously!", df.duplicated().sum())

df.drop_duplicates(inplace=True)


Missing data after cleaning 0
Checking for duplicates, which we forgot to do previously! 2


## Pivot

`pivot` helps us reshape *long* panel data into a *wide* data frame. We can use it to have each stock in a separate column and dates in the rows. We can only have one column substituted in as the values of the data frame, so choose carefully.

In [29]:
df.shape

#to make pivot, we can control row and colum, this will make each column to be the name of Securities

pivot_df = df.pivot(index="DlyCalDt", columns="SecurityNm", values="DlyClose")
pivot_df

SecurityNm,3M CO; COM NONE; CONS,A E S CORP; COM NONE; CONS,A P A CORP; COM NONE; CONS,A T & T INC; COM NONE; CONS,ABBOTT LABORATORIES; COM NONE; CONS,ABBVIE INC; COM NONE; CONS,ACCENTURE PLC IRELAND; COM A; CONS,ADOBE INC; COM NONE; CONS,ADVANCED MICRO DEVICES INC; COM NONE; CONS,AFLAC INC; COM NONE; CONS,...,WILLIAMS COS; COM NONE; CONS,WILLIS TOWERS WATSON PUB LTD CO; COM NONE; CONS,WYNN RESORTS LTD; COM NONE; CONS,X C E L ENERGY INC; COM NONE; CONS,XYLEM INC; COM NONE; CONS,YUM BRANDS INC; COM NONE; CONS,ZEBRA TECHNOLOGIES CORP; COM A; CONS,ZIMMER BIOMET HOLDINGS INC; COM NONE; CONS,ZIONS BANCORPORATION N A; COM NONE; CONS,ZOETIS INC; COM A; CONS
DlyCalDt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-01-02,129.7,13.05,23.38,22.83,113.44,179.44,348.82,441.0,120.63,102.36,...,55.88,309.27,83.8,66.86,115.95,133.56,383.76,104.46,54.07,162.61
2025-01-03,129.87,13.23,23.42,22.67,113.83,181.22,353.85,430.57,125.37,103.16,...,56.6,309.04,83.32,66.69,117.18,133.44,391.86,104.47,54.89,163.31
2025-01-06,130.29,13.01,23.66,22.6,113.04,180.1,351.33,431.18,129.55,101.79,...,55.81,306.61,84.61,65.14,116.84,130.17,395.33,103.56,55.02,165.9
2025-01-07,132.77,12.95,23.66,22.2,113.4,179.53,356.39,422.63,127.33,102.84,...,55.55,309.0,82.45,65.64,115.78,128.55,396.11,103.13,55.4,163.49
2025-01-08,134.53,12.4,23.68,22.18,114.25,178.5,357.73,419.58,121.84,103.57,...,56.39,314.15,81.17,66.39,115.91,127.74,397.0,101.54,54.87,165.02
2025-01-10,131.21,12.02,23.68,21.69,112.31,175.17,349.79,405.92,116.04,100.99,...,55.52,312.92,81.15,63.37,114.02,123.25,385.54,104.54,53.34,163.32
2025-01-13,134.6,11.69,24.38,21.56,113.19,176.74,349.14,408.5,117.32,102.49,...,56.03,311.26,81.33,63.62,115.64,123.73,381.69,105.49,54.35,166.32
2025-01-14,137.21,11.85,24.85,21.8,113.02,175.55,348.99,412.71,116.09,103.65,...,58.06,313.24,81.8,64.08,115.42,124.71,394.4,104.71,56.48,164.41
2025-01-15,137.78,11.77,25.45,21.91,111.1,171.35,349.73,417.28,119.96,104.79,...,58.01,313.35,82.41,64.94,116.09,125.77,400.25,106.51,58.2,167.17
2025-01-16,139.18,12.0,25.15,22.02,113.91,173.7,350.56,426.93,118.44,105.93,...,59.18,322.12,82.5,66.2,118.95,126.24,402.72,108.94,57.28,169.37


In [30]:
pivot_df.isnull().sum().sum()

missing = pivot_df.isnull().sum()
missing[missing > 0]    # this to get the number of missing for each securities.



SecurityNm
BIO RAD LABORATORIES INC; COM B; CONS              51
HARTFORD FINANCIAL SVCS GRP INC; COM NONE; CONS    30
HARTFORD INSURANCE GROUP INC; COM NONE; CONS       30
MCCORMICK & CO INC; COM V; CONS                     1
MOLSON COORS BEVERAGE CO; COM A; CONS              41
dtype: int64

In [31]:

pivot_df["MCCORMICK & CO INC; COM V; CONS"] = pivot_df["MCCORMICK & CO INC; COM V; CONS"].ffill()

In [32]:
#if the majority of data is missing we can drop

pivot_df.dropna(axis=1, inplace=True)

missing = pivot_df.isnull().sum()
missing[missing > 0] 


# Data cleaning!!

Series([], dtype: int64)

We generally favour returns over close prices, as they give us a better picture of relative performance. Because our data frame is only holding close prices, it is straightforward to calculate returns.

In [33]:
pivot_df.pct_change().mean()

SecurityNm
3M CO; COM NONE; CONS                         0.002242
A E S CORP; COM NONE; CONS                   -0.000436
A P A CORP; COM NONE; CONS                   -0.001491
A T & T INC; COM NONE; CONS                   0.003761
ABBOTT LABORATORIES; COM NONE; CONS           0.002774
                                                ...   
YUM BRANDS INC; COM NONE; CONS                0.002935
ZEBRA TECHNOLOGIES CORP; COM A; CONS         -0.004937
ZIMMER BIOMET HOLDINGS INC; COM NONE; CONS    0.001476
ZIONS BANCORPORATION N A; COM NONE; CONS     -0.001206
ZOETIS INC; COM A; CONS                       0.000349
Length: 495, dtype: float64

The other really neat thing we can do with this kind of pivoted dataframe is visualise correlations with ease.

In [34]:
pivot_df.pct_change().corr()

SecurityNm,3M CO; COM NONE; CONS,A E S CORP; COM NONE; CONS,A P A CORP; COM NONE; CONS,A T & T INC; COM NONE; CONS,ABBOTT LABORATORIES; COM NONE; CONS,ABBVIE INC; COM NONE; CONS,ACCENTURE PLC IRELAND; COM A; CONS,ADOBE INC; COM NONE; CONS,ADVANCED MICRO DEVICES INC; COM NONE; CONS,AFLAC INC; COM NONE; CONS,...,WILLIAMS COS; COM NONE; CONS,WILLIS TOWERS WATSON PUB LTD CO; COM NONE; CONS,WYNN RESORTS LTD; COM NONE; CONS,X C E L ENERGY INC; COM NONE; CONS,XYLEM INC; COM NONE; CONS,YUM BRANDS INC; COM NONE; CONS,ZEBRA TECHNOLOGIES CORP; COM A; CONS,ZIMMER BIOMET HOLDINGS INC; COM NONE; CONS,ZIONS BANCORPORATION N A; COM NONE; CONS,ZOETIS INC; COM A; CONS
SecurityNm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3M CO; COM NONE; CONS,1.000000,0.180282,0.254152,0.222141,0.180551,0.072760,0.387254,0.400837,0.264305,0.482321,...,0.327632,0.284982,0.349187,0.397652,0.529672,0.131875,0.470294,-0.003160,0.605922,0.141672
A E S CORP; COM NONE; CONS,0.180282,1.000000,0.254845,0.111729,0.162663,0.264783,-0.206607,-0.115748,0.131804,0.180922,...,0.334818,0.311744,0.161886,0.232709,0.108615,0.285334,0.175097,0.237139,0.067202,0.105321
A P A CORP; COM NONE; CONS,0.254152,0.254845,1.000000,0.023604,-0.043167,-0.210312,0.197648,0.289440,0.403920,0.115249,...,0.219446,-0.116622,0.247081,0.015848,0.202551,-0.033687,0.206105,0.064111,0.300586,-0.045514
A T & T INC; COM NONE; CONS,0.222141,0.111729,0.023604,1.000000,0.467928,0.362349,0.144522,0.053712,-0.264258,0.422030,...,-0.218666,0.329998,0.082823,0.454072,0.251672,0.175006,-0.046017,0.355732,0.042783,0.346190
ABBOTT LABORATORIES; COM NONE; CONS,0.180551,0.162663,-0.043167,0.467928,1.000000,0.423982,0.289866,0.050508,-0.198583,0.393486,...,-0.007503,0.302433,0.039531,0.403744,0.258549,0.051586,0.018323,0.375393,-0.097497,0.450855
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
YUM BRANDS INC; COM NONE; CONS,0.131875,0.285334,-0.033687,0.175006,0.051586,0.228416,0.063716,0.220745,-0.014953,0.060094,...,-0.057951,0.365278,0.037668,0.193799,0.302251,1.000000,-0.134223,-0.240482,0.177993,0.159574
ZEBRA TECHNOLOGIES CORP; COM A; CONS,0.470294,0.175097,0.206105,-0.046017,0.018323,-0.018651,0.266727,0.310089,0.381817,0.222986,...,0.332930,0.109920,0.198070,0.002147,0.271877,-0.134223,1.000000,0.014689,0.565712,0.187822
ZIMMER BIOMET HOLDINGS INC; COM NONE; CONS,-0.003160,0.237139,0.064111,0.355732,0.375393,0.192786,0.108219,-0.224960,-0.141873,0.407832,...,-0.067440,0.237115,0.244753,0.228189,0.061666,-0.240482,0.014689,1.000000,-0.177197,0.340850
ZIONS BANCORPORATION N A; COM NONE; CONS,0.605922,0.067202,0.300586,0.042783,-0.097497,-0.146660,0.258166,0.343513,0.469090,0.404798,...,0.407183,0.086275,0.341135,0.118183,0.495908,0.177993,0.565712,-0.177197,1.000000,-0.078149


### Exercise: Trading Top Ten

Pivot our panel data, this time using trading volume `DlyVol` for values. Find the max trading volume for each stock and display the top 10.

In [35]:
df.shape

pivot_df_Vol = df.pivot(index="DlyCalDt", columns="SecurityNm", values="DlyVol")
pivot_df_Vol

pivot_df_Vol.max().nlargest(10)

SecurityNm
NVIDIA CORP; COM NONE; CONS                     808952921.0
PFIZER INC; COM NONE; CONS                      349996520.0
INTEL CORP; COM NONE; CONS                      281373758.0
FORD MOTOR CO DEL; COM NONE; CONS               240638631.0
TESLA INC; COM NONE; CONS                       188039279.0
ADVANCED MICRO DEVICES INC; COM NONE; CONS      109808255.0
AMERICAN AIRLINES GROUP INC; COM NONE; CONS     108649702.0
WALGREENS BOOTS ALLIANCE INC; COM NONE; CONS    102659034.0
APPLE INC; COM NONE; CONS                       100326344.0
HUNTINGTON BANCSHARES INC; COM NONE; CONS        99308133.0
dtype: float64

## Resample

The other kind of reshaping we can do is called *resampling*, which we use to change the frequency of our data. When we resample, we are generally expected to do some aggregation (but we don't have to). Let's resample our pivoted data to get the mean closing price for each month.

In [38]:
pivot_df.resample("ME").mean()

#resample just like to aggregate data!

pivot_df.resample("ME").asfreq().pct_change() # asfreq is to show the price at the end of each month, not aggregate 

SecurityNm,3M CO; COM NONE; CONS,A E S CORP; COM NONE; CONS,A P A CORP; COM NONE; CONS,A T & T INC; COM NONE; CONS,ABBOTT LABORATORIES; COM NONE; CONS,ABBVIE INC; COM NONE; CONS,ACCENTURE PLC IRELAND; COM A; CONS,ADOBE INC; COM NONE; CONS,ADVANCED MICRO DEVICES INC; COM NONE; CONS,AFLAC INC; COM NONE; CONS,...,WILLIAMS COS; COM NONE; CONS,WILLIS TOWERS WATSON PUB LTD CO; COM NONE; CONS,WYNN RESORTS LTD; COM NONE; CONS,X C E L ENERGY INC; COM NONE; CONS,XYLEM INC; COM NONE; CONS,YUM BRANDS INC; COM NONE; CONS,ZEBRA TECHNOLOGIES CORP; COM A; CONS,ZIMMER BIOMET HOLDINGS INC; COM NONE; CONS,ZIONS BANCORPORATION N A; COM NONE; CONS,ZOETIS INC; COM A; CONS
DlyCalDt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-01-31,,,,,,,,,,,...,,,,,,,,,,
2025-02-28,0.019185,0.053636,-0.056088,0.155078,0.078793,0.13665,-0.094688,0.002537,-0.138767,0.019464,...,0.049612,0.030601,0.02844,0.072917,0.055224,0.198238,-0.196178,-0.047132,-0.066021,-0.021416
2025-03-31,-0.053249,0.071613,0.015459,0.03174,-0.038838,0.002344,-0.10462,-0.125479,0.02884,0.015712,...,0.027157,-0.005005,-0.065159,-0.018169,-0.087325,0.006331,-0.103126,0.084931,-0.07735,-0.015487


We can use `resample()` to help us calculate returns for different periods. When we calculated daily returns we took the last price on the day and the last price on the day before.

For other periods we apply the same thinking. For monthly returns, for example, we take the last price of the month and the last price of the month before. We'll need `last()` to make it work.

There are many possibilities for resampling, here are a few:

- **W** - Weekly
- **D** - Daily (calendar days)
- **QE** - Quarterly (quarter end)
- **YE** - Annually

### Exercise: Losing Days

Resample your *trading volume* pivot df to calendar days. Do you need to do some cleaning? What do you propose?

In [37]:
## YOUR CODE GOES HERE