Now, we're going to import our cleaned data, and perform some calculations and store it again as processed data for future EDA

In [27]:
import pandas as pd
import os
import matplotlib.pyplot as plt

In [28]:
crude_df = pd.read_csv(r'../data/processed/crude_futures.csv')
gasoil_df = pd.read_csv(r'../data/processed/gasoil_futures.csv')

In [29]:
crude_df["Date"] = pd.to_datetime(crude_df["Date"])
gasoil_df["Date"] = pd.to_datetime(gasoil_df["Date"])

Now we want to compute the crackspread. Note that for this calculation, we need only the close daily prices. 

But before that we need to check that dates are aligned (thus dropping dates not present in both)

We will acheive this using 'merge'. The merging criterion is 'inner' since we want an intersection of dates. And we'd need to drop/only select Date and close of both

In [30]:
crude_df = crude_df[['Date', 'Close']].rename(columns={'Close': 'close_crude', 'Date': 'date'})
gasoil_df = gasoil_df[['Date', 'Close']].rename(columns={'Close': 'close_gasoil', 'Date':'date'})

In [31]:
crack_spread_df = crude_df.merge(gasoil_df, how='inner', on='date')
crack_spread_df

Unnamed: 0,date,close_crude,close_gasoil
0,2009-05-08,58.14,482.00
1,2009-05-11,57.48,481.50
2,2009-05-12,57.94,483.25
3,2009-05-13,58.12,483.75
4,2009-05-14,58.59,472.75
...,...,...,...
4172,2025-09-02,69.14,703.50
4173,2025-09-03,67.60,707.25
4174,2025-09-04,66.99,698.75
4175,2025-09-05,65.50,676.75


Now that we have a merged, cleaned dataframe, let's perform the crack spread calculation and save it

In [32]:
crack_spread_df['crack_spread'] = crack_spread_df['close_gasoil'] - crack_spread_df['close_crude']
crack_spread_df

Unnamed: 0,date,close_crude,close_gasoil,crack_spread
0,2009-05-08,58.14,482.00,423.86
1,2009-05-11,57.48,481.50,424.02
2,2009-05-12,57.94,483.25,425.31
3,2009-05-13,58.12,483.75,425.63
4,2009-05-14,58.59,472.75,414.16
...,...,...,...,...
4172,2025-09-02,69.14,703.50,634.36
4173,2025-09-03,67.60,707.25,639.65
4174,2025-09-04,66.99,698.75,631.76
4175,2025-09-05,65.50,676.75,611.25


Now save the data

In [33]:
crack_spread_df.to_csv(r'../data/processed/crack_spread.csv')