# Analying Oil Futures Market

The goal is to quantitative insights into crude oil price behavior and market dynamics, to support trading decisions and risk management.

In [None]:
#importing base packages
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns

0. Find relevant data and process it

1. Price & Volatility Analysis

- Analyze historical price trends using continuous futures data (daily OHLC)

- Calculate rolling volatility (e.g., 30-day, 90-day) and identify volatility regimes

- Identify seasonal patterns and correlations with key economic events

2. Roll Yield Approximation

- Estimate roll yield behavior using price changes in continuous futures

- Understand impact of contango/backwardation dynamics indirectly from data

3. Return Distribution & Risk Metrics

- Compute return distributions over different time scales (daily, weekly, monthly)

- Measure risk metrics like Value-at-Risk (VaR), Conditional VaR (CVaR)

4. Event Studies

- Analyze price/volatility reaction around major oil market events (OPEC meetings, geopolitical shocks)

5. Strategy Development & Backtesting

- Prototype simple trading or hedging strategies (momentum, mean reversion, volatility breakout) based on continuous futures data

- Backtest strategies and evaluate performance metrics (returns, Sharpe ratio, drawdown)

6. Reporting & Visualization

- Create dashboards or notebooks to visualize key insights (price, volatility, returns, event impacts)

- Summarize findings in clear, business-oriented reports with actionable recommendations

## Data Retrieving and Processing

In [None]:
#use a script created to fetch yfinance data

#get to the path

#check where the script function is located
import os
print(os.getcwd())
print(os.listdir('../../scripts'))

#get to the helper function
import sys
sys.path.append('../../scripts')
#import the function
from fetch_yf_data import fetch_data

#now we can call fetch_data to get data from yf API

c:\Users\dgalassi\commodity_lab\projects\oil_analysis
['.gitkeep', 'data_loader.py', 'fetch_yf_data.py', 'setup_db.py', 'update_commodities_data.py', 'upload_db.py', '__pycache__']


In [6]:
#define the ticker/tickers we want to extract. For now only oil futures (CL=F)

#inputs to the function
tickers = {'Oil':'CL=F'}
period = '20y' # 20 years of data
interval = '1d' #we want daily timeframe

data = fetch_data(tickers,period=period,interval=interval)

Fetching  data for Oil (CL=F) - Period: 20y, Interval: 1d


  data = yf.download(ticker, period=period, interval=interval)
[*********************100%***********************]  1 of 1 completed


In [29]:
#now we extracted the data from yf.
#the script created will extract and automatically format it into a nice table with colums

#some stuff so that the user can have an idea of what the data is about
column_names = []
for col in data.columns:
    column_names.append(col)

print('---------------- Main info about the data ----------------')

print(f'The dataframe has {data.shape[0]} rows and {data.shape[1]} columns ')
print(f'The dataframe spans from {data.date.iloc[0]} to {data.date.iloc[-1]} with timeframe of {interval[0]} day')
print(f'The dataframe contains the following columns:{column_names}')

print('----------------------------------------------------------')

print('You can visualize the first rows of the Dataframe...')
data.head()

---------------- Main info about the data ----------------
The dataframe has 5032 rows and 10 columns 
The dataframe spans from 2005-07-18 00:00:00 to 2025-07-18 00:00:00 with timeframe of 1 day
The dataframe contains the following columns:['date', 'open', 'high', 'low', 'close', 'volume', 'name', 'ticker', 'source', 'timeframe']
----------------------------------------------------------
You can visualize the first rows of the Dataframe...


Unnamed: 0,date,open,high,low,close,volume,name,ticker,source,timeframe
0,2005-07-18,58.0,58.98,56.799999,57.32,76097,Oil,CL=F,yfinance,1d
1,2005-07-19,57.25,57.77,56.799999,57.459999,85894,Oil,CL=F,yfinance,1d
2,2005-07-20,57.380001,58.299999,56.099998,56.720001,145901,Oil,CL=F,yfinance,1d
3,2005-07-21,58.150002,58.169998,56.5,57.130001,106235,Oil,CL=F,yfinance,1d
4,2005-07-22,57.27,58.700001,57.130001,58.650002,83067,Oil,CL=F,yfinance,1d


In [38]:
#now let's only keep what we need
print('------------ Reduce to necessary data ---------------')


df = data.drop(['ticker','source','timeframe'],axis=1)

print(f'Make sure there are no missing data ...')
print(f'{df.isna().sum()}')

df.head()

------------ Reduce to necessary data ---------------
Make sure there are no missing data ...
date      0
open      0
high      0
low       0
close     0
volume    0
name      0
dtype: int64


Unnamed: 0,date,open,high,low,close,volume,name
0,2005-07-18,58.0,58.98,56.799999,57.32,76097,Oil
1,2005-07-19,57.25,57.77,56.799999,57.459999,85894,Oil
2,2005-07-20,57.380001,58.299999,56.099998,56.720001,145901,Oil
3,2005-07-21,58.150002,58.169998,56.5,57.130001,106235,Oil
4,2005-07-22,57.27,58.700001,57.130001,58.650002,83067,Oil


## Price & Volatility Analysis

The main goal of this substage is to check the price and volatility evolution, the distribution of returns over different timeframes, and, last but not least, we are curious to see how much a random investor could profit by holding Oil as a part of their portfolio over different time horizons.

date      0
open      0
high      0
low       0
close     0
volume    0
name      0
dtype: int64