## Project Description:

The problem is to systematically identify and model oil risk regimes so portfolio strategies can adapt to regime changes, improving risk-adjusted performance and downside protection.

### Milestone #1: Setting up folders structures and README.md files

### Milestone #2: Pull data required from yfinance

In [1]:
import os
import json
import time
import datetime
from datetime import datetime
from pathlib import Path
import pandas as pd
import bs4
from bs4 import BeautifulSoup
import requests
import pandas as pd
import sys
import yfinance

In [2]:
config_path = Path.cwd().parent / "src"
sys.path.append(str(config_path))

from config import *
from utils import *

## Tickers used:

**CL=F** - WTI Crude Oil Futures; spot/near-month oil price proxy <br>
**UUP** - Invesco DB US Dollar Index Bullish Fund; tracks USD vs major currencies <br>
**GLD** - SPDR Gold Shares; tracks gold price <br>
**^VIX** - CBOE Volatility Index; implied volatility on S&P 500 options <br>
**^OVX** - CBOE Crude Oil Volatility Index; oil volatility <br>
**SHY** - iShares 1–3 Year Treasury Bond ETF; short-term US Treasuries <br>
**IEF** - iShares 7–10 Year Treasury Bond ETF; intermediate-term US Treasuries <br>
**TIP** - iShares TIPS Bond ETF; US Treasury Inflation-Protected Securities <br>

In [6]:
#using ishares proxies for short-term/intermediate-term/TIPS due to data availabiltiy in yfinance

In [None]:
tickerlist = ['CL=F','UUP','GLD','^VIX','^OVX','SHY','IEF','TIP']
use_api = False
period = "5y"
interval = "1d"
import yfinance as yf

for ticker in tickerlist:
    df_raw_api = yf.download(ticker, period=period, interval=interval,auto_adjust=True).reset_index()[[('Date',''), ('Close',ticker)]]
    df_raw_api.columns = ['date','adjClose']
    df_raw_api['date'] = pd.to_datetime(df_raw_api['date'])
    df_raw_api['adjClose']=pd.to_numeric(df_raw_api['adjClose'])
    msg_dict = validate_df(df_raw_api,['date','adjClose'],{'date':'datetime64[ns]','adjClose':'float64'})
    print(msg_dict)

    fname = safe_filename(prefix="api", meta={"source": "fmp" if use_api else "yfinance", "symbol": ticker})
    raw_data_path = Path.cwd().parent / "data" / "raw"
    out_path = raw_data_path / fname
    df_raw_api.to_csv(out_path, index=False)
    print("Saved:", out_path)

[*********************100%***********************]  1 of 1 completed


{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-CL=F_20250822-234148.csv


HTTP Error 404: 
[*********************100%***********************]  1 of 1 completed

1 Failed download:
['CL=F1']: YFPricesMissingError('possibly delisted; no price data found  (period=5y) (Yahoo error = "No data found, symbol may be delisted")')


{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-CL=F1_20250822-234149.csv


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-UUP_20250822-234149.csv
{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-GLD_20250822-234150.csv


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-^VIX_20250822-234150.csv
{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-^OVX_20250822-234150.csv


[*********************100%***********************]  1 of 1 completed


{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-SHY_20250822-234150.csv


[*********************100%***********************]  1 of 1 completed


{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-IEF_20250822-234150.csv


[*********************100%***********************]  1 of 1 completed

{'na_total': 'Total NA values: 0'}
Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/project/data/raw/api_source-yfinance_symbol-TIP_20250822-234151.csv





### Milestone #4,5,6: Clean and store data

In [4]:
processed_data_path = Path.cwd().parent / "data" / "processed"
raw_data_path = Path.cwd().parent / "data" / "raw"
tickerlist = ['CL=F','UUP','GLD','^VIX','^OVX','SHY','IEF','TIP']
files = read_and_concat(tickerlist,str(raw_data_path),str(processed_data_path) )

In [None]:
missing_dates = check_missing_dates(files)
missing_dates = list(set(missing_dates))
for i in range(len(missing_dates)):
    missing_dates[i] = missing_dates[i].strftime("%Y-%m-%d")

In [24]:
end_date, start_date = year_date_interval_creator()
end_date = end_date.strftime("%Y-%m-%d")
start_date = start_date.strftime("%Y-%m-%d")
holidays = get_holidays(start_date,end_date)

DatetimeIndex(['2015-08-24', '2015-08-25', '2015-08-26', '2015-08-27',
               '2015-08-28', '2015-08-31', '2015-09-01', '2015-09-02',
               '2015-09-03', '2015-09-04',
               ...
               '2025-08-07', '2025-08-08', '2025-08-11', '2025-08-12',
               '2025-08-13', '2025-08-14', '2025-08-15', '2025-08-18',
               '2025-08-19', '2025-08-20'],
              dtype='datetime64[ns]', length=2513, freq=None)


In [25]:
count = 0
drop_dates = []
for date in missing_dates:
    if date in holidays:
        count+=1

if count==len(missing_dates):
    drop_dates = missing_dates

In [60]:
processed_df = concat_and_save(files,drop_dates)

In [64]:
processed_df

Unnamed: 0,Date,CL=F,UUP,GLD,^VIX,^OVX,SHY,IEF,TIP
0,2020-08-20,42.580002,22.342087,183.500000,22.719999,32.830002,77.537857,108.536964,104.350189
1,2020-08-21,42.340000,22.449070,182.029999,22.540001,33.680000,77.528900,108.617058,104.707382
2,2020-08-24,42.619999,22.484734,181.000000,22.370001,32.250000,77.519936,108.465836,104.915039
3,2020-08-25,43.349998,22.404493,181.220001,22.030001,33.029999,77.528900,108.207924,104.881805
4,2020-08-26,43.389999,22.377748,183.360001,23.270000,33.790001,77.528900,108.163422,105.089485
...,...,...,...,...,...,...,...,...,...
1251,2025-08-14,63.959999,27.430000,307.250000,14.830000,35.959999,82.699997,95.419998,110.250000
1252,2025-08-15,62.799999,27.340000,307.429993,15.090000,37.220001,82.669998,95.230003,110.040001
1253,2025-08-18,63.419998,27.410000,306.950012,14.990000,32.150002,82.669998,95.139999,109.919998
1254,2025-08-19,62.349998,27.480000,305.269989,15.570000,33.040001,82.709999,95.379997,110.029999


In [63]:
processed_df.to_csv(str(processed_data_path)+'/api_data_cleaned_merged.csv')

### Milestone #7,#8 - Outlier Analysis and EDA