## ETF Price Collection

### Objective
Download 20+ years of daily ETF data for IVV, IEF, GLD from Yahoo Finance.

### Parameters
- Period: 2005-01-01 to 2025-10-01
- Frequency: Daily
- Adjustments: Auto-adjusted for corporate actions

### Output
Raw OHLCV data saved to `../data/sample_data.csv`

*Methodology: See ../docs/methodology.md*

## 1. Downloading necessary libraries

Data will be downloaded from Yahoo Finance. We will specify the price to be adjusted by splits and dividends.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf

## 2. Downloading historical data

### Parameters 

We choose 3 ETFs to study and analyze. These are IVV, IEF, GLD. Choosing a daily interval to extract data in a time span of 20 years will give us plenty observations, ranging from 2005-01-01 to 2025-10-1. 

In [2]:
tickers = ["IVV", "IEF", "GLD"]
start = "2005-01-01"
end = "2025-10-01"   
interval = "1d"      # daily data 

### Downloading and adjusting

For the first step, we will download the price data, specifying this data to be adjusted. This is done to avoid ignoring dividend income, what would give misleading performance. Let's also observe the structure of the data.

In [3]:
# Download data (auto_adjust=True applies splits/dividends a OHLC [Open | Highest | Lowest | Closed]) ---
data = yf.download(tickers, start=start, end=end, interval=interval, group_by='ticker', auto_adjust=True, progress=False)
data.to_csv('../data/sample_data.csv', index=True)
data.head()

Ticker,IVV,IVV,IVV,IVV,IVV,GLD,GLD,GLD,GLD,GLD,IEF,IEF,IEF,IEF,IEF
Price,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
2005-01-03,82.778687,82.914727,81.642777,81.935257,578300,42.98,43.169998,42.740002,43.02,4750400,48.752582,48.890369,48.700915,48.867405,323400
2005-01-04,82.078066,82.078066,80.642872,80.948959,845400,42.799999,42.91,42.459999,42.740002,3456800,48.86736,48.86736,48.557346,48.563084,1181000
2005-01-05,80.942175,81.200644,80.459244,80.459244,618400,42.75,42.880001,42.599998,42.669998,2033600,48.614795,48.723875,48.574608,48.643501,369000
2005-01-06,80.649701,81.119031,80.568077,80.785736,518500,42.48,42.560001,42.07,42.150002,2556400,48.591781,48.741048,48.591781,48.689377,389100
2005-01-07,80.996537,81.173382,80.486398,80.63604,583900,42.09,42.389999,41.700001,41.84,4492700,48.764046,48.798491,48.626259,48.649223,182400


This is a multi-level column DataFrame. Each ticker has its own group of columns.

These columns refer to Open, High, Low, Close and Volume. The first four items give us the values of the price at its opening, highest point, lowest point and closing. The volume is the number of shares that these stocks changed hands each moment.

## Data Collection Complete

### Summary
- **Data Source**: Yahoo Finance
- **Tickers**: IVV, IEF, GLD
- **Period**: 2005-01-01 to 2025-10-01
- **Frequency**: Daily
- **Adjustments**: Auto-adjusted for splits and dividends
- **File Saved**: `../data/sample_data.csv`

### What's next...

I plan to store the data in a SQL database for better scalability and querying.

Proceed to `02_eda_cleaning.ipynb` for exploratory data analysis, where we will:
1. Load and inspect the collected price data
2. Calculate daily returns (simple and logarithmic)
3. Visualize price evolution and cumulative returns
4. Analyze correlations and risk metrics

*Note: All subsequent analysis will use these adjusted closing prices to ensure accurate return calculations.*