# 01 — Data Collection
This notebook collects raw cryptocurrency market data (Bitcoin & Ethereum OHLCV) and the Crypto Fear & Greed Index, then saves them as CSV files under `data/raw/` for downstream cleaning and EDA.

### Setup
Import the libraries we’ll use for downloading market data and calling the sentiment API.

In [3]:
import pandas as pd
import yfinance as yf
import requests
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Download Daily OHLCV for BTC & ETH
We pull daily OHLCV (open, high, low, close, volume) for Bitcoin and Ethereum over a consistent date range.

In [4]:
start_date = "2023-01-02"
end_date = "2025-12-16"

# Download daily OHLCV for each asset
# auto_adjust is set explicitly to avoid future default changes in yfinance
# progress=False keeps notebook output clean
btc = yf.download("BTC-USD", start=start_date, end=end_date, auto_adjust=False, progress=False)
eth = yf.download("ETH-USD", start=start_date, end=end_date, auto_adjust=False, progress=False)

# Preview
btc.head(), eth.head()

(Price          Adj Close         Close          High           Low  \
 Ticker           BTC-USD       BTC-USD       BTC-USD       BTC-USD   
 Date                                                                 
 2023-01-02  16688.470703  16688.470703  16759.343750  16572.228516   
 2023-01-03  16679.857422  16679.857422  16760.447266  16622.371094   
 2023-01-04  16863.238281  16863.238281  16964.585938  16667.763672   
 2023-01-05  16836.736328  16836.736328  16884.021484  16790.283203   
 2023-01-06  16951.968750  16951.968750  16991.994141  16716.421875   
 
 Price               Open       Volume  
 Ticker           BTC-USD      BTC-USD  
 Date                                   
 2023-01-02  16625.509766  12097775227  
 2023-01-03  16688.847656  13903079207  
 2023-01-04  16680.205078  18421743322  
 2023-01-05  16863.472656  13692758566  
 2023-01-06  16836.472656  14413662913  ,
 Price         Adj Close        Close         High          Low         Open  \
 Ticker          ETH-

### Save raw price data
Export the downloaded BTC/ETH data to CSV so the cleaning notebook can load it reproducibly.

In [5]:
# Save with a simple, clean header (Date + OHLCV)
def flatten_yf_columns(df: pd.DataFrame) -> pd.DataFrame:
    out = df.copy()
    if isinstance(out.columns, pd.MultiIndex):
        out.columns = out.columns.get_level_values(0)
    return out

btc_out = flatten_yf_columns(btc).reset_index()
eth_out = flatten_yf_columns(eth).reset_index()

btc_out.to_csv("../data/raw/btc_prices.csv", index=False)
eth_out.to_csv("../data/raw/eth_prices.csv", index=False)

## Fetch the Crypto Fear & Greed Index
This index summarizes overall crypto market sentiment on a 0–100 scale (lower = fear, higher = greed). We fetch the full history from the Alternative.me API.

In [6]:
# Get all available history from Alternative.me API
url = "https://api.alternative.me/fng/?limit=0&format=json"
resp = requests.get(url, timeout=20)
resp.raise_for_status()
payload = resp.json()

# Convert to DataFrame (API returns list under 'data')
fg = pd.DataFrame(payload["data"])

# Convert dtypes
fg["timestamp"] = pd.to_datetime(fg["timestamp"].astype(int), unit="s")
fg["value"] = fg["value"].astype(int)

# Set datetime index and sort
fg = fg.set_index("timestamp").sort_index()

fg.head()

Unnamed: 0_level_0,value,value_classification,time_until_update
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-02-01,30,Fear,
2018-02-02,15,Extreme Fear,
2018-02-03,40,Fear,
2018-02-04,24,Extreme Fear,
2018-02-05,11,Extreme Fear,


### Persist sentiment history
Save the full Fear & Greed history to `data/raw/` for reproducibility.

In [7]:
fg.to_csv("../data/raw/fear_greed_index.csv")
fg.shape

(2873, 3)

### Trim sentiment to the same date range
Limit the sentiment series to the same window as the downloaded price data.

In [8]:
start_date = btc.index.min()
end_date = btc.index.max()

fg = fg.loc[start_date:end_date]
fg.to_csv("../data/raw/fear_greed_index_trimmed.csv")
fg.shape

(1078, 3)