# 01 - Data Collection (OANDA)

**Objective:**  
Fetch historical forex data for multiple timeframes (Weekly, Daily, 4H, 1H)  
using the OANDA API for the following currency pairs:

- EUR/USD  
- USD/JPY  
- GBP/USD  
- USD/CHF  
- AUD/USD  
- USD/CAD  
- NZD/USD

**Tasks:**  
1. Connect to OANDA API  
2. Fetch historical OHLCV data for multiple timeframes  
3. Save raw data as CSV in `data/raw/`  
4. Verify data for use in feature engineering and ML


---

# Change working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspaces/forex-mtf-strategy-predictor/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/workspaces/forex-mtf-strategy-predictor'

## Import Libraries and Initialize OANDA API

**Objective:**  
In this step, we import all the Python libraries required for data collection and  
initialize the connection to the **OANDA API** using a secure `.env` file.

**Key Points:**
- We use `oandapyV20` to communicate with OANDA's REST API.
- API credentials (`OANDA_API_KEY`) are stored securely in a `.env` file.
- `python-dotenv` is used to load environment variables safely.
- Successful initialization will confirm we are ready to fetch historical forex data.


In [4]:
import os
import pandas as pd
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Retrieve OANDA API key
OANDA_API_KEY = os.getenv("OANDA_API_KEY")

# Verify that the key is loaded
print("OANDA_API_KEY loaded:", bool(OANDA_API_KEY))

OANDA_API_KEY loaded: True


---

# Define Currency Pairs, Timeframes, and Output Paths

**Objective:**  
In this step, we define the currency pairs that we will collect **1-hour OHLC data** for.  
Later, we will **resample this 1H data** to create 4H, Daily, and Weekly candles  
instead of fetching multiple timeframes from OANDA.

**Key Points:**
- Reduces API calls and storage space.
- Ensures all higher timeframe candles are generated consistently from 1H data.
- Raw CSV files will be saved in `data/raw/` for feature engineering.

In [5]:
# Currency pairs to collect from OANDA
PAIRS = [
    "EUR_USD", 
    "USD_JPY", 
    "GBP_USD", 
    "USD_CHF", 
    "AUD_USD", 
    "USD_CAD", 
    "NZD_USD"
]

# Number of candles ~5 year of 1-hour data
NUM_CANDLES = 43800
TIMEFRAME = "H1"

# Ensure raw data directory exists
import os
os.makedirs("data/raw", exist_ok=True)

print(f"Collecting {NUM_CANDLES} candles per pair ({TIMEFRAME})")
print("Pairs:", PAIRS)


Collecting 43800 candles per pair (H1)
Pairs: ['EUR_USD', 'USD_JPY', 'GBP_USD', 'USD_CHF', 'AUD_USD', 'USD_CAD', 'NZD_USD']


## Test Data Fetch for One Pair

Before fetching data for all 10 pairs,  
we will **test the `fetch_live_data()` function** for a single pair (EUR/USD):

- Fetch **1 year (~8,760) of 1-hour candles**  
- Preview the first few rows to confirm:
  - Columns: `timestamp, open, high, low, close, volume`  
  - Correct number of rows fetched


In [8]:
from src.fetch_data import fetch_live_data

# Fetch sample data
sample_df = fetch_live_data("EUR_USD", candles=NUM_CANDLES, timeframe=TIMEFRAME)

# Display first 5 rows
sample_df.head()


Unnamed: 0,timestamp,open,high,low,close,volume
0,2018-07-19 08:00:00+00:00,1.1608,1.16186,1.16041,1.16128,6352
1,2018-07-19 09:00:00+00:00,1.16133,1.1616,1.15951,1.16062,4379
2,2018-07-19 10:00:00+00:00,1.1606,1.16142,1.15954,1.15956,3570
3,2018-07-19 11:00:00+00:00,1.1596,1.16006,1.15858,1.16004,4001
4,2018-07-19 12:00:00+00:00,1.16006,1.1601,1.15748,1.15864,4080


---

## Fetch 1H Historical OHLC Data from OANDA

**Objective:**  
Fetch 1-hour historical OHLC data for our selected currency pairs from OANDA.  
We will fetch **multiple years of data** by paginating requests because OANDA  
limits the number of candles per API call (max 5000).

**Key Points:**
- We use the `instruments.InstrumentsCandles` endpoint.
- Data is fetched in batches (pagination) until we reach our desired start date.
- Data is saved as CSV in `data/raw/` for each currency pair.
- Later, we will **resample** 1H data into 4H, Daily, and Weekly for multi-timeframe analysis.


In [9]:
import time

failed_pairs = []

for pair in PAIRS:
    print(f"Fetching data for {pair} ...")
    
    try:
        df = fetch_live_data(pair, candles=NUM_CANDLES, timeframe=TIMEFRAME)
        
        if not df.empty:
            save_path = f"data/raw/{pair}_1H.csv"
            df.to_csv(save_path, index=False)
            print(f"Saved {len(df)} rows to {save_path}\n")
        else:
            print(f"No data fetched for {pair}\n")
            failed_pairs.append(pair)
            
    except Exception as e:
        print(f"Error fetching {pair}: {e}\n")
        failed_pairs.append(pair)
    
    # Pause to avoid hitting API rate limits
    time.sleep(3)

print("\n--- Bulk Fetch Completed ---")
if failed_pairs:
    print("⚠ The following pairs failed and need retrying:", failed_pairs)
else:
    print("All pairs fetched successfully!")


Fetching data for EUR_USD ...
Saved 43800 rows to data/raw/EUR_USD_1H.csv

Fetching data for USD_JPY ...
Saved 43800 rows to data/raw/USD_JPY_1H.csv

Fetching data for GBP_USD ...
Saved 43800 rows to data/raw/GBP_USD_1H.csv

Fetching data for USD_CHF ...
Saved 43800 rows to data/raw/USD_CHF_1H.csv

Fetching data for AUD_USD ...
Saved 43800 rows to data/raw/AUD_USD_1H.csv

Fetching data for USD_CAD ...
Saved 43800 rows to data/raw/USD_CAD_1H.csv

Fetching data for NZD_USD ...
Saved 43800 rows to data/raw/NZD_USD_1H.csv


--- Bulk Fetch Completed ---
All pairs fetched successfully!


---

# Push files to Repo

### 1. Check current git status

In [10]:
!git status


On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mdeleted:    data/raw/AUD_USD_forex_data.csv[m
	[31mdeleted:    data/raw/EUR_USD_forex_data.csv[m
	[31mdeleted:    data/raw/GBP_USD_forex_data.csv[m
	[31mdeleted:    data/raw/NZD_USD_forex_data.csv[m
	[31mdeleted:    data/raw/USD_CAD_forex_data.csv[m
	[31mdeleted:    data/raw/USD_CHF_forex_data.csv[m
	[31mdeleted:    data/raw/USD_JPY_forex_data.csv[m
	[31mmodified:   jupyter_notebooks/01_data_collection.ipynb[m
	[31mmodified:   jupyter_notebooks/02_data_cleaning.ipynb[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mdata/raw/AUD_USD_1H.csv[m
	[31mdata/raw/EUR_USD_1H.csv[m
	[31mdata/raw/GBP_USD_1H.csv[m
	[31mdata/raw/NZD_USD_1H.csv[m
	[31mdata/raw/USD_CAD_1H.csv[m
	[31mdata/raw/USD_CHF_1H.

### 2. Stage all new/updated files

In [None]:
!git add .

### 3. Commit with a descriptive message

In [None]:
!git commit -m "Add raw forex OHLCV data for 7 pairs: Date_collection notebook ran and function coded in fetch_data.py"