# ETL Pipeline: Download Historical Candle Data

## 📊 Overview
This notebook implements an ETL (Extract, Transform, Load) pipeline for downloading historical OHLCV (Open, High, Low, Close, Volume) candle data from cryptocurrency exchanges. It handles rate limiting, batch processing, and data caching for efficient data collection.

## 🎯 Objectives
1. **Data Extraction**: Download historical price data from exchange APIs
2. **Batch Processing**: Handle multiple trading pairs and timeframes efficiently
3. **Rate Limit Management**: Respect exchange API limits to avoid throttling
4. **Data Caching**: Store downloaded data locally for future use
5. **Visualization**: Preview downloaded data with interactive charts

## 📋 Prerequisites
- Exchange API access (e.g., Binance Perpetual) if you are in the US use OKX
- Network connection for API calls
- Sufficient disk space for data storage (~100MB per pair/interval)

## ⚠️ Important Notes on Rate Limiting
Currently, Hummingbot candles don't use a shared rate limit (this is planned for future updates). Therefore:
- **Batch Size**: Keep `BATCH_CANDLES_REQUEST` low (2-3) to avoid hitting limits
- **Sleep Time**: Increase `SLEEP_REQUEST` (10+ seconds) for safety
- **Monitor**: Watch for 429 (Too Many Requests) errors and adjust accordingly

## 📈 Expected Outputs
- Cached candle data in `app/data/cache/candles/`
- Multiple timeframe data (1m, 15m, etc.)
- Interactive candlestick charts for data validation

In [10]:
from core.data_sources.clob import CLOBDataSource
import warnings

warnings.filterwarnings("ignore")


# Main class to access central limit order book connectors
clob = CLOBDataSource()

# Candles config
CONNECTOR_NAME = "binance_perpetual"
INTERVALS = ["1m"]

DAYS = 360  # Number of days of historical data to download

# Rate limits config
BATCH_CANDLES_REQUEST = 1  # Number of trading pairs to request in each batch
SLEEP_REQUEST = 10  # Seconds to wait between batches

2025-09-10 18:22:40,252 - asyncio - ERROR - Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x379719640>


In [None]:
trading_rules = await clob.get_trading_rules(CONNECTOR_NAME)
trading_pairs = trading_rules.get_all_trading_pairs()
all_candles = {
    interval: await clob.get_candles_batch_last_days(CONNECTOR_NAME, trading_pairs, interval, DAYS, BATCH_CANDLES_REQUEST,
                                                  SLEEP_REQUEST) for interval in INTERVALS
}
clob.dump_candles_cache()

Batch 1/526
Start: 0, End: 1


2025-09-10 18:22:53,861 - asyncio - ERROR - Task was destroyed but it is pending!
task: <Task pending name='Task-738' coro=<safe_wrapper() running at /opt/anaconda3/envs/quants-lab/lib/python3.12/site-packages/hummingbot/core/utils/async_utils.py:9> wait_for=<Future pending cb=[Task.task_wakeup()]>>


In [None]:
from core.data_structures.candles import Candles

# Display info about downloaded candles
print(f"Downloaded data for {len(all_candles[INTERVALS[0]])} trading pairs")
print("\nTrading pairs downloaded:")
for i, candles_obj in enumerate(all_candles[INTERVALS[0]]):
    print(f"{i+1}. {candles_obj.trading_pair}: {len(candles_obj.data)} candles")

# Get the first trading pair as example (BTC-USDT)
candles: Candles = all_candles[INTERVALS[0]][0]
print(f"\nExample: Showing {candles.trading_pair} data")

In [None]:
candles.plot()