In [1]:
import requests
import time
from datetime import datetime, timezone
import json
import os
from kafka import KafkaProducer

Defines parameters for fetching previous candles. Specifies the symbol to fetch, timeframe, amount of candles to fetch. Configures Kafka connection details and API endpoint. 

In [2]:
SYMBOLS = ["BTCUSDT"]
INTERVAL = "30m"
LIMIT = 21
KAFKA_TOPIC = "binance_kline"
KAFKA_BOOTSTRAP_SERVERS = "localhost:9092"
BINANCE_API_URL = "https://api.binance.com/api/v3/klines"

Retrieves candlestick data from Binance's public REST API. Sends a GET request with symbol, interval, and limit parameters to fetch the most recent candles. The API returns raw kline data as arrays, which are parsed and transformed into structured dictionaries containing OHLCV fields (open, high, low, close, volume), trading metrics (number_of_trades, taker volumes), and metadata (symbol, interval, ingestion timestamp). 

In [3]:
def fetch_historical_candles(symbol, interval="30m", limit=21):
    params = {
        "symbol": symbol,
        "interval": interval,
        "limit": limit
    }
    
    try:
        response = requests.get(BINANCE_API_URL, params=params)
        response.raise_for_status()
        
        data = response.json()
        candles = []
        
        for kline in data:
            candle = {
                "open_time": int(kline[0]), 
                "open": float(kline[1]),
                "high": float(kline[2]),
                "low": float(kline[3]),
                "close": float(kline[4]),
                "volume": float(kline[5]),
                "quote_asset_volume": float(kline[7]),
                "number_of_trades": int(kline[8]),
                "taker_buy_base_asset_volume": float(kline[9]),
                "taker_buy_quote_asset_volume": float(kline[10]),
                "symbol": symbol,
                "interval": interval,
                "ingested_at": datetime.now(timezone.utc).isoformat()
            }
            candles.append(candle)
        
        return candles
        
    except Exception as e:
        print(f"Error fetching candles for {symbol}: {e}")
        return []

Publishes  candles to the Kafka topic for consumption by the inference system. Iterates through the candle list and sends each as a  message to the configured topic. Prints confirmation for each published candle showing the symbol, timestamp, and closing price for verification.

In [4]:
def send_to_kafka(candles, producer):
    for candle in candles:
        producer.send(KAFKA_TOPIC, value=candle)
        print(f"Sent: {candle['symbol']} @ {candle['open_time']} | Close: {candle['close']:.2f}")

Saves fetched candles to a local JSON file as a backup.

In [5]:
def save_to_json(candles, filename):
    with open(filename, 'w') as f:
        json.dump(candles, f, indent=2)
    print(f"Saved to {filename}")

Execution of the bootstrap process supporting both Kafka publishing and JSON  storage. Initializes a Kafka producer and acknowledgment settings ('acks=all') for guaranteed delivery. If Kafka connection fails, falls back to JSON-only mode. Iterates through all configured symbols, fetching historical candles for each. Successfully retrieved data is stored in a dictionary and  published to Kafka and/or saved to local JSON files in the data/bootstrap/ dir. A delay between symbols is set prevents API rate limiting. After processing all symbols, flushes any pending Kafka messages and closes the producer connection to ensure all data is committed. Prints a summary showing how many symbols were successfully bootstrapped and the total number of candles retrieved.

In [6]:
use_kafka = True
save_json = True

producer = None
if use_kafka:
    try:
        producer = KafkaProducer(
            bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
            value_serializer=lambda v: json.dumps(v).encode('utf-8'),
            acks='all'
        )
        print("Connected to Kafka")
    except Exception as e:
        print(f"Kafka connection failed: {e}")
        print("Falling back to JSON-only mode")
        use_kafka = False

print()

all_candles = {}

for symbol in SYMBOLS:
    print(f"Fetching {symbol}")
    candles = fetch_historical_candles(symbol, INTERVAL, LIMIT)
    
    if candles:
        all_candles[symbol] = candles
        
        if use_kafka and producer:
            send_to_kafka(candles, producer)
        
        if save_json:
            os.makedirs("data/bootstrap", exist_ok=True)
            filename = f"data/bootstrap/{symbol}_{INTERVAL}_bootstrap.json"
            save_to_json(candles, filename)
        
        print()
        time.sleep(0.5)
    else:
        print(f"Failed to fetch candles")
        print()

if producer:
    producer.flush()
    producer.close()
    print("Producer closed")

print("Bootstrap complete")
print(f"Symbols bootstrapped: {len(all_candles)}/{len(SYMBOLS)}")
print(f"Total candles: {sum(len(c) for c in all_candles.values())}")

Kafka connection failed: NoBrokersAvailable
Falling back to JSON-only mode

Fetching BTCUSDT
Saved to data/bootstrap/BTCUSDT_30m_bootstrap.json

Bootstrap complete
Symbols bootstrapped: 1/1
Total candles: 21
