### 01-Data acquisition
- Conectar a una API (Binance) y descargar los trades históricos (btc/usdt)
- Conceptos: 
    - Diferencias entre datos OHLCV vs. raw trades.
    - Consideraciones de tiempo (UTC vs local) 
- Output: dataset en formato CSV, con columnas míminas (timestamp, price, quantity)

In [1]:
# librerias 

import pandas as pd
import requests 
import time 
from datetime import datetime, timedelta
from coinbase.rest import RESTClient

In [2]:
symbol = "BTCUSDT"          # Par a descargar
interval = "1m"             # Intervalo (1m, 5m, 15m, 1h, 1d, etc.)
limit = 1000                # Máximo por request (Binance API limit)
start_date = "2024-12-01"   # Fecha inicial
end_date = "2025-01-01"     # Fecha final

In [3]:
# Convertir fechas a milisegundos
def date_to_millis(date_str):
    return int(datetime.strptime(date_str, "%Y-%m-%d").timestamp() * 1000)

start_ts = date_to_millis(start_date)
end_ts = date_to_millis(end_date)

In [5]:
# función para descargar las velas 

def get_klines(symbol, interval, start_ts, end_ts, limit=1000):
    url = "https://api.binance.com/api/v3/klines"
    all_klines = []
    
    while start_ts < end_ts:
        params = {
            "symbol": symbol,
            "interval": interval,
            "startTime": start_ts,
            "endTime": end_ts,
            "limit": limit
        }
        
        response = requests.get(url, params=params)
        data = response.json()
        
        if not data:
            break
        
        all_klines.extend(data)
        
        # Actualizar el start_ts para la siguiente iteración
        start_ts = data[-1][0] + 1  # +1 para evitar duplicados
        
        # Esperar para respetar el rate limit
        time.sleep(0.5)
    
    return all_klines

In [7]:
# descargar y conversión a dataframe
print("Descargando datos...")

data = get_klines(symbol, interval, start_ts, end_ts, limit)
df = pd.DataFrame(data, columns=[
    "open_time", "open", "high", "low", "close", "volume",
    "close_time", "quote_volume", "trades",
    "taker_buy_base", "taker_buy_quote", "ignore"
])

# conversión de tipos 
df["open_time"] = pd.to_datetime(df["open_time"], unit='ms')
df["close_time"] = pd.to_datetime(df["close_time"], unit='ms')
numeric_cols = ["open", "high", "low", "close", "volume"]
df[numeric_cols] = df[numeric_cols].astype(float)

print("Datos descargados y procesados.")
print(df.head())

# guardar a CSV
# definir directorio
output_dir = "../data/raw/"
output_file = f"{output_dir}{symbol}_{interval}_{start_date}_to_{end_date}.csv"
df.to_csv(output_file, index=False)



Descargando datos...
Datos descargados y procesados.
            open_time      open      high       low     close    volume  \
0 2024-12-01 05:00:00  96515.51  96515.51  96473.18  96473.18   8.03059   
1 2024-12-01 05:01:00  96473.19  96473.19  96464.16  96464.97   2.51941   
2 2024-12-01 05:02:00  96464.97  96509.99  96425.70  96509.99  40.57111   
3 2024-12-01 05:03:00  96509.99  96510.00  96476.00  96480.00   6.33996   
4 2024-12-01 05:04:00  96480.01  96480.01  96472.00  96472.00   2.02027   

               close_time      quote_volume  trades taker_buy_base  \
0 2024-12-01 05:00:59.999   774973.50276870    1842     2.96502000   
1 2024-12-01 05:01:59.999   243038.60539120    1044     0.82286000   
2 2024-12-01 05:02:59.999  3913119.03978170    4030     9.65762000   
3 2024-12-01 05:03:59.999   611806.98451080    2520     0.81025000   
4 2024-12-01 05:04:59.999   194911.89910400     603     0.10671000   

   taker_buy_quote ignore  
0  286150.35448990      0  
1   79377.04405080 