# Notebook : Chargement des ETF dans PostgreSQL
Ce notebook parcourt tous les fichiers CSV d'ETF dans `stocks/`, met à jour la table `stocks` (en copiant le ticker dans `shortName`) et insère les données historiques dans `stock_prices`.

In [4]:
#!/usr/bin/env python3
"""
Notebook : etf_loader_all.ipynb
Parcourt tous les CSV d'ETF dans stocks/,
charge le JSON historique et insère dans stocks et stock_prices.
Copie le ticker dans shortName pour ces ETF.
Filtre les volumes hors portée BIGINT.
Dépendances : pip install pandas psycopg2-binary
"""

import os
import glob
import json
import pandas as pd
import psycopg2
from psycopg2.extras import execute_values

# Configuration DB
DB_HOST     = os.getenv("DB_HOST", "db")
DB_PORT     = os.getenv("DB_PORT", "5432")
DB_NAME     = os.getenv("DB_NAME", "stocks_db")
DB_USER     = os.getenv("DB_USER", "postgres")
DB_PASSWORD = os.getenv("DB_PASSWORD", "postgres")

MAX_BIGINT = 2**63 - 1

# Connexion
conn = psycopg2.connect(host=DB_HOST, port=DB_PORT, dbname=DB_NAME, user=DB_USER, password=DB_PASSWORD)
conn.autocommit = True
cur = conn.cursor()

# Fichiers CSV
csv_paths = glob.glob("stocks/etf_historical_data_*.csv")
print("Fichiers trouvés:", csv_paths)

all_records = []
all_tickers = set()

for path in csv_paths:
    print("Traitement de", path)
    df = pd.read_csv(path)
    for _, row in df.iterrows():
        ticker = row['Ticker']
        all_tickers.add(ticker)
        hist = json.loads(row['Historical'])
        dates = [e['Date_'][:10] for e in hist]
        start_date = min(dates)
        end_date   = max(dates)
        # Upsert dans stocks en copiant ticker dans shortName
        cur.execute(
            """INSERT INTO stocks (ticker, shortName, historical_start, historical_end)
               VALUES (%s, %s, %s, %s)
               ON CONFLICT (ticker) DO UPDATE
                 SET shortName = EXCLUDED.shortName,
                     historical_start = LEAST(stocks.historical_start, EXCLUDED.historical_start),
                     historical_end   = GREATEST(stocks.historical_end,   EXCLUDED.historical_end);
            """, (ticker, f"(ETF) {ticker}", start_date, end_date)
        )
        for e in hist:
            date_ = e['Date_'][:10]
            open_ = e.get(f'Open_{ticker}', e.get('Open_BASE.TO'))
            high  = e.get(f'High_{ticker}', e.get('High_BASE.TO'))
            low   = e.get(f'Low_{ticker}',  e.get('Low_BASE.TO'))
            close = e.get(f'Close_{ticker}',e.get('Close_BASE.TO'))
            vol   = e.get(f'Volume_{ticker}',e.get('Volume_BASE.TO'))
            try:
                vol_i = int(vol) if vol is not None else None
                if vol_i and vol_i > MAX_BIGINT:
                    vol_i = None
            except:
                vol_i = None
            all_records.append((ticker, date_, open_, high, low, close, vol_i))

print(f"Tickers uniques: {len(all_tickers)}, enregistrements: {len(all_records)}")

# Bulk insert
insert_q = """INSERT INTO stock_prices (ticker, date, open, high, low, close, volume)
               VALUES %s ON CONFLICT DO NOTHING;"""
execute_values(cur, insert_q, all_records)
print("Insertion terminée")

# Vérification
cur.execute("SELECT COUNT(*) FROM stock_prices WHERE ticker IN %s;", (tuple(all_tickers),))
print("Total lignes:", cur.fetchone()[0])

cur.close()
conn.close()
print("Import complet")

Fichiers trouvés: ['stocks/etf_historical_data_20250320.csv']
Traitement de stocks/etf_historical_data_20250320.csv
Tickers uniques: 229, enregistrements: 291279
Insertion terminée
Total lignes: 713458
Import complet
