## Yahoo Prices for Your `latest_ticker` Universe (with rate-limit, retry, and checkpoint)

This block downloads **daily OHLCV** from Yahoo Finance (via `yfinance`) for every
unique `latest_ticker` in your universe, with **dot↔hyphen** symbol fallback,
simple **rate limiting**, **retries**, and **checkpointing** so a long run can be resumed.

### What it pulls
For each symbol (since `START_DATE`):
- `date`, `open`, `high`, `low`, `close`, `adj_close`, `volume`
- plus `ticker_latest` (the canonical ticker from your mapping step)

### Key behaviors
1. **Universe build**  

Ensures a clean, deduped list of symbols to fetch.

2. **Yahoo symbol normalization**  
- Primary: convert `.` to `-` for share classes (e.g., `BRK.B` → `BRK-B`).
- Fallback: if no data, try the **opposite** variant once (`-` ↔ `.`).

3. **Per-symbol fetch with retry**  
- `yf.Ticker(symbol).history(...)` with `auto_adjust=False`, `actions=False`.
- Up to `MAX_RETRY` attempts per symbol (exponential backoff).

4. **Rate limiting**  
- A simple rolling window cap (`CALLS_PER_MIN`) to avoid throttling.
- Pauses when the minute budget is reached; then resumes.

5. **Checkpointing (Parquet)**  
- Every `CHECKPOINT_EVERY` symbols, appends progress to
  `prices_checkpoint.parquet`.  
- On restart, previously completed tickers are **skipped** automatically.

6. **Tidy + dedupe**  
- Resets index to a flat table, renames columns, enforces datetime→date,
  sorts by (`ticker_latest`, `date`), and drops any duplicate day rows.

### Output
- **`prices_df`** with columns:
- `date` (Python `date`), `open`, `high`, `low`, `close`, `adj_close`, `volume`, `ticker_latest`

This is ready to:
- join to your S&P membership (on `ticker_latest` + `date` within membership ranges),  
- compute returns, rolling stats, risk metrics, etc.,  
- or bulk-load to PostgreSQL with a primary key on (`ticker_latest`, `date`).

### Usage
- Set `START_DATE` to match your analysis window.
- Build the `universe` from your enriched table (`df["latest_ticker"]`).
- Run:
```python
prices_df = fetch_prices_for_universe(
 universe,
 checkpoint_path="prices_checkpoint.parquet"  # optional but recommended
)


In [4]:
# !pip install sqlalchemy psycopg2-binary

import os
import pandas as pd
from sqlalchemy import create_engine

# Prefer env vars; fall back to your local creds
pg_user = os.getenv("PGUSER", "postgres")
pg_pass = os.getenv("PGPASSWORD", "CSDBMS623")
pg_host = os.getenv("PGHOST", "localhost")
pg_port = os.getenv("PGPORT", "5432")
pg_db   = os.getenv("PGDATABASE", "SP500_ML")

engine = create_engine(f"postgresql+psycopg2://{pg_user}:{pg_pass}@{pg_host}:{pg_port}/{pg_db}")

# 1) If you want the full enriched daily table:
profiles_df = pd.read_sql_query("""
    SELECT
        date,
        ticker           AS ticker_membership,
        latest_ticker,
        company_name,
        sector,
        currency,
        is_actively_trading,
        cik
    FROM sp500_long_latest_profiles
""", engine, parse_dates=["date"])

# 2) If you just need the symbol universe for prices (MUCH faster/lighter):
universe_df = pd.read_sql_query("""
    SELECT DISTINCT latest_ticker
    FROM sp500_long_latest_profiles
    WHERE latest_ticker IS NOT NULL
""", engine)

# (Optional) restrict to the most recent date only:
# universe_df = pd.read_sql_query("""
#     SELECT DISTINCT latest_ticker
#     FROM sp500_long_latest_profiles
#     WHERE latest_ticker IS NOT NULL
#       AND date = (SELECT MAX(date) FROM sp500_long_latest_profiles)
# """, engine)

universe = (universe_df["latest_ticker"]
            .dropna().map(lambda s: str(s).strip().upper())
            .drop_duplicates().tolist())

print("Universe size:", len(universe))

Universe size: 679


In [6]:
# !pip install sqlalchemy psycopg2-binary
import os, pandas as pd
from sqlalchemy import create_engine

pg_user = os.getenv("PGUSER", "postgres")
pg_pass = os.getenv("PGPASSWORD", "CSDBMS623")
pg_host = os.getenv("PGHOST", "localhost")
pg_port = os.getenv("PGPORT", "5432")
pg_db   = os.getenv("PGDATABASE", "SP500_ML")

engine = create_engine(f"postgresql+psycopg2://{pg_user}:{pg_pass}@{pg_host}:{pg_port}/{pg_db}")

# Use only the most recent membership date to define the current universe
universe = pd.read_sql_query("""
    SELECT DISTINCT UPPER(TRIM(latest_ticker)) AS latest_ticker
    FROM sp500_long_latest_profiles
    WHERE latest_ticker IS NOT NULL

""", engine)["latest_ticker"].tolist()

print("Universe size:", len(universe))

# Now run your Yahoo fetcher
#prices_df = fetch_prices_for_universe(universe, checkpoint_path="prices_checkpoint.parquet")

Universe size: 679


In [10]:
# ================== Yahoo prices for your latest_ticker universe (rate-limited) ==================
# pip install yfinance
import time, math, traceback
import pandas as pd
import yfinance as yf

START_DATE = "2013-09-07"   # same start you used for membership
MAX_RETRY  = 3
CALLS_PER_MIN = 290         # under Yahoo-ish caps
CHECKPOINT_EVERY = 100      # persist every N symbols to avoid rework

def to_yahoo(sym: str) -> str:
    s = str(sym).strip().upper()
    # Yahoo prefers hyphens for share classes (BRK-B, BF-B, etc.)
    return s.replace(".", "-")

def swap_variant(s: str) -> str:
    # Try the other style if the first call returns empty
    return s.replace("-", ".") if "-" in s else s.replace(".", "-")

def fetch_one(symbol: str):
    """
    Fetch one ticker (with dot/hyphen fallback).
    Returns a tidy DataFrame: date, open, high, low, close, adj_close, volume, ticker_latest
    or None if nothing available.
    """
    def _history(tkr):
        t = yf.Ticker(tkr)
        return t.history(start=START_DATE, end=None, interval="1d", auto_adjust=False, actions=False)

    # try primary
    hist = _history(symbol)
    if hist is None or hist.empty:
        # try variant once
        alt = swap_variant(symbol)
        if alt != symbol:
            hist = _history(alt)

    if hist is None or hist.empty:
        return None

    out = (
        hist.reset_index()
            .rename(columns={
                "Date":"date", "Open":"open", "High":"high", "Low":"low",
                "Close":"close", "Adj Close":"adj_close", "Volume":"volume"
            })
            [["date","open","high","low","close","adj_close","volume"]]
    )
    return out

def fetch_prices_for_universe(universe, checkpoint_path=None):
    """
    universe: iterable of your canonical symbols (e.g., df['latest_ticker'].unique()).
    checkpoint_path: optional parquet path to append progress.
    """
    rows = []
    calls_in_window = 0
    window_start = time.time()
    done = 0

    # If resuming from checkpoint, skip already completed tickers
    done_set = set()
    if checkpoint_path:
        try:
            prev = pd.read_parquet(checkpoint_path, columns=["ticker_latest"])
            done_set = set(prev["ticker_latest"].unique())
            print(f"Resuming: {len(done_set)} symbols already saved at {checkpoint_path}")
        except Exception:
            pass

    for i, sym in enumerate(universe, start=1):
        if sym in done_set:
            continue

        ysym = to_yahoo(sym)
        # --- Rate limit: <= CALLS_PER_MIN per rolling minute
        now = time.time()
        elapsed = now - window_start
        if calls_in_window >= CALLS_PER_MIN and elapsed < 60:
            sleep_for = 60 - elapsed
            print(f"Pausing {sleep_for:.1f}s to respect {CALLS_PER_MIN}/min (done {i-1}/{len(universe)})")
            time.sleep(sleep_for)
            window_start = time.time()
            calls_in_window = 0

        payload = None
        backoff = 2.0
        for attempt in range(1, MAX_RETRY + 1):
            try:
                payload = fetch_one(ysym)
                calls_in_window += 1    # count the request (history call)
                break
            except Exception:
                if attempt < MAX_RETRY:
                    time.sleep(backoff); backoff *= 2
                    continue
                else:
                    print(f"[WARN] {sym}: failed after retries\n{traceback.format_exc(limit=1)}")

        if payload is None or payload.empty:
            # nothing available; still record a stub?
            pass
        else:
            payload["ticker_latest"] = sym
            rows.append(payload)

        done += 1

        # checkpoint every N tickers
        if checkpoint_path and (done % CHECKPOINT_EVERY == 0):
            ck = pd.concat(rows, ignore_index=True) if rows else pd.DataFrame(
                columns=["date","open","high","low","close","adj_close","volume","ticker_latest"]
            )
            if not ck.empty:
                # append or write new
                try:
                    prev = pd.read_parquet(checkpoint_path)
                    ck = pd.concat([prev, ck], ignore_index=True)
                except Exception:
                    pass
                ck.to_parquet(checkpoint_path, index=False)
                rows = []  # clear buffer
                print(f"Checkpointed {done} symbols → {checkpoint_path}")

    # final concat
    prices = pd.concat(rows, ignore_index=True) if rows else pd.DataFrame(
        columns=["date","open","high","low","close","adj_close","volume","ticker_latest"]
    )
    # if we had checkpointing, merge the final buffer with file
    if checkpoint_path:
        try:
            prev = pd.read_parquet(checkpoint_path)
            prices = pd.concat([prev, prices], ignore_index=True)
        except Exception:
            pass

    # tidy types
    if not prices.empty:
        prices["date"] = pd.to_datetime(prices["date"], errors="coerce").dt.date
        prices = (
            prices.sort_values(["ticker_latest","date"])
                  .drop_duplicates(subset=["ticker_latest","date"], keep="last")
                  .reset_index(drop=True)
        )
    return prices

# ---------- Build your universe from your table `df` ----------
#universe = (
    #pd.Series(df["latest_ticker"])
      #.dropna().map(lambda s: str(s).strip().upper())
      #.drop_duplicates().tolist()
#)
#print("Unique latest_ticker count:", len(universe))

# ---------- Run (with checkpoint to survive interruptions) ----------
prices_df = fetch_prices_for_universe(universe, checkpoint_path="prices_checkpoint.parquet")
print("Prices rows:", len(prices_df))
print(prices_df.head())

Resuming: 601 symbols already saved at prices_checkpoint.parquet


$ABMD: possibly delisted; no timezone found
$ALTR: possibly delisted; no timezone found
$ALXN: possibly delisted; no timezone found
$ARG: possibly delisted; no price data found  (1d 2013-09-07 -> 2025-09-27)
$ATVI: possibly delisted; no timezone found
$BRCM: possibly delisted; no price data found  (1d 2013-09-07 -> 2025-09-27)
$CAM: possibly delisted; no price data found  (1d 2013-09-07 -> 2025-09-27)
$CELG: possibly delisted; no timezone found
$CERN: possibly delisted; no timezone found
$CPGX: possibly delisted; no price data found  (1d 2013-09-07 -> 2025-09-27)
$CTLT: possibly delisted; no timezone found
$CTXS: possibly delisted; no timezone found
$DFS: possibly delisted; no timezone found
$DISH: possibly delisted; no timezone found
$DO: possibly delisted; no timezone found
$DRE: possibly delisted; no timezone found
$ENDP: possibly delisted; no timezone found
$FRC: possibly delisted; no timezone found
$GMCR: possibly delisted; no price data found  (1d 2013-09-07 -> 2025-09-27)
$HFC: 

Prices rows: 1787822
         date       open       high        low      close  adj_close   volume  \
0  2013-09-03  33.648067  33.984264  33.347637  33.569386  30.354403  2188709   
1  2013-09-04  33.569386  34.306152  33.433475  34.241776  30.962395  3525756   
2  2013-09-05  34.206009  34.399143  34.034336  34.105865  30.839495  1835854   
3  2013-09-06  34.127323  34.184547  33.712444  33.927040  30.677790  1863394   
4  2013-09-09  34.041489  34.370529  33.826897  34.299000  31.014139  2333122   

  ticker_latest  
0             A  
1             A  
2             A  
3             A  
4             A  


In [11]:
prices_df

Unnamed: 0,date,open,high,low,close,adj_close,volume,ticker_latest
0,2013-09-03,33.648067,33.984264,33.347637,33.569386,30.354403,2188709,A
1,2013-09-04,33.569386,34.306152,33.433475,34.241776,30.962395,3525756,A
2,2013-09-05,34.206009,34.399143,34.034336,34.105865,30.839495,1835854,A
3,2013-09-06,34.127323,34.184547,33.712444,33.927040,30.677790,1863394,A
4,2013-09-09,34.041489,34.370529,33.826897,34.299000,31.014139,2333122,A
...,...,...,...,...,...,...,...,...
1787817,2025-08-25,156.759995,157.210007,154.889999,155.190002,155.190002,1797700,ZTS
1787818,2025-08-26,155.380005,156.320007,154.509995,154.789993,154.789993,3614300,ZTS
1787819,2025-08-27,155.160004,156.110001,154.509995,155.369995,155.369995,1931100,ZTS
1787820,2025-08-28,155.139999,155.350006,153.289993,154.789993,154.789993,1831500,ZTS


In [None]:
engine = create_engine("postgresql://postgres:CSDBMS623@localhost:5432/SP500_ML")

# 1) ensure target table exists (idempotent)
with engine.begin() as conn:
    conn.execute(text(f"""
        CREATE TABLE IF NOT EXISTS "{TABLE}" (
            date date NOT NULL,
            ticker_latest varchar(16) NOT NULL,
            open double precision,
            high double precision,
            low double precision,
            close double precision,
            adj_close double precision,
            volume bigint,
            PRIMARY KEY (date, ticker_latest)
        );
    """))

# 2) stage the new data
prices_clean.to_sql(
    STAGING, engine, if_exists="replace", index=False, method="multi", chunksize=50_000,
    dtype={
        "date": DATE(),
        "ticker_latest": VARCHAR(16),
        "open": DOUBLE_PRECISION(),
        "high": DOUBLE_PRECISION(),
        "low": DOUBLE_PRECISION(),
        "close": DOUBLE_PRECISION(),
        "adj_close": DOUBLE_PRECISION(),
        "volume": BIGINT(),
    }
)

# 3) merge into target with upsert, then drop staging
with engine.begin() as conn:
    conn.execute(text(f"""
        INSERT INTO "{TABLE}" (date, ticker_latest, open, high, low, close, adj_close, volume)
        SELECT s.date, s.ticker_latest, s.open, s.high, s.low, s.close, s.adj_close, s.volume
        FROM "{STAGING}" s
        ON CONFLICT (date, ticker_latest) DO UPDATE
        SET open = EXCLUDED.open,
            high = EXCLUDED.high,
            low = EXCLUDED.low,
            close = EXCLUDED.close,
            adj_close = EXCLUDED.adj_close,
            volume = EXCLUDED.volume;
        DROP TABLE "{STAGING}";
    """))

print("Upsert complete.")

In [None]:
import pandas as pd
import yfinance as yf

# 0) Make sure your stock df is sorted and has proper dtypes
df = prices_df.copy()
df["date"] = pd.to_datetime(df["date"])
df = df.sort_values(["ticker_latest","date"])
df["ret"] = df.groupby("ticker_latest", group_keys=False)["adj_close"].pct_change()

# 1) Download benchmark (^GSPC or 'SPY')
start_date = df["date"].min().normalize()
mkt = yf.download("^GSPC", start=start_date, auto_adjust=False, progress=False)

# 2) FLATTEN possible MultiIndex columns from yfinance
if isinstance(mkt.columns, pd.MultiIndex):
    mkt.columns = ["_".join([str(x) for x in tup if x not in (None, "")]) for tup in mkt.columns]
# Standardize expected column name for adjusted close
if "Adj Close" in mkt.columns:
    mkt = mkt.rename(columns={"Adj Close": "adj_close_mkt"})
elif "Adj Close_^GSPC" in mkt.columns:
    mkt = mkt.rename(columns={"Adj Close_^GSPC": "adj_close_mkt"})
elif "Adj_Close" in mkt.columns:
    mkt = mkt.rename(columns={"Adj_Close": "adj_close_mkt"})
else:
    # last resort: take the first column named like adj close
    cand = [c for c in mkt.columns if "adj" in c.lower() and "close" in c.lower()]
    assert cand, "Couldn't find adjusted close in market DataFrame"
    mkt = mkt.rename(columns={cand[0]: "adj_close_mkt"})

# 3) Make date a normal column and compute market returns
mkt = mkt.rename_axis("date").reset_index()
mkt["date"] = pd.to_datetime(mkt["date"])
mkt["mkt_ret"] = mkt["adj_close_mkt"].pct_change()

# 4) Now the merge will work (both have 1-level columns and datetime 'date')
x = df.merge(mkt[["date","mkt_ret"]], on="date", how="left", validate="m:1")

# 5) Rolling 12-month beta (~252 trading days; require at least 126 obs)
W = 252
MIN_OBS = 126
def rolling_beta(g):
    exy = (g["ret"] * g["mkt_ret"]).rolling(W, min_periods=MIN_OBS).mean()
    ex  = g["ret"].rolling(W, min_periods=MIN_OBS).mean()
    ey  = g["mkt_ret"].rolling(W, min_periods=MIN_OBS).mean()
    cov = exy - ex*ey
    var = g["mkt_ret"].rolling(W, min_periods=MIN_OBS).var()
    return cov / var

x["beta_12m"] = x.groupby("ticker_latest", group_keys=False).apply(rolling_beta)
beta_df = x.loc[:, ["date","ticker_latest","beta_12m"]].dropna().reset_index(drop=True)
print(beta_df.head(), "\nrows:", len(beta_df))

In [None]:
import pandas as pd

beta_clean = beta_df.copy()
beta_clean["date"] = pd.to_datetime(beta_clean["date"], errors="coerce").dt.date
beta_clean["ticker_latest"] = (
    beta_clean["ticker_latest"].astype(str).str.strip().str.upper()
)
beta_clean = (
    beta_clean.dropna(subset=["date", "ticker_latest", "beta_12m"])
              .drop_duplicates(subset=["date","ticker_latest"], keep="last")
              .reset_index(drop=True)
)

print("rows to ingest:", len(beta_clean))
beta_clean.head()


In [None]:
# --- Upsert into Postgres ---
from sqlalchemy import create_engine, text
from sqlalchemy.dialects.postgresql import DATE, VARCHAR, DOUBLE_PRECISION

ENGINE_URL = "postgresql://postgres:CSDBMS623@localhost:5432/SP500_ML"
TABLE      = "sp500_beta_12m_daily"
STAGING    = TABLE + "_stg"

engine = create_engine(ENGINE_URL)

# 1) Ensure target table exists with PK (idempotent)
with engine.begin() as conn:
    conn.execute(text(f"""
        CREATE TABLE IF NOT EXISTS "{TABLE}" (
            date date NOT NULL,
            ticker_latest varchar(16) NOT NULL,
            beta_12m double precision,
            PRIMARY KEY (date, ticker_latest)
        );
    """))

# 2) Stage new data
beta_clean.to_sql(
    STAGING, engine, if_exists="replace", index=False, method="multi", chunksize=50_000,
    dtype={
        "date": DATE(),
        "ticker_latest": VARCHAR(16),
        "beta_12m": DOUBLE_PRECISION(),
    },
)

# 3) Merge (UPSERT) from staging into target, then drop staging
with engine.begin() as conn:
    conn.execute(text(f"""
        INSERT INTO "{TABLE}" (date, ticker_latest, beta_12m)
        SELECT s.date, s.ticker_latest, s.beta_12m
        FROM "{STAGING}" s
        ON CONFLICT (date, ticker_latest) DO UPDATE
            SET beta_12m = EXCLUDED.beta_12m;
        DROP TABLE "{STAGING}";
    """))

print("Upsert complete →", TABLE)

In [47]:
# !pip install sqlalchemy psycopg2-binary
import os, pandas as pd
from sqlalchemy import create_engine

pg_user = os.getenv("PGUSER", "postgres")
pg_pass = os.getenv("PGPASSWORD", "CSDBMS623")
pg_host = os.getenv("PGHOST", "localhost")
pg_port = os.getenv("PGPORT", "5432")
pg_db   = os.getenv("PGDATABASE", "SP500_ML")

engine = create_engine(f"postgresql+psycopg2://{pg_user}:{pg_pass}@{pg_host}:{pg_port}/{pg_db}")

# Use only the most recent membership date to define the current universe
df = pd.read_sql_query("""
    SELECT * FROM sp500_prices_daily_yahoo

""", engine)



In [48]:
df

Unnamed: 0,date,ticker_latest,open,high,low,close,adj_close,volume,company_sk
0,2013-09-03,A,33.648067,33.984264,33.347637,33.569386,30.354403,2188709,
1,2013-09-04,A,33.569386,34.306152,33.433475,34.241776,30.962395,3525756,
2,2013-09-05,A,34.206009,34.399143,34.034336,34.105865,30.839495,1835854,
3,2013-09-06,A,34.127323,34.184547,33.712444,33.927040,30.677790,1863394,
4,2013-09-09,A,34.041489,34.370529,33.826897,34.299000,31.014126,2333122,
...,...,...,...,...,...,...,...,...,...
1797760,2025-09-22,ZTS,146.550003,146.710007,144.350006,144.649994,144.649994,2044700,
1797761,2025-09-23,ZTS,143.250000,146.179993,141.529999,142.610001,142.610001,3679500,
1797762,2025-09-24,ZTS,141.619995,144.009995,140.539993,141.669998,141.669998,4224500,
1797763,2025-09-25,ZTS,141.330002,142.000000,139.339996,141.130005,141.130005,3058500,


## Technical Momentum & TA Features (per-ticker)

This cell engineers **momentum** and **technical indicators** for each `ticker_latest`
using `pandas_ta`. It assumes `df` has at least `['ticker_latest', 'date', 'adj_close']`,
is **sorted by (`ticker_latest`, `date`)**, and `date` is a proper `datetime`.

### Features created
- **Momentum (close-to-close returns):**
  - `30_day_return`  ≈ 1-month ~21 trading days → `pct_change(periods=21)`
  - `180_day_return` ≈ 6-months ~126 trading days → `pct_change(periods=126)`
  - `360_day_return` ≈ 12-months ~252 trading days → `pct_change(periods=252)`
  - Each is computed **per ticker** via `groupby('ticker_latest').transform(...)`.

- **RSI (Relative Strength Index):**
  - `rsi` (14), `rsi2` (9), `rsi3` (3) using `pandas_ta.rsi`.
  - Shorter lengths react faster but are noisier.

- **Simple Moving Averages (SMA):**
  - `sma` (50), `sma2` (100), `sma3` (200) via `pandas_ta.sma`.

- **Bollinger Bands (20, 2σ):**
  - Grouped `bbands = pandas_ta.bbands(length=20, std=2)` then renamed to:
    - `bb_lower`, `bb_middle`, `bb_upper`, `bb_bandwidth`, `bb_percent`
  - Joined back to `df` on the **row index**.

### Notes & gotchas
- **Warm-up NaNs:** All rolling indicators need lookback data; the first `length-1` rows
  per ticker will be `NaN`. Keep them or drop later.
- **Index alignment:** `groupby(...).apply(pandas_ta.bbands(...))` returns a DataFrame
  aligned to the original sub-index. The code resets the group level and `df.join(...)`
  aligns on the remaining index. This is safe if your index hasn’t been disturbed.
  If you ever see misalignment, set a stable index first:
  ```python
  df = df.sort_values(["ticker_latest","date"]).set_index(["ticker_latest","date"])
  # ... compute features, then df = df.reset_index()


In [50]:
# Momentum
import pandas_ta
df['30_day_return'] = df.groupby('ticker_latest')['adj_close'].transform(lambda x: x.pct_change(periods=21))
df['180_day_return'] = df.groupby('ticker_latest')['adj_close'].transform(lambda x: x.pct_change(periods=126))
df['360_day_return'] = df.groupby('ticker_latest')['adj_close'].transform(lambda x: x.pct_change(periods=252))

#rsi
df['rsi'] = df.groupby(['ticker_latest'])['adj_close'].transform(lambda x: pandas_ta.rsi(close=x, length=14))
df['rsi2'] = df.groupby(['ticker_latest'])['adj_close'].transform(lambda x: pandas_ta.rsi(close=x, length=9))
df['rsi3'] = df.groupby(['ticker_latest'])['adj_close'].transform(lambda x: pandas_ta.rsi(close=x, length=3))

#moving average
df['sma'] = df.groupby('ticker_latest')['adj_close'].transform(lambda x: pandas_ta.sma(close=x, length=50))
df['sma2'] = df.groupby('ticker_latest')['adj_close'].transform(lambda x: pandas_ta.sma(close=x, length=100))
df['sma3'] = df.groupby('ticker_latest')['adj_close'].transform(lambda x: pandas_ta.sma(close=x, length=200))


# Calculate Bollinger Bands
bbands = df.groupby(['ticker_latest'])['adj_close'].apply(lambda x: pandas_ta.bbands(close=x, length=20, std=2))
bbands.columns = ['bb_lower', 'bb_middle', 'bb_upper', 'bb_bandwidth', 'bb_percent']
bbands = bbands.reset_index(level=0, drop=True)
df = df.join(bbands)

In [51]:
df.loc[df.ticker_latest == 'AAPL']

Unnamed: 0,date,ticker_latest,open,high,low,close,adj_close,volume,company_sk,30_day_return,...,rsi2,rsi3,sma,sma2,sma3,bb_lower,bb_middle,bb_upper,bb_bandwidth,bb_percent
7475,2013-09-03,AAPL,17.610714,17.878571,17.405357,17.449286,15.081167,331928800,,,...,,,,,,,,,,
7476,2013-09-04,AAPL,17.841429,17.937143,17.724285,17.810356,15.393235,345032800,,,...,,,,,,,,,,
7477,2013-09-05,AAPL,17.866072,17.881430,17.629999,17.688213,15.287665,236367600,,,...,,,,,,,,,,
7478,2013-09-06,AAPL,17.801430,17.834999,17.498215,17.793571,15.378724,359525600,,,...,,76.550627,,,,,,,,
12110,2013-09-09,AAPL,18.035713,18.139999,17.981428,18.077499,15.624122,340687200,,,...,,89.467744,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15141,2025-09-22,AAPL,248.300003,256.640015,248.119995,256.079987,256.079987,105517400,,0.138639,...,79.781041,94.891962,224.731473,214.199038,221.304487,222.415230,235.787000,249.158770,11.342245,1.258800
15142,2025-09-23,AAPL,255.880005,257.339996,253.580002,254.429993,254.429993,60275200,,0.117097,...,75.765967,83.461136,225.652403,214.623527,221.367679,222.117372,237.150500,252.183627,12.678133,1.074714
15143,2025-09-24,AAPL,255.220001,255.740005,251.039993,252.309998,252.309998,42303700,,0.110715,...,70.628171,67.735497,226.521144,215.018635,221.418477,222.351446,238.300500,254.249553,13.385665,0.939195
15144,2025-09-25,AAPL,253.210007,257.170013,251.710007,256.869995,256.869995,55202100,,0.120187,...,74.768444,79.933987,227.460109,215.538849,221.491925,222.178807,239.619499,257.060191,14.556989,0.994547


In [59]:
df.drop('company_sk',axis = 1, inplace=True)

In [63]:
df.drop('volume',axis = 1, inplace=True)

In [73]:
tech = df.copy()

In [77]:
tech.columns

Index(['date', 'ticker_latest', 'open', 'high', 'low', 'close', 'adj_close',
       '30_day_return', '180_day_return', '360_day_return', 'rsi', 'rsi2',
       'rsi3', 'sma', 'sma2', 'sma3', 'bb_lower', 'bb_middle', 'bb_upper',
       'bb_bandwidth', 'bb_percent'],
      dtype='object')

In [79]:
# starting df has columns:
# ['date','ticker_latest','open','high','low','close','adj_close',
#  '30_day_return','180_day_return','360_day_return','rsi','rsi2','rsi3',
#  'sma','sma2','sma3','bb_lower','bb_middle','bb_upper','bb_bandwidth','bb_percent']

import pandas as pd

rename_map = {
    "30_day_return": "ret_30d",
    "180_day_return": "ret_180d",
    "360_day_return": "ret_360d",
    "rsi":  "rsi_14",
    "rsi2": "rsi_9",
    "rsi3": "rsi_3",
    "sma":  "sma_50",
    "sma2": "sma_100",
    "sma3": "sma_200",
}

tech = df.rename(columns=rename_map).copy()

# normalize types / sort
tech["date"] = pd.to_datetime(tech["date"], errors="coerce").dt.date
tech["ticker_latest"] = tech["ticker_latest"].astype(str).str.upper().str.strip()
tech = tech.sort_values(["ticker_latest","date"])

# (optional) clip RSI to valid range
for c in ["rsi_14","rsi_9","rsi_3"]:
    if c in tech.columns:
        tech[c] = tech[c].clip(lower=0, upper=100)

# reorder to exactly what the loader expects
required_cols = [
    "date","ticker_latest","adj_close",
    "ret_30d","ret_180d","ret_360d",
    "rsi_14","rsi_9","rsi_3",
    "sma_50","sma_100","sma_200",
    "bb_lower","bb_middle","bb_upper","bb_bandwidth","bb_percent",
]
missing = [c for c in required_cols if c not in tech.columns]
if missing:
    raise ValueError(f"Missing expected columns after rename: {missing}")

tech = tech[required_cols]

# now pass `tech` into your upsert block


In [83]:
from sqlalchemy import create_engine, text
from sqlalchemy.dialects.postgresql import DATE, VARCHAR, DOUBLE_PRECISION, BIGINT
import pandas as pd

ENGINE_URL = "postgresql://postgres:CSDBMS623@localhost:5432/SP500_ML"
TABLE      = "sp500_prices_technicals_daily"
STAGING    = TABLE + "_stg"

engine = create_engine(ENGINE_URL)

# (optional) quick checks to avoid silent schema mismatches
required = {
    "date","ticker_latest","adj_close","ret_30d","ret_180d","ret_360d",
    "rsi_14","rsi_9","rsi_3","sma_50","sma_100","sma_200",
    "bb_lower","bb_middle","bb_upper","bb_bandwidth","bb_percent"
}
missing = required - set(tech.columns)
if missing:
    raise ValueError(f"tech is missing columns: {sorted(missing)}")

# normalize types
tech = tech.copy()
tech["date"] = pd.to_datetime(tech["date"], errors="coerce").dt.date
tech["ticker_latest"] = tech["ticker_latest"].astype(str).str.upper().str.strip()

# 1) ensure target table exists
with engine.begin() as conn:
    conn.execute(text(f"""
        CREATE TABLE IF NOT EXISTS "{TABLE}" (
            date date NOT NULL,
            ticker_latest varchar(16) NOT NULL,
            adj_close double precision,
            ret_30d double precision,
            ret_180d double precision,
            ret_360d double precision,
            rsi_14 double precision,
            rsi_9 double precision,
            rsi_3 double precision,
            sma_50 double precision,
            sma_100 double precision,
            sma_200 double precision,
            bb_lower double precision,
            bb_middle double precision,
            bb_upper double precision,
            bb_bandwidth double precision,
            bb_percent double precision,
            PRIMARY KEY (date, ticker_latest)
        );
    """))

# 2) stage the data
tech.to_sql(
    STAGING, engine, if_exists="replace", index=False, method="multi", chunksize=50_000,
    dtype={
        "date": DATE(),
        "ticker_latest": VARCHAR(16),
        "adj_close": DOUBLE_PRECISION(),
        "ret_30d": DOUBLE_PRECISION(),
        "ret_180d": DOUBLE_PRECISION(),
        "ret_360d": DOUBLE_PRECISION(),
        "rsi_14": DOUBLE_PRECISION(),
        "rsi_9": DOUBLE_PRECISION(),
        "rsi_3": DOUBLE_PRECISION(),
        "sma_50": DOUBLE_PRECISION(),
        "sma_100": DOUBLE_PRECISION(),
        "sma_200": DOUBLE_PRECISION(),
        "bb_lower": DOUBLE_PRECISION(),
        "bb_middle": DOUBLE_PRECISION(),
        "bb_upper": DOUBLE_PRECISION(),
        "bb_bandwidth": DOUBLE_PRECISION(),
        "bb_percent": DOUBLE_PRECISION(),
    }
)

# 3) upsert + drop staging
with engine.begin() as conn:
    conn.execute(text(f"""
        INSERT INTO "{TABLE}" (
            date, ticker_latest, adj_close,
            ret_30d, ret_180d, ret_360d,
            rsi_14, rsi_9, rsi_3,
            sma_50, sma_100, sma_200,
            bb_lower, bb_middle, bb_upper, bb_bandwidth, bb_percent
        )
        SELECT
            s.date, s.ticker_latest, s.adj_close,
            s.ret_30d, s.ret_180d, s.ret_360d,
            s.rsi_14, s.rsi_9, s.rsi_3,
            s.sma_50, s.sma_100, s.sma_200,
            s.bb_lower, s.bb_middle, s.bb_upper, s.bb_bandwidth, s.bb_percent
        FROM "{STAGING}" s
        ON CONFLICT (date, ticker_latest) DO UPDATE SET
            adj_close   = EXCLUDED.adj_close,
            ret_30d     = EXCLUDED.ret_30d,
            ret_180d    = EXCLUDED.ret_180d,
            ret_360d    = EXCLUDED.ret_360d,
            rsi_14      = EXCLUDED.rsi_14,
            rsi_9       = EXCLUDED.rsi_9,
            rsi_3       = EXCLUDED.rsi_3,
            sma_50      = EXCLUDED.sma_50,
            sma_100     = EXCLUDED.sma_100,
            sma_200     = EXCLUDED.sma_200,
            bb_lower    = EXCLUDED.bb_lower,
            bb_middle   = EXCLUDED.bb_middle,
            bb_upper    = EXCLUDED.bb_upper,
            bb_bandwidth= EXCLUDED.bb_bandwidth,
            bb_percent  = EXCLUDED.bb_percent;
        DROP TABLE "{STAGING}";
    """))

print("Upsert complete →", TABLE)


Upsert complete → sp500_prices_technicals_daily


In [93]:
prices_clean = df.copy()

In [97]:
# --- Build `mm` (market momentum) with plain date ---
import pandas as pd
import yfinance as yf

# Use your prices range to choose a start date
mkt_start = pd.to_datetime(prices_clean["date"], errors="coerce").min().date()

gspc = yf.Ticker("^GSPC").history(
    start=mkt_start, auto_adjust=False, interval="1d", actions=False
).reset_index().rename(columns={"Date": "date"})

# robust adjusted close detection
adj_col = next((c for c in gspc.columns if "adj" in c.lower() and "close" in c.lower()), None)
if adj_col is None:
    raise ValueError(f"Adjusted close not found in benchmark columns: {list(gspc.columns)}")

gspc["date"] = pd.to_datetime(gspc["date"], errors="coerce").dt.date
gspc = gspc.sort_values("date").dropna(subset=[adj_col])

# 1M/6M/12M windows ≈ 21/126/252 trading days
WINS = {"1m": 21, "6m": 126, "12m": 252}
for tag, W in WINS.items():
    gspc[f"mkt_{tag}"] = gspc[adj_col].pct_change(W)

mm = gspc[["date", "mkt_1m", "mkt_6m", "mkt_12m"]].drop_duplicates("date").reset_index(drop=True)

print("mm rows:", len(mm))
print(mm.tail())

mm rows: 3036
            date    mkt_1m    mkt_6m   mkt_12m
3031  2025-09-22  0.050796  0.181064  0.191428
3032  2025-09-23  0.029382  0.154198  0.165093
3033  2025-09-24  0.030850  0.149104  0.164035
3034  2025-09-25  0.021463  0.156248  0.154960
3035  2025-09-26  0.025041  0.166931  0.158866


In [99]:
from sqlalchemy import create_engine, text
from sqlalchemy.dialects.postgresql import DATE, DOUBLE_PRECISION

ENGINE_URL = "postgresql://postgres:CSDBMS623@localhost:5432/SP500_ML"
TABLE      = "sp500_market_momentum_daily"
STAGING    = TABLE + "_stg"

engine = create_engine(ENGINE_URL)

# Ensure target table
with engine.begin() as conn:
    conn.execute(text(f"""
        CREATE TABLE IF NOT EXISTS "{TABLE}" (
            date date PRIMARY KEY,
            mkt_1m double precision,
            mkt_6m double precision,
            mkt_12m double precision
        );
    """))

# Stage
mm.to_sql(
    STAGING, engine, if_exists="replace", index=False, method="multi", chunksize=50_000,
    dtype={"date": DATE(), "mkt_1m": DOUBLE_PRECISION(), "mkt_6m": DOUBLE_PRECISION(), "mkt_12m": DOUBLE_PRECISION()},
)

# Upsert
with engine.begin() as conn:
    conn.execute(text(f"""
        INSERT INTO "{TABLE}" (date, mkt_1m, mkt_6m, mkt_12m)
        SELECT s.date, s.mkt_1m, s.mkt_6m, s.mkt_12m
        FROM "{STAGING}" s
        ON CONFLICT (date) DO UPDATE SET
            mkt_1m  = EXCLUDED.mkt_1m,
            mkt_6m  = EXCLUDED.mkt_6m,
            mkt_12m = EXCLUDED.mkt_12m;
        DROP TABLE "{STAGING}";
    """))

print("Upsert complete →", TABLE)


Upsert complete → sp500_market_momentum_daily


In [101]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine

ENGINE_URL = "postgresql://postgres:CSDBMS623@localhost:5432/SP500_ML"
engine = create_engine(ENGINE_URL)

# ---------- 1) Daily prices ----------
df_prices = pd.read_sql("""
    SELECT * FROM sp500_prices_daily_yahoo
""", engine, parse_dates=["date"])

In [103]:
df_prices

Unnamed: 0,date,ticker_latest,open,high,low,close,adj_close,volume,company_sk
0,2013-09-03,A,33.648067,33.984264,33.347637,33.569386,30.354403,2188709,
1,2013-09-04,A,33.569386,34.306152,33.433475,34.241776,30.962395,3525756,
2,2013-09-05,A,34.206009,34.399143,34.034336,34.105865,30.839495,1835854,
3,2013-09-06,A,34.127323,34.184547,33.712444,33.927040,30.677790,1863394,
4,2013-09-09,A,34.041489,34.370529,33.826897,34.299000,31.014126,2333122,
...,...,...,...,...,...,...,...,...,...
1797760,2025-09-22,ZTS,146.550003,146.710007,144.350006,144.649994,144.649994,2044700,
1797761,2025-09-23,ZTS,143.250000,146.179993,141.529999,142.610001,142.610001,3679500,
1797762,2025-09-24,ZTS,141.619995,144.009995,140.539993,141.669998,141.669998,4224500,
1797763,2025-09-25,ZTS,141.330002,142.000000,139.339996,141.130005,141.130005,3058500,


In [105]:
import numpy as np
import pandas as pd

def add_rolling_volatility(
    df_prices: pd.DataFrame,
    ticker_col: str = "ticker_latest",
    date_col: str = "date",
    price_col: str = "adj_close",
) -> pd.DataFrame:
    """
    Adds realized close-to-close volatility columns:
      - vol_1m_ann, vol_6m_ann, vol_12m_ann  (annualized, using sqrt(252))
      - vol_1m_win, vol_6m_win, vol_12m_win  (window-scale, using sqrt(window))
    Windows use ~trading-day counts: 1m=21, 6m=126, 12m=252.
    """
    df = df_prices.copy()

    # basics
    df[date_col] = pd.to_datetime(df[date_col], errors="coerce")
    df = df.sort_values([ticker_col, date_col])

    # guard: non-positive prices -> NaN so logs don't explode
    px = pd.to_numeric(df[price_col], errors="coerce")
    px = px.where(px > 0, np.nan)

    # log returns per ticker
    lr = np.log(px) - np.log(px.groupby(df[ticker_col]).shift(1))
    df["log_ret"] = lr

    # windows in trading days
    windows = {
        "1m": 21,
        "6m": 126,
        "12m": 252,
    }

    # rolling std of daily log returns per ticker
    g = df.groupby(ticker_col)["log_ret"]
    for tag, w in windows.items():
        rstd = g.rolling(window=w, min_periods=w).std().reset_index(level=0, drop=True)

        # annualized: multiply by sqrt(252)
        df[f"vol_{tag}_ann"] = rstd * np.sqrt(252)

        # window-scale (non-annualized): multiply by sqrt(w)
        df[f"vol_{tag}_win"] = rstd * np.sqrt(w)

    return df


In [107]:
df_vol = add_rolling_volatility(df_prices)
# Peek:
cols = ["date","ticker_latest","adj_close",
        "vol_1m_ann","vol_6m_ann","vol_12m_ann",
        "vol_1m_win","vol_6m_win","vol_12m_win"]
print(df_vol[cols].sort_values(["ticker_latest","date"]).head(30))


              date ticker_latest  adj_close  vol_1m_ann  vol_6m_ann  \
0       2013-09-03             A  30.354403         NaN         NaN   
1       2013-09-04             A  30.962395         NaN         NaN   
2       2013-09-05             A  30.839495         NaN         NaN   
3       2013-09-06             A  30.677790         NaN         NaN   
4       2013-09-09             A  31.014126         NaN         NaN   
5       2013-09-10             A  31.402214         NaN         NaN   
6       2013-09-11             A  31.667398         NaN         NaN   
7       2013-09-12             A  31.447481         NaN         NaN   
8       2013-09-13             A  31.343987         NaN         NaN   
9       2013-09-16             A  31.525116         NaN         NaN   
10      2013-09-17             A  31.609182         NaN         NaN   
11      2013-09-18             A  31.900240         NaN         NaN   
12      2013-09-19             A  32.973938         NaN         NaN   
13    

In [109]:
# ====== 1) BUILD ANNUALIZED 1m/6m/12m VOL FROM DAILY PRICES ======
import numpy as np
import pandas as pd

def build_vol_df(
    df_prices: pd.DataFrame,
    ticker_col: str = "ticker_latest",
    date_col: str = "date",
    price_col: str = "adj_close",
) -> pd.DataFrame:
    """
    Returns only: ['symbol','date','vol_1m_ann','vol_6m_ann','vol_12m_ann'].
    Vols are annualized realized vols from daily log returns of adj_close:
      1m ~ 21d, 6m ~ 126d, 12m ~ 252d, annualize with sqrt(252).
    """
    df = df_prices.copy()
    df[date_col] = pd.to_datetime(df[date_col], errors="coerce")
    df = df.sort_values([ticker_col, date_col])

    px = pd.to_numeric(df[price_col], errors="coerce")
    px = px.where(px > 0, np.nan)

    # daily log-return per ticker
    lr = np.log(px) - np.log(px.groupby(df[ticker_col]).shift(1))
    df["_lr"] = lr

    windows = {"1m": 21, "6m": 126, "12m": 252}
    g = df.groupby(ticker_col)["_lr"]

    for tag, w in windows.items():
        rstd = g.rolling(window=w, min_periods=w).std().reset_index(level=0, drop=True)
        df[f"vol_{tag}_ann"] = rstd * np.sqrt(252.0)

    out = df[[ticker_col, date_col, "vol_1m_ann", "vol_6m_ann", "vol_12m_ann"]].rename(
        columns={ticker_col: "symbol", date_col: "date"}
    )
    # Optional: drop rows where all three vols are NaN
    # mask_all_nan = out[["vol_1m_ann","vol_6m_ann","vol_12m_ann"]].isna().all(axis=1)
    # out = out.loc[~mask_all_nan]
    return out


# ====== 2) POSTGRES UPSERT (auto-add columns, unique(symbol,date)) ======
import math
from typing import Sequence, Set, Dict
from sqlalchemy import create_engine, text
from sqlalchemy.engine import Engine
from sqlalchemy.types import Float, Text, DateTime

PG_CONN_STR = "postgresql://postgres:CSDBMS623@localhost:5432/SP500_ML"
SCHEMA      = "public"
TABLE       = "realized_vol_d"
CHUNK_ROWS  = 100_000

def _get_engine(conn_str: str) -> Engine:
    return create_engine(conn_str, pool_pre_ping=True)

def ensure_table_and_indexes(engine: Engine, schema: str, table: str):
    ddl = f'''
    CREATE TABLE IF NOT EXISTS "{schema}"."{table}" (
      symbol        TEXT,
      date          TIMESTAMP,
      vol_1m_ann    DOUBLE PRECISION,
      vol_6m_ann    DOUBLE PRECISION,
      vol_12m_ann   DOUBLE PRECISION
    );'''
    uq = f"""
    DO $$
    BEGIN
      IF NOT EXISTS (
        SELECT 1 FROM pg_constraint WHERE conname = '{table}_symbol_date_key'
      ) THEN
        ALTER TABLE "{schema}"."{table}"
        ADD CONSTRAINT {table}_symbol_date_key UNIQUE (symbol, date);
      END IF;
    END$$;"""
    idx1 = f'CREATE INDEX IF NOT EXISTS {table}_symbol_idx ON "{schema}"."{table}" (symbol);'
    idx2 = f'CREATE INDEX IF NOT EXISTS {table}_date_idx   ON "{schema}"."{table}" (date);'
    with engine.begin() as conn:
        conn.execute(text(ddl)); conn.execute(text(uq)); conn.execute(text(idx1)); conn.execute(text(idx2))

def _existing_columns(engine: Engine, schema: str, table: str) -> Set[str]:
    sql = """
    SELECT lower(column_name) FROM information_schema.columns
    WHERE table_schema=:schema AND table_name=:table"""
    with engine.begin() as conn:
        rows = conn.execute(text(sql), {"schema": schema, "table": table}).fetchall()
    return {r[0] for r in rows}

def ensure_missing_columns(engine: Engine, schema: str, table: str, df: pd.DataFrame):
    have = _existing_columns(engine, schema, table)
    add = [c for c in df.columns if c not in have]
    if not add: return
    alters = [f'ADD COLUMN IF NOT EXISTS {c} DOUBLE PRECISION' if c != "date" and c != "symbol"
              else (f'ADD COLUMN IF NOT EXISTS {c} TIMESTAMP' if c=="date" else f'ADD COLUMN IF NOT EXISTS {c} TEXT')
              for c in add]
    with engine.begin() as conn:
        conn.execute(text(f'ALTER TABLE "{schema}"."{table}" ' + ", ".join(alters) + ";"))

def _dtype_map(df: pd.DataFrame) -> Dict[str, object]:
    d: Dict[str, object] = {}
    for c in df.columns:
        if c == "date":
            d[c] = DateTime(timezone=False)
        elif c == "symbol":
            d[c] = Text()
        else:
            d[c] = Float()
    return d

def _to_sql_staging(engine: Engine, df: pd.DataFrame, schema: str, staging: str):
    df.to_sql(
        name=staging, con=engine, schema=schema,
        if_exists="replace", index=False,
        dtype=_dtype_map(df), chunksize=50_000,
    )

def _merge_from_staging(engine: Engine, schema: str, table: str, staging: str, cols: Sequence[str]):
    non_key = [c for c in cols if c not in ("symbol","date")]
    set_clause = ", ".join([f"{c}=EXCLUDED.{c}" for c in non_key]) or "symbol=EXCLUDED.symbol"
    sql = f"""
    INSERT INTO "{schema}"."{table}" ({", ".join(cols)})
    SELECT {", ".join(cols)} FROM "{schema}"."{staging}"
    ON CONFLICT (symbol, date) DO UPDATE SET {set_clause};"""
    with engine.begin() as conn:
        conn.execute(text(sql))

def upsert_realized_vol_postgres(
    df_vol: pd.DataFrame,
    conn_str: str = PG_CONN_STR,
    schema: str = SCHEMA,
    table: str = TABLE,
    chunk_rows: int = CHUNK_ROWS,
):
    if df_vol.empty:
        print("vol DataFrame empty; nothing to ingest."); return
    df = df_vol.copy()
    # normalize cols
    df.columns = df.columns.str.lower()
    df["symbol"] = df["symbol"].astype(str).str.upper()
    df["date"]   = pd.to_datetime(df["date"], errors="coerce").dt.tz_localize(None)

    engine = _get_engine(conn_str)
    ensure_table_and_indexes(engine, schema, table)
    ensure_missing_columns(engine, schema, table, df)

    # order columns
    key_first = [c for c in ("symbol","date") if c in df.columns]
    rest = [c for c in df.columns if c not in key_first]
    df = df[key_first + rest]

    # chunked upsert
    n = len(df); n_chunks = math.ceil(n / chunk_rows)
    for i in range(n_chunks):
        lo, hi = i*chunk_rows, min((i+1)*chunk_rows, n)
        staging = f"stg_{table}"
        _to_sql_staging(engine, df.iloc[lo:hi].copy(), schema, staging)
        _merge_from_staging(engine, schema, table, staging, df.columns.tolist())
        print(f"Upserted rows {lo}–{hi} / {n}")
    print("✅ Realized volatility ingestion complete.")


In [111]:
# 1) Build vols (only symbol/date/annualized vols)
df_vol = build_vol_df(df_prices)  # expects columns: date, ticker_latest, adj_close

# 2) Ingest to Postgres
upsert_realized_vol_postgres(df_vol)


Upserted rows 0–100000 / 1797765
Upserted rows 100000–200000 / 1797765
Upserted rows 200000–300000 / 1797765
Upserted rows 300000–400000 / 1797765
Upserted rows 400000–500000 / 1797765
Upserted rows 500000–600000 / 1797765
Upserted rows 600000–700000 / 1797765
Upserted rows 700000–800000 / 1797765
Upserted rows 800000–900000 / 1797765
Upserted rows 900000–1000000 / 1797765
Upserted rows 1000000–1100000 / 1797765
Upserted rows 1100000–1200000 / 1797765
Upserted rows 1200000–1300000 / 1797765
Upserted rows 1300000–1400000 / 1797765
Upserted rows 1400000–1500000 / 1797765
Upserted rows 1500000–1600000 / 1797765
Upserted rows 1600000–1700000 / 1797765
Upserted rows 1700000–1797765 / 1797765
✅ Realized volatility ingestion complete.


In [113]:
df_vol

Unnamed: 0,symbol,date,vol_1m_ann,vol_6m_ann,vol_12m_ann
0,A,2013-09-03,,,
1,A,2013-09-04,,,
2,A,2013-09-05,,,
3,A,2013-09-06,,,
4,A,2013-09-09,,,
...,...,...,...,...,...
1797760,ZTS,2025-09-22,0.140623,0.273165,0.255497
1797761,ZTS,2025-09-23,0.140432,0.273799,0.255361
1797762,ZTS,2025-09-24,0.139628,0.273507,0.255412
1797763,ZTS,2025-09-25,0.139497,0.273136,0.255352
