In [0]:
# Databricks notebook source
# MAGIC %md
# MAGIC # Silver v5 — Limpeza, Padronização e Modelagem (Long/Wide)
# MAGIC
# MAGIC ## Objetivo
# MAGIC Transformar a camada **Bronze** em dados consistentes e padronizados para análise, gerando:
# MAGIC - **silver_prices_long**: formato “longo” (1 linha por `timestamp` + `symbol`)
# MAGIC - **silver_prices_wide**: formato “largo” (1 linha por `trade_date`, colunas de close por ativo)
# MAGIC
# MAGIC ## Entradas
# MAGIC - `mvp_finance.bronze_prices_raw`
# MAGIC
# MAGIC ## Saídas
# MAGIC - `mvp_finance.silver_prices_long`
# MAGIC - `mvp_finance.silver_prices_wide`
# MAGIC
# MAGIC ## Regras principais
# MAGIC - Tipagem: OHLCV em `double`, datas em `date`, timestamps em `timestamp`
# MAGIC - Deduplicação: (`timestamp`, `symbol`) mantendo o último por `ingestion_ts`
# MAGIC - WIDE: pivot por `trade_date` com closes por símbolo

In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 1) Imports e contexto do database
# MAGIC Carregamos funções do Spark SQL para cast, datas, deduplicação (Window) e arredondamentos.
# COMMAND ----------
import pyspark.sql.functions as F
from pyspark.sql.functions import col, to_date, to_timestamp, round
from pyspark.sql import Window

spark.sql("USE mvp_finance")

DataFrame[]

In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 2) Leitura da Bronze
# MAGIC Lemos a tabela Bronze e validamos rapidamente schema e amostra.
# COMMAND ----------
bronze_df = spark.table("bronze_prices_raw")

print("Schema Bronze v5:")
bronze_df.printSchema()
display(bronze_df.limit(5))

Schema Bronze v5:
root
 |-- date_raw: timestamp (nullable = true)
 |-- open: double (nullable = true)
 |-- high: double (nullable = true)
 |-- low: double (nullable = true)
 |-- close: double (nullable = true)
 |-- volume: long (nullable = true)
 |-- symbol: string (nullable = true)
 |-- source: string (nullable = true)
 |-- ingestion_ts: timestamp (nullable = true)



date_raw,open,high,low,close,volume,symbol,source,ingestion_ts
2025-01-04T00:00:00.000Z,98106.9921875,98734.4296875,97562.9765625,98236.2265625,22342608078,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-05T00:00:00.000Z,98233.90625,98813.3046875,97291.765625,98314.9609375,20525254825,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-06T00:00:00.000Z,98314.953125,102482.875,97926.1484375,102078.0859375,51823432705,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-07T00:00:00.000Z,102248.8515625,102712.484375,96132.875,96922.703125,58685738547,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-08T00:00:00.000Z,96924.1640625,97258.3203125,92525.84375,95043.5234375,63875859171,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 3) Construção do Silver LONG (padronização e casts)
# MAGIC Nesta etapa:
# MAGIC - Criamos `date` (DATE) e `timestamp` (TIMESTAMP) a partir de `date_raw`
# MAGIC - Fazemos casts de OHLCV para `double`
# MAGIC - Arredondamos preços (2 casas) e volume (0 casas)
# MAGIC - Removemos registros inválidos (NULL em `timestamp` ou `close`)
# COMMAND ----------
silver_long = (
    bronze_df
    .withColumn("date", to_date(col("date_raw")))
    .withColumn("timestamp", to_timestamp(col("date_raw")))
    .select(
        "timestamp",
        "date",
        col("open").cast("double").alias("open"),
        col("high").cast("double").alias("high"),
        col("low").cast("double").alias("low"),
        col("close").cast("double").alias("close"),
        col("volume").cast("double").alias("volume"),
        "symbol",
        "source",
        "ingestion_ts"
    )
    .dropna(subset=["timestamp", "close", "symbol"])
    .withColumn("open",   round(col("open"),   2))
    .withColumn("high",   round(col("high"),   2))
    .withColumn("low",    round(col("low"),    2))
    .withColumn("close",  round(col("close"),  2))
    .withColumn("volume", round(col("volume"), 0))
)

print("Schema Silver (long) antes do dedup:")
silver_long.printSchema()
display(silver_long.limit(10))


Schema Silver (long) antes do dedup:
root
 |-- timestamp: timestamp (nullable = true)
 |-- date: date (nullable = true)
 |-- open: double (nullable = true)
 |-- high: double (nullable = true)
 |-- low: double (nullable = true)
 |-- close: double (nullable = true)
 |-- volume: double (nullable = true)
 |-- symbol: string (nullable = true)
 |-- source: string (nullable = true)
 |-- ingestion_ts: timestamp (nullable = true)



timestamp,date,open,high,low,close,volume,symbol,source,ingestion_ts
2025-01-04T00:00:00.000Z,2025-01-04,98106.99,98734.43,97562.98,98236.23,22342608078.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-05T00:00:00.000Z,2025-01-05,98233.91,98813.3,97291.77,98314.96,20525254825.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-06T00:00:00.000Z,2025-01-06,98314.95,102482.88,97926.15,102078.09,51823432705.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-07T00:00:00.000Z,2025-01-07,102248.85,102712.48,96132.88,96922.7,58685738547.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-08T00:00:00.000Z,2025-01-08,96924.16,97258.32,92525.84,95043.52,63875859171.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-09T00:00:00.000Z,2025-01-09,95043.48,95349.72,91220.84,92484.04,62777261693.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-10T00:00:00.000Z,2025-01-10,92494.49,95770.61,92250.09,94701.45,62058693684.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-11T00:00:00.000Z,2025-01-11,94700.84,94977.69,93840.05,94566.59,18860894100.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-12T00:00:00.000Z,2025-01-12,94565.73,95367.54,93712.51,94488.44,20885130965.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-13T00:00:00.000Z,2025-01-13,94488.89,95837.0,89260.1,94516.52,72978998252.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 4) Deduplicação do Silver LONG
# MAGIC Removemos duplicatas por (`timestamp`, `symbol`) mantendo o registro mais recente
# MAGIC com base em `ingestion_ts` (última ingestão vence).
# COMMAND ----------
w = Window.partitionBy("timestamp", "symbol").orderBy(F.col("ingestion_ts").desc())

silver_prices_long = (
    silver_long
    .withColumn("row_num", F.row_number().over(w))
    .filter(F.col("row_num") == 1)
    .drop("row_num")
)

display(silver_prices_long.limit(10))

timestamp,date,open,high,low,close,volume,symbol,source,ingestion_ts
2023-12-18T00:00:00.000Z,2023-12-18,41348.2,42720.3,40530.26,42623.54,25224642008.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,37330.14,37393.45,37284.85,37306.02,292120000.0,DOWJONES,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,102.59,102.63,102.38,102.51,0.0,DXY,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,57074.21,57910.55,56772.14,57732.81,183221200.0,MEXICO,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,14814.02,14938.04,14811.82,14904.81,5866080000.0,NASDAQ,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,71.68,74.26,70.64,72.47,73941.0,PETROLEO,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,4725.58,4749.52,4725.58,4740.56,4060340000.0,SP500,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,3.92,3.97,3.91,3.95,0.0,TREASURY10Y,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-19T00:00:00.000Z,2023-12-19,42641.51,43354.3,41826.34,42270.53,23171001281.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-19T00:00:00.000Z,2023-12-19,37311.82,37562.83,37311.82,37557.92,272740000.0,DOWJONES,yahoo_finance_api,2025-12-18T15:24:38.178Z


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 5) Quality Gate — Silver LONG (mínimo)
# MAGIC Validamos:
# MAGIC - Schema obrigatório
# MAGIC - Nulls críticos (`timestamp`, `date`, `close`, `symbol`)
# MAGIC - Consistência básica (`high >= low`, preços não-negativos quando aplicável)
# MAGIC - Duplicidade por (`timestamp`, `symbol`) deve ser zero após dedup
# COMMAND ----------
required_long_cols = [
    "timestamp", "date", "open", "high", "low", "close", "volume",
    "symbol", "source", "ingestion_ts"
]
missing_long = [c for c in required_long_cols if c not in silver_prices_long.columns]
if missing_long:
    raise RuntimeError(f"[SILVER LONG - QUALITY GATE] Colunas ausentes: {missing_long}")

# Nulls críticos
crit = silver_prices_long.select(
    F.sum(F.col("timestamp").isNull().cast("int")).alias("null_timestamp"),
    F.sum(F.col("date").isNull().cast("int")).alias("null_date"),
    F.sum(F.col("close").isNull().cast("int")).alias("null_close"),
    F.sum(F.col("symbol").isNull().cast("int")).alias("null_symbol"),
).collect()[0]

if any([crit["null_timestamp"] > 0, crit["null_date"] > 0, crit["null_close"] > 0, crit["null_symbol"] > 0]):
    raise RuntimeError(
        "[SILVER LONG - QUALITY GATE] Nulls críticos: "
        f"timestamp={crit['null_timestamp']}, date={crit['null_date']}, "
        f"close={crit['null_close']}, symbol={crit['null_symbol']}"
    )

# Consistência OHLC
bad_hilo = silver_prices_long.filter(F.col("high") < F.col("low")).count()
if bad_hilo > 0:
    raise RuntimeError(f"[SILVER LONG - QUALITY GATE] high < low (qtde={bad_hilo})")

bad_negative = silver_prices_long.filter(
    (F.col("open") < 0) | (F.col("high") < 0) | (F.col("low") < 0) | (F.col("close") < 0)
).count()
if bad_negative > 0:
    raise RuntimeError(f"[SILVER LONG - QUALITY GATE] preços negativos (qtde={bad_negative})")

# Duplicidade (não deve existir após dedup)
dups = (
    silver_prices_long
    .groupBy("timestamp", "symbol")
    .count()
    .filter(F.col("count") > 1)
    .count()
)
if dups > 0:
    raise RuntimeError(f"[SILVER LONG - QUALITY GATE] duplicidade detectada pós-dedup (qtde={dups})")

print("[SILVER LONG - QUALITY GATE] OK — validações mínimas passaram.")
display(silver_prices_long.groupBy("symbol").count().orderBy("symbol"))

[SILVER LONG - QUALITY GATE] OK — validações mínimas passaram.


symbol,count
BITCOIN,732
DOWJONES,503
DXY,505
MEXICO,503
NASDAQ,503
PETROLEO,505
SP500,503
TREASURY10Y,503


In [0]:
# Databricks notebook source
# MAGIC %md
# MAGIC # Silver v6 — Upgrade Mestre (Long/Wide/Aligned/Returns)
# MAGIC
# MAGIC ## Objetivo
# MAGIC Evoluir a camada Silver para suportar análise quantitativa consistente:
# MAGIC - padronização e deduplicação (LONG)
# MAGIC - pivot por data (WIDE)
# MAGIC - alinhamento de sessão e preenchimento de lacunas (WIDE_ALIGNED)
# MAGIC - cálculo de retornos (RETURNS)
# MAGIC
# MAGIC ## Entradas
# MAGIC - `mvp_finance.bronze_prices_raw`
# MAGIC
# MAGIC ## Saídas
# MAGIC - `mvp_finance.silver_prices_long`
# MAGIC - `mvp_finance.silver_prices_wide`
# MAGIC - `mvp_finance.silver_prices_wide_aligned`
# MAGIC - `mvp_finance.silver_returns_wide`


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 1) Imports e contexto do database
# COMMAND ----------
import pyspark.sql.functions as F
from pyspark.sql.functions import col, to_date, to_timestamp, round
from pyspark.sql import Window

spark.sql("USE mvp_finance")


DataFrame[]

In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 2) Leitura da Bronze e inspeção rápida
# COMMAND ----------
bronze_df = spark.table("bronze_prices_raw")

print("Schema Bronze v5:")
bronze_df.printSchema()
display(bronze_df.limit(5))


Schema Bronze v5:
root
 |-- date_raw: timestamp (nullable = true)
 |-- open: double (nullable = true)
 |-- high: double (nullable = true)
 |-- low: double (nullable = true)
 |-- close: double (nullable = true)
 |-- volume: long (nullable = true)
 |-- symbol: string (nullable = true)
 |-- source: string (nullable = true)
 |-- ingestion_ts: timestamp (nullable = true)



date_raw,open,high,low,close,volume,symbol,source,ingestion_ts
2025-01-04T00:00:00.000Z,98106.9921875,98734.4296875,97562.9765625,98236.2265625,22342608078,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-05T00:00:00.000Z,98233.90625,98813.3046875,97291.765625,98314.9609375,20525254825,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-06T00:00:00.000Z,98314.953125,102482.875,97926.1484375,102078.0859375,51823432705,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-07T00:00:00.000Z,102248.8515625,102712.484375,96132.875,96922.703125,58685738547,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-08T00:00:00.000Z,96924.1640625,97258.3203125,92525.84375,95043.5234375,63875859171,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 3) Silver LONG — padronização, casts e limpeza mínima
# MAGIC - `date_raw` → `date` (DATE) e `timestamp` (TIMESTAMP)
# MAGIC - OHLCV em `double`
# MAGIC - remove linhas inválidas em campos críticos
# COMMAND ----------
silver_long = (
    bronze_df
    .withColumn("date", to_date(col("date_raw")))
    .withColumn("timestamp", to_timestamp(col("date_raw")))
    .select(
        "timestamp",
        "date",
        col("open").cast("double").alias("open"),
        col("high").cast("double").alias("high"),
        col("low").cast("double").alias("low"),
        col("close").cast("double").alias("close"),
        col("volume").cast("double").alias("volume"),
        "symbol",
        "source",
        "ingestion_ts"
    )
    .dropna(subset=["timestamp", "date", "close", "symbol"])
    .withColumn("open",   round(col("open"),   2))
    .withColumn("high",   round(col("high"),   2))
    .withColumn("low",    round(col("low"),    2))
    .withColumn("close",  round(col("close"),  2))
    .withColumn("volume", round(col("volume"), 0))
)

print("Schema Silver (long) antes do dedup:")
silver_long.printSchema()
display(silver_long.limit(10))


Schema Silver (long) antes do dedup:
root
 |-- timestamp: timestamp (nullable = true)
 |-- date: date (nullable = true)
 |-- open: double (nullable = true)
 |-- high: double (nullable = true)
 |-- low: double (nullable = true)
 |-- close: double (nullable = true)
 |-- volume: double (nullable = true)
 |-- symbol: string (nullable = true)
 |-- source: string (nullable = true)
 |-- ingestion_ts: timestamp (nullable = true)



timestamp,date,open,high,low,close,volume,symbol,source,ingestion_ts
2025-01-04T00:00:00.000Z,2025-01-04,98106.99,98734.43,97562.98,98236.23,22342608078.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-05T00:00:00.000Z,2025-01-05,98233.91,98813.3,97291.77,98314.96,20525254825.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-06T00:00:00.000Z,2025-01-06,98314.95,102482.88,97926.15,102078.09,51823432705.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-07T00:00:00.000Z,2025-01-07,102248.85,102712.48,96132.88,96922.7,58685738547.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-08T00:00:00.000Z,2025-01-08,96924.16,97258.32,92525.84,95043.52,63875859171.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-09T00:00:00.000Z,2025-01-09,95043.48,95349.72,91220.84,92484.04,62777261693.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-10T00:00:00.000Z,2025-01-10,92494.49,95770.61,92250.09,94701.45,62058693684.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-11T00:00:00.000Z,2025-01-11,94700.84,94977.69,93840.05,94566.59,18860894100.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-12T00:00:00.000Z,2025-01-12,94565.73,95367.54,93712.51,94488.44,20885130965.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2025-01-13T00:00:00.000Z,2025-01-13,94488.89,95837.0,89260.1,94516.52,72978998252.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 4) Deduplicação (LONG) — (timestamp, symbol)
# MAGIC Mantém o registro mais recente por `ingestion_ts`.
# COMMAND ----------
w = Window.partitionBy("timestamp", "symbol").orderBy(F.col("ingestion_ts").desc())

silver_prices_long = (
    silver_long
    .withColumn("row_num", F.row_number().over(w))
    .filter(F.col("row_num") == 1)
    .drop("row_num")
)

display(silver_prices_long.limit(10))


timestamp,date,open,high,low,close,volume,symbol,source,ingestion_ts
2023-12-18T00:00:00.000Z,2023-12-18,41348.2,42720.3,40530.26,42623.54,25224642008.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,37330.14,37393.45,37284.85,37306.02,292120000.0,DOWJONES,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,102.59,102.63,102.38,102.51,0.0,DXY,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,57074.21,57910.55,56772.14,57732.81,183221200.0,MEXICO,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,14814.02,14938.04,14811.82,14904.81,5866080000.0,NASDAQ,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,71.68,74.26,70.64,72.47,73941.0,PETROLEO,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,4725.58,4749.52,4725.58,4740.56,4060340000.0,SP500,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-18T00:00:00.000Z,2023-12-18,3.92,3.97,3.91,3.95,0.0,TREASURY10Y,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-19T00:00:00.000Z,2023-12-19,42641.51,43354.3,41826.34,42270.53,23171001281.0,BITCOIN,yahoo_finance_api,2025-12-18T15:24:38.178Z
2023-12-19T00:00:00.000Z,2023-12-19,37311.82,37562.83,37311.82,37557.92,272740000.0,DOWJONES,yahoo_finance_api,2025-12-18T15:24:38.178Z


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 5) Quality Gate — Silver LONG
# MAGIC Validações mínimas para garantir integridade da série:
# MAGIC - schema obrigatório
# MAGIC - nulls críticos
# MAGIC - consistência OHLC (high >= low)
# MAGIC - duplicidade pós-dedup deve ser zero
# COMMAND ----------
required_long_cols = [
    "timestamp","date","open","high","low","close","volume","symbol","source","ingestion_ts"
]
missing_long = [c for c in required_long_cols if c not in silver_prices_long.columns]
if missing_long:
    raise RuntimeError(f"[SILVER LONG] Colunas ausentes: {missing_long}")

crit = silver_prices_long.select(
    F.sum(F.col("timestamp").isNull().cast("int")).alias("null_timestamp"),
    F.sum(F.col("date").isNull().cast("int")).alias("null_date"),
    F.sum(F.col("close").isNull().cast("int")).alias("null_close"),
    F.sum(F.col("symbol").isNull().cast("int")).alias("null_symbol"),
).collect()[0]

if crit["null_timestamp"] or crit["null_date"] or crit["null_close"] or crit["null_symbol"]:
    raise RuntimeError(
        "[SILVER LONG] Nulls críticos: "
        f"timestamp={crit['null_timestamp']} date={crit['null_date']} "
        f"close={crit['null_close']} symbol={crit['null_symbol']}"
    )

bad_hilo = silver_prices_long.filter(F.col("high") < F.col("low")).count()
if bad_hilo > 0:
    raise RuntimeError(f"[SILVER LONG] high < low (qtde={bad_hilo})")

dups = (
    silver_prices_long.groupBy("timestamp","symbol").count().filter(F.col("count") > 1).count()
)
if dups > 0:
    raise RuntimeError(f"[SILVER LONG] duplicidade pós-dedup (qtde={dups})")

print("[SILVER LONG] OK")
display(silver_prices_long.groupBy("symbol").count().orderBy("symbol"))


[SILVER LONG] OK


symbol,count
BITCOIN,732
DOWJONES,503
DXY,505
MEXICO,503
NASDAQ,503
PETROLEO,505
SP500,503
TREASURY10Y,503


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 6) Persistência — Silver LONG
# COMMAND ----------
silver_prices_long.write.format("delta").mode("overwrite").saveAsTable("silver_prices_long")
print("✅ silver_prices_long criada")


✅ silver_prices_long criada


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 7) Silver WIDE — pivot por data (trade_date)
# MAGIC - 1 linha por dia
# MAGIC - colunas com `close` por símbolo
# MAGIC - renomeia para padrão analítico
# COMMAND ----------
wide_raw = (
    silver_prices_long
    .groupBy("date")
    .pivot("symbol")
    .agg(F.first("close"))
)

# renomeia as colunas (dinâmico, mas padronizado)
rename_map = {}
for c in wide_raw.columns:
    if c == "date":
        continue
    cu = str(c).upper()
    if cu.startswith("IBOV"):
        rename_map[c] = "ibov_close"
    elif cu.startswith("SP500"):
        rename_map[c] = "sp500_close"
    elif cu.startswith("DXY"):
        rename_map[c] = "dxy_close"

silver_prices_wide = wide_raw
for old, new in rename_map.items():
    silver_prices_wide = silver_prices_wide.withColumnRenamed(old, new)

silver_prices_wide = silver_prices_wide.withColumnRenamed("date", "trade_date")

print("Schema Silver wide:")
silver_prices_wide.printSchema()
display(silver_prices_wide.orderBy("trade_date").limit(10))


Schema Silver wide:
root
 |-- trade_date: date (nullable = true)
 |-- BITCOIN: double (nullable = true)
 |-- DOWJONES: double (nullable = true)
 |-- dxy_close: double (nullable = true)
 |-- MEXICO: double (nullable = true)
 |-- NASDAQ: double (nullable = true)
 |-- PETROLEO: double (nullable = true)
 |-- sp500_close: double (nullable = true)
 |-- TREASURY10Y: double (nullable = true)



trade_date,BITCOIN,DOWJONES,dxy_close,MEXICO,NASDAQ,PETROLEO,sp500_close,TREASURY10Y
2023-12-18,42623.54,37306.02,102.51,57732.81,14904.81,72.47,4740.56,3.95
2023-12-19,42270.53,37557.92,102.17,57694.34,15003.22,73.44,4768.37,3.92
2023-12-20,43652.25,37082.0,102.41,56909.37,14777.94,74.22,4698.35,3.88
2023-12-21,43869.15,37404.35,101.84,57487.7,14963.87,73.89,4746.75,3.89
2023-12-22,43997.9,37385.97,101.7,57313.47,14992.97,73.56,4754.63,3.9
2023-12-23,43739.54,,,,,,,
2023-12-24,43016.12,,,,,,,
2023-12-25,43613.14,,,,,,,
2023-12-26,42520.4,37545.33,101.47,57745.79,15074.57,75.57,4774.75,3.89
2023-12-27,43442.86,37656.52,100.99,57554.47,15099.18,74.11,4781.58,3.79


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 8) Quality Gate — Silver WIDE (chave diária)
# MAGIC - `trade_date` não nulo
# MAGIC - 1 linha por dia (sem duplicidade)
# COMMAND ----------
if "trade_date" not in silver_prices_wide.columns:
    raise RuntimeError("[SILVER WIDE] trade_date ausente")

null_td = silver_prices_wide.filter(F.col("trade_date").isNull()).count()
if null_td > 0:
    raise RuntimeError(f"[SILVER WIDE] trade_date NULL (qtde={null_td})")

dups_td = silver_prices_wide.groupBy("trade_date").count().filter(F.col("count") > 1).count()
if dups_td > 0:
    raise RuntimeError(f"[SILVER WIDE] duplicidade por trade_date (qtde={dups_td})")

print("[SILVER WIDE] OK")


[SILVER WIDE] OK


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 9) Persistência — Silver WIDE
# COMMAND ----------
spark.sql("DROP TABLE IF EXISTS silver_prices_wide")
silver_prices_wide.write.format("delta").mode("overwrite").saveAsTable("silver_prices_wide")
print("✅ silver_prices_wide criada")


✅ silver_prices_wide criada


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 10) Silver WIDE_ALIGNED — calendário + preenchimento de lacunas
# MAGIC Esta é a melhoria principal para análise:
# MAGIC - Criamos um calendário diário contínuo entre min e max
# MAGIC - Fazemos `left join` para inserir dias faltantes
# MAGIC - Aplicamos **forward-fill** nos closes (último valor conhecido)
# MAGIC
# MAGIC **Por que isso é importante?**
# MAGIC - Correlação/retornos exigem séries alinhadas
# MAGIC - Mercados têm feriados diferentes; sem alinhamento, a matriz fica quebrada
# COMMAND ----------
minmax = silver_prices_wide.select(
    F.min("trade_date").alias("min_date"),
    F.max("trade_date").alias("max_date")
).collect()[0]

min_date = minmax["min_date"]
max_date = minmax["max_date"]

calendar_df = (
    spark.sql(f"SELECT explode(sequence(to_date('{min_date}'), to_date('{max_date}'), interval 1 day)) AS trade_date")
)

wide_cal = (
    calendar_df
    .join(silver_prices_wide, on="trade_date", how="left")
    .orderBy("trade_date")
)

# forward-fill por coluna (window acumulada)
from pyspark.sql import functions as F
from pyspark.sql.window import Window

w_ffill = (
    Window
    .partitionBy(F.lit(1))  # partição única intencional (remove o warning)
    .orderBy("trade_date")
    .rowsBetween(Window.unboundedPreceding, 0)
)

for c in ["ibov_close", "sp500_close", "dxy_close"]:
    if c in wide_cal.columns:
        wide_cal = wide_cal.withColumn(
            c,
            F.last(F.col(c), ignorenulls=True).over(w_ffill)
        )

silver_prices_wide_aligned = wide_cal

display(silver_prices_wide_aligned.limit(15))


trade_date,BITCOIN,DOWJONES,dxy_close,MEXICO,NASDAQ,PETROLEO,sp500_close,TREASURY10Y
2023-12-18,42623.54,37306.02,102.51,57732.81,14904.81,72.47,4740.56,3.95
2023-12-19,42270.53,37557.92,102.17,57694.34,15003.22,73.44,4768.37,3.92
2023-12-20,43652.25,37082.0,102.41,56909.37,14777.94,74.22,4698.35,3.88
2023-12-21,43869.15,37404.35,101.84,57487.7,14963.87,73.89,4746.75,3.89
2023-12-22,43997.9,37385.97,101.7,57313.47,14992.97,73.56,4754.63,3.9
2023-12-23,43739.54,,101.7,,,,4754.63,
2023-12-24,43016.12,,101.7,,,,4754.63,
2023-12-25,43613.14,,101.7,,,,4754.63,
2023-12-26,42520.4,37545.33,101.47,57745.79,15074.57,75.57,4774.75,3.89
2023-12-27,43442.86,37656.52,100.99,57554.47,15099.18,74.11,4781.58,3.79


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 11) Alinhamento de sessão (opcional, mas “nível mestre”)
# MAGIC Como os ativos globais fecham em horários diferentes do mercado BR,
# MAGIC uma abordagem comum é usar **SP500/DXY do dia anterior** como “informação disponível”
# MAGIC para o pregão brasileiro do dia atual.
# MAGIC
# MAGIC Criamos colunas *_prev (shift de 1 dia) para globais.
# COMMAND ----------
# COMMAND ----------
from pyspark.sql import Window
import pyspark.sql.functions as F

w_shift = Window.orderBy("trade_date")

# Detecta colunas de preço (tudo que não é trade_date e não é coluna derivada)
base_cols = [c for c in silver_prices_wide_aligned.columns if c != "trade_date"]

# Cria *_prev para todas
df = silver_prices_wide_aligned
for c in base_cols:
    # evita recriar se já existir
    prev_col = f"{c}_prev"
    if prev_col not in df.columns:
        df = df.withColumn(prev_col, F.lag(F.col(c), 1).over(w_shift))

silver_prices_wide_aligned = df

# Exibe um recorte (escolhe algumas colunas automaticamente)
sample_cols = ["trade_date"] + base_cols[:6] + [f"{c}_prev" for c in base_cols[:6]]
display(silver_prices_wide_aligned.select(*sample_cols).limit(15))





trade_date,BITCOIN,DOWJONES,dxy_close,MEXICO,NASDAQ,PETROLEO,BITCOIN_prev,DOWJONES_prev,dxy_close_prev,MEXICO_prev,NASDAQ_prev,PETROLEO_prev
2023-12-18,42623.54,37306.02,102.51,57732.81,14904.81,72.47,,,,,,
2023-12-19,42270.53,37557.92,102.17,57694.34,15003.22,73.44,42623.54,37306.02,102.51,57732.81,14904.81,72.47
2023-12-20,43652.25,37082.0,102.41,56909.37,14777.94,74.22,42270.53,37557.92,102.17,57694.34,15003.22,73.44
2023-12-21,43869.15,37404.35,101.84,57487.7,14963.87,73.89,43652.25,37082.0,102.41,56909.37,14777.94,74.22
2023-12-22,43997.9,37385.97,101.7,57313.47,14992.97,73.56,43869.15,37404.35,101.84,57487.7,14963.87,73.89
2023-12-23,43739.54,,101.7,,,,43997.9,37385.97,101.7,57313.47,14992.97,73.56
2023-12-24,43016.12,,101.7,,,,43739.54,,101.7,,,
2023-12-25,43613.14,,101.7,,,,43016.12,,101.7,,,
2023-12-26,42520.4,37545.33,101.47,57745.79,15074.57,75.57,43613.14,,101.7,,,
2023-12-27,43442.86,37656.52,100.99,57554.47,15099.18,74.11,42520.4,37545.33,101.47,57745.79,15074.57,75.57


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 12) Quality Gate — WIDE_ALIGNED (genérico)
# MAGIC - `trade_date` sem NULL
# MAGIC - Reporta NULLs por coluna (após forward-fill)
# MAGIC - Permite NULLs no início (até o 1º valor existir) e informa quantos são
# MAGIC
# MAGIC Observação:
# MAGIC - Aqui assumimos que as colunas dos ativos são todas as colunas exceto `trade_date`
# MAGIC - Colunas derivadas como `*_prev` podem ser excluídas do gate (opcional)
# COMMAND ----------
import pyspark.sql.functions as F

# 1) trade_date obrigatório e não-nulo
if "trade_date" not in silver_prices_wide_aligned.columns:
    raise RuntimeError("[SILVER WIDE_ALIGNED] Coluna trade_date não existe.")

null_trade_date = silver_prices_wide_aligned.filter(F.col("trade_date").isNull()).count()
if null_trade_date > 0:
    raise RuntimeError(f"[SILVER WIDE_ALIGNED] trade_date NULL (qtde={null_trade_date})")

# 2) Define quais colunas entram no gate
#    - por padrão: todas exceto trade_date
#    - opcional: excluir colunas derivadas *_prev (geralmente são esperadas ter NULL no início)
asset_cols = [
    c for c in silver_prices_wide_aligned.columns
    if c != "trade_date" and not c.endswith("_prev")
]

if not asset_cols:
    raise RuntimeError("[SILVER WIDE_ALIGNED] Nenhuma coluna de ativo encontrada para validação.")

# 3) Calcula NULLs por coluna (sum(isNull))
null_exprs = [
    F.sum(F.col(c).isNull().cast("int")).alias(f"null_{c}")
    for c in asset_cols
]

nulls_row = silver_prices_wide_aligned.select(*null_exprs).collect()[0].asDict()

# 4) Imprime um resumo enxuto (ordenado do pior para o melhor)
nulls_sorted = sorted(nulls_row.items(), key=lambda kv: kv[1], reverse=True)

print("[SILVER WIDE_ALIGNED] Nulls após ffill (top 20 colunas):")
for k, v in nulls_sorted[:20]:
    print(f"  - {k}: {v}")

# 5) (Opcional) Gate mais rígido:
#    exigir que, após o 1º ponto válido, a série não tenha mais NULLs.
#    Aqui deixamos apenas como diagnóstico para não quebrar casos com ativos muito curtos.
#
#    Se você quiser ativar o modo rígido depois, eu te passo a versão que calcula
#    o "primeiro trade_date válido" por coluna e valida NULLs a partir dali.
print("[SILVER WIDE_ALIGNED] OK (modo informativo).")




[SILVER WIDE_ALIGNED] Nulls após ffill (top 20 colunas):
  - null_DOWJONES: 229
  - null_MEXICO: 229
  - null_NASDAQ: 229
  - null_TREASURY10Y: 229
  - null_PETROLEO: 227
  - null_BITCOIN: 0
  - null_dxy_close: 0
  - null_sp500_close: 0
[SILVER WIDE_ALIGNED] OK (modo informativo).


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 13) Silver RETURNS — retornos diários (%) (genérico)
# MAGIC Calculamos retornos percentuais para uso na Gold e validação de hipóteses.
# MAGIC
# MAGIC **Definição**
# MAGIC - `X_prev = lag(X, 1)`
# MAGIC - `X_ret  = (X / X_prev) - 1`
# MAGIC
# MAGIC **Por que retornos**
# MAGIC - Retornos são mais adequados para correlação e modelagem, pois preços não são estacionários.
# COMMAND ----------
import pyspark.sql.functions as F
from pyspark.sql.window import Window

# Janela temporal (garante ordenação por trade_date)
w_shift = Window.orderBy("trade_date")

df = silver_prices_wide_aligned

# Colunas base (ativos): tudo exceto trade_date e colunas já derivadas
asset_cols = [
    c for c in df.columns
    if c != "trade_date" and not c.endswith("_prev") and not c.endswith("_ret")
]

if not asset_cols:
    raise RuntimeError("[SILVER RETURNS] Nenhuma coluna de ativo encontrada.")

# 1) cria *_prev (lag 1)
for c in asset_cols:
    prev_col = f"{c}_prev"
    if prev_col not in df.columns:
        df = df.withColumn(prev_col, F.lag(F.col(c), 1).over(w_shift))

# 2) cria *_ret
for c in asset_cols:
    prev_col = f"{c}_prev"
    ret_col  = f"{c}_ret"
    if ret_col not in df.columns:
        df = df.withColumn(ret_col, (F.col(c) / F.col(prev_col)) - F.lit(1))

# 3) seleciona colunas finais (organizadas)
price_cols = asset_cols
prev_cols  = [f"{c}_prev" for c in asset_cols]
ret_cols   = [f"{c}_ret" for c in asset_cols]

silver_returns_wide = df.select(
    "trade_date",
    *price_cols,
    *prev_cols,
    *ret_cols
)

display(silver_returns_wide.orderBy("trade_date").limit(20))




trade_date,BITCOIN,DOWJONES,dxy_close,MEXICO,NASDAQ,PETROLEO,sp500_close,TREASURY10Y,BITCOIN_prev,DOWJONES_prev,dxy_close_prev,MEXICO_prev,NASDAQ_prev,PETROLEO_prev,sp500_close_prev,TREASURY10Y_prev,BITCOIN_ret,DOWJONES_ret,dxy_close_ret,MEXICO_ret,NASDAQ_ret,PETROLEO_ret,sp500_close_ret,TREASURY10Y_ret
2023-12-18,42623.54,37306.02,102.51,57732.81,14904.81,72.47,4740.56,3.95,,,,,,,,,,,,,,,,
2023-12-19,42270.53,37557.92,102.17,57694.34,15003.22,73.44,4768.37,3.92,42623.54,37306.02,102.51,57732.81,14904.81,72.47,4740.56,3.95,-0.0082820432089874,0.0067522614312649,-0.0033167495854062,-0.000666345532115975,0.0066025665540183,0.0133848489029944,0.0058663955313296,-0.0075949367088608
2023-12-20,43652.25,37082.0,102.41,56909.37,14777.94,74.22,4698.35,3.88,42270.53,37557.92,102.17,57694.34,15003.22,73.44,4768.37,3.92,0.0326875485119302,-0.0126716282477836,0.0023490261329157,-0.0136056673843568,-0.0150154433514938,0.0106209150326797,-0.014684263175886,-0.010204081632653
2023-12-21,43869.15,37404.35,101.84,57487.7,14963.87,73.89,4746.75,3.89,43652.25,37082.0,102.41,56909.37,14777.94,74.22,4698.35,3.88,0.0049688160404102,0.0086928968232564,-0.0055658627087198,0.0101622984053415,0.0125815912096003,-0.0044462409054163,0.0103014888205432,0.0025773195876288
2023-12-22,43997.9,37385.97,101.7,57313.47,14992.97,73.56,4754.63,3.9,43869.15,37404.35,101.84,57487.7,14963.87,73.89,4746.75,3.89,0.0029348642497062,-0.0004913866970017722,-0.001374705420267,-0.0030307352703272,0.0019446840957586,-0.0044660982541615,0.0016600832148312,0.0025706940874035
2023-12-23,43739.54,,101.7,,,,4754.63,,43997.9,37385.97,101.7,57313.47,14992.97,73.56,4754.63,3.9,-0.0058720984410619,,0.0,,,,0.0,
2023-12-24,43016.12,,101.7,,,,4754.63,,43739.54,,101.7,,,,4754.63,,-0.0165392685885584,,0.0,,,,0.0,
2023-12-25,43613.14,,101.7,,,,4754.63,,43016.12,,101.7,,,,4754.63,,0.013878983041706,,0.0,,,,0.0,
2023-12-26,42520.4,37545.33,101.47,57745.79,15074.57,75.57,4774.75,3.89,43613.14,,101.7,,,,4754.63,,-0.0250552929690455,,-0.0022615535889872,,,,0.0042316647141837,
2023-12-27,43442.86,37656.52,100.99,57554.47,15099.18,74.11,4781.58,3.79,42520.4,37545.33,101.47,57745.79,15074.57,75.57,4774.75,3.89,0.021694527803125,0.0029614868213967,-0.004730462205578,-0.0033131419623837,0.0016325507128893,-0.0193198359137223,0.0014304413843657,-0.025706940874036


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ### 13.1) Quality Gate — RETURNS (sanity)
# MAGIC - `trade_date` não nulo
# MAGIC - Retornos em faixa conservadora (ex.: ±50%) para evitar outliers por dados ruins
# COMMAND ----------
null_td = silver_returns_wide.filter(F.col("trade_date").isNull()).count()
if null_td > 0:
    raise RuntimeError(f"[SILVER RETURNS] trade_date NULL (qtde={null_td})")

abs_limit = 0.50
ret_cols = [c for c in silver_returns_wide.columns if c.endswith("_ret")]

# evita quebrar se alguns ativos estiverem muito curtos (ret pode ser null no início)
bad = (
    silver_returns_wide
    .dropna(subset=ret_cols)
    .filter(F.greatest(*[F.abs(F.col(c)) for c in ret_cols]) > abs_limit)
    .count()
)

if bad > 0:
    display(
        silver_returns_wide
        .dropna(subset=ret_cols)
        .filter(F.greatest(*[F.abs(F.col(c)) for c in ret_cols]) > abs_limit)
        .select("trade_date", *ret_cols)
        .orderBy("trade_date")
        .limit(50)
    )
    raise RuntimeError(f"[SILVER RETURNS] retornos fora do limite ±{abs_limit*100:.0f}% (qtde={bad})")

print("[SILVER RETURNS] OK")

spark.sql("DROP TABLE IF EXISTS silver_returns_wide")
silver_returns_wide.write.format("delta").mode("overwrite").saveAsTable("silver_returns_wide")
print("✅ silver_returns_wide criada")





[SILVER RETURNS] OK
✅ silver_returns_wide criada


In [0]:
# COMMAND ----------
import pyspark.sql.functions as F

# identifica colunas de retorno
ret_cols = [c for c in silver_returns_wide.columns if c.endswith("_ret")]

if not ret_cols:
    raise RuntimeError("[SILVER RETURNS] Nenhuma coluna *_ret encontrada para validação.")

# remove linhas iniciais (onde todos os returns podem estar NULL)
# (mantemos apenas linhas que tenham pelo menos 1 retorno não nulo)
ret_clean = silver_returns_wide.filter(
    F.greatest(*[F.col(c).isNotNull().cast("int") for c in ret_cols]) == 1
)

# 1) finitude: NaN (Spark tem isnan para double/float)
nan_expr = None
for c in ret_cols:
    e = F.sum(F.isnan(F.col(c)).cast("int")).alias(f"nan_{c}")
    nan_expr = e if nan_expr is None else nan_expr  # placeholder (vamos fazer select abaixo)

nan_counts = (
    ret_clean
    .select(*[F.sum(F.isnan(F.col(c)).cast("int")).alias(c) for c in ret_cols])
    .collect()[0]
    .asDict()
)

total_nan = sum(nan_counts.values())
if total_nan > 0:
    print("[SILVER RETURNS] NaNs detectados (por coluna):")
    for k, v in sorted(nan_counts.items(), key=lambda kv: kv[1], reverse=True):
        if v > 0:
            print(f"  - {k}: {v}")
    raise RuntimeError(f"[SILVER RETURNS] Existem NaNs em retornos (total={total_nan}).")

# 2) finitude: infinitos (aproximação robusta: abs(ret) muito grande)
#    Como Infinity pode aparecer em divisão por zero, tratamos com:
#    - prev == 0 => ret explode
#    - ou ret absurdamente alto
#    Nota: retornos percentuais reais não deveriam ter magnitude gigantesca.
INF_GUARD = 10.0  # 1000% como guarda para detectar inf/erro grotesco

inf_like = ret_clean.filter(
    F.greatest(*[F.abs(F.col(c)) for c in ret_cols]) > F.lit(INF_GUARD)
)

inf_cnt = inf_like.count()
if inf_cnt > 0:
    display(inf_like.select("trade_date", *ret_cols).orderBy("trade_date"))
    raise RuntimeError(f"[SILVER RETURNS] Retornos com magnitude > {INF_GUARD} (possível Inf/divisão por zero). qtde={inf_cnt}")

# 3) sanity: retornos fora do limite conservador
abs_limit = 0.50  # ajuste conforme universo (ex.: cripto pode exigir maior)

bad = ret_clean.filter(
    F.greatest(*[F.abs(F.col(c)) for c in ret_cols]) > F.lit(abs_limit)
)

bad_cnt = bad.count()
if bad_cnt > 0:
    # mostra os maiores outliers (ordena pelo maior retorno absoluto entre colunas)
    bad_ranked = bad.withColumn(
        "_max_abs_ret",
        F.greatest(*[F.abs(F.col(c)) for c in ret_cols])
    ).orderBy(F.col("_max_abs_ret").desc())

    display(bad_ranked.select("trade_date", "_max_abs_ret", *ret_cols).limit(200))
    raise RuntimeError(f"[SILVER RETURNS] Retornos fora do limite conservador ±{abs_limit*100:.0f}% (qtde={bad_cnt})")

print("[SILVER RETURNS] OK — sanity check passou.")




[SILVER RETURNS] OK — sanity check passou.


In [0]:
# COMMAND ----------
# MAGIC %md
# MAGIC ## 15) Persistência — WIDE_ALIGNED e RETURNS
# COMMAND ----------
spark.sql("DROP TABLE IF EXISTS silver_prices_wide_aligned")
silver_prices_wide_aligned.write.format("delta").mode("overwrite").saveAsTable("silver_prices_wide_aligned")
print("✅ silver_prices_wide_aligned criada")

spark.sql("DROP TABLE IF EXISTS silver_returns_wide")
silver_returns_wide.write.format("delta").mode("overwrite").saveAsTable("silver_returns_wide")
print("✅ silver_returns_wide criada")

spark.sql("SELECT COUNT(*) FROM silver_prices_wide_aligned").show()
spark.sql("SELECT COUNT(*) FROM silver_returns_wide").show()




✅ silver_prices_wide_aligned criada
✅ silver_returns_wide criada
+--------+
|COUNT(*)|
+--------+
|     732|
+--------+

+--------+
|COUNT(*)|
+--------+
|     732|
+--------+



##Conclusão — Camada Silver

A camada **Silver** consolida os dados provenientes da Bronze em um conjunto **limpo, consistente e estatisticamente utilizável**, estabelecendo a base correta para análises quantitativas e construção da camada Gold.

### Principais entregas desta camada
  - Padronização completa de tipos e datas (`date`, `timestamp`)
  - Remoção de duplicidades por `(timestamp, symbol)` com critério determinístico (`ingestion_ts`)
  - Construção do formato **Long** (`silver_prices_long`), adequado para:
  - cálculo de indicadores técnicos
  - janelas temporais por ativo
  - Construção do formato **Wide** (`silver_prices_wide`), com:
  - 1 linha por dia (`trade_date`)
  - colunas por ativo (closes)
  - Alinhamento de calendário entre mercados com:
  - calendário contínuo
  - forward-fill controlado
  - Geração de **retornos diários percentuais**, garantindo séries estacionárias para:
  - correlação
  - modelagem estatística
  - machine learning

### Qualidade e rigor analítico
Foram aplicados **Quality Gates explícitos**, assegurando que:
  - chaves temporais não contenham valores nulos
  - não existam duplicidades após deduplicação
  - preços respeitem consistência básica (`high ≥ low`)
  - retornos extremos sejam detectados como possíveis problemas de dado

NULLs remanescentes são **intencionais e esperados** apenas:
  - no início das séries (aquecimento de janelas)
  - antes da existência do primeiro valor válido após forward-fill

### Resultado final
Ao final desta etapa, os dados encontram-se:
  - alinhados temporalmente entre ativos locais e globais
  - com estrutura adequada para correlação estática e dinâmica
  - prontos para cálculo de features avançadas na camada Gold

Esta camada estabelece a **fronteira clara entre tratamento de dados e análise**, permitindo que a Gold opere com foco exclusivo em **insights, sinais e modelagem**, sem necessidade de correções estruturais adicionais.

