
In Databricks, it is very common to use Python for the complicated API work (Bronze) and then switch to SQL for the transformations (Silver/Gold) because SQL is incredibly powerful for data manipulation and easy to read.


In [0]:
%sql

-- 1. Create the Silver Table structure (if it doesn't exist yet)
CREATE TABLE IF NOT EXISTS silver_crypto_prices (
  coin_id STRING,
  event_time TIMESTAMP,
  date DATE,
  price_usd DOUBLE,
  ingest_date TIMESTAMP
)
USING DELTA;

-- 2. Perform the Upsert (Merge)
-- This logic says: "If the record exists, update it. If not, insert it."
MERGE INTO silver_crypto_prices AS target
USING (
  SELECT 
    coin_id,
    event_time,
    date,
    price_usd,
    current_timestamp() as ingest_date
  FROM bronze_crypto_prices
  WHERE price_usd IS NOT NULL  -- Simple Data Quality Check
) AS source
ON target.coin_id = source.coin_id AND target.event_time = source.event_time
WHEN MATCHED THEN
  UPDATE SET target.price_usd = source.price_usd, target.ingest_date = source.ingest_date
WHEN NOT MATCHED THEN
  INSERT *;

-- 3. Verify
SELECT * FROM silver_crypto_prices ORDER BY event_time DESC LIMIT 5;


Goal: Ensure high data quality. We will use a MERGE (Upsert) operation. This ensures that if you run the pipeline multiple times, you don't get duplicate rows for the same timestamp.