# Silver Layer â€“ Data Transformation Pipeline

**Purpose:**

This notebook orchestrates the Silver layer transformations for UK and US economic and energy datasets in the Oil Market Analytics project. It applies business rules, cleans, standardises, and enriches the raw Bronze data to produce analytics-ready Silver tables.

**Scope:**

- Energy price data (WTI, Brent, Natural Gas)
- Equity indices (S&P 500, FTSE 100)
- Currency index (US Dollar Index / DXY)
- Macroeconomic indicators (Interest Rates, GDP, CPI, Unemployment)


**Process Overview:**


**Load:**

Reads raw datasets from the Bronze Delta tables under the oil_analytics schema.


**Transform:**

Applies dataset-specific transformation functions to:

- Standardise column names and data types
- Handle missing or invalid values
- Remove duplicates and enforce schema consistency
- Enrich data where necessary (e.g., derived metrics, aggregations)


**Validate:** 

Checks row counts and schema consistency for all Silver tables.


**Output:**

Writes cleaned, analytics-ready datasets to Silver Delta tables under the oil_analytics schema.

Examples of Silver tables generated:

- `oil_analytics.silver_energy_prices`
- `oil_analytics.silver_sp500`
- `oil_analytics.silver_ftse100`
- `oil_analytics.silver_dollar_index`
- `oil_analytics.silver_uk_unemployment`
- `oil_analytics.silver_fed_unemployment`
- `oil_analytics.silver_uk_cpi`
- `oil_analytics.silver_fed_cpi`
- `oil_analytics.silver_uk_gdp`
- `oil_analytics.silver_fed_gdp`
- `oil_analytics.silver_uk_interest_rate`
- `oil_analytics.silver_fed_interest_rate`

### Setup and Imports

In [0]:
dbutils.library.restartPython()

In [0]:
from pyspark.sql.functions import col, trim, split, to_date, desc, rlike, round, format_number

In [0]:
from src.transforms.silver.silver_energy_prices import generate_silver_energy_price_table
from src.transforms.silver.silver_index import generate_silver_index_tables
from src.transforms.silver.silver_macro import generate_silver_macro_tables

### Generate Silver Tables

In [0]:
generate_silver_energy_price_table(spark)

In [0]:
generate_silver_index_tables(spark)

In [0]:
generate_silver_macro_tables(spark)

### Data Analysis

In [0]:
energy_price_df = spark.table("oil_analytics.silver_energy_prices")
sp500_df = spark.table("oil_analytics.silver_sp500")
dollar_index_df = spark.table("oil_analytics.silver_dollar_index")

In [0]:
energy_price_df.show(20)
energy_price_df.printSchema()

In [0]:
sp500_df.show(20)
sp500_df.printSchema()

In [0]:
dollar_index_df.show(20)
dollar_index_df.printSchema()

In [0]:
schema = "oil_analytics"
tables = [
    'silver_energy_prices',
    'silver_sp500',
    'silver_ftse100',
    'silver_dollar_index',
    'silver_uk_unemployment',
    'silver_fed_unemployment',
    'silver_uk_cpi',
    'silver_fed_cpi',
    'silver_uk_gdp',
    'silver_fed_gdp',
    'silver_uk_interest_rate',
    'silver_fed_interest_rate'
    ]

for table in tables:
    df = spark.table(f"{schema}.{table}")
    print(f"Table: {table} | Row Count: {df.count()}")
    df.printSchema()