# Silver Layer Processing for Stock Quotes


In [1]:
# In this notebook, we’re taking raw JSON files from the Bronze layer and giving them structure.
# First, we apply a schema to make the data usable, then extract the stock symbol from the folder path.
# After cleaning things up, we save the result as a structured Delta table in the Silver layer.
# Let’s kick things off by loading the raw bronze data so we can start tidying it up.

from pyspark.sql.functions import col, from_unixtime, input_file_name, regexp_extract
from pyspark.sql.types import DecimalType, TimestampType

# We’re pointing to the raw Bronze data here.
# The '*' wildcard lets Spark scoop up files from all subfolders—handy when the data’s scattered.

bronze_path = "Files/bronze/trades/stock_symbol=*/*/*/*/*.json"

print(f"Reading raw JSON data from: {bronze_path}")

try:
# Step 1: Let’s load the raw JSON files from the Bronze layer into a Spark DataFrame.
# This gives us a starting point to clean and shape the data before moving it up the pipeline.
    df_bronze = spark.read.option("multiLine", "true").json(bronze_path)

# Step 2: We’re pulling the stock symbol straight from the folder path and adding it as a new column.
# It’s a neat trick—super common in data engineering—and helps us tag each record with its source.
    df_with_symbol = df_bronze.withColumn("stock_symbol", regexp_extract(input_file_name(), "stock_symbol=([^/]+)", 1))

# Step 3: Time to shape the data—let’s pick the columns we need, rename them for clarity,
# and make sure each one has the right data type. This sets us up for smooth sailing downstream.
    df_silver = df_with_symbol.select(
        col("stock_symbol"),
        col("c").alias("price_current").cast(DecimalType(10, 2)),
        col("d").alias("price_change").cast(DecimalType(10, 2)),
        col("dp").alias("price_percent_change").cast(DecimalType(10, 4)),
        col("h").alias("price_high_day").cast(DecimalType(10, 2)),
        col("l").alias("price_low_day").cast(DecimalType(10, 2)),
        col("o").alias("price_open_day").cast(DecimalType(10, 2)),
        col("pc").alias("price_previous_close").cast(DecimalType(10, 2)),
# Step 4: Let’s convert the Unix epoch timestamp into a readable datetime format.
# This makes it easier to work with time-based data—like filtering by date or plotting trends.
        from_unixtime(col("t")).cast(TimestampType()).alias("timestamp_utc")
    )
    
    print("Transformations complete. Showing a preview of the clean data:")
    df_silver.show(5, truncate=False)

# Step 5: Let’s write our cleaned DataFrame to the Silver Delta table.
# We’re using 'append' mode so each run adds fresh data without overwriting what’s already there.
# Partitioning by symbol helps speed up queries later—especially when we’re slicing by stock.

    table_name = "silver_stock_trades"
    df_silver.write.format("delta").mode("append").partitionBy("stock_symbol").saveAsTable(table_name)
    
    print(f"SUCCESS: Cleaned data has been appended to the Silver table: '{table_name}'")

except Exception as e:
    print(f"An error occurred: {e}")


StatementMeta(, 83f698a3-2799-4b99-b4a6-6a5d28fe9384, 3, Finished, Available, Finished)

Reading raw JSON data from: Files/bronze/trades/stock_symbol=*/*/*/*/*.json
Transformations complete. Showing a preview of the clean data:
+------------+-------------+------------+--------------------+--------------+-------------+--------------+--------------------+-------------------+
|stock_symbol|price_current|price_change|price_percent_change|price_high_day|price_low_day|price_open_day|price_previous_close|timestamp_utc      |
+------------+-------------+------------+--------------------+--------------+-------------+--------------+--------------------+-------------------+
|AAPL        |230.56       |-0.33       |-0.1429             |232.87        |229.35       |231.28        |230.89              |2025-08-19 20:00:00|
|AAPL        |227.36       |2.46        |1.0937              |228.72        |225.35       |225.35        |224.90              |2025-08-22 16:09:54|
+------------+-------------+------------+--------------------+--------------+-------------+--------------+---------------