# Energy Demand ETL Project Setup

## Why I Set Up This Notebook
I want to make it easy to test my energy demand ETL project before moving it to production. By having a clear setup, I can avoid mistakes and keep my work organized.

## How I Handle Environments
Because I need to switch between development and production, I use a widget to pick the environment. This lets me run everything in 'dev' mode for testing, and then switch to 'prod' when I am ready. It helps me keep test data separate from real data. Even though I used the real data in dev environment because it was small and faster that way, it was still a way for me to learn and see whether I can implement different evironment when I have to.

## Dynamic Naming and Storage
I set the catalog, schema, and volume names based on the environment I choose. This means all my tables and storage paths are created automatically for either 'dev' or 'prod'. I also define paths for landing data, checkpoints, and the raw data source, so everything is easy to find.

## What This Setup Achieves
With this setup, I can:
* Quickly switch between environments
* Keep my data organized and separated
* Create all the necessary databases and storage locations
* Print out the current environment and storage path to check everything is correct

This makes it simple for me to manage resources and avoid mixing up test and production data.

In [0]:
# init widget
dbutils.widgets.dropdown("env", "dev", ["dev", "prod"])
env = dbutils.widgets.get("env")

# constant
CATALOG_NAME = "energy_demand_catalog"
SCHEMA_NAME = env  # 'dev' or 'prod'
VOLUME_NAME = "energy_demand_etl_vol"

# create catalogs

spark.sql(f"CREATE CATALOG IF NOT EXISTS {CATALOG_NAME}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {CATALOG_NAME}.{SCHEMA_NAME}")
spark.sql(f"CREATE VOLUME IF NOT EXISTS {CATALOG_NAME}.{SCHEMA_NAME}.{VOLUME_NAME}")

# set Context
spark.sql(f"USE CATALOG {CATALOG_NAME}")
spark.sql(f"USE SCHEMA {SCHEMA_NAME}")

#define table names
table_bronze_name = "raw_power_usage"
table_silver_name = "silver_power_consumption"
table_gold_hourly_name = "agg_hourly_metrics"
table_gold_daily_name = "agg_daily_metrics"
table_gold_submeter_name = "agg_submeter_metrics"

#construct full paths for tables
full_path_bronze = f"{CATALOG_NAME}.{SCHEMA_NAME}.{table_bronze_name}"
full_path_silver = f"{CATALOG_NAME}.{SCHEMA_NAME}.{table_silver_name}"
full_path_gold_hourly = f"{CATALOG_NAME}.{SCHEMA_NAME}.{table_gold_hourly_name}"
full_path_gold_daily = f"{CATALOG_NAME}.{SCHEMA_NAME}.{table_gold_daily_name}"
full_path_gold_submeter = f"{CATALOG_NAME}.{SCHEMA_NAME}.{table_gold_submeter_name}"

# define path
volume_root = f"/Volumes/{CATALOG_NAME}/{SCHEMA_NAME}/{VOLUME_NAME}"
landing_path = f"{volume_root}/landing"
checkpoint_path = f"{volume_root}/checkpoints"
raw_data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip"

print(f"Environment: {env.upper()}")
print(f"Storage Path: {volume_root}")
print(f"Database: {CATALOG_NAME}.{SCHEMA_NAME}")