# ü•á **Gold Layer: Business Intelligence & Analytical Optimization**

The **Gold Layer** (also known as the "Curated" or "Business" layer) is the final destination in our data pipeline. Here, data is transformed from cleaned records into high-level, aggregated business entities that are ready for immediate consumption by stakeholders, dashboards, and machine learning models.

---

### üåê **Strategic Integration: Google Colab & Databricks**

To maximize development speed and analytical flexibility, we are utilizing **Google Colab** as our primary IDE (Integrated Development Environment) while maintaining the heavy-duty processing power of the **Databricks SQL Warehouse**.



### ‚öôÔ∏è **General Overview of Execution**

* **The Interface (Google Colab):** We use Colab for its lightweight, snappy UI and superior Python visualization libraries. It serves as the "Command Center" where we write our SQL logic and Python transformations.
* **The Engine (Databricks SQL Warehouse):** Even though the code is written in Colab, the actual **compute resources** (CPU, RAM, and Disk) are provided by Databricks. When a query is executed, it is sent to the high-performance SQL Warehouse to process massive datasets in seconds.
* **The Result:** Only the final, filtered results are sent back to Colab, allowing us to work with "Big Data" without crashing our local browser session.

---

## üõ†Ô∏è **Environment Setup & Dependency Injection**

To prepare our analytical bridge, we are installing the necessary drivers and importing the libraries required for secure communication and data manipulation.

---

### üì¶ **Libraries & Dependencies**

* **`databricks-sql-connector`**: The official Python driver used to send SQL commands from Google Colab to the Databricks SQL Warehouse.
* **`pandas`**: The primary data analysis library used to store and manipulate result sets in local DataFrames.
* **`google.colab.userdata`**: A security utility used to securely retrieve credentials (host, path, and tokens) from the Colab Secrets vault.

---

### üöÄ **The Data Flow Architecture**

This setup allows the **Google Colab Interface** to trigger heavy-duty processing on **Databricks Resources**, returning only the final insights for local analysis.

In [4]:
! pip install databricks-sql-connector --q

import pandas as pd
from databricks import sql
from google.colab import userdata

## üîó **Establishing the Connection & Table Check**

We are now connecting to the Databricks engine and checking which data is available to us in the Silver Layer.

---

### üîê **Secure Authentication**
* **Credential Retrieval**: We pull our host, path, and token safely from the Colab secrets vault.
* **The Handshake**: We use the `sql.connect` command to link our Colab notebook directly to the Databricks SQL Warehouse.

### üîç **Inventory Check**
* **Exploring the Silver Layer**: We run a quick query to list all tables currently cleaned and ready in the Silver Layer.
* **Data Preview**: The results are displayed in a clean table so we can confirm exactly which datasets are ready for Gold-layer analysis.

In [5]:
#  Get your secrets
host = userdata.get("DB_HOST")
path = userdata.get("DB_PATH")
token = userdata.get("DB_TOKEN")

connection = sql.connect(
    server_hostname = host,
    http_path       = path,
    access_token    = token
)

# show all tables on silver layer
schema_query = "SHOW TABLES IN sales.silver_layer"
df_tables = pd.read_sql(schema_query, connection)
display(df_tables)

  df_tables = pd.read_sql(schema_query, connection)


Unnamed: 0,database,tableName,isTemporary
0,silver_layer,customers,False
1,silver_layer,products,False
2,silver_layer,regions,False
3,silver_layer,sales,False
4,silver_layer,sales_teams,False
5,silver_layer,stores_locations,False


## üì• **Automated Data Ingestion to Python**

Now that we have connected to Databricks, we are automating the process of bringing our **Silver Layer** tables into the local Colab environment for analysis.

---

### üîÑ **Dynamic Table Loading**

* **The Loop**: We iterate through the list of table names discovered in the previous step.
* **Dynamic Queries**: For every table found, we automatically generate a `SELECT *` query to pull all its data.
* **Global Variable Creation**: We use the `globals()` function to automatically create a new Python DataFrame for each table (e.g., `df_customers`, `df_orders`).
* **Verification**: The code prints the name and columns of each new DataFrame so we can confirm the data has loaded correctly.



---

### üí° **What this means for you**
You don't have to load tables one by one. You now have local copies of all your Silver data ready to be joined and aggregated using Python or SQL!

In [6]:
# laod tables as dataframe
for table_name in df_tables['tableName']:
  query = f"SELECT * FROM sales.silver_layer.{table_name}"
  df_name = f"df_{table_name}"
  df= pd.read_sql(query, connection)
  globals()[df_name] = df
  print(f"dataframe : {df_name}, columns : {list(df.columns)}")

  df= pd.read_sql(query, connection)


dataframe : df_customers, columns : ['Customer_ID', 'Customer_Name', 'ingestion_data']


  df= pd.read_sql(query, connection)


dataframe : df_products, columns : ['Product_ID', 'Product_Name', 'Category', 'ingestion_data']


  df= pd.read_sql(query, connection)


dataframe : df_regions, columns : ['State_Code', 'State', 'Region', 'ingestion_data']


  df= pd.read_sql(query, connection)


dataframe : df_sales, columns : ['Order_Number', 'Sales_Channel', 'Warehouse_Code', 'Purchased_Date', 'Order_Date', 'Ship_Date', 'Delivery_Date', 'Currency_Code', 'Sales_Team_ID', 'Customer_ID', 'Store_ID', 'Product_ID', 'Order_Quantity', 'Discount_Applied', 'Unit_Price', 'Unit_Cost', 'ingestion_data']


  df= pd.read_sql(query, connection)


dataframe : df_sales_teams, columns : ['Sales_Team_ID', 'Sales_Team', 'Region', 'ingestion_data']


  df= pd.read_sql(query, connection)


dataframe : df_stores_locations, columns : ['Store_ID', 'City_Name', 'County', 'State_Code', 'State', 'Type', 'Area_Code', 'Population', 'Household_Income', 'Median_Income', 'Land_Area', 'Water_Area', 'Time_Zone', 'ingestion_data']


## üöÄ **Next Steps: Building Gold Business Views**

Now that we have successfully loaded our data into Python, we move to the final transformation phase: creating the **Gold Layer** results.

---

### üìä **The Path to Insights**

* **SQL Joins & Aggregation**: Our primary goal is to merge the separate Silver tables (like Customers, Orders, and Products) into a single, comprehensive "Master View."
* **Dashboard Readiness**: We focus on creating high-level metrics (such as Total Revenue by Region or Customer Growth) that are ready to be plugged directly into BI tools like Power BI, Tableau, or Dash.
* **Refining the Truth**: These initial results serve as our "Gold Standard." While we can add more complex logic later, this step provides the clean, final numbers the business needs right now.

### üèÅ **Final Outcome**
At the end of this stage, we will have **consolidated DataFrames** that represent the actual performance of the business, moving from technical data cleaning to strategic decision-making.