### Spark session configuration
This cell sets Spark session settings to enable _Verti-Parquet_ and _Optimize on Write_. More details about _Verti-Parquet_ and _Optimize on Write_ in tutorial document.

In [1]:
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.binSize", "1073741824")

StatementMeta(, deefd559-7730-4d10-9207-0ddf09af9dc3, 3, Finished, Available, Finished)

### Fact - Sale

This cell reads raw data from the _Files_ section of the lakehouse, adds additional columns for different date parts and the same information is being used to create partitioned fact delta table.

In [6]:
from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

# 1. Ler os dados brutos
df = spark.read.format("parquet").load('Files/wwi-raw-data/full/fact_sale_1y_full')

# 2. Criar as colunas de Data (Ano, Mês, Trimestre)
df = df.withColumn('Year', year(col("InvoiceDateKey")))
df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))
df = df.withColumn('Month', month(col("InvoiceDateKey")))

# 3. Usar saveAsTable com particionamento
# Tirar o "Tables/" do caminho. O saveAsTable só quer o nome.
df.write.mode("overwrite").format("delta").partitionBy("Year", "Quarter").saveAsTable(table_name)

StatementMeta(, deefd559-7730-4d10-9207-0ddf09af9dc3, 9, Finished, Available, Finished)

### Dimensions
This cell creates a function to read raw data from the _Files_ section of the lakehouse for the table name passed as a parameter. Next, it creates a list of dimension tables. Finally, it has a _for loop_ to loop through the list of tables and call above function with each table name as parameter to read data for that specific table and create delta table.

In [7]:
from pyspark.sql.types import *

def loadFullDataFromSource(table_name):
    # 1. Ler o arquivo
    df = spark.read.format("parquet").load('Files/wwi-raw-data/full/' + table_name)
    # 2. Remover coluna de foto (pra limpar)
    df = df.select([c for c in df.columns if c != 'Photo'])
    
    # 3. Usar saveAsTable em vez de save
    # Força o Fabric a reconhecer como tabela imediatamente
    df.write.mode("overwrite").format("delta").saveAsTable(table_name)

full_tables = [
    'dimension_city',
    'dimension_customer',
    'dimension_date',
    'dimension_employee',
    'dimension_stock_item'
]

for table in full_tables:
    loadFullDataFromSource(table)

StatementMeta(, e6cc36d7-e216-4675-a2cc-2f8a86da4f28, 11, Finished, Available, Finished)