# Lab 1: Create Direct Lake custom semantic model

In this lab you will use [Semantic Link Labs](https://github.com/microsoft/semantic-link-labs) to perform the following tasks:

- Create a Lakehouse
- Create a custom semantic model
- Customize the semantic model by: 
    - Adding measures 
    - Adding relationsips
    - Marking a Date Table
    - Setting sortby columns
    - Hiding columns
- Run DAX Query and DMV

## 1. Install Semantic Link Labs Python Library
This step installs Semantic Link Library which is a Python library design for use in Microsoft Fabric Notebooks.  The library extends the capabilities of [Semantic Link](https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-overview) offering additional functionalities to seamlessly integrate alongside it.

In [None]:
%pip install -q --disable-pip-version-check semantic-link-labs

## 2. Install Python Libraries
This step does the following:
- Sets up libraries that will be used iater in the script for various functions related to data processing, manipulation and handling
- Creates a populates the following variables
    - LakehouseName - Used as the name for the Lakehouse that will be created later in this script
    - SemanticModelName = Used as the  name for the Semantic Model that will be created later in this script

In [None]:
import sempy_labs as labs
from sempy import fabric
import sempy
import pandas
import json
import time

LakehouseName = "AdventureWorks"
SemanticModelName = f"{LakehouseName}_model"

## 3. Create Lakehouse
This step uses Semantic Link Labs to:
1. Check for the exsistance of a Lakehouse
2. If a Lakehouse doesn't exist a new one is created using the name from the LakeHouseName variable created in the preious step

In [None]:
lakehouses=labs.list_lakehouses()["Lakehouse Name"]
if LakehouseName in lakehouses.values:
    lakehouseId = notebookutils.lakehouse.getWithProperties(LakehouseName)["id"]
else:
    lakehouseId = fabric.create_lakehouse(LakehouseName)

workspaceId = notebookutils.lakehouse.getWithProperties(LakehouseName)["workspaceId"]
workspaceName = sempy.fabric.resolve_workspace_name(workspaceId)
print(f"WorkspaceId = {workspaceId}, LakehouseID = {lakehouseId}, Workspace Name = {workspaceName}")

## 4. Copy data from source into local lakehouse
The following code cell: 
1.  Defines a function `loadDataToLakehouse` 
2.  The function:
    a. Transfers data from a specified source table to a target table within a lakehouse environment. 
    b. Retrieves the workspace ID and lakehouse ID from the lakehouse properties, reads data from the source table, and overwrites the target table with this data.

The function is invoked four times to load data from different source tables to their respective target tables:
1. "adw_DimCustomer" to "DimCustomer"
2. "adw_DimDate" to "DimDate"
3. "adw_DimProduct" to "DimProduct"
4. "adw_FactInternetSales" to "FactInternetSales"

Finally, the code prints "Done" to indicate the completion of the data loading process.

In [None]:
capacity_name = labs.get_capacity_name()

def loadDataToLakehouse(fromTable:str,toTable:str):
    workspaceId = notebookutils.lakehouse.getWithProperties(LakehouseName)["workspaceId"]
    lakehouseId = notebookutils.lakehouse.getWithProperties(LakehouseName)["id"]

    #North Central US
    conn_str = "abfss://16cf855f-3bf4-4312-a7a1-ccf5cb6a0121@onelake.dfs.fabric.microsoft.com/99ed86df-13d1-4008-a7f6-5768e53f4f85/Tables"
    if capacity_name == "FabConUS8-P1": #West US 3
        conn_str = "abfss://b1d61bbe-de20-4d3a-8075-b8e2eaacb868@onelake.dfs.fabric.microsoft.com/631e45c0-1243-4f42-920a-56bfe6ecdd6d/Tables"

    customer_df =spark.read.load(f"{conn_str}/{fromTable}")
    customer_df.write.mode("overwrite").save(f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{lakehouseId}/Tables/{toTable}")
    print(f"Loaded {toTable}")

loadDataToLakehouse("adw_DimCustomer"       ,"DimCustomer")
loadDataToLakehouse("adw_DimDate"           ,"DimDate")
loadDataToLakehouse("adw_DimProduct"        ,"DimProduct")
loadDataToLakehouse("adw_FactInternetSales" ,"FactInternetSales")
print("Done")

## 5. Trigger background job to sync Lakehouse tables
The following code cell creates the function `triggerMetadataRefresh` that automatically generates the schema in the SQL analytics endpoint.  The function does the following:
1. It creates a FabricRestClient instance to interact with the API.
2. It retrieves metadata for a specific lakehouse within a workspace using their respective IDs.
3. It extracts the SQL endpoint property ID from the response.
4. It triggers a metadata refresh on the lakehouse by sending a POST request with the necessary payload.
5. It monitors the progress of the metadata refresh by repeatedly querying the status until it succeeds.
6. It prints the progress state at each step and indicates when the process is complete.
7. Finally, it calls the `triggerMetadataRefresh` function to execute the process.

In [None]:
##https://medium.com/@sqltidy/delays-in-the-automatically-generated-schema-in-the-sql-analytics-endpoint-of-the-lakehouse-b01c7633035d

def triggerMetadataRefresh():
    client = fabric.FabricRestClient()
    response = client.get(f"/v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}")
    sqlendpoint = response.json()['properties']['sqlEndpointProperties']['id']

    # trigger sync
    uri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}"
    payload = {"commands":[{"$type":"MetadataRefreshExternalCommand"}]}
    response = client.post(uri,json= payload)
    batchId = response.json()['batchId']

    # Monitor Progress
    statusuri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}/batches/{batchId}"
    statusresponsedata = client.get(statusuri).json()
    progressState = statusresponsedata['progressState']
    print(f"Metadata refresh : {progressState}")
    while progressState != "success":
        statusuri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}/batches/{batchId}"
        statusresponsedata = client.get(statusuri).json()
        progressState = statusresponsedata['progressState']
        print(progressState)
        time.sleep(1)

    print('Metadata refresh complete')

triggerMetadataRefresh()

## 6. Create Custom Semantic Model from Lakehouse
The following code cell:
1.  Creates a list variable and assigns a list of tables from the Lakehouse to the variable
2.  Checks for the existance of a Semantic Model
3.  If the Semantic Model does not exist, the model is create and all the tables from the preiously create varaible is added.

In [None]:
from sempy import fabric

#1. Generate list of ALL table names from lakehouse to add to Semantic Model
lakehouseTables:list = labs.lakehouse.get_lakehouse_tables(lakehouse=LakehouseName)["Table Name"]

completedOK:bool=False
while not completedOK:
    try:
        #2 Create the semantic model
        if sempy.fabric.list_items().query(f"`Display Name`=='{LakehouseName}_model' & Type=='SemanticModel'  ").shape[0] ==0:
            labs.directlake.generate_direct_lake_semantic_model(dataset=f"{LakehouseName}_model",lakehouse_tables=lakehouseTables,workspace=workspaceName,lakehouse=lakehouseId,refresh=False,overwrite=True)
            completedOK=True
    except:
        print('Error creating model... trying again.')
        time.sleep(3)
        triggerMetadataRefresh()

print('Semantic model created OK')

## 7. Add model relationships
The code block 
1. opens a connection to a semantic model.
2. Removes any existing relationships in the model.
3. Adds new relationships between tables in the model:
- Links "FactInternetSales.OrderDateKey" to "DimDate.DateKey" with a many-to-one cardinality.
- Links "FactInternetSales.CustomerKey" to "DimCustomer.CustomerKey" with a many-to-one cardinality.
- Links "FactInternetSales.ProductKey" to "DimProduct.ProductKey" with a many-to-one cardinality.

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            #1. Remove any existing relationships
            for r in tom.model.Relationships:
                tom.model.Relationships.Remove(r)

            #2. Creates correct relationships
            tom.add_relationship(from_table="FactInternetSales", from_column="OrderDateKey" , to_table="DimDate"    , to_column="DateKey"       , from_cardinality="many" , to_cardinality="one")
            tom.add_relationship(from_table="FactInternetSales", from_column="CustomerKey"  , to_table="DimCustomer", to_column="CustomerKey"   , from_cardinality="many" , to_cardinality="one")
            tom.add_relationship(from_table="FactInternetSales", from_column="ProductKey"   , to_table="DimProduct" , to_column="ProductKey"    , from_cardinality="many" , to_cardinality="one")
            completedOK=True
    except:
        print('Error adding relationships... trying again.')
        time.sleep(3)

print('done')


## 8. Add model measures
The code block 
1. Opens a connection to a semantic model.
2. Removes any existing measures in the model.
3. Adds a new measure Sum Of Sales to the model:

In [None]:

completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            #1. Remove any existing measures
            for t in tom.model.Tables:
                for m in t.Measures:
                    tom.remove_object(m)
                    print(f"[{m.Name}] measure removed")

            tom.add_measure(table_name="FactInternetSales" ,measure_name="Sum of Sales",expression="SUM(FactInternetSales[SalesAmount])",format_string="\$#,0.###############;(\$#,0.###############);\$#,0.###############")
            tom.add_measure(table_name="FactInternetSales" ,measure_name="Count of Sales",expression="COUNTROWS(FactInternetSales)",format_string="#,0")
            completedOK=True
    except:
        print('Error adding measures... trying again.')
        time.sleep(3)

print('done')

## 9. Mark DimDate as Date Table
This code block:

1.  Opens connection to semantic model.
2.  Marks DimDate table as Date Table

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            tom.mark_as_date_table(table_name="DimDate",column_name="Date")
            completedOK=True
    except:
        print('Error with date table... trying again.')
        time.sleep(3)

print('done')

## 10. Set Sort by Cols
This code block:

1. Imports and uses the json library
2. Sets the sorting order for columns "MonthName" and "DayOfWeek" in the "DimDate" table using the columns "MonthNumberOfYear" and "DayNumberOfWeek" respectively.
3. Saves the changes to the model.
4. Iterates through the tables in the model to find the "DimDate" table.
5. Once the "DimDate" table is found, it converts its structure to a JSON format and prints it.

In [None]:
import json
tom = labs.tom.TOMWrapper(dataset=SemanticModelName, workspace=workspaceName, readonly=False)
tom.set_sort_by_column(table_name="DimDate",column_name="MonthName"       ,sort_by_column="MonthNumberOfYear")
tom.set_sort_by_column(table_name="DimDate",column_name="DayOfWeek"       ,sort_by_column="DayNumberOfWeek")
tom.model.SaveChanges()

i:int=0
for t in tom.model.Tables:
    if t.Name=="DimDate":
        bim = json.dumps(tom.get_bim()["model"]["tables"][i],indent=4)
        print(bim)
    i=i+1

## 11. Hide Fact Table columns
This code block:
1. Iterates through all tables in the `tom.model.Tables` collection. 
2. For the table named "FactInternetSales", it sets the `IsHidden` property of each column to `True`. 
3. It then converts the table's information to a JSON format and prints it. The index `i` is incremented after processing each table.

In [None]:
i:int=0
for t in tom.model.Tables:
    if t.Name in ["FactInternetSales"]:
        for c in t.Columns:
            c.IsHidden=True

        bim = json.dumps(tom.get_bim()["model"]["tables"][i],indent=4)
        print(bim)
    i=i+1

## 12. Reframe model to update changes
This code block attempts to reframe the Semantic model in a loop until successful, catching exceptions and retrying every 3 seconds.  Upon, success, it prints a confirmation message.

In [None]:
reframeOK:bool=False
while not reframeOK:
    try:
        result:pandas.DataFrame = labs.refresh_semantic_model(dataset=SemanticModelName)
        reframeOK=True
    except:
        print('Error with reframe... trying again.')
        triggerMetadataRefresh()
        time.sleep(3)

print('Custom Semantic Model reframe OK')

## 13. Create function to run DMV
This code cell:
1. Import neccessary modules that will be used later in the code
2. Defines a function that executes a DAX Query 
3. Defines another function that captures server timings based on the query execution

In [None]:
import warnings
import time
from Microsoft.AnalysisServices.Tabular import TraceEventArgs
from typing import Dict, List, Optional, Callable

def runDMV():
    df = sempy.fabric.evaluate_dax(
        dataset=SemanticModelName, 
        dax_string="""
        
        SELECT 
            MEASURE_GROUP_NAME AS [TABLE],
            ATTRIBUTE_NAME AS [COLUMN],
            DATATYPE ,
            DICTIONARY_SIZE 		    AS SIZE ,
            DICTIONARY_ISPAGEABLE 		AS PAGEABLE ,
            DICTIONARY_ISRESIDENT		AS RESIDENT ,
            DICTIONARY_TEMPERATURE		AS TEMPERATURE,
            DICTIONARY_LAST_ACCESSED	AS LASTACCESSED 
        FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS 
        ORDER BY 
            [DICTIONARY_TEMPERATURE] DESC
        
        """)
    display(df)

## 14. DAX Queries
The following code block executes a DAX query that returns the results of the useful tabletraits() DAX function.

In [None]:
df=sempy.fabric.evaluate_dax(
    dataset=SemanticModelName, 
    dax_string="""
    
    evaluate tabletraits()
    
    """)
display(df)

In [None]:
df=labs.directlake.get_direct_lake_guardrails()
display(df)

## 15. Run DMV to check column details
This code cell executes a function that was created earlier in the notebook, which executes a DMV function to describe the state of our custom semantic model.

In [None]:
runDMV()

## 16. Run DAX Query on custom semantic model
The code cell:
1.  Clears the semantic model cache
2.  Executes a DAX query and displays the results
3.  Re-runs the DMV from step 16 so we can compare the before/after to see the effect of running a query

In [None]:
labs.clear_cache(SemanticModelName)

df=sempy.fabric.evaluate_dax(
    dataset=SemanticModelName, 
    dax_string="""
    
    EVALUATE
        SUMMARIZECOLUMNS(
               
                DimDate[MonthName] ,
                "Count of Transactions" , COUNTROWS(FactInternetSales) ,
                "Sum of Sales" , [Sum of Sales] 
        )
        ORDER BY [MonthName]
    """)
display(df)

runDMV()

## 17. Stop the Spark session

In [None]:
mssparkutils.session.stop()