# Lab 1: Create Direct Lake Semantic Model

## Lab Overview

This lab teaches you how to create a **Direct Lake semantic model** from scratch using Microsoft Fabric. You'll learn the complete workflow from data loading to model creation and validation.

### What You'll Build

**Workshop Flow:**
```
1. Lakehouse Setup
   ↓
2. Load Adventure Works Data  
   ↓
3. Create Semantic Model
   ↓
4. Add Relationships
   ↓
5. Create Measures
   ↓
6. Test & Validate
```

**End Result:** A fully functional Direct Lake semantic model ready for Power BI reporting with real-time data access.

### Key Concepts
- **Direct Lake**: Query data directly from Delta Lake without imports
- **Adventure Works**: Sample business dataset with customers, products, and sales
- **Semantic Model**: Business logic layer with relationships and measures

### Learning Objectives
By completing this lab, you'll be able to:
- ✅ Set up a lakehouse and load sample data
- ✅ Create a Direct Lake semantic model programmatically  
- ✅ Define table relationships and business measures
- ✅ Validate model performance and behavior

**Estimated Time**: 30-45 minutes

---

## 1. Install Required Libraries

Install Semantic Link Labs to enable Direct Lake model creation and management capabilities.

In [None]:
%pip install -q --disable-pip-version-check semantic-link-labs

## 2. Import Libraries and Set Variables

Import required libraries and define key variables for the lakehouse and semantic model names.

In [None]:
import sempy_labs as labs
from sempy import fabric
import sempy
import pandas
import json
import time

LakehouseName = "AdventureWorks"
SemanticModelName = f"{LakehouseName}_model"

## 3. Create or Connect to Lakehouse

Check if the AdventureWorks lakehouse exists, create it if needed, and retrieve workspace identifiers.

In [None]:
lakehouses=labs.list_lakehouses()["Lakehouse Name"]
if LakehouseName in lakehouses.values:
    lakehouseId = notebookutils.lakehouse.getWithProperties(LakehouseName)["id"]
else:
    lakehouseId = fabric.create_lakehouse(LakehouseName)

workspaceId = notebookutils.lakehouse.getWithProperties(LakehouseName)["workspaceId"]
workspaceName = sempy.fabric.resolve_workspace_name(workspaceId)
print(f"WorkspaceId = {workspaceId}, LakehouseID = {lakehouseId}, Workspace Name = {workspaceName}")

## 4. Load Adventure Works Sample Data

Load four Adventure Works tables (Customer, Date, Product, Sales) into the lakehouse using region-aware data sources.

**Tables being loaded:**
- DimCustomer (~18K customers)  
- DimDate (2K+ dates)
- DimProduct (~600 products)
- FactInternetSales (~60K sales records)
```
Loaded DimCustomer
Loaded DimDate  
Loaded DimProduct
Loaded FactInternetSales
Done
```

### Behind the Scenes
- Data is stored in **Delta format** for ACID compliance
- **Overwrite mode** ensures clean data for the workshop
- **OneLake integration** provides seamless cross-workspace data access

🎯 **Success indicator**: All four "Loaded" messages followed by "Done"

In [None]:
capacity_name = labs.get_capacity_name()

def loadDataToLakehouse(fromTable: str, toTable: str):
    """
    Optimized data loading function with improved error handling and performance.
    
    Args:
        fromTable: Source table name to read from
        toTable: Target table name to write to
    """
    try:
        # Get lakehouse properties once and reuse
        lakehouse_props = notebookutils.lakehouse.getWithProperties(LakehouseName)
        workspaceId = lakehouse_props["workspaceId"]
        lakehouseId = lakehouse_props["id"]

        # Region-aware connection string selection
        if capacity_name == "FabConUS8-P1":  # West US 3
            conn_str = "abfss://b1d61bbe-de20-4d3a-8075-b8e2eaacb868@onelake.dfs.fabric.microsoft.com/631e45c0-1243-4f42-920a-56bfe6ecdd6d/Tables"
        else:  # North Central US (default)
            conn_str = "abfss://16cf855f-3bf4-4312-a7a1-ccf5cb6a0121@onelake.dfs.fabric.microsoft.com/99ed86df-13d1-4008-a7f6-5768e53f4f85/Tables"

        # Read source data with format specification for better performance
        customer_df = spark.read.format("delta").load(f"{conn_str}/{fromTable}")
        
        # Cache the DataFrame if it will be used multiple times or is computation-heavy
        customer_df.cache()
        
        # Write with optimized settings
        (customer_df
         .write
         .format("delta")
         .mode("overwrite")
         .option("overwriteSchema", "true")
         .save(f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{lakehouseId}/Tables/{toTable}"))
        
        # Unpersist cached DataFrame to free memory
        customer_df.unpersist()
        
        print(f"Loaded {toTable}")
        
    except Exception as e:
        print(f"Error loading {toTable}: {e}")
        raise

# Load all tables with proper error handling
tables_to_load = [
    ("DimCustomer", "DimCustomer"),
    ("DimDate", "DimDate"),
    ("DimProduct", "DimProduct"),
    ("FactInternetSales", "FactInternetSales")
]

for from_table, to_table in tables_to_load:
    loadDataToLakehouse(from_table, to_table)

print("Done")

## 5. Trigger Metadata Synchronization

Force synchronization between lakehouse storage and SQL Analytics Endpoint to ensure schema accuracy for the semantic model.

In [None]:
##https://medium.com/@sqltidy/delays-in-the-automatically-generated-schema-in-the-sql-analytics-endpoint-of-the-lakehouse-b01c7633035d

def triggerMetadataRefresh():
    client = fabric.FabricRestClient()
    response = client.get(f"/v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}")
    sqlendpoint = response.json()['properties']['sqlEndpointProperties']['id']

    # trigger sync
    uri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}"
    payload = {"commands":[{"$type":"MetadataRefreshExternalCommand"}]}
    response = client.post(uri,json= payload)
    batchId = response.json()['batchId']

    # Monitor Progress
    statusuri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}/batches/{batchId}"
    statusresponsedata = client.get(statusuri).json()
    progressState = statusresponsedata['progressState']
    print(f"Metadata refresh : {progressState}")
    while progressState != "success":
        statusuri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}/batches/{batchId}"
        statusresponsedata = client.get(statusuri).json()
        progressState = statusresponsedata['progressState']
        print(f"Metadata refresh : {progressState}")
        time.sleep(1)

    print('Metadata refresh complete')

triggerMetadataRefresh()

## 6. Create Direct Lake Semantic Model

Generates semantic model from lakehouse tables with automatic discovery and robust error handling.

In [None]:
from sempy import fabric

#1. Generate list of ALL table names from lakehouse to add to Semantic Model
lakehouseTables:list = labs.lakehouse.get_lakehouse_tables(lakehouse=LakehouseName)["Table Name"]

completedOK:bool=False
while not completedOK:
    try:
        #2 Create the semantic model
        if sempy.fabric.list_items().query(f"`Display Name`=='{LakehouseName}_model' & Type=='SemanticModel'  ").shape[0] ==0:
            labs.directlake.generate_direct_lake_semantic_model(dataset=f"{LakehouseName}_model",lakehouse_tables=lakehouseTables,workspace=workspaceName,lakehouse=lakehouseId,refresh=False,overwrite=True)
            completedOK=True
    except:
        print('Error creating model... trying again.')
        time.sleep(3)
        triggerMetadataRefresh()

print('Semantic model created OK')

## 7. Configure Table Relationships

Establishes star schema relationships between fact and dimension tables for accurate cross-table analysis.

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            #1. Remove any existing relationships
            for r in tom.model.Relationships:
                tom.model.Relationships.Remove(r)

            #2. Creates correct relationships
            tom.add_relationship(from_table="FactInternetSales", from_column="OrderDateKey" , to_table="DimDate"    , to_column="DateKey"       , from_cardinality="Many" , to_cardinality="One")
            tom.add_relationship(from_table="FactInternetSales", from_column="CustomerKey"  , to_table="DimCustomer", to_column="CustomerKey"   , from_cardinality="Many" , to_cardinality="One")
            tom.add_relationship(from_table="FactInternetSales", from_column="ProductKey"   , to_table="DimProduct" , to_column="ProductKey"    , from_cardinality="Many" , to_cardinality="One")
            completedOK=True
    except:
        print('Error adding relationships... trying again.')
        time.sleep(3)

print('done')


## 8. Add Business Intelligence Measures

Creates essential DAX measures with proper formatting for business reporting and analysis.

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            #1. Remove any existing measures
            for t in tom.model.Tables:
                for m in t.Measures:
                    tom.remove_object(m)
                    print(f"[{m.Name}] measure removed")

            tom.add_measure(table_name="FactInternetSales" ,measure_name="Sum of Sales",expression="SUM(FactInternetSales[SalesAmount])",format_string="\$#,0.###############;(\$#,0.###############);\$#,0.###############")
            tom.add_measure(table_name="FactInternetSales" ,measure_name="Count of Sales",expression="COUNTROWS(FactInternetSales)",format_string="#,0")
            completedOK=True
    except:
        print('Error adding measures... trying again.')
        time.sleep(3)

print('done')

## 9. Configure Date Table for Time Intelligence

Marks DimDate table as date table to enable time-based analysis functions and calendar features.

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            tom.mark_as_date_table(table_name="DimDate",column_name="Date")
            completedOK=True
    except:
        print('Error with date table... trying again.')
        time.sleep(3)

print('done')

## 10. Configure Column Sorting for Improved User Experience

Sets logical column sorting on date table columns to ensure proper chronological ordering in visualizations.

In [None]:
import json
tom = labs.tom.TOMWrapper(dataset=SemanticModelName, workspace=workspaceName, readonly=False)
tom.set_sort_by_column(table_name="DimDate",column_name="MonthName"       ,sort_by_column="MonthNumberOfYear")
tom.set_sort_by_column(table_name="DimDate",column_name="DayOfWeek"       ,sort_by_column="DayNumberOfWeek")
tom.model.SaveChanges()

i:int=0
for t in tom.model.Tables:
    if t.Name=="DimDate":
        bim = json.dumps(tom.get_bim()["model"]["tables"][i],indent=4)
        print(bim)
    i=i+1

## 11. Hide Fact Table Columns for Optimal User Experience

Hides raw fact table columns to guide users toward proper measures and improve usability.

In [None]:
i:int=0
for t in tom.model.Tables:
    if t.Name in ["FactInternetSales"]:
        for c in t.Columns:
            c.IsHidden=True

        bim = json.dumps(tom.get_bim()["model"]["tables"][i],indent=4)
        print(bim)
    i=i+1
    
tom.model.SaveChanges()

## 12. Refresh Model and Apply All Configuration Changes

Refreshes semantic model to apply all configuration changes with robust retry logic for production readiness.

In [None]:
reframeOK:bool=False
while not reframeOK:
    try:
        result:pandas.DataFrame = labs.refresh_semantic_model(dataset=SemanticModelName)
        reframeOK=True
    except:
        print('Error with reframe... trying again.')
        triggerMetadataRefresh()
        time.sleep(3)

print('Custom Semantic Model reframe OK')

## 13. Setup DMV Monitoring Function for Direct Lake Performance

Creates monitoring function to track Direct Lake column temperature and memory usage using DMV queries.

In [None]:
import warnings
import time
from Microsoft.AnalysisServices.Tabular import TraceEventArgs
from typing import Dict, List, Optional, Callable

def runDMV():
    df = sempy.fabric.evaluate_dax(
        dataset=SemanticModelName, 
        dax_string="""
        
        SELECT 
            MEASURE_GROUP_NAME AS [TABLE],
            ATTRIBUTE_NAME AS [COLUMN],
            DATATYPE ,
            DICTIONARY_SIZE 		    AS SIZE ,
            DICTIONARY_ISPAGEABLE 		AS PAGEABLE ,
            DICTIONARY_ISRESIDENT		AS RESIDENT ,
            DICTIONARY_TEMPERATURE		AS TEMPERATURE,
            DICTIONARY_LAST_ACCESSED	AS LASTACCESSED 
        FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS 
        ORDER BY 
            [DICTIONARY_TEMPERATURE] DESC
        
        """)
    display(df)

## 14. Explore Direct Lake Capabilities with DAX Functions

Uses TABLETRAITS() and guardrails functions to validate Direct Lake configuration and performance limits.

In [None]:
df=sempy.fabric.evaluate_dax(
    dataset=SemanticModelName, 
    dax_string="""
    
    evaluate tabletraits()
    
    """)
display(df)

In [None]:
df=labs.directlake.get_direct_lake_guardrails()
display(df)

## 15. Establish Performance Baseline with DMV Analysis

Captures initial Direct Lake column states and memory usage to establish performance baseline.

In [None]:
runDMV()

## 16. Execute DAX Query and Monitor Column Loading

Executes DAX query and monitors which columns get loaded into memory using DMV analysis.

In [None]:
labs.clear_cache(SemanticModelName)

df=sempy.fabric.evaluate_dax(
    dataset=SemanticModelName, 
    dax_string="""
    
    EVALUATE
        SUMMARIZECOLUMNS(
               
                DimDate[MonthName] ,
                "Count of Transactions" , COUNTROWS(FactInternetSales) ,
                "Sum of Sales" , [Sum of Sales] 
        )
        ORDER BY [MonthName]
    """)
display(df)

runDMV()

## 17. Clean Up Resources and Session Conclusion

### Workshop Summary 🎉
Congratulations! You have successfully completed Lab 1 and built a comprehensive Direct Lake semantic model. Here's what you accomplished:

#### ✅ **Infrastructure Setup**
- Created a lakehouse with proper configuration
- Loaded Adventure Works sample data (4 tables, 80K+ rows)
- Configured metadata synchronization

#### ✅ **Model Development**  
- Built a Direct Lake semantic model from lakehouse tables
- Established star schema relationships (3 relationships)
- Created business measures with proper DAX and formatting

#### ✅ **User Experience Optimization**
- Configured date table for time intelligence
- Set logical column sorting for better visuals
- Optimized column visibility for end users

#### ✅ **Performance Analysis**
- Implemented DMV monitoring for performance insights
- Analyzed query execution impact on memory usage  
- Established baseline and post-query performance comparison

### Key Direct Lake Concepts Learned

#### 🔄 **Real-time Analytics**
Your model provides immediate access to lakehouse data without import delays or scheduled refreshes.

#### ⚡ **Intelligent Memory Management**
Direct Lake automatically loads only the columns needed for your queries, optimizing both performance and resource usage.

#### 📊 **Enterprise-Ready Design**
The star schema design with proper relationships, measures, and formatting provides a foundation for scalable business intelligence.

### Next Steps in Your Direct Lake Journey

#### 🚀 **Immediate Actions**:
- Explore the model in Power BI Desktop or Fabric
- Create reports using the measures and relationships you built
- Experiment with different DAX queries to see performance patterns

#### 📈 **Advanced Learning**:
- **Lab 2**: Scale to larger datasets and understand big data scenarios
- **Lab 3**: Analyze Delta table structure and optimization
- **Lab 4**: Explore fallback behaviors and troubleshooting

#### 🛠️ **Production Considerations**:
- Security and access control for lakehouse data
- Monitoring and alerting for model performance
- Governance and lifecycle management

### Resource Cleanup Importance
The following command stops the Spark session to:
- **💰 Save costs**: Release compute resources
- **🧹 Clean memory**: Free up cluster resources for other users
- **✅ Best practice**: Proper session management in Fabric notebooks

### Final Thoughts
Direct Lake represents a paradigm shift in analytics, providing the **real-time capabilities of DirectQuery** with the **performance benefits of Import mode**. You now have hands-on experience with this powerful technology!

🎯 **Ready for the next lab?** Let's explore Direct Lake with big data scenarios!

---

## Lab Summary

### What You Accomplished
In this lab, you successfully built a complete Direct Lake semantic model from scratch:

- ✅ **Infrastructure Setup**: Created lakehouse and loaded Adventure Works data
- ✅ **Model Creation**: Generated semantic model with automatic table discovery
- ✅ **Data Modeling**: Established star schema relationships between fact and dimensions
- ✅ **Business Logic**: Added essential DAX measures with proper formatting
- ✅ **User Experience**: Configured date tables, column sorting, and visibility
- ✅ **Performance Validation**: Tested model with business queries and DMV analysis

### Architecture Overview

**End-to-End Direct Lake Flow:**
```
Adventure Works Data → Lakehouse (Delta Tables) → Direct Lake Model → Real-time Analytics
        ↓                    ↓                         ↓                    ↓
   CSV/Parquet         Delta Format            Semantic Layer        Power BI Reports
```

### Key Takeaways

- **Direct Lake Advantage**: Real-time data access without imports or scheduled refreshes
- **Star Schema Power**: Proper relationships enable accurate cross-table analysis
- **DAX Measures**: Essential for business metrics - don't rely on raw column values
- **User Experience**: Column sorting and hiding improve report usability
- **Performance Monitoring**: DMVs provide insights into memory usage and query patterns

### Performance Results

- **Data Freshness**: Real-time updates as soon as lakehouse data changes
- **Query Performance**: Excellent response times with columnar Direct Lake access
- **Memory Efficiency**: Only accessed columns loaded into memory ("column temperature")
- **Resource Optimization**: Minimal compute overhead compared to import models

### Technical Skills Gained

- **Semantic Link Labs**: Programmatic model creation and management
- **TOM (Tabular Object Model)**: Advanced model configuration capabilities
- **DMV Analysis**: Understanding Direct Lake memory and performance patterns
- **Error Handling**: Robust retry logic for production-ready deployments

### Next Steps

**Continue to Lab 2** to learn about:
- Working with billion-row datasets
- OneLake shortcuts for cross-workspace data access
- Direct Lake guardrails and fallback behavior
- Advanced performance monitoring for big data scenarios

**For Production Deployment:**
- Implement proper security and access controls
- Set up monitoring and alerting for model performance
- Establish governance and lifecycle management processes
- Consider refresh automation for supporting data pipelines

---

In [None]:
mssparkutils.session.stop()