# Lab 8: Direct Lake over One Lake with Import Mode Integration

## Introduction

This advanced lab demonstrates **composite storage mode optimization** by combining Direct Lake and Import mode tables within a single Microsoft Fabric semantic model. This powerful technique allows you to leverage the best of both storage modes: Direct Lake for large-scale data access and Import mode for specific optimizations and performance tuning of critical tables.

## Lab Overview

**Learning Objectives:**
- Understand Direct Lake over One Lake vs. Direct Lake over SQL differences
- Learn to convert Direct Lake tables to Import mode within the same model
- Master hybrid storage mode configuration for optimal performance
- Analyze performance characteristics of mixed storage mode scenarios

**Key Concepts:**
- **Direct Lake over One Lake**: Advanced DirectLake implementation with OneLake integration
- **Composite Storage Modes**: Strategic combination of Direct Lake and Import modes
- **Model Cloning**: Creating optimized model variants for testing
- **Storage Mode Conversion**: Dynamic switching between Direct Lake and Import modes

**Prerequisites:** Lab 7 completion (high cardinality column optimization)

## 1. Install Semantic Link Labs Python Library
Install the Semantic Link Labs library for advanced Direct Lake and hybrid storage mode operations.

In [None]:
%pip install -q semantic-link-labs

## 2. Install Python Libraries and Setup Parameters
Import required libraries and configure parameters for hybrid Direct Lake and Import mode operations.

In [None]:
# Import libraries for hybrid storage mode operations and model management
import sempy_labs as labs
import sempy
from sempy import fabric
import pandas as pd
import json
import time
import uuid
from sempy_labs.tom._model import TOMWrapper, connect_semantic_model

# Import specialized helper functions for advanced operations
from sempy_labs._helper_functions import (
    format_dax_object_name,
    generate_guid,
    _make_list_unique,
    resolve_dataset_name_and_id,
    resolve_workspace_name_and_id,
    _base_api,
    resolve_workspace_id,
    resolve_item_id,
    resolve_lakehouse_id,
    resolve_lakehouse_name_and_id
)

# Initialize Analysis Services for advanced model operations
fabric._client._utils._init_analysis_services()
import Microsoft.AnalysisServices.Tabular as TOM
import Microsoft.AnalysisServices
import warnings
from Microsoft.AnalysisServices.Tabular import TraceEventArgs
from typing import Dict, List, Optional, Callable

# Configure model names for hybrid storage mode testing
LakehouseName = "BigData"
SemanticModelName = f"{LakehouseName}_model"
ClonedModelName = SemanticModelName + "_clone"
workspace = None

(workspace_name, workspace_id) = resolve_workspace_name_and_id(workspace)
(lakehouse_name, lakehouse_id) = resolve_lakehouse_name_and_id(lakehouse=LakehouseName, workspace=workspace)

#### Generate Unique Trace Name - Start ####
import json, base64
token = notebookutils.credentials.getToken("pbi")
payload = token.split(".")[1]
payload += "=" * (4 - len(payload) % 4)
upn = json.loads(base64.b64decode(payload)).get("upn")

# Extract just the user part (e.g. "SQLKDL.user39")
user_id = upn.split("@")[0]
lab_number = 8  # set per lab

trace_name = f"Lab{lab_number}_{user_id}"
#### Generate Unique Trace Name - End ####

def runDMV():
    df = sempy.fabric.evaluate_dax(
        dataset=SemanticModelName, 
        dax_string="""
        
        SELECT 
            MEASURE_GROUP_NAME AS [TABLE],
            ATTRIBUTE_NAME AS [COLUMN],
            DATATYPE ,
            DICTIONARY_SIZE 		    AS SIZE ,
            DICTIONARY_ISPAGEABLE 		AS PAGEABLE ,
            DICTIONARY_ISRESIDENT		AS RESIDENT ,
            DICTIONARY_TEMPERATURE		AS TEMPERATURE,
            DICTIONARY_LAST_ACCESSED	AS LASTACCESSED 
        FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS 
        ORDER BY 
            [DICTIONARY_TEMPERATURE] DESC
        
        """)
    display(df)

def filter_func(e):
    retVal:bool=True
    if e.EventSubclass.ToString() == "VertiPaqScanInternal":
        retVal=False      
    #     #if e.EventSubClass.ToString() == "VertiPaqScanInternal":
    #     retVal=False
    return retVal

# define events to trace and their corresponding columns
def runQueryWithTrace (expr:str,workspaceName:str,SemanticModelName:str,Result:Optional[bool]=True,Trace:Optional[bool]=True,DMV:Optional[bool]=True,ClearCache:Optional[bool]=True) -> pd.DataFrame :
    event_schema = fabric.Trace.get_default_query_trace_schema()
    event_schema.update({"ExecutionMetrics":["EventClass","TextData"]})
    del event_schema['VertiPaqSEQueryBegin']
    del event_schema['VertiPaqSEQueryCacheMatch']
    del event_schema['DirectQueryBegin']

    warnings.filterwarnings("ignore")

    WorkspaceName = workspaceName
    SemanticModelName = SemanticModelName

    if ClearCache:
        labs.clear_cache(SemanticModelName)

    with fabric.create_trace_connection(SemanticModelName,WorkspaceName) as trace_connection:
        # create trace on server with specified events
        with trace_connection.create_trace(
            event_schema=event_schema, 
            name=trace_name,
            filter_predicate=filter_func,
            stop_event="QueryEnd"
            ) as trace:

            trace.start()

            df=sempy.fabric.evaluate_dax(
                dataset=SemanticModelName, 
                dax_string=expr)

            if Result:
                displayHTML(f"<H2>####### DAX QUERY RESULT #######</H2>")
                display(df)

            # Wait 5 seconds for trace data to arrive
            time.sleep(5)

            # stop Trace and collect logs
            final_trace_logs = trace.stop()

    if Trace:
        displayHTML(f"<H2>####### SERVER TIMINGS #######</H2>")
        display(final_trace_logs)
    
    if DMV:
        displayHTML(f"<H2>####### SHOW DMV RESULTS #######</H2>")
        runDMV()
    
    return final_trace_logs

## 3. Clone BigData Semantic Model
Create a copy of the existing BigData semantic model for hybrid storage mode experimentation.

In [None]:
#Clear any existing cloned model if re-running
df = fabric.list_items()
if ClonedModelName in df.values:
    model_id = df.at[df[df['Display Name'] == ClonedModelName].index[0], 'Id']
    fabric.delete_item(model_id)
    print("Cloned model deleted")

with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
    newDB = tom._tom_server.Databases.GetByName(SemanticModelName).Clone()
    newModel = tom._tom_server.Databases.GetByName(SemanticModelName).Model.Clone()
    newDB.Name = ClonedModelName
    newDB.ID = str(uuid.uuid4())
    #newDB.Model = newModel
    newModel.CopyTo(newDB.Model)
    tom._tom_server.Databases.Add(newDB)

    newDB.Update(Microsoft.AnalysisServices.UpdateOptions.ExpandFull)

## 4. Frame the Cloned Model
Refresh the cloned semantic model to ensure all data connections and relationships are properly initialized.

In [None]:
# Refresh the cloned model to initialize data connections
labs.refresh_semantic_model(dataset=ClonedModelName)

## 5. Clear Schema Name
This is a temporary step to resolve issue.

In [None]:
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    for t in tom.model.Tables:
        for p in t.Partitions:
            if isinstance(p.Source,Microsoft.AnalysisServices.Tabular.EntityPartitionSource):
                p.Source.SchemaName=None

## 6. Frame the Cloned Model (after schema change)
Refresh the cloned semantic model to ensure all data connections and relationships are properly initialized.

In [None]:
# Refresh the cloned model to initialize data connections
labs.refresh_semantic_model(dataset=ClonedModelName)

## 7. Check Direct Lake Version
Identify whether the model uses Direct Lake over SQL (Sql.Database) or Direct Lake over One Lake (Azure.Lakehouse).

eg:  
let database = SqlDatabase(....)     = **DL/SQL**  
let database = Azure.Lakehouse(....) = **DL/OL**

In [None]:
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    for e in tom.model.Expressions:
        print(e.Expression)

## 8. Show Storage Mode for Each Table in Cloned Model
Display the current storage mode configuration for all tables in the cloned semantic model.

In [None]:
objects = {}
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    for t in tom.model.Tables:
        #print(t.Name)
        for p in t.Partitions:
            #print(p.Mode)
            objects[t.Name] = str(p.Mode)
 
df=pd.DataFrame([objects])
display(df)

## 9. Try to Convert Direct Lake Table to Import (First Attempt)
Attempt to convert a Direct Lake table to Import mode - **this will fail** if using Direct Lake over SQL.

In [None]:
# Attempt to convert Direct Lake table to Import mode (will fail if Direct Lake over SQL)
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    tom.convert_direct_lake_to_import(
        table_name="dim_Date" ,
        entity_name="dim_Date" ,
        source="BigData",
        source_type = "Lakehouse"
    )

## 10. Convert Cloned Model to Direct Lake over One Lake
Modify the cloned model to use Direct Lake over One Lake instead of Direct Lake over SQL.

In [None]:
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:

    for e in tom.model.Expressions:
        e.Expression = f"""
        let
            Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/{workspace_id}/{lakehouse_id}", [HierarchicalNavigation=true])
        in
            Source"""
        
print("Converted semantic model to use DirectLake over One Lake")

## 11. Convert Direct Lake Table to Import (Second Attempt)
Successfully convert a Direct Lake table to Import mode now that the model uses Direct Lake over One Lake.

In [None]:
# Convert Direct Lake table to Import mode (should work with Direct Lake over One Lake)
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    tom.convert_direct_lake_to_import(
        table_name="dim_Date" ,
        entity_name="dim_Date" ,
        source="BigData",
        source_type = "Lakehouse"
    )

## 12. Show Storage Mode for Each Table After Conversion
Display the updated storage mode configuration showing the hybrid model with both Direct Lake and Import tables.

In [None]:
objects = {}
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    for t in tom.model.Tables:
        #print(t.Name)
        for p in t.Partitions:
            #print(p.Mode)
            objects[t.Name] = str(p.Mode)
 
df=pd.DataFrame([objects])
display(df)

## <mark>13. SET CREDENTIALS AND LARGE MODEL IN SERVICE</mark>
This must be done on the Semantic Model and not via a Notebook script.  
Create a new Shared Cloud Connection to be used for refreshing the import table.  
For the purpose of this lab, use OAuth2 for the Authentication method.  
Service Principal is the recommended option here for production scenarios.  
Make sure the Privacy Level is Organizational (default)

## 14. Refresh import table

In [None]:
labs.refresh_semantic_model(dataset=ClonedModelName,tables=["dim_Date"])

## 15. Recalculate relationship indexes

In [None]:
labs.refresh_semantic_model(dataset=ClonedModelName,refresh_type="calculate")

## 16. Show what version of Direct Lake is being used

In [None]:
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    for e in tom.model.Expressions:
        print(e.Expression)

## 17. Update relationship to Many to Many 
(Not needed any more, keeping step just in case)

In [None]:
# with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
#     #1. Remove any existing relationships
#     for r in tom.model.Relationships:
#         if r.FromTable.Name == "fact_myevents_1bln" and r.ToTable.Name == "dim_Date":
#             tom.model.Relationships.Remove(r)

#     #2. Creates correct relationships
#     tom.add_relationship(from_table="fact_myevents_1bln"                    , from_column="DateKey"     , to_table="dim_Date"       , to_column="DateKey"       , from_cardinality="Many" , to_cardinality="Many")

## 18. Run query on 1Bln Row

In [None]:
df = runQueryWithTrace("""
        EVALUATE
	        SUMMARIZECOLUMNS(
		        dim_Date[DateKey],
		        "Quantity" , [Sum of Sales (1bln)]
		        )
""",workspace_name,ClonedModelName,DMV=False)

display(df)

## 19. Run query on 2Bln Row
**This will fail due to guardrail**

In [None]:
df = runQueryWithTrace("""
        EVALUATE
	        SUMMARIZECOLUMNS(
		        dim_Date[DateKey],
		        "Quantity" , [Sum of Sales (2bln)]
		        )
""",workspace_name,ClonedModelName,DMV=False)

display(df)

## 20. Convert cloned model back to DL/SQL

In [None]:
df=pd.DataFrame(labs.list_lakehouses())
endpointid = df[df['Lakehouse Name']==LakehouseName]['SQL Endpoint ID'].iloc[0]
server = df[df['Lakehouse Name']==LakehouseName]['SQL Endpoint Connection String'].iloc[0]

with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:

    #Convert import tables to Direct Lake
    for t in tom.model.Tables:
        for p in t.Partitions:
            if(p.Mode==TOM.ModeType.Import):
                t.Partitions.Remove(p)
                tom.add_entity_partition(table_name=t.Name,entity_name=t.Name)
                print(f"Table {t.Name} converted")
            p.Source.SchemaName=None

    #Switch Model to Direct Lake over SQL
    for e in tom.model.Expressions:
        e.Expression = f"""
        let
            Source = Sql.Database("{server}", "{endpointid}")
        in
            Source"""

print("Converted to Direct Lake over SQL")

## 21. Check what version of Direct Lake is being used

Sql.Database    = DirectLake over SQL  

Azure.Lakehouse = DirectLake over One Lake

In [None]:
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    for e in tom.model.Expressions:
        print(e.Expression)

## 22. Show storage mode for each table

In [None]:
objects = {}
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    for t in tom.model.Tables:
        #print(t.Name)
        for p in t.Partitions:
            #print(p.Mode)
            objects[t.Name] = str(p.Mode)
 
df=pd.DataFrame([objects])
display(df)

## 23. Run query on 2bln row table
This should work, but fall back to SQL Endpoint
(Run twice if you get error first time)

In [None]:
df = runQueryWithTrace("""
        EVALUATE
	        SUMMARIZECOLUMNS(
		        dim_Date[DateKey],
		        "Quantity" , [Sum of Sales (2bln)]
		        )
""",workspace_name,ClonedModelName,DMV=False)

display(df)

## 24. Show TMSL code for cloned model

In [None]:
import json
with labs.tom.connect_semantic_model(dataset=ClonedModelName, readonly=False) as tom:
    x= tom.get_bim()

    formatted_json = json.dumps(x, indent=4)
    print(formatted_json)

## 25. Stop the Spark Session

In [None]:
mssparkutils.session.stop()

---

## Lab 8 Summary: Hybrid Storage Mode Mastery

**ðŸŽ¯ What You Accomplished:**
This lab demonstrated the most advanced Direct Lake techniques, showcasing hybrid storage modes that combine Direct Lake over One Lake with Import mode capabilities for maximum flexibility and performance.

**ðŸ”§ Key Technical Achievements:**
- **Hybrid Architecture Implementation**: Successfully configured Direct Lake over One Lake with Import mode fallback capabilities
- **Storage Mode Optimization**: Demonstrated conversion between Direct Lake over One Lake and Direct Lake over SQL Endpoint
- **Large-Scale Performance Testing**: Validated query performance on datasets ranging from 1 billion to 2 billion rows
- **Guardrail Management**: Explored Direct Lake guardrails and automatic fallback mechanisms to SQL Endpoint
- **Advanced Relationship Configuration**: Implemented many-to-many relationships for optimal large table performance

**ðŸ’¡ Business Impact:**
- **Scalable Architecture**: Hybrid storage modes provide the best of both worlds - Direct Lake performance with Import mode reliability
- **Cost Optimization**: Intelligent fallback mechanisms ensure queries succeed while maintaining optimal performance characteristics
- **Enterprise Readiness**: Advanced configurations support the largest enterprise datasets with guaranteed query execution
- **Flexibility**: Multiple storage mode options allow for precise performance tuning based on specific business requirements

**ðŸš€ Advanced Concepts Mastered:**
- Direct Lake over One Lake vs. SQL Endpoint implementation differences
- Hybrid storage mode architecture patterns
- Large-scale data performance optimization strategies
- Intelligent query fallback mechanisms

**Next Steps**: You now possess comprehensive knowledge of all Direct Lake storage modes and hybrid architectures, enabling you to design and implement enterprise-scale semantic models that deliver optimal performance across any dataset size while maintaining reliability and cost effectiveness.