# Lab 2: Direct Lake with Big Data

## Overview

This lab demonstrates Direct Lake at enterprise scale by working with billion-row datasets. You will create OneLake shortcuts for cross-workspace data access, build a semantic model over massive tables, and use tracing to observe how Direct Lake handles queries that exceed its guardrails.

### Workshop Flow

1. Create a lakehouse with OneLake shortcuts to billion-row tables
2. Build a Direct Lake semantic model over the shortcut tables
3. Define relationships and measures for the big data model
4. Run queries against billion-row tables with performance tracing
5. Observe fallback behaviour when guardrails are exceeded

### Key Concepts

- **OneLake shortcuts** provide access to data in other workspaces without copying it
- **Direct Lake guardrails** define the row and memory limits for each capacity SKU
- **Fallback behaviour** is the automatic switch to SQL Endpoint mode when limits are exceeded
- **Column temperature** indicates whether a column is loaded into memory (hot) or not (cold)

### Dataset Scale

| Table | Rows | Purpose |
|:------|:-----|:--------|
| fact_myevents_1bln | 1 billion | Standard fact table |
| fact_myevents_2bln | 2 billion | Exceeds guardrails to trigger fallback |
| fact_myevents_1bln_partitioned_datekey | 1 billion | Partitioned variant for later labs |
| dim_Date | ~3,650 | Date dimension |
| dim_Geography | ~200 | Geography dimension |

**Estimated duration:** 60-90 minutes  
**Prerequisites:** Lab 1 completed, access to the Big Data workspace

---

*Deutsche Version:*

# Lab 2: Direct Lake mit Big Data

## Uebersicht

Dieses Lab demonstriert Direct Lake im Unternehmensmassstab mit Milliarden-Zeilen-Datensaetzen. Sie erstellen OneLake-Shortcuts fuer workspace-uebergreifenden Datenzugriff, bauen ein Semantic Model ueber massiven Tabellen auf und verwenden Tracing, um zu beobachten, wie Direct Lake Abfragen behandelt, die seine Guardrails ueberschreiten.

### Wichtige Konzepte

- **OneLake-Shortcuts** bieten Zugriff auf Daten in anderen Workspaces, ohne sie zu kopieren
- **Direct Lake Guardrails** definieren die Zeilen- und Speichergrenzen fuer jede Kapazitaets-SKU
- **Fallback-Verhalten** ist der automatische Wechsel zum SQL Endpoint-Modus bei Ueberschreitung der Grenzen
- **Spaltentemperatur** gibt an, ob eine Spalte in den Speicher geladen ist (hot) oder nicht (cold)

**Geschaetzte Dauer:** 60-90 Minuten  
**Voraussetzungen:** Lab 1 abgeschlossen, Zugang zum Big Data Workspace

## Step 1: Install Required Libraries

Install the Semantic Link Labs library for Direct Lake model creation and OneLake shortcut management.

---

*Installieren Sie die Semantic Link Labs-Bibliothek fuer die Erstellung von Direct Lake-Modellen und die Verwaltung von OneLake-Shortcuts.*

In [None]:
%pip install -q semantic-link-labs

## Step 2: Import Libraries and Set Variables

Import the required Python libraries and configure environment variables for the big data lakehouse and semantic model names.

---

*Importieren Sie die erforderlichen Python-Bibliotheken und konfigurieren Sie Umgebungsvariablen fuer den Big-Data-Lakehouse- und Semantic Model-Namen.*

In [None]:
import sempy_labs as labs
from sempy import fabric
import sempy
import pandas
import json
import time

LakehouseName = "BigData"
SemanticModelName = f"{LakehouseName}_model"

capacity_name = labs.get_capacity_name()

Shortcut_LakehouseName = "BigDemoDB"
Shortcut_WorkspaceName = "DL Labs - Data [North Central US]"
if capacity_name == "FabConUS8-P1":
    Shortcut_WorkspaceName = "DL Labs - Data [West US 3]"


## Step 3: Create Lakehouse for Big Data

Create a new lakehouse that will host OneLake shortcuts to the billion-row tables. No data is physically copied; the shortcuts reference the source tables directly.

---

*Erstellen Sie ein neues Lakehouse, das OneLake-Shortcuts zu den Milliarden-Zeilen-Tabellen enthaelt. Es werden keine Daten physisch kopiert; die Shortcuts verweisen direkt auf die Quelltabellen.*

In [None]:
lakehouses=labs.list_lakehouses()["Lakehouse Name"]
if LakehouseName in lakehouses.values:
    lakehouseId = notebookutils.lakehouse.getWithProperties(LakehouseName)["id"]
else:
    lakehouseId = fabric.create_lakehouse(LakehouseName)

workspaceId = notebookutils.lakehouse.getWithProperties(LakehouseName)["workspaceId"]
workspaceName = sempy.fabric.resolve_workspace_name(workspaceId)
print(f"WorkspaceId = {workspaceId}, LakehouseID = {lakehouseId}, Workspace Name = {workspaceName}")

## Step 4: Create OneLake Shortcuts

Create shortcuts from the big data workspace to the current lakehouse. This gives the lakehouse access to billion-row fact tables and dimension tables without duplicating storage.

---

*Erstellen Sie Shortcuts vom Big-Data-Workspace zum aktuellen Lakehouse. Dadurch erhaelt das Lakehouse Zugriff auf Milliarden-Zeilen-Faktentabellen und Dimensionstabellen, ohne den Speicher zu duplizieren.*

In [None]:
#1. Remove any existing shortcuts
for index, row in labs.lakehouse.list_shortcuts(lakehouse=LakehouseName).iterrows():
    labs.lakehouse.delete_shortcut(shortcut_name=row["Shortcut Name"],lakehouse=LakehouseName)
    print(f"Deleted shortcut {row['Shortcut Name']}")

#2. Creates correct shortcuts
labs.lakehouse.create_shortcut_onelake(table_name="fact_myevents_1bln"                      ,source_lakehouse=Shortcut_LakehouseName,source_workspace=Shortcut_WorkspaceName,destination_lakehouse=LakehouseName)
labs.lakehouse.create_shortcut_onelake(table_name="fact_myevents_1bln_no_vorder"            ,source_lakehouse=Shortcut_LakehouseName,source_workspace=Shortcut_WorkspaceName,destination_lakehouse=LakehouseName)
labs.lakehouse.create_shortcut_onelake(table_name="fact_myevents_1bln_partitioned_datekey"  ,source_lakehouse=Shortcut_LakehouseName,source_workspace=Shortcut_WorkspaceName,destination_lakehouse=LakehouseName)
labs.lakehouse.create_shortcut_onelake(table_name="fact_myevents_2bln"                      ,source_lakehouse=Shortcut_LakehouseName,source_workspace=Shortcut_WorkspaceName,destination_lakehouse=LakehouseName)
labs.lakehouse.create_shortcut_onelake(table_name="dim_Date"                                ,source_lakehouse=Shortcut_LakehouseName,source_workspace=Shortcut_WorkspaceName,destination_lakehouse=LakehouseName)
labs.lakehouse.create_shortcut_onelake(table_name="dim_Geography"                           ,source_lakehouse=Shortcut_LakehouseName,source_workspace=Shortcut_WorkspaceName,destination_lakehouse=LakehouseName)

print('Adding shortcuts complete.')

## Step 5: Synchronise Table Metadata

Trigger a metadata refresh so the SQL Analytics Endpoint recognises the newly created shortcuts and their schemas.

---

*Loesen Sie eine Metadaten-Aktualisierung aus, damit der SQL Analytics Endpoint die neu erstellten Shortcuts und deren Schemata erkennt.*

The metadata sync polls until the operation completes. Expect periodic "running" status messages followed by "success".

---

*Die Metadaten-Synchronisation fragt ab, bis der Vorgang abgeschlossen ist. Erwarten Sie periodische "running"-Statusmeldungen, gefolgt von "success".*

In [None]:
##https://medium.com/@sqltidy/delays-in-the-automatically-generated-schema-in-the-sql-analytics-endpoint-of-the-lakehouse-b01c7633035d

def triggerMetadataRefresh():
    client = fabric.FabricRestClient()
    response = client.get(f"/v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}")
    sqlendpoint = response.json()['properties']['sqlEndpointProperties']['id']

    # trigger sync
    uri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}"
    payload = {"commands":[{"$type":"MetadataRefreshExternalCommand"}]}
    response = client.post(uri,json= payload)
    batchId = response.json()['batchId']

    # Monitor Progress
    statusuri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}/batches/{batchId}"
    statusresponsedata = client.get(statusuri).json()
    progressState = statusresponsedata['progressState']
    print(f"Metadata refresh : {progressState}")
    while progressState != "success":
        statusuri = f"/v1.0/myorg/lhdatamarts/{sqlendpoint}/batches/{batchId}"
        statusresponsedata = client.get(statusuri).json()
        progressState = statusresponsedata['progressState']
        print(f"Metadata refresh : {progressState}")
        time.sleep(1)

    print('Metadata refresh complete')

triggerMetadataRefresh()

## Step 6: Create the Big Data Semantic Model

Generate a Direct Lake semantic model from the shortcut tables. This model will reference billion-row fact tables directly.

---

*Erstellen Sie ein Direct Lake Semantic Model aus den Shortcut-Tabellen. Dieses Modell verweist direkt auf Milliarden-Zeilen-Faktentabellen.*

In [None]:
from sempy import fabric
#1. Generate list of ALL table names from lakehouse to add to Semantic Model
lakehouseTables:list = labs.lakehouse.get_lakehouse_tables(lakehouse=LakehouseName)["Table Name"]

completedOK:bool=False
while not completedOK:
    try:
        #2 Create the semantic model (check if exists first)
        if sempy.fabric.list_items().query(f"`Display Name`=='{LakehouseName}_model' & Type=='SemanticModel'  ").shape[0] ==0:
            labs.directlake.generate_direct_lake_semantic_model(dataset=f"{LakehouseName}_model",lakehouse_tables=lakehouseTables,workspace=workspaceName,lakehouse=lakehouseId,refresh=False,overwrite=True)
            completedOK=True
    except:
        print('Error creating model... trying again.')
        time.sleep(3)
        triggerMetadataRefresh()

print('done')

## Step 7: Configure Relationships

Define relationships between the billion-row fact tables and the dimension tables to enable cross-table filtering in queries.

---

*Definieren Sie Beziehungen zwischen den Milliarden-Zeilen-Faktentabellen und den Dimensionstabellen, um tabellenuebergreifendes Filtern in Abfragen zu ermoeglichen.*

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            #1. Remove any existing relationships
            for r in tom.model.Relationships:
                tom.model.Relationships.Remove(r)

            #2. Creates correct relationships
            tom.add_relationship(from_table="fact_myevents_1bln"                    , from_column="DateKey"     , to_table="dim_Date"       , to_column="DateKey"       , from_cardinality="Many" , to_cardinality="One")
            tom.add_relationship(from_table="fact_myevents_1bln"                    , from_column="GeographyID" , to_table="dim_Geography"  , to_column="GeographyID"   , from_cardinality="Many" , to_cardinality="One")

            tom.add_relationship(from_table="fact_myevents_2bln"                    , from_column="DateKey"     , to_table="dim_Date"       , to_column="DateKey"       , from_cardinality="Many" , to_cardinality="One")
            tom.add_relationship(from_table="fact_myevents_2bln"                    , from_column="GeographyID" , to_table="dim_Geography"  , to_column="GeographyID"   , from_cardinality="Many" , to_cardinality="One")

            tom.add_relationship(from_table="fact_myevents_1bln_partitioned_datekey", from_column="DateKey"     , to_table="dim_Date"       , to_column="DateKey"       , from_cardinality="Many" , to_cardinality="One")
            tom.add_relationship(from_table="fact_myevents_1bln_partitioned_datekey", from_column="GeographyID" , to_table="dim_Geography"  , to_column="GeographyID"   , from_cardinality="Many" , to_cardinality="One")
            completedOK=True
    except:
        print('Error adding relationships... trying again.')
        time.sleep(3)

print('done')

## Step 8: Add Measures

Create DAX measures on each fact table for use in performance testing queries later in this lab.

---

*Erstellen Sie DAX-Measures fuer jede Faktentabelle zur Verwendung in Leistungstestabfragen spaeter in diesem Lab.*

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            #1. Remove any existing measures
            for t in tom.model.Tables:
                for m in t.Measures:
                    tom.remove_object(m)
                    print(m.Name)

            tom.add_measure(table_name="fact_myevents_2bln",measure_name="Sum of Sales (2bln)",expression="SUM(fact_myevents_2bln[Quantity_ThisYear])",format_string="#,0")
            tom.add_measure(table_name="fact_myevents_1bln",measure_name="Sum of Sales (1bln)",expression="SUM(fact_myevents_1bln[Quantity_ThisYear])",format_string="#,0")
            completedOK=True
    except:
        print('Error adding measures... trying again.')
        time.sleep(3)

print('done')

## Step 9: Configure Date Table

Mark dim_Date as a date table to enable time intelligence functions for filtering and aggregating billion-row tables by date.

---

*Markieren Sie dim_Date als Datumstabelle, um Zeitintelligenzfunktionen zum Filtern und Aggregieren von Milliarden-Zeilen-Tabellen nach Datum zu aktivieren.*

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        with labs.tom.connect_semantic_model(dataset=SemanticModelName, readonly=False) as tom:
            tom.mark_as_date_table(table_name="dim_Date",column_name="DateKey")
            completedOK=True
    except:
        print('Error with date table... trying again.')
        time.sleep(3)

print('done')

## Step 10: Configure Column Sorting

Set sort-by-column properties on date dimension columns to ensure chronological ordering in visualisations.

---

*Legen Sie Sortierungseigenschaften fuer Datumsdimensionsspalten fest, um die chronologische Reihenfolge in Visualisierungen sicherzustellen.*

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        tom = labs.tom.TOMWrapper(dataset=SemanticModelName, workspace=workspaceName, readonly=False)
        tom.set_sort_by_column(table_name="dim_Date",column_name="MonthName"       ,sort_by_column="Month")
        tom.set_sort_by_column(table_name="dim_Date",column_name="WeekDayName"     ,sort_by_column="Weekday")
        tom.model.SaveChanges()

        #Show BIM data for dim_Date table
        i:int=0
        for t in tom.model.Tables:
            if t.Name=="dim_Date":
                bim = json.dumps(tom.get_bim()["model"]["tables"][i],indent=4)
                print(bim)
            i=i+1
            completedOK=True
    except:
        print('Error with sort by cols... trying again.')
        time.sleep(3)

print('done')

## Step 11: Hide Fact Table Columns

Hide raw columns on the fact tables so that report authors are guided towards using the defined measures.

---

*Blenden Sie Rohspalten der Faktentabellen aus, damit Berichtsautoren die definierten Measures verwenden.*

In [None]:
completedOK:bool=False
while not completedOK:
    try:
        i:int=0
        for t in tom.model.Tables:
            if t.Name in ["fact_myevents_1bln","fact_myevents_2bln","fact_myevents_1bln_partitioned_datekey"]:
                for c in t.Columns:
                    c.IsHidden=True

                bim = json.dumps(tom.get_bim()["model"]["tables"][i],indent=4)
                print(bim)
            i=i+1
        tom.model.SaveChanges()
        completedOK=True
    except:
        print('Error with hiding cols... trying again.')
        time.sleep(3)

print('done')

## Step 12: Frame the Model

Trigger framing to prepare the semantic model to serve queries against the billion-row tables.

---

*Loesen Sie das Framing aus, um das Semantic Model fuer die Abfrage der Milliarden-Zeilen-Tabellen vorzubereiten.*

In [None]:
reframeOK:bool=False
while not reframeOK:
    try:
        result:pandas.DataFrame = labs.refresh_semantic_model(dataset=SemanticModelName)
        reframeOK=True
    except:
        print('Error with reframe... trying again.')
        triggerMetadataRefresh()
        time.sleep(3)

print('Custom Semantic Model reframe OK')

## Step 13: Create Tracing Helper Functions

Define functions for capturing trace events during query execution. These will be used to detect Direct Lake mode, SQL Endpoint fallback, and query timing.

---

*Definieren Sie Funktionen zum Erfassen von Trace-Ereignissen waehrend der Abfrageausfuehrung. Diese werden verwendet, um Direct Lake-Modus, SQL Endpoint-Fallback und Abfragezeiten zu erkennen.*

In [None]:
import warnings
from Microsoft.AnalysisServices.Tabular import TraceEventArgs
from typing import Dict, List, Optional, Callable


#### Generate Unique Trace Name - Start ####
import json, base64, re
token = notebookutils.credentials.getToken("pbi")
payload = token.split(".")[1]
payload += "=" * (4 - len(payload) % 4)
upn = json.loads(base64.b64decode(payload)).get("upn")

# Extract just the user part (e.g. "SQLKDL.user39")
user_id = upn.split("@")[0]
lab_number = 2  # set per lab

# Remove characters not allowed in trace names: . , ; ' ` : / \ * | ? " & % $ ! + = ( ) [ ] { } < >
user_id_clean = re.sub(r"[.,;'`:/\\*|?\"&%$!+=(){}\[\]<>]", "_", user_id)
trace_name = f"Lab{lab_number}_{user_id_clean}"
#### Generate Unique Trace Name - End ####


def runDMV():
    df = sempy.fabric.evaluate_dax(
        dataset=SemanticModelName, 
        dax_string="""
        
        SELECT 
            MEASURE_GROUP_NAME AS [TABLE],
            ATTRIBUTE_NAME AS [COLUMN],
            DATATYPE ,
            DICTIONARY_SIZE 		    AS SIZE ,
            DICTIONARY_ISPAGEABLE 		AS PAGEABLE ,
            DICTIONARY_ISRESIDENT		AS RESIDENT ,
            DICTIONARY_TEMPERATURE		AS TEMPERATURE,
            DICTIONARY_LAST_ACCESSED	AS LASTACCESSED 
        FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS 
        ORDER BY 
            [DICTIONARY_TEMPERATURE] DESC
        
        """)
    display(df)

def filter_func(e):
    retVal:bool=True
    if e.EventSubclass.ToString() == "VertiPaqScanInternal":
        retVal=False      
    #     #if e.EventSubClass.ToString() == "VertiPaqScanInternal":
    #     retVal=False
    return retVal

# define events to trace and their corresponding columns
def runQueryWithTrace (expr:str,workspaceName:str,SemanticModelName:str,Result:Optional[bool]=True,Trace:Optional[bool]=True,DMV:Optional[bool]=True,ClearCache:Optional[bool]=True) -> pandas.DataFrame :
    event_schema = fabric.Trace.get_default_query_trace_schema()
    event_schema.update({"ExecutionMetrics":["EventClass","TextData"]})
    del event_schema['VertiPaqSEQueryBegin']
    del event_schema['VertiPaqSEQueryCacheMatch']
    del event_schema['DirectQueryBegin']

    warnings.filterwarnings("ignore")

    WorkspaceName = workspaceName
    SemanticModelName = SemanticModelName

    if ClearCache:
        labs.clear_cache(SemanticModelName)

    with fabric.create_trace_connection(SemanticModelName,WorkspaceName) as trace_connection:
        # create trace on server with specified events
        with trace_connection.create_trace(
            event_schema=event_schema, 
            name=trace_name,
            filter_predicate=filter_func,
            stop_event="QueryEnd"
            ) as trace:

            trace.start()

            df=sempy.fabric.evaluate_dax(
                dataset=SemanticModelName, 
                dax_string=expr)

            if Result:
                displayHTML(f"<H2>####### DAX QUERY RESULT #######</H2>")
                display(df)

            # Wait 5 seconds for trace data to arrive
            time.sleep(5)

            # stop Trace and collect logs
            final_trace_logs = trace.stop()

    if Trace:
        displayHTML(f"<H2>####### SERVER TIMINGS #######</H2>")
        display(final_trace_logs)
    
    if DMV:
        displayHTML(f"<H2>####### SHOW DMV RESULTS #######</H2>")
        runDMV()
    
    return final_trace_logs

## Step 14: Review Table Traits and Guardrails

Validate that the model is in Direct Lake mode and display the guardrail limits for the current capacity SKU. This helps set expectations for which queries will succeed in Direct Lake mode and which will fall back.

---

*Validieren Sie, dass das Modell im Direct Lake-Modus ist, und zeigen Sie die Guardrail-Grenzen fuer die aktuelle Kapazitaets-SKU an. Dies hilft bei der Einschaetzung, welche Abfragen im Direct Lake-Modus erfolgreich sind und welche zurueckfallen.*

In [None]:
df=sempy.fabric.evaluate_dax(
    dataset=SemanticModelName, 
    dax_string="""
    
    evaluate tabletraits()
    
    """)
display(df)

In [None]:
df=labs.directlake.get_direct_lake_guardrails()
display(df)

## Step 15: Establish Column Residency Baseline

Capture the current column loading states and memory usage as a baseline before running billion-row queries.

---

*Erfassen Sie die aktuellen Spaltenladezustaende und die Speichernutzung als Basislinie, bevor Milliarden-Zeilen-Abfragen ausgefuehrt werden.*

In [None]:
runDMV()

## Step 16: Execute Billion-Row Queries with Tracing

Run a series of queries against the billion-row tables with trace monitoring enabled. The trace output reveals whether each query ran in Direct Lake mode or fell back to the SQL Endpoint.

---

*Fuehren Sie eine Reihe von Abfragen gegen die Milliarden-Zeilen-Tabellen mit aktivierter Trace-Ueberwachung aus. Die Trace-Ausgabe zeigt, ob jede Abfrage im Direct Lake-Modus oder ueber den SQL Endpoint ausgefuehrt wurde.*

### Step 16.1: Query the 1 Billion Row Table

Run a baseline query against the 1 billion row fact table to observe Direct Lake performance at scale.

---

*Fuehren Sie eine Basisabfrage gegen die 1-Milliarden-Zeilen-Faktentabelle aus, um die Direct Lake-Leistung im grossen Massstab zu beobachten.*

In [None]:
df = runQueryWithTrace("""
    
    EVALUATE
        SUMMARIZECOLUMNS(
               
                dim_Date[FirstDateofMonth] ,
                "Count of Transactions" , COUNTROWS(fact_myevents_1bln) ,
                "Sum of Sales" , [Sum of Sales (1bln)] 
        )
        ORDER BY [FirstDateofMonth]

""",workspaceName,SemanticModelName)

### Step 16.2: Query the 2 Billion Row Table

Run the same query pattern against the 2 billion row table to observe whether Direct Lake falls back to the SQL Endpoint.

---

*Fuehren Sie dasselbe Abfragemuster gegen die 2-Milliarden-Zeilen-Tabelle aus, um zu beobachten, ob Direct Lake zum SQL Endpoint zurueckfaellt.*

In [None]:
df = runQueryWithTrace("""

    EVALUATE
        SUMMARIZECOLUMNS(
                dim_Date[FirstDateofMonth] ,
                "Count of Transactions" , COUNTROWS(fact_myevents_2bln) ,
                "Sum of Sales" , [Sum of Sales (2bln)]
        )
        ORDER BY [FirstDateofMonth]

""",workspaceName,SemanticModelName,DMV=False)

### Step 16.3: Cross-Table Query Combining Both Billion-Row Tables

Run a query that references both billion-row tables simultaneously to test Direct Lake under maximum load.

---

*Fuehren Sie eine Abfrage aus, die beide Milliarden-Zeilen-Tabellen gleichzeitig referenziert, um Direct Lake unter maximaler Last zu testen.*

In [None]:
df = runQueryWithTrace("""

    EVALUATE
        SUMMARIZECOLUMNS(
                dim_Date[FirstDateofMonth] ,
                "Count of Transactions" , COUNTROWS(fact_myevents_1bln) ,
                "Sum of Sales (1bln)" , [Sum of Sales (1bln)] ,
                "Sum of Sales (2bln)" , [Sum of Sales (2bln)]
        )
        ORDER BY [FirstDateofMonth]

""",workspaceName,SemanticModelName,DMV=False)

## Step 17: Stop the Spark Session

---

*Spark-Sitzung beenden.*

In [None]:
mssparkutils.session.stop()

## Lab 2 Summary

### What You Accomplished

- **Cross-workspace data access:** Created OneLake shortcuts to billion-row tables without duplicating storage
- **Billion-row semantic model:** Built and configured a Direct Lake model over 1 and 2 billion row fact tables
- **Performance tracing:** Used trace events to monitor query execution mode and timing
- **Fallback observation:** Identified conditions under which Direct Lake falls back to the SQL Endpoint
- **Stress testing:** Ran queries combining multiple billion-row tables to test system limits

### Key Takeaways

- Direct Lake can handle billion-row tables when the data fits within the capacity's guardrails
- OneLake shortcuts enable zero-copy access to data across workspaces
- When a query exceeds guardrail limits, Direct Lake automatically falls back to the SQL Endpoint to ensure the query still completes
- Trace events provide clear visibility into whether a query ran in Direct Lake mode or via fallback

### Next Lab

Continue to **Lab 3** to analyse Delta table structure using the Delta Analyzer tool.

---

*Deutsche Version:*

### Was Sie erreicht haben

- **Workspace-uebergreifender Datenzugriff:** OneLake-Shortcuts zu Milliarden-Zeilen-Tabellen ohne Speicherduplizierung erstellt
- **Milliarden-Zeilen Semantic Model:** Ein Direct Lake-Modell ueber 1- und 2-Milliarden-Zeilen-Faktentabellen erstellt und konfiguriert
- **Leistungs-Tracing:** Trace-Ereignisse zur Ueberwachung des Abfrageausfuehrungsmodus und der Zeitmessung verwendet
- **Fallback-Beobachtung:** Bedingungen identifiziert, unter denen Direct Lake zum SQL Endpoint zurueckfaellt
- **Stresstests:** Abfragen ueber mehrere Milliarden-Zeilen-Tabellen ausgefuehrt, um Systemgrenzen zu testen

### Wichtige Erkenntnisse

- Direct Lake kann Milliarden-Zeilen-Tabellen verarbeiten, wenn die Daten innerhalb der Kapazitaets-Guardrails liegen
- OneLake-Shortcuts ermoeglichen kopierlosen Zugriff auf Daten ueber Workspaces hinweg
- Bei Ueberschreitung der Guardrail-Grenzen faellt Direct Lake automatisch zum SQL Endpoint zurueck
- Trace-Ereignisse bieten klare Sichtbarkeit, ob eine Abfrage im Direct Lake-Modus oder ueber Fallback ausgefuehrt wurde

### Naechstes Lab

Weiter zu **Lab 3**, um die Delta-Tabellenstruktur mit dem Delta Analyzer zu analysieren.