# Lab 4: Direct Lake Fallback Behaviour

## Overview

This lab explores how Direct Lake protects system stability when queries exceed capacity limits. You will observe when and why Direct Lake automatically switches to the SQL Endpoint (fallback), compare the three execution modes (Automatic, DirectLakeOnly, DirectQueryOnly), and learn to diagnose fallback scenarios using trace analysis.

### Workshop Flow

1. Connect to the big data environment from Lab 2
2. Run a query in Automatic mode and observe fallback to the SQL Endpoint
3. Switch to DirectLakeOnly mode and observe query failure when limits are exceeded
4. Restore Automatic mode and compare performance characteristics
5. Review trace output to understand execution mode decisions

### Key Concepts

- **Automatic mode** lets Direct Lake fall back to the SQL Endpoint when guardrails are exceeded, ensuring queries always return results
- **DirectLakeOnly mode** disables fallback, meaning queries fail if they exceed capacity limits
- **DirectQueryOnly mode** forces all queries through the SQL Endpoint regardless of data size
- **Trace analysis** reveals which execution path a query took and its timing characteristics

### Learning Objectives

- Understand the conditions that trigger fallback from Direct Lake to the SQL Endpoint
- Configure and compare the three execution modes
- Use trace events to diagnose execution mode decisions
- Determine the appropriate mode for different production scenarios

**Prerequisites:** Lab 2 completed (big data lakehouse and semantic model with billion-row tables)

---

*Deutsche Version:*

# Lab 4: Direct Lake Fallback-Verhalten

## Uebersicht

Dieses Lab untersucht, wie Direct Lake die Systemstabilitaet schuetzt, wenn Abfragen die Kapazitaetsgrenzen ueberschreiten. Sie beobachten, wann und warum Direct Lake automatisch zum SQL Endpoint wechselt (Fallback), vergleichen die drei Ausfuehrungsmodi (Automatic, DirectLakeOnly, DirectQueryOnly) und lernen, Fallback-Szenarien mithilfe der Trace-Analyse zu diagnostizieren.

### Wichtige Konzepte

- **Automatic-Modus** laesst Direct Lake zum SQL Endpoint zurueckfallen, wenn Guardrails ueberschritten werden
- **DirectLakeOnly-Modus** deaktiviert den Fallback; Abfragen schlagen fehl, wenn Grenzen ueberschritten werden
- **DirectQueryOnly-Modus** erzwingt alle Abfragen ueber den SQL Endpoint
- **Trace-Analyse** zeigt, welchen Ausfuehrungspfad eine Abfrage genommen hat

**Voraussetzungen:** Lab 2 abgeschlossen (Big-Data-Lakehouse mit Milliarden-Zeilen-Tabellen)

### Prerequisites and Lab Dependencies

This lab requires the infrastructure from Lab 2:

- **BigData lakehouse** with OneLake shortcuts to the billion-row tables
- **BigData_model semantic model** with relationships and measures configured
- **Billion-row tables** (fact_myevents_1bln and fact_myevents_2bln) that naturally push against guardrail limits

The large table sizes are essential because they create realistic conditions for observing fallback behaviour. The lab follows a structured sequence: establish baseline behaviour, intentionally trigger fallback, test each execution mode, and then analyse the results.

---

*Dieses Lab erfordert die Infrastruktur aus Lab 2: BigData Lakehouse mit OneLake-Shortcuts, das konfigurierte Semantic Model und die Milliarden-Zeilen-Tabellen. Die grossen Tabellengroessen sind wesentlich, da sie realistische Bedingungen fuer die Beobachtung des Fallback-Verhaltens schaffen.*

## Step 1: Install Required Libraries

Install the Semantic Link Labs library for semantic model configuration and trace analysis.

---

*Installieren Sie die Semantic Link Labs-Bibliothek fuer die Konfiguration von Semantic Models und die Trace-Analyse.*

In [None]:
%pip install -q semantic-link-labs

## Step 2: Configure Environment

Import libraries and establish connection to the big data lakehouse and semantic model from Lab 2. This validates that all required artifacts are accessible.

---

*Importieren Sie Bibliotheken und stellen Sie die Verbindung zum Big-Data-Lakehouse und Semantic Model aus Lab 2 her. Dies validiert, dass alle erforderlichen Artefakte zugaenglich sind.*

In [None]:
import sempy_labs as labs
from sempy import fabric
import sempy

LakehouseName = "BigData"
lakehouses = labs.list_lakehouses()["Lakehouse Name"]
for l in lakehouses:
    if l.startswith("Big"):
        LakehouseName = l

SemanticModelName = f"{LakehouseName}_model"

lakehouses=labs.list_lakehouses()["Lakehouse Name"]
if LakehouseName in lakehouses.values:
    lakehouseId = notebookutils.lakehouse.getWithProperties(LakehouseName)["id"]
else:
    print("You need to complete Lab 2 to create the required lakehouse for this lab")

workspaceId = notebookutils.lakehouse.getWithProperties(LakehouseName)["workspaceId"]
workspaceName = sempy.fabric.resolve_workspace_name(workspaceId)
print(f"WorkspaceId = {workspaceId}, LakehouseID = {lakehouseId}, Workspace Name = {workspaceName}")

## Step 3: Create Trace Function for Fallback Analysis

Define a tracing function that captures query execution events, including whether Direct Lake mode was used or whether the query fell back to the SQL Endpoint.

---

*Definieren Sie eine Tracing-Funktion, die Abfrageausfuehrungsereignisse erfasst, einschliesslich ob der Direct Lake-Modus verwendet wurde oder ob die Abfrage zum SQL Endpoint zurueckgefallen ist.*

In [None]:
import warnings
import time
from Microsoft.AnalysisServices.Tabular import TraceEventArgs
from typing import Dict, List, Optional, Callable
import pandas

#### Generate Unique Trace Name - Start ####
import json, base64, re
token = notebookutils.credentials.getToken("pbi")
payload = token.split(".")[1]
payload += "=" * (4 - len(payload) % 4)
upn = json.loads(base64.b64decode(payload)).get("upn")

# Extract just the user part (e.g. "SQLKDL.user39")
user_id = upn.split("@")[0]
lab_number = 4  # set per lab

# Remove characters not allowed in trace names: . , ; ' ` : / \ * | ? " & % $ ! + = ( ) [ ] { } < >
user_id_clean = re.sub(r"[.,;'`:/\\*|?\"&%$!+=(){}\[\]<>]", "_", user_id)
trace_name = f"Lab{lab_number}_{user_id_clean}"
#### Generate Unique Trace Name - End ####


def runDMV():
    df = sempy.fabric.evaluate_dax(
        dataset=SemanticModelName, 
        dax_string="""
        
        SELECT 
            MEASURE_GROUP_NAME AS [TABLE],
            ATTRIBUTE_NAME AS [COLUMN],
            DATATYPE ,
            DICTIONARY_SIZE 		    AS SIZE ,
            DICTIONARY_ISPAGEABLE 		AS PAGEABLE ,
            DICTIONARY_ISRESIDENT		AS RESIDENT ,
            DICTIONARY_TEMPERATURE		AS TEMPERATURE,
            DICTIONARY_LAST_ACCESSED	AS LASTACCESSED 
        FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS 
        ORDER BY 
            [DICTIONARY_TEMPERATURE] DESC
        
        """)
    display(df)

def filter_func(e):
    retVal:bool=True
    if e.EventSubclass.ToString() == "VertiPaqScanInternal":
        retVal=False      
    #     #if e.EventSubClass.ToString() == "VertiPaqScanInternal":
    #     retVal=False
    return retVal

# define events to trace and their corresponding columns
def runQueryWithTrace (expr:str,workspaceName:str,SemanticModelName:str,Result:Optional[bool]=True,Trace:Optional[bool]=True,DMV:Optional[bool]=True,ClearCache:Optional[bool]=True) -> pandas.DataFrame :
    event_schema = fabric.Trace.get_default_query_trace_schema()
    event_schema.update({"ExecutionMetrics":["EventClass","TextData"]})
    del event_schema['VertiPaqSEQueryBegin']
    del event_schema['VertiPaqSEQueryCacheMatch']
    del event_schema['DirectQueryBegin']

    warnings.filterwarnings("ignore")

    WorkspaceName = workspaceName
    SemanticModelName = SemanticModelName

    if ClearCache:
        labs.clear_cache(SemanticModelName)

    with fabric.create_trace_connection(SemanticModelName,WorkspaceName) as trace_connection:
        # create trace on server with specified events
        with trace_connection.create_trace(
            event_schema=event_schema, 
            name=trace_name,
            filter_predicate=filter_func,
            stop_event="QueryEnd"
            ) as trace:

            trace.start()

            df=sempy.fabric.evaluate_dax(
                dataset=SemanticModelName, 
                dax_string=expr)

            if Result:
                displayHTML(f"<H2>####### DAX QUERY RESULT #######</H2>")
                display(df)

            # Wait 5 seconds for trace data to arrive
            time.sleep(5)

            # stop Trace and collect logs
            final_trace_logs = trace.stop()

    if Trace:
        displayHTML(f"<H2>####### SERVER TIMINGS #######</H2>")
        display(final_trace_logs)
    
    if DMV:
        displayHTML(f"<H2>####### SHOW DMV RESULTS #######</H2>")
        runDMV()

    return final_trace_logs


In [None]:
runDMV()

## Step 4: Demonstrate Fallback in Automatic Mode

Execute a query against a billion-row table in Automatic mode. If the query exceeds Direct Lake guardrails, Direct Lake will automatically fall back to the SQL Endpoint to ensure the query completes successfully.

---

*Fuehren Sie eine Abfrage gegen eine Milliarden-Zeilen-Tabelle im Automatic-Modus aus. Wenn die Abfrage die Direct Lake-Guardrails ueberschreitet, faellt Direct Lake automatisch zum SQL Endpoint zurueck, um sicherzustellen, dass die Abfrage erfolgreich abgeschlossen wird.*

In [None]:
trace1 = runQueryWithTrace(
    """
    EVALUATE
        SUMMARIZECOLUMNS(
                dim_Date[FirstDateofMonth] ,
                "Count of Transactions" , COUNTROWS(fact_myevents_1bln) ,
                "Sum of Sales (1bln)" , [Sum of Sales (1bln)] ,
                "Sum of Sales (2bln)" , [Sum of Sales (2bln)]
        )
        ORDER BY [FirstDateofMonth]
    """ , workspaceName , SemanticModelName
)

## Step 5: Switch to DirectLakeOnly Mode

Update the semantic model to DirectLakeOnly mode. In this mode, queries that exceed guardrails will fail rather than falling back to the SQL Endpoint.

---

*Stellen Sie das Semantic Model auf den DirectLakeOnly-Modus um. In diesem Modus schlagen Abfragen, die Guardrails ueberschreiten, fehl, anstatt zum SQL Endpoint zurueckzufallen.*

In [None]:
tom = labs.tom.TOMWrapper(dataset=SemanticModelName, workspace=workspaceName, readonly=False)
tom.set_direct_lake_behavior("DirectLakeOnly") ##  Can be set to any of ['Automatic', 'DirectLakeOnly', 'DirectQueryOnly'].
tom.model.SaveChanges()
print("Model changed")
fabric.refresh_dataset(refresh_type="calculate",dataset=SemanticModelName)
print("Model recalculated")

## Step 6: Test Query Failure in DirectLakeOnly Mode

Attempt the same billion-row query in DirectLakeOnly mode. This is expected to fail with an error. Read the error message carefully, as it explains the specific guardrail that was exceeded.

---

*Versuchen Sie dieselbe Milliarden-Zeilen-Abfrage im DirectLakeOnly-Modus. Dies wird voraussichtlich mit einem Fehler fehlschlagen. Lesen Sie die Fehlermeldung sorgfaeltig, da sie die spezifische ueberschrittene Guardrail erklaert.*

In [None]:
from sempy import fabric
x = sempy.fabric._client._adomd_connection.FabricAdomdException
try:
    runQueryWithTrace(
        """
        EVALUATE
            SUMMARIZECOLUMNS(
                    dim_Date[FirstDateofMonth] ,
                    "Count of Transactions" , COUNTROWS(fact_myevents_1bln) ,
                    "Sum of Sales (1bln)" , [Sum of Sales (1bln)] ,
                    "Sum of Sales (2bln)" , [Sum of Sales (2bln)]
            )
            ORDER BY [FirstDateofMonth]
        """ , workspaceName , SemanticModelName
    )
except sempy.fabric._client._adomd_connection.FabricAdomdException as f:
    print(f)
except Exception as e:
    print(e)

## Step 7: Restore Automatic Mode

Switch the semantic model back to Automatic mode, which is the recommended setting for production workloads. This re-enables the intelligent fallback mechanism.

---

*Stellen Sie das Semantic Model zurueck auf den Automatic-Modus, der fuer Produktionsworkloads empfohlen wird. Dies aktiviert den intelligenten Fallback-Mechanismus erneut.*

In [None]:
tom = labs.tom.TOMWrapper(dataset=SemanticModelName, workspace=workspaceName, readonly=False)
tom.set_direct_lake_behavior("Automatic") ##  ['Automatic', 'DirectLakeOnly', 'DirectQueryOnly'].
tom.model.SaveChanges()
print("Model changed")
fabric.refresh_dataset(refresh_type="calculate",dataset=SemanticModelName)
print("Model recalculated")

## Step 8: Run Query with Automatic Fallback Enabled

Run a query against the 2 billion row table with Automatic mode restored. The query should complete without error, but check the trace output for a DirectQueryEnd event indicating that fallback to the SQL Endpoint occurred.

---

*Fuehren Sie eine Abfrage gegen die 2-Milliarden-Zeilen-Tabelle mit wiederhergestelltem Automatic-Modus aus. Die Abfrage sollte fehlerfrei abgeschlossen werden, aber pruefen Sie die Trace-Ausgabe auf ein DirectQueryEnd-Ereignis, das auf einen Fallback zum SQL Endpoint hinweist.*

In [None]:
trace2 = runQueryWithTrace(
    """
    EVALUATE
        SUMMARIZECOLUMNS(
                dim_Date[FirstDateofMonth] ,
                "Count of Transactions" , COUNTROWS(fact_myevents_1bln) ,
                "Sum of Sales (1bln)" , [Sum of Sales (1bln)] ,
                "Sum of Sales (2bln)" , [Sum of Sales (2bln)]
        )
        ORDER BY [FirstDateofMonth]
    """ , workspaceName , SemanticModelName, Trace=True, DMV=False
)

## Step 9: Stop the Spark Session

---

*Spark-Sitzung beenden.*

In [None]:
mssparkutils.session.stop()

## Lab 4 Summary

### What You Accomplished

- **Observed fallback:** Ran queries that triggered automatic fallback from Direct Lake to the SQL Endpoint
- **Tested execution modes:** Compared Automatic, DirectLakeOnly, and DirectQueryOnly behaviour
- **Validated protection:** Confirmed that DirectLakeOnly mode prevents queries from exceeding memory limits by failing rather than falling back
- **Analysed trace output:** Used trace events to identify which execution path each query took

### Execution Mode Reference

| Mode | Behaviour | Recommended Use |
|------|-----------|-----------------|
| **Automatic** | Falls back to SQL Endpoint when limits exceeded | Production workloads |
| **DirectLakeOnly** | Fails with error when limits exceeded | Performance testing and validation |
| **DirectQueryOnly** | Always uses SQL Endpoint | Troubleshooting and comparison |

### Key Takeaways

- Automatic mode is the recommended default because it ensures queries always return results, even when the data exceeds Direct Lake guardrails
- DirectLakeOnly mode is useful for testing whether a specific query can run entirely in Direct Lake
- Trace events provide clear evidence of the execution path, making it straightforward to diagnose fallback
- The fallback mechanism is a protective feature that maintains system stability under heavy load

### Troubleshooting Workflow

1. Use DMVs or trace events to detect whether fallback occurred
2. Review the error or trace message to understand the specific guardrail that was exceeded
3. Consider optimising the query, the model, or the data to reduce memory demand
4. Validate improvements by running the query in DirectLakeOnly mode
5. Deploy in Automatic mode for production reliability

### Next Lab

Continue to **Lab 5** to learn about framing and how Direct Lake synchronises with data changes.

---

*Deutsche Version:*

### Was Sie erreicht haben

- **Fallback beobachtet:** Abfragen ausgefuehrt, die den automatischen Fallback von Direct Lake zum SQL Endpoint ausgeloest haben
- **Ausfuehrungsmodi getestet:** Automatic, DirectLakeOnly und DirectQueryOnly verglichen
- **Schutz validiert:** Bestaetigt, dass DirectLakeOnly-Modus Abfragen am Ueberschreiten von Speichergrenzen hindert
- **Trace-Ausgabe analysiert:** Trace-Ereignisse verwendet, um den Ausfuehrungspfad jeder Abfrage zu identifizieren

### Wichtige Erkenntnisse

- Automatic-Modus ist die empfohlene Standardeinstellung, da Abfragen immer Ergebnisse liefern
- DirectLakeOnly-Modus ist nuetzlich zum Testen, ob eine Abfrage vollstaendig im Direct Lake laufen kann
- Trace-Ereignisse liefern klare Nachweise des Ausfuehrungspfads
- Der Fallback-Mechanismus ist eine Schutzfunktion, die die Systemstabilitaet bei hoher Last aufrechterhaelt

### Naechstes Lab

Weiter zu **Lab 5**, um Framing und die Synchronisation von Direct Lake mit Datenaenderungen zu lernen.