### Install the latest .whl package

Check [here](https://pypi.org/project/semantic-link-labs/) to see the latest version.

In [None]:
%pip uninstall "builtin/semantic_link_labs-0.9.4-py3-none-any.whl" -y

In [None]:
# %pip install semantic-link-labs

%pip install "builtin/semantic_link_labs-0.9.4-py3-none-any.whl"

### Requirements
* Fabric Capacity with XMLA read/write enabled
    * A Fabric Trial Capacity is sufficient for evaluation.
    * The XMLA Endpoint must be read/write enabled because the perf lab provisions semantic models automatically.
* Fabric Permissions
    * User must have permissions to create workspaces, lakehouses, and semantic models. This notebook provisions sample resources to demonstrate the use of a perf lab.
    * User should have access to a Fabric capacity. This notebook provisions workspaces, lakehouses, and semantic models on a Fabric capacity.
    * Connect this notebook to a lakehouse without a schema to persist test definitions and test results. Although strictly not a requirement, it eliminates the need to provide the name and Id of a disconnected lakehouse.

### Result
* A master and test workspaces, lakehouses, and semantic models are created to establish a perf lab
    * The master workspace contains a lakehouse and a sample semantic model in Direct Lake on OneLake mode that uses the lakehouse as its data source. 
    * The test workspace contains semantic models cloned from the sample semantic model in the master workspace.
    * Various Delta tables are created in the lakehouse connected to this notebook to persist test definitions, table analysis, and test results.
    * The resources in the master workspace and in the test workspace are deprovisioned upon completion of the perf lab. Delete the workspaces manually.
* The names of the newly created resources can be adjusted to customize the perf lab.


### Import the library and set global notebook parameters

This notebook deploys lakehouses and semantic models across different workspaces, but the resources can also be hosted together in a centralized workspace. The master workspace contains a lakehouse with sample data, used as the data source for the sample semantic models in Direct Lake on OneLake mode. The master semantic model serves as a template for the actual test models, which this notebook provisions prior to running the performance tests by cloning the master semantic model.

In [None]:

import sempy_labs.perf_lab as perf_lab

master_workspace = 'Perf Lab Master'                # Enter the name of the master workspace.
lakehouse = 'SalesSampleLakehouse'                        # Enter the name of the lakehouse used as the data source.
master_dataset = 'Master Semantic Model'            # Enter the name of the master semantic model.

test_workspace = 'Perf Lab Testing'                 # Enter the name of the workspace for the semantic model clone.
target_dataset_prefix = 'Test Model_'               # Enter the common part of the name for all semantic model clones.
test_dataset_A = target_dataset_prefix + 'A'        # Enter the name of the first semantic model clone.
test_dataset_B = target_dataset_prefix + 'B'        # Enter the name of the second semantic model clone.

capacity_id = None                                  # The Id of the capacity for the workspaces. 
                                                    # Leave this as None to use the capacity of the attached lakehouse or perf lab notebook.
                                        
test_definitions_tbl = 'TestDefinitions'            # The name of the table in the notebook-attached lakehouse to store the test definitions.
column_segments_tbl = 'StorageTableColumnSegments'  # The name of the table in the notebook-attached lakehouse to store the test definitions.
trace_events_tbl = "TraceEvents"                    # The name of the table in the notebook-attached lakehouse to store the captured trace events.

execution_log = "ExecutionLog"                      # The name of the table in the notebook-attached lakehouse to store the test cycle execution details.

### Working with test definitions

Test definitions define the key parameters for the test runs, including the following fields: QueryId, QueryText, MasterWorkspace, MasterDataset, TargetWorkspace, TargetDataset, DatasourceName, DatasourceWorkspace, DatasourceType. The following test code illustrates how to work with the TestDefinition and TestSuite classes.


In [None]:
first_test_definition = perf_lab.TestDefinition(QueryId=1, QueryText="Evaluate {1}", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                   TargetWorkspace=test_workspace, TargetDataset='Test Model_', DatasourceName=lakehouse,
                   DatasourceWorkspace=master_workspace, DatasourceType="WrongType")

first_test_definition.remove("DatasourceType")
print(first_test_definition.get_keys())
first_test_definition.add("DatasourceType", "Lakehouse")
first_test_definition.TargetDataset='Test Model_A'
print(first_test_definition.get_keys())
print(first_test_definition.get_values())
print(first_test_definition.to_schema())

test_definitions = [
    first_test_definition,
    perf_lab.TestDefinition(QueryId=2, QueryText="Evaluate {2}", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                   TargetWorkspace=test_workspace, TargetDataset='Test Model_B', DatasourceName=lakehouse,
                   DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
]

test_suite = perf_lab.TestSuite(test_definitions)
test_suite.save_as(test_definitions_tbl)
display(test_suite.to_df())

test_suite.remove_test_definition(first_test_definition)
display(test_suite.to_df())

test_suite.clear()
test_suite.load(test_definitions_tbl)
display(test_suite.to_df())

test_suite.add_test_definition(
    perf_lab.TestDefinition(QueryId=3, QueryText="Evaluate {3}", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                   TargetWorkspace=test_workspace, TargetDataset='Test Model_C', DatasourceName=lakehouse,
                   DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
)
display(test_suite.to_df())

test_suite.add_field("AdditionalProperty", "additional value")
print(test_suite.get_schema())

test_suite.remove_field("AdditionalProperty")
print(test_suite.get_schema())

test_suite.clear()
display(test_suite.to_df())


In [None]:

test_suite = perf_lab.SalesSampleQueries().to_test_suite(
    target_dataset = test_dataset_A,
    target_workspace = test_workspace,
    master_dataset = master_dataset,
    master_workspace = master_workspace,
    data_source = lakehouse,
    data_source_workspace = master_workspace,   
)

display(test_suite.to_df())

### Run default (incremental), cold, and warm query tests
The main purpose of a test run is to measure the performance of a set of DAX queries against the test semantic models with different memory states: Cold (full framing), Semi-warm (incremental framing), and Warm (no framing). Other than running the queries and measuring response times, the run_test_cycle() function must therefore perform additional actions, specifically clearing the cache and refreshing the model.

In [None]:
with perf_lab.ExecutionTracker(
    table_name = execution_log,
    description="Running a test cycle.") as tracker:

    test_suite = perf_lab.TestSuite(
        [
        perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                    TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                    DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
        ]
    )

    test_suite = perf_lab.initialize_test_cycle(
        test_suite = test_suite, 
        test_run_id = tracker.run_id,
        test_description = "This run is for test purposes."
    )

    inc_results = perf_lab.run_test_cycle(
        test_suite = test_suite,
        clear_query_cache = True,
        refresh_type = "clearValuesFull",
        tag = "testing only"
        )

    display(inc_results[0])
    display(inc_results[1])

### Provision master workspace, lakehouse, and semantic model with sample data
A sample lakehouse can be provisioned by calling the provision_sample_lakehouse() function with a table-generator function as an input parameter to customize the table generation. A sample semantic model can be provisioned by calling the function with a metadata-generator function as an input parameter to customize the semantic model generation.

In [None]:
# Can use a test suite to supply the workspace and lakehouse names.
test_suite = perf_lab.TestSuite(
    [
    perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
    ])

# TableGeneratorCallback = Callable[[UUID, UUID, dict], None]
def noop(
    workspace_id,
    lakehouse_id,
    table_properties,
):
    return None

id_pairs = perf_lab.provision_lakehouses(
    test_suite = test_suite,
    table_properties={},
    table_generator=noop,
    capacity = capacity_id
)

print(id_pairs)

In [None]:
# Can use a test suite to supply the dataset, workspace, and lakehouse names.
test_suite = perf_lab.TestSuite(
    [
    perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
    ])

# MetadataGeneratorCallback = Callable[[str, UUID, bool], None]
def no_metadata(
    semantic_model,
    workspace,
    remove_schema,
):
    return None

name_id_pairs = perf_lab.provision_master_semantic_models(
    test_suite = test_suite,
    semantic_model_mode = "OneLake",
    overwrite = True,
    metadata_generator = no_metadata,
    )

print(name_id_pairs)

In [None]:
# Can also provision lakehouses and semantic models individually without the help of a test suite.

# TableGeneratorCallback = Callable[[UUID, UUID, dict], None]
def create_people_table(
    workspace_id,
    lakehouse_id,
    table_properties,
):
    import pandas as pd
    from sempy_labs._helper_functions import save_as_delta_table
    
    data = {
        "name": ["Alice", "Bob", "Cathy"],
        "age": [30, 25, 27]
        }

    save_as_delta_table(
        dataframe = pd.DataFrame(data),
        delta_table_name = "People",
        write_mode = 'overwrite',
        lakehouse = lakehouse_id,
        workspace = workspace_id,
        )
    return None

(master_workspace_id, lakehouse_id) = perf_lab.provision_lakehouse(
    workspace = master_workspace, 
    lakehouse = lakehouse,
    table_properties={},
    table_generator=create_people_table,
    capacity = capacity_id
) 

print((master_workspace_id, lakehouse_id))

# MetadataGeneratorCallback = Callable[[str, UUID, bool], None]
def no_metadata(
    semantic_model,
    workspace,
    remove_shema,
):
    return None

(master_dataset_name, master_dataset_id) = perf_lab.provision_semantic_model(
    workspace = master_workspace_id, 
    lakehouse=lakehouse_id, 
    semantic_model_name = master_dataset,
    semantic_model_mode = "OneLake",
    overwrite = True,
    metadata_generator = no_metadata,
    )

print((master_dataset_name, master_dataset_id))

In [None]:
# Provision a sales sample lakehouse.
sslh_config = perf_lab.SalesLakehouseConfig(start_date="2025-01-25", years=5, fact_rows_in_millions=10)
print(sslh_config.to_dict())

(master_workspace_id, lakehouse_id) = perf_lab.provision_lakehouse(
    workspace = master_workspace, 
    lakehouse = lakehouse,
    table_properties=sslh_config.to_dict(),
    table_generator=perf_lab.provision_sales_tables,
    capacity = capacity_id
) 

In [None]:
(master_dataset_name, master_dataset_id) = perf_lab.provision_semantic_model(
    workspace = master_workspace_id, 
    lakehouse=lakehouse_id, 
    semantic_model_name = master_dataset,
    semantic_model_mode = "OneLake",
    overwrite = True,
    metadata_generator = perf_lab.apply_sales_metadata,
    )

print((master_dataset_name, master_dataset_id))

### Provision test semantic models
Creating numerous semantic models for testing can easily be accomplished by passing a test suite instance with the test definitions to the provision_test_semantic_models() function. For every unique combination of 'MasterWorkspace', 'MasterDataset', 'TargetWorkspace', and 'TargetDataset', this function creates the necessary semantic model clones that the test cycle later uses to run DAX queries.

In [None]:
# Start with a test suite to supply the master and target dataset, workspace information.
test_suite = perf_lab.TestSuite(
    [
    perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
    ])

# Provision the test models by cloning the master semantic models
# in the specified test workspaces according to the test definitions.
perf_lab.provision_test_semantic_models( 
    test_suite = test_suite,
    capacity = capacity_id,
    refresh_clones = True,
    )

### Refresh and warm up the test models
Before updating Delta tables and refreshing Direct Lake models, it is a good idea to simulate semantic models that are currently in use by running all the test queries without tracing. This brings the test semantic models into warm state.

In [None]:
# Execute all queries in the test suite against their test models
# so that all relevant column data is loaded into memory.
test_suite = perf_lab.TestSuite(
    [
    perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
    ])

perf_lab.refresh_test_models(
    test_suite = test_suite,
    refresh_type = 'clearValuesFull'
) 

perf_lab.warmup_test_models(
    test_suite = test_suite
) 

### Analyze Delta tables and semantic model tables
To investigate the dependencies and interactions between Delta tables and Direct Lake models in various configurations, the perf lab includes functions to analyze the column segments for each table in the semantic model as well as the parquet files, storage groups, and other information for the Delta tables.

In [None]:
test_suite = perf_lab.TestSuite(
    [
    perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
    ])

test_suite = perf_lab.initialize_test_cycle(
    test_suite = test_suite, 
    test_description = "This run is for test purposes."
    )

table_info = perf_lab.get_source_tables(
    test_suite = test_suite
    )

display(table_info)

column_segments = perf_lab.get_storage_table_column_segments(
    test_suite = test_suite,
    tables_info = table_info
    )

display(column_segments)

### Simulate Lakehouse ETL
The perf lab has no real ETL pipeline and must therefore rely on a simulated ETL process. The perf lab accomplishes the work with the help of a sample callback function. Refer to the source code if you want to implement your own table update logic.

In [None]:
# To update Delta tables, determine the list of Delta tables that must be processed. 
test_suite = perf_lab.TestSuite(
    [
    perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
    ])

test_suite = perf_lab.initialize_test_cycle(
    test_suite = test_suite, 
    test_description = "This run is for test purposes."
    )

table_info = perf_lab.get_source_tables(
    test_suite = test_suite
    )

# Use the sample _filter_by_prefix() callback function 
# to perform a rolling window update by deleting the oldest DateID
# and reinserting it as the newest DateID.
# The _filter_by_prefix() callback function expects a property bag
# that identifies the DateID column as the key column.
perf_lab.simulate_etl(
    source_tables_info = table_info,
    update_properties = {"key_column": "name", "Optimize": True},
    update_function = perf_lab.delete_reinsert_rows
)

### Deprovision perf lab resources
The perf lab also provides functions to clean up provisioned resources. However, the perf lab does not delete workspaces to avoid accidental data loss if an existing workspace with unrelated items was used in the perf lab. Note that deprovisioning lakehouses listed in the test definitions does indeed remove these lakehouses with all their tables. Make sure these lakehouses only contain perf lab tables or delete the resources manually.

In [None]:
test_suite = perf_lab.TestSuite(
    [
    perf_lab.TestDefinition(QueryId="TestQuery", QueryText="EVALUATE SUMMARIZECOLUMNS(\"Test\", \"Hello World\")", MasterWorkspace=master_workspace, MasterDataset=master_dataset,
                TargetWorkspace=test_workspace, TargetDataset=test_dataset_A, DatasourceName=lakehouse,
                DatasourceWorkspace=master_workspace, DatasourceType="Lakehouse")
    ])

perf_lab.deprovision_semantic_models(
    test_suite = test_suite,
    delete_masters = True
    )

# Delete the lakehouses listed in the test definitions.

perf_lab.deprovision_lakehouses(
    test_suite = test_suite
    )