# Sync Git-Enabled Workspaces from a Remote Repository

This notebook provides an automated solution to find and update Fabric workspaces that are integrated with Git. 

### Functionality
1.  **Finds Workspaces**: Scans the tenant for all Git-enabled workspaces on a dedicated capacity.
2.  **Checks Git Status**: Determines if each workspace is behind the remote `main` branch.
3.  **Updates Safely**: If a workspace is behind **and** has no local changes, it is automatically updated.
4.  **Reports Conflicts**: Returns a final DataFrame listing workspaces with potential merge conflicts that require manual review.

For full documentation, please see the `README.md` file in the GitHub repository.

#### 0. Check and Install Required Libraries
This section checks if the `semantic-link-labs` library is installed.
If not, it will be installed automatically. You can opt for configuring a workspace environment with the package installed to improve performance and maintenance instead.

In [10]:
import importlib
import sys

try:
    importlib.import_module('sempy_labs')
    #print("semantic-link-labs is already installed.")
except ImportError:
    print("semantic-link-labs not found. Installing now...")
    # Using pip to install the package. The -U flag ensures the latest version is installed.
    if 'ipykernel' in sys.modules:
        get_ipython().system('pip install -U semantic-link-labs')
    else:
        import subprocess
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-U", "semantic-link-labs"])
    print("Installation complete.")

StatementMeta(, 6b7a442a-6933-457d-a79d-cbfa9adba8c6, 14, Finished, Available, Finished)

### 1. Import Packages and Libraries

The following packages are required for the notebook to function. The key library is `sempy_labs`, which provides the necessary functions to interact with Fabric workspaces and their Git status.

In [11]:
import os
import sempy_labs
import pandas as pd
import requests

StatementMeta(, 6b7a442a-6933-457d-a79d-cbfa9adba8c6, 15, Finished, Available, Finished)

### 2. Define Core Functions

These functions handle the logic for fetching Git status and updating workspaces.

In [12]:
import pandas as pd
# Make sure sempy_labs is imported and configured

def get_git_status_for_workspaces(df_filtered):
    """
    Retrieves the Git status for a list of workspaces and returns it as a DataFrame.
    
    Args:
        df_filtered (pd.DataFrame): A DataFrame containing a 'Name' column with workspace names.
        
    Returns:
        tuple[pd.DataFrame, pd.DataFrame]: A tuple containing two DataFrames:
                                           1. A DataFrame with the detailed Git status for each item in the workspaces.
                                           2. A DataFrame listing workspaces where the Git status could not be retrieved.
    """
    git_status_dfs = []  # A list to hold the DataFrames from each workspace
    no_git_status_list = []
    
    for workspace_name in df_filtered["Name"]:
        try:
            # The function returns a DataFrame directly.
            git_status_df = sempy_labs.get_git_status(workspace=workspace_name)
            
            # Check if the returned DataFrame is empty.
            if git_status_df.empty:
                continue
            
            # Add the workspace name and append the whole DataFrame.
            git_status_df['Workspace Name'] = workspace_name
            git_status_dfs.append(git_status_df)
        
        except Exception as e:
            # Silently collect workspaces where status could not be retrieved.
            error_message = str(e)
            no_git_status_list.append({'Workspace Name': workspace_name, 'Error': error_message[:200]})

    # Concatenate all the collected DataFrames into a single one.
    final_git_status_df = pd.concat(git_status_dfs, ignore_index=True) if git_status_dfs else pd.DataFrame()
    
    total_workspaces = len(df_filtered["Name"])
    workspaces_with_status = len(git_status_dfs)

    additional_message = ""
    if workspaces_with_status == 0 and no_git_status_list:
        additional_message = "\nCheck 'no_git_status_df' for errors. This often means workspaces are not configured with Git."

    print(f"Found Git status for {workspaces_with_status} out of {total_workspaces} total workspaces.{additional_message}")
    
    return final_git_status_df, pd.DataFrame(no_git_status_list)

StatementMeta(, 6b7a442a-6933-457d-a79d-cbfa9adba8c6, 16, Finished, Available, Finished)

In [13]:
def update_clean_workspaces_from_git(git_status_df):
    """
    Updates workspaces that have no local changes and separates those that do.

    Args:
        git_status_df (pd.DataFrame): DataFrame from get_git_status_for_workspaces().
    
    Returns:
        pd.DataFrame: A DataFrame of workspaces with local changes that require manual review.
    """
    if git_status_df.empty:
        print("No workspaces with Git status information were found to process.")
        return pd.DataFrame()
        
    # Group by workspace to check the status of each one individually
    grouped = git_status_df.groupby('Workspace Name')
    
    # Identify workspaces that have local changes (potential conflicts)
    workspaces_with_local_changes = grouped.filter(
        lambda group: not pd.isna(group['Workspace Change']).all()
    )
    
    # Identify workspaces with no local changes (safe to update)
    clean_workspaces_to_update = grouped.filter(
        lambda group: pd.isna(group['Workspace Change']).all()
    )
    
    # Get a unique row for each clean workspace to use for the update API call
    unique_clean_workspaces = clean_workspaces_to_update.drop_duplicates(subset=['Workspace Name'])

    print(f"Found {len(unique_clean_workspaces)} workspaces that are safe to update automatically.")
    for _, row in unique_clean_workspaces.iterrows():
        try:
            # These parameters are safe because we've already filtered out workspaces with local changes
            sempy_labs.update_from_git(
                workspace=row['Workspace Name'],
                remote_commit_hash=row['Remote Commit Hash'],
                conflict_resolution_policy="PreferRemote",
                workspace_head=row['Workspace Head'],
                allow_override=True
            )
            print(f"✅ Successfully updated workspace: {row['Workspace Name']}")
        except Exception as e:
            print(f"❌ Failed to update workspace '{row['Workspace Name']}': {str(e)[:120]}")

    return workspaces_with_local_changes

StatementMeta(, 6b7a442a-6933-457d-a79d-cbfa9adba8c6, 17, Finished, Available, Finished)

### 3. Execute Synchronization

The following cells execute the process. It starts by listing all workspaces, filtering them, checking their Git status, and finally applying updates where it is safe to do so.

In [14]:
# Step 1: Get a list of all workspaces in the tenant.
# This requires tenant admin permissions for the user/principal running the notebook.
print("Fetching all workspaces in the tenant...")
all_workspaces_df = sempy_labs.admin.list_workspaces()
print(f"Found {len(all_workspaces_df)} total workspaces.")

StatementMeta(, 6b7a442a-6933-457d-a79d-cbfa9adba8c6, 18, Finished, Available, Finished)

Fetching all workspaces in the tenant...
Found 351 total workspaces.


In [15]:
# Step 2: Filter for standard workspaces on a dedicated capacity.
# This avoids personal workspaces and those not assigned to a capacity.
print("\nFinding workspaces on a dedicated capacity...")
filtered_workspaces_df = all_workspaces_df[(all_workspaces_df['Capacity Id'].notna()) & (all_workspaces_df['Type'] == "Workspace")]
print(f"Found {len(filtered_workspaces_df)} workspaces on a dedicated capacity.")

StatementMeta(, 6b7a442a-6933-457d-a79d-cbfa9adba8c6, 19, Finished, Available, Finished)


Finding workspaces on a dedicated capacity...
Found 17 workspaces on a dedicated capacity.


In [25]:
# Step 3: Retrieve the Git status for the filtered workspaces.
print("\nChecking Git status for each workspace...")
git_status_df, no_git_status_df = get_git_status_for_workspaces(filtered_workspaces_df)

StatementMeta(, 6b7a442a-6933-457d-a79d-cbfa9adba8c6, 29, Finished, Available, Finished)


Checking Git status for each workspace...
Found Git status for 1 out of 17 total workspaces.


In [7]:
# Step 4: Run the update function for all eligible workspaces.
print("\nStarting update process for workspaces with no local changes...")
non_null_workspaces_df = update_clean_workspaces_from_git(git_status_df)

StatementMeta(, afbe86e4-1e5c-4515-8bf2-633688f2688a, 11, Finished, Available, Finished)


Starting update process for workspaces with no local changes...
No workspaces with Git status information were found to process.


### 4. Review Workspaces Requiring Manual Sync

The following workspaces have changes made in PBI / Fabric and were **not** automatically updated. Please review each workspace in the Fabric UI to commit changes or resolve conflicts manually.

In [8]:
if not non_null_workspaces_df.empty:
    print(f"\nFound {len(non_null_workspaces_df['Workspace Name'].unique())} workspaces that require manual review:")
    display(non_null_workspaces_df)
else:
    print("\nAll Git-enabled workspaces are at or ahead of git branch. No manual review needed.")

StatementMeta(, afbe86e4-1e5c-4515-8bf2-633688f2688a, 12, Finished, Available, Finished)


All Git-enabled workspaces are in sync. No manual review needed.
