# Share components, environments and models across workspaces

This is the companion notebook for the article on sharing components, environments and models across workspaces: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-share-models-pipelines-across-workspaces-with-registries 

### Prerequisites
Review the prerequisites section in the article: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-share-models-pipelines-across-workspaces-with-registries?tabs=python#prerequisites. To summarize, in addition to the Python SDK, you need an AzureML registry and an AzureML workspace in a region that is supported by the workspace.


### Scenarios

There are two scenarios where you'd want to use the same set of models, components and environments in multiple workspaces:

* Cross-workspace MLOps: You're training a model in a dev workspace and need to deploy it to test and prod workspaces. In this case you, want to have end-to-end lineage between endpoints to which the model is deployed in test or prod workspaces and the training job, metrics, code, data and environment that was used to train the model in the dev workspace.
* Share and reuse models and pipelines across different teams: Sharing and reuse improve collaboration and productivity. In this scenario, you may want to publish a trained model and the associated components and environments used to train it to a central catalog. From there, colleagues from other teams can search and reuse the assets you shared in their own experiments.

### Goals
* Create an environment and component in the registry.
* Use the component from registry to submit a model training job in a workspace.
* Register the trained model in the registry.
* Deploy the model from the registry to an online-endpoint in the workspace, then submit an inference request.


In [None]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component
from azure.ai.ml.entities import (
    Environment,
    BuildContext,
    Model,
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    CodeConfiguration,
)
from azure.ai.ml.constants import AssetTypes
import time, datetime, os

# print the sdk version - you many want to share this in the issue you will report if parts of this notebook don't work
!pip show azure-ai-ml

### Setup authentication

We are using `DefaultAzureCredential` to get access to workspace. When an access token is needed, it requests one using multiple identities(`EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential, VisualStudioCodeCredential, AzureCliCredential, AzurePowerShellCredential`) in turn, stopping when one provides a token.
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for more information.

`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python) for all available credentials if it does not work for you.  

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## Connect to a workspace and registry

Most samples create one client to connect to the workspace. However, in this sample, you need two clients. First client, called `ml_client_workspace`, will be used to connect to a workspace and run jobs or deploy endpoints. Second client, called `ml_client_registry` will be used to connect to the registry to create components, environments and models.

Replace the following:
* `<SUBSCRIPTION_ID>`
* `<RESOURCE_GROUP>`
* `<AML_WORKSPACE_NAME>`
* `<REGISTRY_NAME>`
 

In [None]:
ml_client_workspace = MLClient(
    credential=credential,
    subscription_id="<SUBSCRIPTION_ID>",
    resource_group_name="<RESOURCE_GROUP>",
    workspace_name="<AML_WORKSPACE_NAME>",
)
print(ml_client_workspace)

ml_client_registry = MLClient(credential=credential, registry_name="<REGISTRY_NAME>")
print(ml_client_registry)

### Create a version number and setup root directory 
Make sure that you set the version number to something unique if this notebook has been run before. You can use the timestamp to generate a unique version number, the sample code for which is commented out. This will prevent any name and version conflicts when creating assets.

Set the root directory in which the YAML definitions of the components, environments, etc. are present.

In [None]:
import time
import sys

# version = str(123456)
version = str(int(time.time()))
print("version: ", version)

parent_dir = os.path.abspath(
    os.path.join(
        sys.path[0],
        "../../../../cli/jobs/pipelines-with-components/nyc_taxi_data_regression",
    )
)

### Create environment in registry

You will use a docker file to create the environment. The docker file has base python image and few python dependencies required to run Scikit Learn training jobs. This notebook: [../environment/environment.ipynb](../environment/environment.ipynb)has more samples for environment create.

Note that we use the `ml_client_registry` client because we plan to create the environment in registry. The syntax for creating environment in a workspace or registry are identical. You just use a client that is specific to the target - workspace or registry.

In [None]:
env_docker_context = Environment(
    build=BuildContext(path=os.path.join(parent_dir, "env_train")),
    name="SKLearnEnv",
    version=version,
    description="Scikit Learn environment",
)
ml_client_registry.environments.create_or_update(env_docker_context)

### Get environment from registry

Get the environment using the `ml_client_registry` client. The syntax for getting environment in a workspace or registry are identical. You just use a client that is specific to the target - workspace or registry.

You will use this environment in the next step to create a component in the registry.

In [None]:
env_from_registry = ml_client_registry.environments.get(
    name="SKLearnEnv", version=version
)
print(env_from_registry)

### Create component in registry

You will use the [`train.yml`](../../../cli/jobs/pipelines-with-components/nyc_taxi_data_regression/train.yml) component YAML defined in `cli/jobs/pipelines-with-components/nyc_taxi_data_regression` for this. This component runs a Scikit Learn training python script. The `train.yml` refers to the AzureML curated environment for the Scikit Learn framework: `AzureML-sklearn-0.24-ubuntu18`, but you will over ride this to use the Scikit Learn environment you created in the previous step.

A similar sample notebook shows how to create these components in workspaces instead of registry, in which case you can use those components only in the specific workspace: https://github.com/Azure/azureml-examples/blob/main/sdk/jobs/pipelines/1e_pipeline_with_registered_components/pipeline_with_registered_components.ipynb


Use the `ml_client_registry` client to create the component in the registry. The syntax for creating component in a workspace or registry are identical. You just use a client that is specific to the target - workspace or registry.

In [None]:
# load component definition from YAML
print(parent_dir)
train_model = load_component(source=os.path.join(parent_dir, "train.yml"))
# print the component as yaml
print(train_model)

# change environment reference to the environment created in registry
train_model.environment = env_from_registry

# changing the version number is optional, but useful if a component with same name and version already exist in registry
train_model.version = version

print(train_model)
ml_client_registry.components.create_or_update(train_model)

### Get component from Registry

Get the component using the `ml_client_registry` client. The syntax for getting component from a workspace or registry are identical. You just use a client that is specific to the target - workspace or registry.

You will use this component in the next step to run a pipeline job to train a model.

In [None]:
train_component_from_registry = ml_client_registry.components.get(
    name="train_linear_regression_model", version=version
)
print(train_component_from_registry)

### Create a pipeline job using component from registry

Review this page to learn how to use pipelines and components: https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines. 

You will create a pipeline job that uses the training component created in the previous step using the Python DSL for pipelines. 

Make sure your workspace has a compute with the name `cpu-cluster` or update the compute name here: `pipeline_job.settings.default_compute = `

In [None]:
@pipeline()
def pipeline_with_registered_components(training_data):
    train_job = train_component_from_registry(
        training_data=training_data,
    )


pipeline_job = pipeline_with_registered_components(
    training_data=Input(type="uri_folder", path=parent_dir + "/data_transformed/"),
)
pipeline_job.settings.default_compute = "cpu-cluster"
print(pipeline_job)

### Run pipeline job using a component from registry

Submit pipeline job and wait for it to complete. Notice that you are using the workspace client: `ml_client_workspace` to run the pipeline job. This job is running a component that is not available in your workspace but is coming from a registry. This way, you can run this job in any workspace you have access to. This is useful when you want to develop a pipeline in th `dev` workspace with some sample data and run the pipeline in the `prod` workspace with actual data. This is also helpful if you want to share the components you develop with other teams in your organization who may be using a different workspace. 
To summarize, you can submit this job to different workspaces such as `dev`, `test` or `prod` by creating different ML clients for each of those workspaces.

In [None]:
pipeline_job = ml_client_workspace.jobs.create_or_update(
    pipeline_job,
    experiment_name="sdk_job_component_from_registry",
    skip_validation=True,
)
ml_client_workspace.jobs.stream(pipeline_job.name)
pipeline_job = ml_client_workspace.jobs.get(pipeline_job.name)
pipeline_job

### Create model in registry

You will now obtain the model trained by the pipeline job in the above step and create the model in the registry. For completeness, we are showing two options here.
* First option shows to create a model in registry from job output without downloading it. This option is recommended when you want to track the lineage between the training job and the model. You will create a model directly from the job output (without downloading it) in your workspace and then copy the model from workspace to registry. 
* Second option shows how to create a model in registry from local files. In this case you will download the model from the job output. This option is helpful if you have an existing model from some external source and want to host it in the registry to be shared with many workspaces.

Review this notebook to learn the different model types and how to create them in a workspace: [../../assets/model/model.ipynb](../model/model.ipynb). In the below example, you will work with a `mlflow_model` that will help you deploy this model for inference without writing any scoring scripts.

### Create model in workspace and copy it to registry

Step a: Get the model path from job output. Note that you use the `ml_client_workspace` to get the model path from job output in a workspace.

In [None]:
jobs = ml_client_workspace.jobs.list(parent_job_name=pipeline_job.name)
for job in jobs:
    if job.display_name == "train_job":
        print(job.name)
        model_path_from_job = (
            "azureml://jobs/{job_name}/outputs/artifacts/paths/model".format(
                job_name=job.name
            )
        )

print(model_path_from_job)

### Create model in workspace and copy it to registry
Step b: Create model in workspace from job output. Note that you use the `ml_client_workspace` to create the model in workspace.

In [None]:
mlflow_model = Model(
    path=model_path_from_job,
    type=AssetTypes.MLFLOW_MODEL,
    name="nyc-taxi-model",
    version=version,
    description="MLflow model created from job output",
)
print(mlflow_model)
ml_client_workspace.models.create_or_update(mlflow_model)

### Create model in workspace and copy it to registry

Step c: Get the model from workspace, prepare to copy it to registry and create the model registry using the `model_ready_to_copy` object. Note that you both the clients here: First, `ml_client_workspace` client to get the model from workspace and prepare the `model_ready_to_copy` object. Second, `ml_client_registry` to create the model in registry using the `model_ready_to_copy` object.

In [None]:
# fetch the model from workspace
model_in_workspace = ml_client_workspace.models.get(
    name="nyc-taxi-model", version=version
)
print("workspace model:\n\n", model_in_workspace)
# change the format such that the registry understands the model (when you print the model_ready_to_copy object, notice the asset id
model_ready_to_copy = ml_client_workspace.models._prepare_to_copy(model_in_workspace)
print("\n\nmodel ready to copy:\n\n", model_ready_to_copy)
# copy the model from registry to workspace
ml_client_registry.models.create_or_update(model_ready_to_copy).wait()

### [OPTIONAL] Crete model from local files

Step a: Download the model from job output to a local folder. Note that you the `ml_client_workspace` client to download the model from workspace.


In [None]:
jobs = ml_client_workspace.jobs.list(parent_job_name=pipeline_job.name)
for job in jobs:
    if job.display_name == "train_job":
        print(job.name)
        ml_client_workspace.jobs.download(job.name)

### [OPTIONAL] Crete model from local files 
Step b: Create a model in registry from files in a local folder. Note that you use the `ml_client_registry` client to create the model in registry. The syntax for creating model in a workspace or registry are identical. You just use a client that is specific to the target - workspace or registry.

> **Warning:** If you have successfully created a model in registry in the previous steps, this section will fail as the model with the name and version will already exist. 

In [None]:
# this section is optional, will fail if this model name and version is already created in the registry in the previous steps
mlflow_model = Model(
    path="./artifacts/model/",
    type=AssetTypes.MLFLOW_MODEL,
    name="nyc-taxi-model",
    version=str(int(version) + 1),
    description="MLflow model created from local path",
)
print(mlflow_model)
ml_client_registry.models.create_or_update(mlflow_model)

### Deploy model from registry to online endpoint in workspace

You will deploy the model to an online endpoint and submit some sample inference requests in this section. Note that just like jobs, endpoints that host models are specific to a workspace. You can deploy the a model from a registry to many workspaces. This helps you develop a model in `dev` workspace, share it with a registry, and then deploy it to `test` and `prod` workspaces. 

### Get model from registry

Use the `ml_client_registry` client to get the model created in previous section from the registry. The syntax for creating component in a workspace or registry are identical. You just use a client that is specific to the target - workspace or registry.

In [None]:
mlflow_model_from_registry = ml_client_registry.models.get(
    name="nyc-taxi-model", version=version
)
print(mlflow_model_from_registry)

### Create an online endpoint 

Create an online endpoint to deploy the model

In [None]:
online_endpoint_name = "endpoint-" + version
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is a sample online endpoint for mlflow model",
    auth_mode="key",
)
ml_client_workspace.begin_create_or_update(endpoint).wait()

### Deploy the model from registry to the online endpoint

In [None]:
# create a demo deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=mlflow_model_from_registry.id,
    instance_type="Standard_F4s_v2",
    instance_count=1,
)
ml_client_workspace.online_deployments.begin_create_or_update(demo_deployment).wait()

endpoint.traffic = {"demo": 100}
ml_client_workspace.begin_create_or_update(endpoint).result()

### Test the deployment

This section needs a sample request file `scoring-data.json` which is available in the root directory initialized in the beginning of this notebook: [../../../cli/jobs/pipelines-with-components/nyc_taxi_data_regression/scoring-data.json](../../../cli/jobs/pipelines-with-components/nyc_taxi_data_regression/scoring-data.json)

In [None]:
# test the  deployment with some sample data
ml_client_workspace.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file=parent_dir + "/scoring-data.json",
)

### Clean up resources

#### delete online endpoint 

In [None]:
print(f"online_endpoint_name: {online_endpoint_name}")
ml_client_workspace.online_endpoints.begin_delete(name=online_endpoint_name).wait()