# Computer Vision for Medical Imaging: Part 2. Model Lineage and Model Registry
This notebook is part 2 of a 4-part series of techniques and services offer by SageMaker to build a model which predicts if an image of cells contains cancer. This notebook gives an overview of how to track model lineage, how to create a model registry, and how to store models into the registry.

## Dataset
The dataset for this demo comes from the [Camelyon16 Challenge](https://camelyon16.grand-challenge.org/) made available under the CC0 licencse. The raw data provided by the challenge has been processed into 96x96 pixel tiles by [Bas Veeling](https://github.com/basveeling/pcam) and also made available under the CC0 license. For detailed information on each dataset please see the papers below:
* Ehteshami Bejnordi et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA: The Journal of the American Medical Association, 318(22), 2199–2210. [doi:jama.2017.14585](https://doi.org/10.1001/jama.2017.14585)
* B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling. "Rotation Equivariant CNNs for Digital Pathology". [arXiv:1806.03962](http://arxiv.org/abs/1806.03962)

The tiled dataset from Bas Veeling is over 6GB of data. In order to easily run this demo, the dataset has been pruned to the first 14,000 images of the tiled dataset and comes included in the repo with this notebook for convenience.

## Update Sagemaker SDK and Boto3

<div class="alert alert-warning">
<b>NOTE</b> You may get an error from pip's dependency resolver; you can ignore this error.
</div>

In [None]:
%store -r
%store

## Import Libraries

In [None]:
import boto3
import sagemaker
import numpy as np
import cv2

from inference_specification import InferenceSpecification

## Configure Boto3 Clients and Sessions

In [None]:
region = "us-west-2" # Change region as needed
boto3.setup_default_session(region_name=region)
boto_session = boto3.Session(region_name=region)

s3_client = boto3.client('s3', region_name=region)

sagemaker_boto_client = boto_session.client('sagemaker')
sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_boto_client)
sagemaker_role = sagemaker.get_execution_role()

bucket = sagemaker.Session().default_bucket()

## Examine Lineage
Though you already know the training job details from the previous notebook, if we were just given the model uri, we could use SageMaker Lineage to retrieve the training job details which produced the model.

### Data Lineage and Metrics for Best Model

In [None]:
from sagemaker.lineage import context, artifact, association, action

### Training data artifact

In [None]:
results = sagemaker.analytics.HyperparameterTuningJobAnalytics(tuning_job_name)
results_df = results.dataframe()
best_training_job_summary = results.description()['BestTrainingJob']
best_training_job_details = sagemaker_boto_client.describe_training_job(TrainingJobName=best_training_job_name)

In [None]:
data_artifact_list = []
for data_input in best_training_job_details['InputDataConfig']:
    channel = data_input['ChannelName']
    data_s3_uri = data_input['DataSource']['S3DataSource']['S3Uri']
   
    matching_artifacts = list(artifact.Artifact.list(
        source_uri=data_s3_uri,
        sagemaker_session=sagemaker_session)
    )
    
    if matching_artifacts:
        data_artifact = matching_artifacts[0]
        print(f'Using existing artifact: {data_artifact.artifact_arn}')
    else:
        data_artifact = artifact.Artifact.create(
            artifact_name=channel,
            source_uri=data_s3_uri,
            artifact_type='DataSet',
            sagemaker_session=sagemaker_session)
        print(f'Create artifact {data_artifact.artifact_arn}: SUCCESSFUL')
    data_artifact_list.append(data_artifact)

### Model artifact

In [None]:
trained_model_s3_uri = best_training_job_details['ModelArtifacts']['S3ModelArtifacts']

matching_artifacts = list(artifact.Artifact.list(
    source_uri=trained_model_s3_uri,
    sagemaker_session=sagemaker_session)
)

if matching_artifacts:
    model_artifact = matching_artifacts[0]
    print(f'Using existing artifact: {model_artifact.artifact_arn}')
else:
    model_artifact = artifact.Artifact.create(
        artifact_name='TrainedModel',
        source_uri=trained_model_s3_uri,
        artifact_type='Model',
        sagemaker_session=sagemaker_session)
    print(f'Create artifact {model_artifact.artifact_arn}: SUCCESSFUL')

#### Set artifact associations

In [None]:
trial_component = sagemaker_boto_client.describe_trial_component(TrialComponentName=best_training_job_summary['TrainingJobName']+'-aws-training-job')
trial_component_arn = trial_component['TrialComponentArn']

#### Store artifacts

In [None]:
artifact_list = data_artifact_list + [model_artifact]

for artif in artifact_list:
    if artif.artifact_type == 'DataSet':
        assoc = 'ContributedTo'
    else:
        assoc = 'Produced'
    try:
        association.Association.create(
            source_arn=artif.artifact_arn,
            destination_arn=trial_component_arn,
            association_type=assoc,
            sagemaker_session=sagemaker_session)
        print(f"Association with {artif.artifact_type}: SUCCESSFUL")
    except:
        print(f"Association already exists with {artif.artifact_type}")

## Model Registry

In [None]:
mpg_name = prefix

model_packages = sagemaker_boto_client.list_model_packages(ModelPackageGroupName=mpg_name)['ModelPackageSummaryList']

if model_packages:
    print(f'Using existing Model Package Group: {mpg_name}')
else:
    mpg_input_dict = {
        'ModelPackageGroupName': mpg_name,
        'ModelPackageGroupDescription': 'Cancer metastasis detection'
    }

    mpg_response = sagemaker_boto_client.create_model_package_group(**mpg_input_dict)
    print(f'Create Model Package Group {mpg_name}: SUCCESSFUL')

In [None]:
%store mpg_name

In [None]:
training_jobs = results_df['TrainingJobName']

for job_name in training_jobs:
    job_data = sagemaker_boto_client.describe_training_job(TrainingJobName=job_name)
    model_uri = job_data.get('ModelArtifacts', {}).get('S3ModelArtifacts')
    training_image = job_data['AlgorithmSpecification']['TrainingImage']
    
    mp_inference_spec = InferenceSpecification().get_inference_specification_dict(
        ecr_image=training_image,
        supports_gpu=False,
        supported_content_types=['text/csv'],
        supported_mime_types=['text/csv'])

    mp_inference_spec['InferenceSpecification']['Containers'][0]['ModelDataUrl'] = model_uri
    mp_input_dict = {
        'ModelPackageGroupName': mpg_name,
        'ModelPackageDescription': 'SageMaker Image Classifier',
        'ModelApprovalStatus': 'PendingManualApproval'
    }

    mp_input_dict.update(mp_inference_spec)
    mp_response = sagemaker_boto_client.create_model_package(**mp_input_dict)
    
model_packages = sagemaker_boto_client.list_model_packages(ModelPackageGroupName=mpg_name, MaxResults=6)['ModelPackageSummaryList']
model_packages

In [None]:
%store model_packages