This notebook contains example of SageMaker Pipeline definition for HavosAi using AWS Python SDK
It is based on this example: 

https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html (click edit on github in the top right corner to get the source notebook)

First off, upgrade pip and SageMaker SDK libraries. Also get AWS context information required for work with SDK

In [1]:
!python -m pip install --upgrade pip
!pip install --upgrade sagemaker

import boto3
import sagemaker


region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()



#### Set default S3 bucket with required artifacts and input data

In [2]:
default_bucket = "test-sagemaker-pipeline-on-cosai"
input_data_path_in_bucket = "subfolder-0-reduced.csv"
search_index_path_in_bucket = "search_index.pickle"
population_tags_path_in_bucket = "population_tags.xlsx"
outcomes_bert_model_path_in_bucket = "bert_exp_outcome_sentences_new_multilabel_15epoch_1300_mixed_0.7/"


def get_s3_uri(bucket, path):
    return f"s3://{bucket}/{path}"

### Define Parameters to Parametrize Pipeline Execution

Define Pipeline parameters that you can use to parametrize the pipeline. Parameters enable custom pipeline executions and schedules without having to modify the Pipeline definition.

In [3]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)


processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
processing_instance_type = ParameterString(
    name="ProcessingInstanceType", default_value="ml.t3.medium"
)
input_data = ParameterString(
    name="InputData",
    default_value=get_s3_uri(default_bucket, input_data_path_in_bucket),
)

outcomes_bert = ParameterString(
    name="OutcomesBert",
    default_value=get_s3_uri(default_bucket, outcomes_bert_model_path_in_bucket),
)


#### Clone HavosAI repo (if needed) to use for the processing steps

In [4]:
# !git clone https://github.com/HavosAi/HavosAi.git

## Create Processing Steps

### Create a docker image for processing steps

image contains whole HavosAI repo. Each step uses it's own script in HavosAi/sagemaker_pipeline

about custom processing containers in AWS docs with examples:

https://docs.aws.amazon.com/sagemaker/latest/dg/processing-container-run-scripts.html

https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html

Do not need to run untill libs or code base are changed

In [5]:
# %%writefile HavosAi/SageMaker_Pipeline.Dockerfile

# FROM conda/miniconda3:latest

# ENV PYTHONPATH="${PYTHONPATH}:/app:/app/src"

# # RUN conda config --set channel_priority strict

# COPY ./env-havos-linux.yml /app/env-havos-linux.yml
# RUN conda update conda
# RUN conda env create -f /app/env-havos-linux.yml --debug
# SHELL ["conda", "run", "-n", "havos", "/bin/bash", "-c"]
# RUN conda install pytorch
# # RUN pip uninstall --yes h5py
# # RUN pip install h5py
# # RUN pip uninstall --yes overrides
# # RUN pip install overrides
# # # RUN pip install make # for allennlp
# # # RUN pip install allennlp
# # RUN conda install allennlp -c pytorch -c allennlp -c conda-forge

# # check libs
# # RUN python import allennlp 

# RUN python -m nltk.downloader punkt words stopwords wordnet averaged_perceptron_tagger \
#     && python -m spacy download en \
#     && python -m spacy validate

# COPY ./ /app

# # ENTRYPOINT ["conda", "run", "-n", "havos", "--no-capture-output", "python"]
# ENTRYPOINT ["conda", "run", "-n", "havos", "python"]

For sagemaker studio notebook instances
https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/

In [6]:
# !pip install sagemaker-studio-image-build

#### Build docker image and push it to ECR repository

In [7]:
import boto3

account_id = boto3.client("sts").get_caller_identity().get("Account")
ecr_repository = "sagemaker-processing-container-havos-ai"
tag = ":latest"

uri_suffix = "amazonaws.com"
if region in ["cn-north-1", "cn-northwest-1"]:
    uri_suffix = "amazonaws.com.cn"
processing_repository_uri = "{}.dkr.ecr.{}.{}/{}".format(
    account_id, region, uri_suffix, ecr_repository + tag
)

# Create ECR repository and push docker image

## sagemaker studio docker option
#!sm-docker build -t $ecr_repository -f SageMaker_Pipeline.Dockerfile ./

!docker build -t $ecr_repository -f HavosAi/SageMaker_Pipeline.Dockerfile ./HavosAi 


Sending build context to Docker daemon  7.515MB
Step 1/10 : FROM conda/miniconda3:latest
 ---> 2c4c668a3586
Step 2/10 : ENV PYTHONPATH="${PYTHONPATH}:/app:/app/src"
 ---> Using cache
 ---> cf17d03a9533
Step 3/10 : COPY ./env-havos-linux.yml /app/env-havos-linux.yml
 ---> Using cache
 ---> aa2385fe3a1b
Step 4/10 : RUN conda update conda
 ---> Using cache
 ---> 664ff7554c2e
Step 5/10 : RUN conda env create -f /app/env-havos-linux.yml --debug
 ---> Using cache
 ---> 19d720ed90b2
Step 6/10 : SHELL ["conda", "run", "-n", "havos", "/bin/bash", "-c"]
 ---> Using cache
 ---> 0be08527b0fa
Step 7/10 : RUN conda install pytorch
 ---> Using cache
 ---> 05be192f0c7d
Step 8/10 : RUN python -m nltk.downloader punkt words stopwords wordnet averaged_perceptron_tagger     && python -m spacy download en     && python -m spacy validate
 ---> Using cache
 ---> 32673a1901e7
Step 9/10 : COPY ./ /app
 ---> 6b7643634f67
Step 10/10 : ENTRYPOINT ["conda", "run", "-n", "havos", "python"]
 ---> Running in 94bbf80b

In [8]:
!$(aws ecr get-login --region $region --registry-ids $account_id --no-include-email)
# # Uncomment if you need to create a new repository
# #!aws ecr create-repository --repository-name $ecr_repository

!docker tag {ecr_repository + tag} $processing_repository_uri
!docker push $processing_repository_uri

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
The push refers to repository [089541407911.dkr.ecr.us-west-2.amazonaws.com/sagemaker-processing-container-havos-ai]

[1B8bbfa83a: Preparing 
[1B4dd7c152: Preparing 
[1B2ffbdf12: Preparing 
[1Bbdc23ae3: Preparing 
[1B4e4e5d72: Preparing 
[1B7cb5b41a: Preparing 
[1B73d4b40c: Preparing 
[8B8bbfa83a: Pushed   7.517MB/7.347MBA[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2Klatest: digest: sha256:9fdbc641ab69d0f66abc1d6682e11c335c6f0ce69f22ad25f48c5ac201115cdd size: 2012


In [9]:
# !docker ps

In [10]:
# to check free memory
# ! docker system df

In [11]:
# to delete unused images
# ! docker system prune --force

#### Create Script Processor definition to use for the Processing steps
Uses the same docker container

In [12]:
from sagemaker.processing import ScriptProcessor

script_processor = ScriptProcessor(
    command=["conda", "run", "-n", "havos", "python"],
    image_uri=processing_repository_uri,
    role=role,
    instance_count=processing_instance_count,
    instance_type=processing_instance_type,
)

### Get local paths in container images

In [13]:
from HavosAi.src.sagemaker_pipeline.constants import (
    CONFIG_INPUTS_DIR, 
    INPUTS_DIR,
    OUTPUTS_DIR,
    MODELS_DIR,
    OUTCOMES_MODEL,
    MODELS_INPUTS_DIR,
    DATA_INPUTS_DIR,
    SEARCH_INPUTS_DIR,
    ABBREV_INPUTS_DIR,
)

In [14]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.workflow.steps import CacheConfig

### AbbreviationsResolver

In [15]:
step_abbr_resolver = ProcessingStep(
    name="AbbreviationsResolver",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=f"s3://test-sagemaker-pipeline-on-cosai/subfolder-0-reduced.csv",
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/model/abbreviations_dicts",
            destination=MODELS_INPUTS_DIR,
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="AbbreviationsResolverOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
        ProcessingOutput(
            output_name="abbreviation_resolver", 
            source=OUTCOMES_MODEL,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/AbbreviationsResolverStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### SearchIndex

In [16]:
step_search_index = ProcessingStep(
    name="SearchIndex",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_abbr_resolver.properties.ProcessingOutputConfig.Outputs["AbbreviationsResolverOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/model/abbreviations_dicts",
            destination=MODELS_INPUTS_DIR,
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="SearchIndexOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
        ProcessingOutput(
            output_name="search_index", 
            source=OUTCOMES_MODEL,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/SearchIndexStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### AdvancedTextNormalizer

In [17]:
step_adv_normalizer = ProcessingStep(
    name="AdvancedTextNormalizer",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_search_index.properties.ProcessingOutputConfig.Outputs["SearchIndexOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/abbreviation_resolver.pickle",
            destination=MODELS_INPUTS_DIR,
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="AdvancedTextNormalizerOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/AdvancedTextNormalizerStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### KeywordsNormalizer

In [18]:
step_keywords_norm = ProcessingStep(
    name="KeywordsNormalizer",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_adv_normalizer.properties.ProcessingOutputConfig.Outputs["AdvancedTextNormalizerOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="KeywordsNormalizerOutput",
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/KeywordsNormalizerStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### JournalNormalizer

In [19]:
step_journal_normalizer = ProcessingStep(
    name="JournalNormalizer",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_keywords_norm.properties.ProcessingOutputConfig.Outputs["KeywordsNormalizerOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="JournalNormalizerOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/JournalNormalizerStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### AuthorAndAffiliationsProcessing


In [20]:
step_author_affill_proc = ProcessingStep(
    name="AuthorAndAffiliationsProcessing",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_journal_normalizer.properties.ProcessingOutputConfig.Outputs["JournalNormalizerOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/data",
            destination="/opt/ml/processing/data",
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="AuthorAndAffiliationsProcessingOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/AuthorAndAffiliationsProcessingStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### GeoNameFinder

In [35]:
step_geo_name_finder = ProcessingStep(
    name="GeoNameFinder",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_author_affill_proc.properties.ProcessingOutputConfig.Outputs["AuthorAndAffiliationsProcessingOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/search_index.pickle",
            destination=MODELS_INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/data",
            destination="/opt/ml/processing/data",
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="GeoNameFinderOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/GeoNameFinderStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### CropsSearch

In [22]:
step_crops_search = ProcessingStep(
    name="CropsSearch",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_geo_name_finder.properties.ProcessingOutputConfig.Outputs["GeoNameFinderOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/search_index.pickle",
            destination=MODELS_INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/data",
            destination="/opt/ml/processing/data",
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="CropsSearchOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/CropsSearchStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### PopulationTagsFinder

In [23]:
step_popul_tags_finder = ProcessingStep(
    name="PopulationTagsFinder",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_crops_search.properties.ProcessingOutputConfig.Outputs["CropsSearchOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/search_index.pickle",
            destination=MODELS_INPUTS_DIR,
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="PopulationTagsFinderOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/PopulationTagsFinderStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### ColumnFiller

In [24]:
step_column_filler = ProcessingStep(
    name="ColumnFiller",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_popul_tags_finder.properties.ProcessingOutputConfig.Outputs["PopulationTagsFinderOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/search_index.pickle",
            destination=SEARCH_INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/abbreviation_resolver.pickle",
            destination=ABBREV_INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/data/population_tags.xlsx",
            destination="/opt/ml/processing/data",
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="ColumnFillerOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/ColumnFillerStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

In [25]:
## Interventions Step

### ProgramExtractor

In [26]:
step_program_extractor = ProcessingStep(
    name="ProgramExtractor",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_column_filler.properties.ProcessingOutputConfig.Outputs["ColumnFillerOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/search_index.pickle",
            destination=SEARCH_INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/output/abbreviation_resolver.pickle",
            destination=ABBREV_INPUTS_DIR,
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/data/extracted_programs.xlsx",
            destination="/opt/ml/processing/data",
        ),
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/model/programs_extraction_model_2619",
            destination="/opt/ml/processing/tmp/programs_extraction_model_2619",
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="ProgramExtractorOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output",
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/ProgramExtractorStep.py",
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

### OutcomesFinder

In [27]:


step_outcomes = ProcessingStep(
    name="OutcomesFinder",
    processor=script_processor,
    inputs=[
        ProcessingInput(
            source="s3://test-sagemaker-pipeline-on-cosai/config.json",
            destination=CONFIG_INPUTS_DIR,
        ),
        ProcessingInput(
            source=step_program_extractor.properties.ProcessingOutputConfig.Outputs["ProgramExtractorOutput"].S3Output.S3Uri,
            destination=INPUTS_DIR,
        ),
        ProcessingInput(
            source=outcomes_bert,
            destination=OUTCOMES_MODEL,
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="OutcomesFinderOutput", 
            source=OUTPUTS_DIR,
            destination="s3://test-sagemaker-pipeline-on-cosai/output"
        ),
    ],
    code="HavosAi/src/sagemaker_pipeline/OutcomesFinderStep.py",   
    cache_config = CacheConfig(enable_caching=True, expire_after="1y")
)

## Define a Pipeline of Parameters, Steps, and Conditions
In this section, combine the steps into a Pipeline so it can be executed.

A pipeline requires a name, parameters, and steps. Names must be unique within an (account, region) pair.

Note:

All the parameters used in the definitions must be present.
Steps passed into the pipeline do not have to be listed in the order of execution. The SageMaker Pipeline service resolves the data dependency DAG as steps for the execution to complete.
Steps must be unique to across the pipeline step list and all condition step if/else lists.

In [36]:
from sagemaker.workflow.pipeline import Pipeline


pipeline_name = f"HavosAIPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_type,
        processing_instance_count,
        input_data,
        outcomes_bert
    ],
    steps=[
        step_abbr_resolver,
        step_search_index,
        step_adv_normalizer,
        step_keywords_norm,
        step_journal_normalizer,
        step_author_affill_proc,
        step_geo_name_finder,
        step_crops_search,
        step_popul_tags_finder,
        step_column_filler,
        step_program_extractor,
        step_outcomes,
    ],
)

### Check generated pipeline definition json

In [37]:
import json


definition = json.loads(pipeline.definition())
definition


{'Version': '2020-12-01',
 'Metadata': {},
 'Parameters': [{'Name': 'ProcessingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.t3.medium'},
  {'Name': 'ProcessingInstanceCount', 'Type': 'Integer', 'DefaultValue': 1},
  {'Name': 'InputData',
   'Type': 'String',
   'DefaultValue': 's3://test-sagemaker-pipeline-on-cosai/subfolder-0-reduced.csv'},
  {'Name': 'OutcomesBert',
   'Type': 'String',
   'DefaultValue': 's3://test-sagemaker-pipeline-on-cosai/bert_exp_outcome_sentences_new_multilabel_15epoch_1300_mixed_0.7/'}],
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
  'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Name': 'AbbreviationsResolver',
   'Type': 'Processing',
   'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': {'Get': 'Parameters.ProcessingInstanceType'},
      'InstanceCount': {'Get': 'Parameters.ProcessingInstanceCount'},
      'VolumeSizeInGB': 30}},
    'AppSpecification': {'ImageUri

### Register pipeline in AWS

In [38]:
# register pipeline
pipeline.upsert(role_arn=role)

{'PipelineArn': 'arn:aws:sagemaker:us-west-2:089541407911:pipeline/havosaipipeline',
 'ResponseMetadata': {'RequestId': '2dee331a-99b0-4702-b090-03dd5a6f4163',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '2dee331a-99b0-4702-b090-03dd5a6f4163',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '83',
   'date': 'Fri, 15 Apr 2022 07:22:03 GMT'},
  'RetryAttempts': 0}}

### Download full repo

In [None]:
!conda install -y -c conda-forge zip

Collecting package metadata (current_repodata.json): done
Solving environment: - 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/linux-64::graphviz==2.47.3=h85b4f2f_0
  - conda-forge/linux-64::matplotlib==3.3.4=py36h5fab9bb_0
  - pytorch/linux-64::torchvision==0.5.0=py36_cu101
  - conda-forge/linux-64::google-api-core==1.14.3=py36_0
  - conda-forge/noarch::ipywidgets==7.6.3=pyhd3deb0d_0
  - conda-forge/noarch::google-cloud-storage==1.20.0=py_0
  - conda-forge/noarch::pathy==0.5.2=pyhd8ed1ab_0
  - conda-forge/linux-64::widgetsnbextension==3.5.1=py36h9f0ad1d_4
  - conda-forge/linux-64::nbconvert==6.0.7=py36h5fab9bb_3
  - conda-forge/linux-64::scikit-image==0.17.2=py36hd87012b_4
  - conda-forge/linux-64::nb_conda==2.2.1=py36h9f0ad1d_4
  - conda-forge/noarch::sphinx==4.0.2=pyh6c4a22f_1
  - conda-forge/noarch::typer==0.3.1=py_0
  - conda-forge/linux-64::matplotlib-base==3.3.4=py36hd391965_0
  - 

In [None]:
!zip -r -X folder.zip ./