# Prepare data for ML - Framework Container

**SageMaker Studio Kernel**: Data Science

In this exercise you will do:
 - Run a Preprocessing Job using Amazon SageMaker Processing Job for preparing data for training ML models

***

## Part 1/2 - Setup
Here we'll import some libraries and define some variables. You can also take a look on the scripts that were previously created for preparing the data and training our model.

In [None]:
import boto3
import logging
import sagemaker
from sagemaker.processing import FrameworkProcessor, ProcessingInput, ProcessingOutput
from sagemaker.sklearn.estimator import SKLearn

In [None]:
logging.basicConfig(level=logging.INFO)
LOGGER = logging.getLogger(__name__)

In [None]:
sagemaker_client = boto3.client("sagemaker")
s3_client = boto3.client("s3")

***

### Global configurations

Configuration variables used for Processing, Training, and registration

In [None]:
region = boto3.session.Session().region_name
role_name = "mlops-sagemaker-execution-role"
role = "arn:aws:iam::{}:role/{}".format(boto3.client('sts').get_caller_identity().get('Account'), role_name)

kms_account_id = boto3.client('sts').get_caller_identity().get('Account')

kms_alias = "ml-kms"

bucket_name = ""

In [None]:
boto_session = boto3.Session(region_name=region)

sagemaker_client = boto_session.client("sagemaker")
runtime_client = boto_session.client("sagemaker-runtime")

sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_runtime_client=runtime_client,
    default_bucket=bucket_name
)

In [None]:
kms_key = "arn:aws:kms:{}:{}:alias/{}".format(region, kms_account_id, kms_alias)

***

## Part 2/2: Run the processing job

### Step 1/3: Create the Processing Job

#### Define input variables

In [None]:
processing_artifact_path = "artifact/processing"
processing_artifact_name = "sourcedir.tar.gz"
processing_framework_version = "0.23-1"
processing_instance_count = 1
processing_instance_type = "ml.t3.large"
processing_input_files_path = "data/input"
processing_output_files_path = "data/output"

#### Get the dataset and upload it to an S3 bucket

In [None]:
# Download the 
# clean the buckets first
s3_client.delete_object(Bucket=bucket_name, Key=processing_input_files_path)

input_data = sagemaker_session.upload_data('./../data/data.csv', key_prefix=processing_input_files_path)

LOGGER.info(input_data)

#### Create Processor

#### Compress source code for installing additional python modules

In [None]:
! pygmentize ./../algorithms/processing/src/processing.py

In [None]:
! ./../algorithms/buildspec.sh processing $bucket_name

In [None]:
processor = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version=processing_framework_version,
    role=role,
    instance_count=processing_instance_count,
    instance_type=processing_instance_type,
    output_kms_key=kms_key,
    sagemaker_session=sagemaker_session
)

In [None]:
run_args = processor.get_run_args(
        "processing.py",
        source_dir="s3://{}/{}/{}".format(bucket_name,
                                      processing_artifact_path,
                                      processing_artifact_name
                                      ),
        inputs=[
            ProcessingInput(
                input_name="input",
                source="s3://{}/{}".format(bucket_name, processing_input_files_path),
                destination="/opt/ml/processing/input"
            )
        ],
        outputs=[
            ProcessingOutput(
                output_name="output",
                source="/opt/ml/processing/output",
                destination="s3://{}/{}".format(bucket_name, processing_output_files_path))
        ]
    )

In [None]:
processor.run(
    code=run_args.code,
    inputs=run_args.inputs,
    outputs=run_args.outputs,
    wait=True
)

We have just seen how to prepare data using Amazon SageMaker Processing Job. If you want to provide a Custom Python script for training a ML model using a SageMaker Framework Container, you can execute the following lab.

 > [Train-Build-Model-Framework-Container](./05-Train-Build-Model-Framework-Container.ipynb)

If you want to create a Custom Framework Container and provide a Custom Python script for training a ML model, you can execute the following lab (Optional).

 > [Train-Custom-Script-Container](./06-Train-Build-Model-Custom-Script-Container.ipynb)

If you want to create a Custom Container for training a ML model, you can execute the following lab (Optional).

 > [Train-Custom-Container](./07-Train-Build-Model-Custom-Container.ipynb)

If you want to explore Amazon SageMaker Feature Store, you can execute the following lab (Optional).

 > [Store-Features](./04-Store-Features.ipynb)

If you want to create a Custom Framework Container and provide a Custom Python script as input for the processing Job, you can execute the following lab (Optional).

 > [Prepare-Data-ML-Custom-Script-Container](./02-Prepare-Data-ML-Custom-Script-Container.ipynb)

If you want to create a Custom Container for the processing Job, you can execute the following lab (Optional).

 > [Prepare-Data-ML-Custom-Container](./03-Prepare-Data-ML-Custom-Container.ipynb)