# Managed Data Processing with SageMaker Processing in Python

<img align="left" width="130" src="https://raw.githubusercontent.com/PacktPublishing/Amazon-SageMaker-Cookbook/master/Extra/cover-small-padded.png"/>

This notebook contains the code to help readers work through one of the recipes of the book [Machine Learning with Amazon SageMaker Cookbook: 80 proven recipes for data scientists and developers to perform ML experiments and deployments](https://www.amazon.com/Machine-Learning-Amazon-SageMaker-Cookbook/dp/1800567030)

### How to do it...

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor

In [None]:
role = get_execution_role()

sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
                                     role=role,
                                     instance_count=1,
                                     instance_type='ml.m5.large')

In [None]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

source = 'tmp/dataset.processing.csv'
pinput1 = ProcessingInput(
    source=source, 
    destination='/opt/ml/processing/input')

poutput1 = ProcessingOutput(source='/opt/ml/processing/output')

In [None]:
sklearn_processor.run(
    code='processing.py',
    arguments = ['--sample-argument', '3'],
    inputs=[pinput1],
    outputs=[poutput1]
)

In [None]:
sklearn_processor.__dict__

In [None]:
sklearn_processor.latest_job.__dict__

In [None]:
latest_job = sklearn_processor.latest_job
destination = latest_job.outputs[0].destination
destination

In [None]:
!aws s3 cp "{destination}/output.csv" tmp/output.processing.csv

In [None]:
!cat tmp/output.processing.csv