# Getting Started With Machine Learning Using Amazon Sagemaker

## How it works…

#### The SageMaker Python SDK helps data scientists and ML practitioners work on ML experiments using a Python library that abstracts the lower-level API operations, which distinguishes it from the Boto3 AWS SDK for Python. The SageMaker Python SDK makes use of abstraction layers and concepts such as models, estimators, and predictors, with fit() and deploy() functions similar to what libraries and frameworks such as Keras and scikit-learn have.

In [None]:
from sklearn.model_selection import train_test_split

X = df_all_data['management_experience_months']

X = X.values

y = df_all_data['monthly_salary'].values

     

X_train, X_test, y_train, y_test = train_test_split(

    X, y, test_size=0.3, random_state=0)

In [None]:
df_training_data = pd.DataFrame({

    'monthly_salary': y_train,

    'management_experience_months': X_train

})

#### This step is important as several algorithms, such as the Linear Learner built-in algorithm, expect the first column to contain the target variable data.

In [None]:
import sagemaker

import boto3

from sagemaker import get_execution_role

role_arn = get_execution_role()

session = sagemaker.Session()

region_name = boto3.Session().region_name



#### The return values of get_execution_role() and sagemaker.Session() will be used in a later step. The get_execution_role() function from the SageMaker Python SDK returns the IAM role associated with the notebook instance. The return value of this function is used as an argument later when we initialize the Estimator object for the training job.

In [None]:
training_s3_input_location = f"s3://{s3_bucket}/{prefix}/input/training_data.csv"

training_s3_output_location = f"s3://{s3_bucket}/{prefix}/output/"

from sagemaker.inputs import TrainingInput

train = TrainingInput(training_s3_input_location, content_type="text/csv")

#### Prepare the image URI for Linear Learner. The retrieve() function returns the Amazon ECR URI of the Linear Learner built-in algorithm. Take note that the URI changes depending on the region and the experiments that you are running assume that all resources are in a single region. Otherwise, you will encounter issues during your training jobs. To solve these types of issues, simply specify the region name when using and configuring the different tools:

In [None]:
from sagemaker.image_uris import retrieve

container = retrieve("linear-learner", region_name, "1")

container

#### Initialize the Estimator object. The Estimator class accepts a couple of arguments, including the container URI, SageMaker session object, and the role ARN we have obtained from the previous steps in this recipe. In the following code, we have also specified the arguments instance_count, instance_type, and output_path:

In [None]:
estimator = sagemaker.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=training_s3_output_location,
    sagemaker_session=session)

#### When running training jobs, SageMaker launches new instances outside of the Jupyter notebook instance you are using. These instances are dedicated to running the training jobs and are automatically destroyed after the training jobs have been completed. The number of training job instances used depends on the instance_count argument, and the size and type of the instances depend on the instance_type argument. That said, when the fit() function is called in a later step with this current configuration in the Estimator, SageMaker provisions a single ml.m5.xlarge instance to run the Linear Learner built-algorithm and store the results to output_path.