# Regression with Amazon SageMaker Autopilot (Parquet input)

My own code so i can go through the code by myself

- [Link to source](https://sagemaker-examples.readthedocs.io/en/latest/autopilot/sagemaker_autopilot_abalone_parquet_input.html)
---

## SageMaker Conf

In [2]:
!pip install --upgrade boto3 --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sagemaker 2.145.0 requires importlib-metadata<5.0,>=1.4.0, but you have importlib-metadata 6.3.0 which is incompatible.
sagemaker 2.145.0 requires PyYAML==5.4.1, but you have pyyaml 6.0 which is incompatible.
awscli 1.27.111 requires botocore==1.29.111, but you have botocore 1.29.139 which is incompatible.
awscli 1.27.111 requires PyYAML<5.5,>=3.10, but you have pyyaml 6.0 which is incompatible.
awscli 1.27.111 requires rsa<4.8,>=3.1.2, but you have rsa 4.9 which is incompatible.
aiobotocore 2.4.2 requires botocore<1.27.60,>=1.27.59, but you have botocore 1.29.139 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [

In [7]:
import os
import boto3
import re
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
region = boto3.Session().region_name
sess = sagemaker.Session()

bucket = "test-sagemaker-examples-1357942113492"
prefix = "DEMO-AutoML-Parquet"



In [8]:
%%time

import numpy as np
import pandas as pd
import pyarrow

s3 = boto3.client("s3")
FILE_NAME = "abalone.csv"
s3.download_file("sagemaker-sample-files", f"datasets/tabular/uci_abalone/abalone.csv", FILE_NAME)

feature_names = [
    "Sex",
    "Length",
    "Diameter",
    "Height",
    "Whole weight",
    "Shucked weight",
    "Viscera weight",
    "Shell weight",
    "Rings",
]

data = pd.read_csv(FILE_NAME, header=None, names=feature_names)

data.to_parquet("abalone.parquet")

CPU times: user 58 ms, sys: 0 ns, total: 58 ms
Wall time: 739 ms


In [9]:
%%time
sess.upload_data("abalone.parquet", bucket=bucket, key_prefix=prefix)

CPU times: user 165 ms, sys: 9.35 ms, total: 175 ms
Wall time: 355 ms


's3://test-sagemaker-examples-1357942113492/DEMO-AutoML-Parquet/abalone.parquet'

In [11]:
%%time
import time
from time import gmtime, strftime

job_name = "autopilot-parquet-" + strftime("%m-%d-%H-%M", gmtime())
print("AutoML job:", job_name)

create_auto_ml_job_params = {
    "AutoMLJobConfig": {
        "CompletionCriteria": {
            "MaxCandidates": 50,
        }
    },
    "AutoMLJobName": job_name,
    "InputDataConfig": [
        {
            "ContentType": "x-application/vnd.amazon+parquet",
            "CompressionType": "None",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": f"s3://{bucket}/{prefix}/abalone.parquet",
                }
            },
            "TargetAttributeName": "Rings",
        }
    ],
    "OutputDataConfig": {"S3OutputPath": f"s3://{bucket}/{prefix}/output"},
    "RoleArn": role,
}

client = boto3.client("sagemaker", region_name=region)
client.create_auto_ml_job(**create_auto_ml_job_params)

response = client.describe_auto_ml_job(AutoMLJobName=job_name)
status = response["AutoMLJobStatus"]
secondary_status = response["AutoMLJobSecondaryStatus"]
print(f"{status} - {secondary_status}")

while status != "Completed" and secondary_status != "Failed":
    time.sleep(60)
    response = client.describe_auto_ml_job(AutoMLJobName=job_name)
    status = response["AutoMLJobStatus"]
    secondary_status = response["AutoMLJobSecondaryStatus"]
    print(f"{status} - {secondary_status}")

AutoML job: autopilot-parquet-05-24-12-05
InProgress - Starting
InProgress - AnalyzingData
InProgress - AnalyzingData
InProgress - AnalyzingData
InProgress - AnalyzingData
InProgress - AnalyzingData
InProgress - AnalyzingData
InProgress - AnalyzingData
InProgress - AnalyzingData
InProgress - FeatureEngineering
InProgress - FeatureEngineering
InProgress - FeatureEngineering
InProgress - FeatureEngineering
InProgress - FeatureEngineering
InProgress - FeatureEngineering
InProgress - FeatureEngineering
InProgress - FeatureEngineering
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - ModelTuning
InProgress - M

---

## Cleanup

In [13]:
s3 = boto3.resource('s3')
bucket_s3 = s3.Bucket(bucket)

bucket_s3.objects.filter(Prefix=prefix).delete()

[{'ResponseMetadata': {'RequestId': 'PG87HQKQ3HE3ZVMF',
   'HostId': 'ch1MoqaKO1od2mdR5VhTICljXi1J95m7CNsQ6KFbne3xc9kA4ozahLdZAulP7M77+BEQH7b10UA=',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'ch1MoqaKO1od2mdR5VhTICljXi1J95m7CNsQ6KFbne3xc9kA4ozahLdZAulP7M77+BEQH7b10UA=',
    'x-amz-request-id': 'PG87HQKQ3HE3ZVMF',
    'date': 'Wed, 24 May 2023 12:53:06 GMT',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3',
    'connection': 'close'},
   'RetryAttempts': 0},
  'Deleted': [{'Key': 'DEMO-AutoML-Parquet/output/autopilot-parquet-05-24-12-05/transformed-data/dpp2/rpb/train/chunk_36.csv.out'},
   {'Key': 'DEMO-AutoML-Parquet/output/autopilot-parquet-05-24-12-05/transformed-data/dpp2/rpb/validation/chunk_0.csv.out'},
   {'Key': 'DEMO-AutoML-Parquet/output/autopilot-parquet-05-24-12-05/transformed-data/dpp2/rpb/train/chunk_19.csv.out'},
   {'Key': 'DEMO-AutoML-Parquet/output/autopilot-parquet-05-24-12-05/transformed-data/dpp3/r