Q. what is amazon sagemaker python sdk?<br>
A. you build, train, and deploy ML models with python code from a jupyter notebook.


Q. how to get to the documentation of models that you can use in sagemaker?<br>
aws sagemaker documentation: [link](https://docs.aws.amazon.com/sagemaker/latest/dg/k-nearest-neighbors.html)<br>
you can use the sample notebooks as a good example for how to use the ml model (estimator). [link for knn sample notebook](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/k_nearest_neighbors_covtype/k_nearest_neighbors_covtype.html)

In [30]:
import boto3

In [31]:
import sagemaker

In [32]:
# !pip install seaborn

In [33]:
import seaborn as sns

In [34]:
penguins = sns.load_dataset('penguins')
print(f'penguins.shape: {penguins.shape}')

penguins.shape: (344, 7)


In [35]:
penguins['species'].unique()

array(['Adelie', 'Chinstrap', 'Gentoo'], dtype=object)

In [36]:
num_map = {'Adelie':0, 'Chinstrap':1, 'Gentoo':2}

In [37]:
penguins['species'] = penguins['species'].map(num_map)

In [38]:
penguins = penguins.dropna()
print(f'penguins.shape: {penguins.shape}')

penguins.shape: (333, 7)


In [39]:
import pandas as pd  

In [40]:
penguins = pd.get_dummies(penguins,drop_first=True)
print(f'penguins.shape: {penguins.shape}') 

penguins.shape: (333, 8)


In [41]:
penguins  

Unnamed: 0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,island_Dream,island_Torgersen,sex_Male
0,0,39.1,18.7,181.0,3750.0,False,True,True
1,0,39.5,17.4,186.0,3800.0,False,True,False
2,0,40.3,18.0,195.0,3250.0,False,True,False
4,0,36.7,19.3,193.0,3450.0,False,True,False
5,0,39.3,20.6,190.0,3650.0,False,True,True
...,...,...,...,...,...,...,...,...
338,2,47.2,13.7,214.0,4925.0,False,False,False
340,2,46.8,14.3,215.0,4850.0,False,False,False
341,2,50.4,15.7,222.0,5750.0,False,False,True
342,2,45.2,14.8,212.0,5200.0,False,False,False


In [44]:
train_data = penguins.sample(frac=0.8)
print(f'train_data.shape: {train_data.shape}')
train_data.head(2)

train_data.shape: (266, 8)


Unnamed: 0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,island_Dream,island_Torgersen,sex_Male
241,2,45.1,14.5,215.0,5000.0,False,False,False
196,1,50.9,17.9,196.0,3675.0,True,False,False


In [22]:
# train_data

In [46]:
test_data = penguins.drop(train_data.index)
print(f'test_data.shape: {test_data.shape}')
test_data.head(2)

test_data.shape: (67, 8)


Unnamed: 0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,island_Dream,island_Torgersen,sex_Male
4,0,36.7,19.3,193.0,3450.0,False,True,False
16,0,38.7,19.0,195.0,3450.0,False,True,False


In [47]:
# test_data

In [26]:
import boto3

In [27]:
bucket_name = 'penguins-example-bucket-course'

In [28]:
s3 = boto3.client('s3')

In [29]:
s3.create_bucket(Bucket=bucket_name)  

{'ResponseMetadata': {'RequestId': 'QNDMSCJ7P2CG5XSN',
  'HostId': 'Wm+PD1EG929R65RL7eGrcOfFY30OMpmpF5W74I12E0WfPv5e2er9p1KMRvc5fsMQ1ymI0CDzmS4=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'Wm+PD1EG929R65RL7eGrcOfFY30OMpmpF5W74I12E0WfPv5e2er9p1KMRvc5fsMQ1ymI0CDzmS4=',
   'x-amz-request-id': 'QNDMSCJ7P2CG5XSN',
   'date': 'Thu, 11 Apr 2024 21:29:38 GMT',
   'location': '/penguins-example-bucket-course',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'Location': '/penguins-example-bucket-course'}

1. create folders inside the s3 bucket

In [30]:
prefixes = ['test/', 'train/', 'output/'] 

In [31]:
for prefix in prefixes:
    s3.put_object(Bucket=bucket_name,Key=prefix) 

2. save train and test files locally

In [48]:
train_data.to_csv('penguins_train.csv',index=False,header=False) 

In [62]:
test_data.to_csv('penguins_test.csv',index=False,header=False)

In [63]:
train_file_name = 'penguins_train.csv'
test_file_name = 'penguins_test.csv'

3. upload local csv files to s3 buckets

In [64]:
s3.upload_file(train_file_name, bucket_name, 'train/penguins_train.csv')

In [65]:
s3.upload_file(test_file_name,bucket_name,'test/penguins_test.csv')

In [50]:
import sagemaker

4. create training data input definition for an Amazon SageMaker training job

In [67]:
s3_input_train = sagemaker.inputs.TrainingInput(s3_data='s3://penguins-example-bucket-course/train/penguins_train.csv',
                                                content_type='text/csv')

5. create test data input definition for an Amazon SageMaker test job

In [68]:
s3_input_test = sagemaker.inputs.TrainingInput(s3_data='s3://penguins-example-bucket-course/test/penguins_test.csv',
                                                content_type='text/csv')

In [51]:
from sagemaker import get_execution_role

ImportError: cannot import name 'get_execution_role' from 'sagemaker' (/home/alin/miniconda3/envs/py12/lib/python3.12/site-packages/sagemaker/__init__.py)

In [41]:
get_execution_role()

'arn:aws:iam::472948420345:role/service-role/AmazonSageMaker-ExecutionRole-20240304T111312'

6. setup your estimator (ml model) from amazon sagemaker

In [52]:
from sagemaker.amazon.amazon_estimator import get_image_uri

ModuleNotFoundError: No module named 'sagemaker.amazon'

6.1. get location (uri) of the image of your estimator<br>
In Amazon SageMaker, “Amazon estimators” usually refers to the built-in algorithms provided and maintained by AWS. These are delivered as Docker images, which is why you retrieve them via get_image_uri (or the newer image_uris.retrieve).<br>
`image_uri` is the location to the image of your estimators.

In [None]:
knn_uri = get_image_uri(boto3.Session().region_name,'knn')

The method get_image_uri has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


6.2. create your estimator object<br>
`output_path`: location where s3 stores the trained model after you fit the model to the training data.

In [55]:
knn_model = sagemaker.estimator.Estimator(image_uri=knn_uri,
                                          role=get_execution_role(),
                                          instance_count=1,
                                          instance_type='ml.m5.2xlarge',
                                          output_path='s3://penguins-example-bucket-course/output/', 
                                          sagemaker_session=sagemaker.Session()
                                         )

6.3. set the model hyper-parameters

In [56]:
knn_model.set_hyperparameters(k=3, sample_size=333, predictor_type='classifier')

6.4. fit the knn model to training data

Q. what happens when you fit your knn_model?<br>
A. <br> 
- Trains the model
- Packages the trained model into model.tar.gz
- Uploads it to the S3 path specified by output_path

```python
Example S3 location:
s3://penguins-example-bucket-course/output/<training-job-name>/output/model.tar.gz
```

----

Q. what is this model artifacts (`model.tar.gz`)?<br>
A. model.tar.gz is the entire trained model package, not just the raw weights.<br>
It contains everything needed for inference by the container, except the container image itself.

Q. if it has everything for inference and it has everything from the model, why does it need ontainer image itself?<br>
A. `model.tar.gz` has the model, but not the runtime environment.<br>

##### `model.tar.gz` = WHAT to run

-   Model weights
-   Model structure
-   Inference code
-   Preprocessing artifacts

##### Container image = HOW to run it

-   Operating system
-   Python runtime
-   ML framework (PyTorch / TensorFlow / XGBoost / scikit-learn)
-   System libraries (CUDA, BLAS, etc.)
-   SageMaker serving logic (HTTP server, batch worker)

Both are required.

----



Q. how to later retrieve the trained model?<br>
A. <br>
```python
from sagemaker.model import Model
from sagemaker import get_execution_role

knn_loaded_model = Model(
    image_uri=knn_uri,
    model_data='s3://penguins-example-bucket-course/output/<job-name>/output/model.tar.gz',
    role=get_execution_role(),
    sagemaker_session=sagemaker.Session()
)
```

----

In [69]:
knn_model.fit({'train':s3_input_train, 'validation':s3_input_test}) 

INFO:sagemaker:Creating training-job with name: knn-2024-04-11-22-57-04-683


2024-04-11 22:57:04 Starting - Starting the training job...
2024-04-11 22:57:20 Starting - Preparing the instances for training...
2024-04-11 22:58:02 Downloading - Downloading the training image........................
2024-04-11 23:01:48 Training - Training image download completed. Training in progress...[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
[34m[04/11/2024 23:02:15 INFO 139721676150592] Reading default configuration from /opt/amazon/lib/python3.9/site-packages/algorithm/resources/default-conf.json: {'_kvstore': 'dist_async', '_log_level': 'info', '_num_gpus': 'auto', '_num_kv_servers': '1', '_tuning_objective_metric': '', '_faiss_index_nprobe': '5', 'epochs': '1', 'feature_dim': 'auto', 'faiss_index_ivf_nlists': 'auto', 'index_metric': 'L2', 'index_type': 'faiss.Flat', 'mini_batch_size': '5000', '_enable_profiler': 'false'}[0m
[34m[04/11/2024 23:02:15 INFO 139721676150592] Merging with provided configu

7. deploy the model into production and create an endpoint

In [70]:
knn_predictor = knn_model.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge') 

INFO:sagemaker:Creating model with name: knn-2024-04-11-23-06-05-020
INFO:sagemaker:Creating endpoint-config with name knn-2024-04-11-23-06-05-020
INFO:sagemaker:Creating endpoint with name knn-2024-04-11-23-06-05-020


-------------!

In [86]:
test_data.iloc[0].tolist()

[0, 38.9, 17.8, 181.0, 3625.0, False, True, False]

In [98]:
input_data = [38.9, 17.8, 181.0, 3625.0, 0, 1, 0]

In [94]:
from sagemaker.serializers import CSVSerializer

In [95]:
knn_predictor.serializer = CSVSerializer() 

In [99]:
knn_predictor.predict(input_data)

b'{"predictions": [{"predicted_label": 0.0}]}'

8. delete the model and model endpoint

In [100]:
knn_predictor.delete_model()
knn_predictor.delete_endpoint() 
knn_predictor.delete_predictor()

INFO:sagemaker:Deleting model with name: knn-2024-04-11-23-06-05-020
INFO:sagemaker:Deleting endpoint configuration with name: knn-2024-04-11-23-06-05-020
INFO:sagemaker:Deleting endpoint with name: knn-2024-04-11-23-06-05-020
INFO:sagemaker:Deleting endpoint configuration with name: knn-2024-04-11-23-06-05-020


ClientError: An error occurred (ValidationException) when calling the DeleteEndpointConfig operation: Could not find endpoint configuration "knn-2024-04-11-23-06-05-020".