## Dependency Setup

We'll use this section to ensure that all of the necessary dependencies for this project are installed.

In [1]:
!python --version

Python 3.11.11


In [6]:
!pip install --disable-pip-version-check -q awswrangler --quiet
!pip install --disable-pip-version-check -q kagglehub --quiet

In [None]:
!pip list

In [22]:
import boto3
from botocore.client import ClientError
import sagemaker
import pandas as pd

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


## Create S3 Bucket and Upload Objects

Here, we're downloading the raw datasets from kaggle and using boto3 to write those local files to our S3 bucket. For those attempting to reproduce the download, you can do so programmatically via cURL request or kagglehub API.

**cURL command**: 
```
!curl -L -o ~/Downloads/employee-attrition-dataset.zip\
  https://www.kaggle.com/api/v1/datasets/download/stealthtechnologies/employee-attrition-dataset
```
**kagglehub snippet**:
```
kagglehub.dataset_download("stealthtechnologies/employee-attrition-dataset")
```

In [23]:
# Create a SageMaker session object, which is used to manage interactions with SageMaker resources.
sess = sagemaker.Session()

# Retrieve the default Amazon S3 bucket associated with the SageMaker session.
bucket = sess.default_bucket()

# Get the IAM role associated with the current SageMaker notebook or environment.
role = sagemaker.get_execution_role()

# Get the AWS region name for the current session.
region = boto3.Session().region_name

# Retrieve the AWS account ID of the caller using the Security Token Service (STS) client.
account_id = boto3.client("sts").get_caller_identity().get("Account")

# Create a Boto3 client for the SageMaker service, specifying the AWS region.
sm = boto3.Session().client(service_name="sagemaker", region_name=region)

## Setting Object Destination and Copying Data to Bucket

In this section, we are configuring the destination for our data within an Amazon S3 bucket. The bucket name is determined dynamically based on the SageMaker session, ensuring that each user interacts with their own unique bucket. By obtaining the default bucket associated with each user’s session, we ensure that data storage remains consistent and personalized.

In [24]:
bucket_path = "s3://{}/aai-540-group-3-final-project/data".format(bucket)
bucket_path

's3://sagemaker-us-east-1-203012117619/aai-540-group-3-final-project/data'

In [25]:
%store bucket_path


Stored 'bucket_path' (str)


In [26]:
!aws s3 cp "train.csv" $bucket_path/

upload: ./train.csv to s3://sagemaker-us-east-1-203012117619/aai-540-group-3-final-project/data/train.csv


In [27]:
!aws s3 cp "test.csv" $bucket_path/

upload: ./test.csv to s3://sagemaker-us-east-1-203012117619/aai-540-group-3-final-project/data/test.csv


## Listing Files in our Bucket

In this section, we will programmatically list the files stored in the Amazon S3 bucket associated with this notebook. By dynamically identifying the bucket through the SageMaker session, we ensure that the code remains reproducible for anyone using it, regardless of their account or environment. This approach avoids hardcoding bucket names and guarantees compatibility across different users.


In [29]:
!aws s3 ls $bucket_path/

2025-01-25 05:02:58    1910316 test.csv
2025-01-25 05:02:57    7640348 train.csv


## Release Resources

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

In [None]:
%%javascript

try {
    Jupyter.notebook.save_checkpoint();
    Jupyter.notebook.session.delete();
}
catch(err) {
    // NoOp
}