# Treinamento do modelo no sagemaker


Este notebook vai:
1. Autenticar (usando seu profile).
2. Empacotar o código local (train.py).
3. Enviá-lo para a AWS.
4. Dizer à AWS onde estão os dados no S3.

In [1]:
import sagemaker
import boto3
from sagemaker.pytorch import PyTorch

boto_session = boto3.Session(profile_name='pessoal', region_name='us-east-1')
bucket_name = 'sagemaker-portfolio-cv-aws' 
sess = sagemaker.Session(boto_session=boto_session, default_bucket=bucket_name)
region = sess.boto_region_name

print(f"Região: {region}")
print(f"Bucket alvo: {bucket_name}")
role = "arn:aws:iam::002447664581:role/SageMaker-Execution-Role-CV" 

sagemaker.config INFO - Not applying SDK defaults from location: C:\ProgramData\sagemaker\sagemaker\config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: C:\Users\josel\AppData\Local\sagemaker\sagemaker\config.yaml
Região: us-east-1
Bucket alvo: sagemaker-portfolio-cv-aws


In [2]:
s3_input_data = f's3://{bucket_name}/data/raw/hymenoptera_data'

print(f"Dados de treino serão baixados de: {s3_input_data}")

Dados de treino serão baixados de: s3://sagemaker-portfolio-cv-aws/data/raw/hymenoptera_data


## Configurando o Estimator

In [3]:
hyperparameters = {
    'epochs': 5,
    'batch-size': 4,
    'lr': 0.001
}

estimator = PyTorch(
    entry_point='train.py',
    source_dir='../../sagemaker_entry_point',
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    framework_version='2.8.0',
    py_version='py312',
    hyperparameters=hyperparameters,
    sagemaker_session=sess
)

## Executar o treinamento

In [4]:
estimator.fit({'training': s3_input_data})

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: pytorch-training-2025-11-20-17-03-50-691


2025-11-20 17:03:55 Starting - Starting the training job...
2025-11-20 17:04:10 Starting - Preparing the instances for training...
2025-11-20 17:04:35 Downloading - Downloading input data...
2025-11-20 17:05:30 Downloading - Downloading the training image......
2025-11-20 17:06:26 Training - Training image download completed. Training in progress.bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2025-11-20 17:06:30,984 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training
2025-11-20 17:06:30,985 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2025-11-20 17:06:30,986 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)
2025-11-20 17:06:30,996 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.
2025-11-20 17:06:31,043 sagemaker_pytorch_container.training INFO     Invoking user trainin

In [5]:
existing_job = sagemaker.estimator.Estimator.attach('pytorch-training-2025-11-20-14-59-09-796', sagemaker_session=sess)
existing_job.logs()


2025-11-20 15:02:04 Starting - Preparing the instances for training
2025-11-20 15:02:04 Downloading - Downloading the training image
2025-11-20 15:02:04 Training - Training image download completed. Training in progress.
2025-11-20 15:02:04 Uploading - Uploading generated training model
2025-11-20 15:02:04 Failed - Training job failed
