# ML Flow 및 SageMaker Pipeline 를 사용한 한국어 뉴스 요약 훈련 

    
---

# 1.환경 설정 



In [1]:
%load_ext autoreload
%autoreload 2

import os
from getpass import getpass
 
is_sagemaker_notebook = True
# is_sagemaker_notebook = False # use VS Code

if is_sagemaker_notebook:
    HF_TOKEN = getpass("Enter HUGGINGFACE Access Token: ")
else: # VS Code
    from dotenv import load_dotenv
    HF_TOKEN = os.getenv('HF_TOKEN') or getpass("Enter HUGGINGFACE Access Token: ")
    print("token: ", HF_TOKEN)

# Log in to HF
!huggingface-cli login --token {HF_TOKEN}


    

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/ec2-user/SageMaker/.cache/token
Login successful


In [2]:
%load_ext autoreload
%autoreload 2

import sys, os

def add_python_path(module_path):
    if os.path.abspath(module_path) not in sys.path:
        sys.path.append(os.path.abspath(module_path))
        print(f"python path: {os.path.abspath(module_path)} is added")
    else:
        print(f"python path: {os.path.abspath(module_path)} already exists")
    print("sys.path: ", sys.path)

module_path = "../.."
add_python_path(module_path)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
python path: /home/ec2-user/SageMaker/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/30_fine_tune/03-fine-tune-llama3 is added
sys.path:  ['/home/ec2-user/SageMaker/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/30_fine_tune/03-fine-tune-llama3/notebook/03-naver-news-lllama3-mlops', '/home/ec2-user/SageMaker/.cs/conda/envs/llama3_puy310/lib/python310.zip', '/home/ec2-user/SageMaker/.cs/conda/envs/llama3_puy310/lib/python3.10', '/home/ec2-user/SageMaker/.cs/conda/envs/llama3_puy310/lib/python3.10/lib-dynload', '', '/home/ec2-user/SageMaker/.cs/conda/envs/llama3_puy310/lib/python3.10/site-packages', '/home/ec2-user/SageMaker/huggingface-inferentia2-samples/llama3-70b/llmperf/src', '/home/ec2-user/SageMaker/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/30_fine_tune/03-fine-tune-llama3']


In [3]:
import sagemaker
import boto3
sess = sagemaker.Session()
region = boto3.Session().region_name
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/SageMaker/.xdg/config/sagemaker/config.yaml
sagemaker role arn: arn:aws:iam::057716757052:role/gen_ai_gsmoon
sagemaker bucket: sagemaker-us-east-1-057716757052
sagemaker session region: us-east-1


# 2. 훈련 설정 값 준비

In [4]:
%%writefile accelerator_config/sm_llama_3_8b_fsdp_qlora.yaml
# script parameters
model_id:  "meta-llama/Meta-Llama-3-8B" # Hugging Face model id
max_seq_len:  2048              # max sequence length for model and packing of the dataset
# sagemaker specific parameters
train_dataset_path: "/opt/ml/input/data/train/" # path to where SageMaker saves train dataset
validation_dataset_path: "/opt/ml/input/data/validation/" # path to where SageMaker saves train dataset
test_dataset_path: "/opt/ml/input/data/test/"   # path to where SageMaker saves test dataset
output_dir: "/tmp/llama3"            # where the LoRA adapter weight is
# training parameters
# report_to: "tensorboard" 
report_to: "mlflow" 
mlflow_experiment_name: "llama3-naver-news-fine-tuning"
# report metrics to tensorboard
MLFLOW_TRACKING_ARN: "arn:aws:sagemaker:us-east-1:057716757052:mlflow-tracking-server/my-setup-test3"
learning_rate: 0.0002                  # learning rate 2e-4
lr_scheduler_type: "constant"          # learning rate scheduler
###########################             
# For Debug
###########################             
num_train_epochs: 1                    # number of training epochs
per_device_train_batch_size: 1         # batch size per device during training
per_device_eval_batch_size: 1          # batch size for evaluation
gradient_accumulation_steps: 2         # number of steps before performing a backward/update pass
###########################             
# For evaluation
###########################             
# num_train_epochs: 3                    # number of training epochs
# per_device_train_batch_size: 16         # batch size per device during training
# per_device_eval_batch_size: 8          # batch size for evaluation
# gradient_accumulation_steps: 2         # number of steps before performing a backward/update pass
###########################             
optim: adamw_torch                     # use torch adamw optimizer
logging_steps: 10                      # log every 10 steps
save_strategy: epoch                   # save checkpoint every epoch
evaluation_strategy: epoch             # evaluate every epoch
max_grad_norm: 0.3                     # max gradient norm
warmup_ratio: 0.03                     # warmup ratio
bf16: true                             # use bfloat16 precision
tf32: true                             # use tf32 precision
gradient_checkpointing: true           # use gradient checkpointing to save memory
# FSDP parameters: https://huggingface.co/docs/transformers/main/en/fsdp
fsdp: "full_shard auto_wrap offload" # remove offload if enough GPU memory
fsdp_config:
  backward_prefetch: "backward_pre"
  forward_prefetch: "false"
  use_orig_params: "false"

Overwriting accelerator_config/sm_llama_3_8b_fsdp_qlora.yaml


### 설정 파일 업로드 위치 로딩

In [5]:
%store -r input_path
print("input_path: ", input_path)

input_path:  s3://sagemaker-us-east-1-057716757052/datasets/naver-news-summarization-ko


### 설정 파일을 S3 에 업로드
- 위에 정의한 파일을 업로드 합니다.


In [6]:
from scripts.train_util import upload_data_s3

config_desired_s3_uri = f"{input_path}/config"
config_model_name = "accelerator_config/sm_llama_3_8b_fsdp_qlora.yaml"
train_config_s3_path = upload_data_s3(desired_s3_uri=config_desired_s3_uri, file_name=config_model_name, verbose=True)


accelerator_config/sm_llama_3_8b_fsdp_qlora.yaml is uploaded to:
s3://sagemaker-us-east-1-057716757052/datasets/naver-news-summarization-ko/config/sm_llama_3_8b_fsdp_qlora.yaml


#### 훈련 인스턴스 등 설정
- 디버그 용도이면 run_debug_sample = True, 전데 데이터 이면 False 로 조절 하세요

In [7]:
# USE_LOCAL_MODE = True
USE_LOCAL_MODE = False

import torch

if USE_LOCAL_MODE:
    instance_type = 'local_gpu' if torch.cuda.is_available() else 'local'
    instance_count = 1
    from sagemaker.local import LocalSession
    sagemaker_session = LocalSession()
    sagemaker_session.config = {'local': {'local_code': True}}
    # data = local_data 
    # data = s3_data
    metric_definitions = None
    nKeepAliveSeconds = None # Warmpool feature
    print("## Local mode is set")
else:
    instance_type = 'ml.g5.4xlarge'
    # instance_type = 'ml.g5.12xlarge'
    # instance_type = 'ml.g5.48xlarge'
    # instance_type = 'ml.p4d.24xlarge'
    # Emit: 
    # {'train_runtime': 37.2985, 'train_samples_per_second': 0.375, 'train_steps_per_second': 0.054, 'train_loss': 2.3541293144226074, 'epoch': 1.0}
    # {'eval_loss': 2.50766658782959, 'eval_runtime': 3.4741, 'eval_samples_per_second': 3.454, 'eval_steps_per_second': 0.864, 'epoch': 1.0}
    metric_definitions=[
        {"Name": "train:loss", "Regex": "'train_loss':(.*?),"},
        {"Name": "validation:loss", "Regex": "'eval_loss':(.*?),"}
    ]
    instance_count = 1
    sagemaker_session = sagemaker.session.Session()
    # data = s3_data
    nKeepAliveSeconds = 3600 # Warmpool feature, 1 hour
    print(f"## Cloud mode is set with {instance_type} and {instance_count} of instance_count")
# print("dataset: \n", data)

## Cloud mode is set with ml.g5.4xlarge and 1 of instance_count


### 훈련 데이터 위치 로딩

In [8]:
%store -r data
print("data: \n", data)

data: 
 {'train': 's3://sagemaker-us-east-1-057716757052/datasets/naver-news-summarization-ko/train/train_dataset.json', 'validation': 's3://sagemaker-us-east-1-057716757052/datasets/naver-news-summarization-ko/validation/validation_dataset.json', 'config': 's3://sagemaker-us-east-1-057716757052/datasets/naver-news-summarization-ko/config/sm_llama_3_8b_fsdp_qlora.yaml'}


# 3. 세이지 메이커 파이프라인 생성

## 3.1. 모델 빌딩 파이프라인 변수 및 세션 생성

파이프라인에 인자로 넘길 변수는 아래 크게 3가지 종류가 있습니다.
- 모델 레지스트리에 모델 등록시에 모델 승인 상태 값    


In [9]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)

# 입력 데이터
# s3_data_loc = ParameterString(
#     name="InputData",
#     default_value=s3_input_data_uri,
# )


model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)


### 3.1.1 로컬 모드 설정 
- 로컬 모드 사용을 위해서는 Estimator, Pipeline() 오브젝트 생성시에 인자로서 sagemaker_session 에 LocalPipelineSession() 를 할당해야 합니다.
- 모델 훈련 스텝은 로컬 모드가 가능합니다. 
    - 람다 스텝, 모델 등록 스텝은 지원을 하지 않음.
- Tip : 노트북 하단에서 Pipeline() 를 선언시에 steps 부분을 주석 처리하면서 사용하세요.
``` python
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        s3_data_loc,                
        model_approval_status,        
    ],
    sagemaker_session=pipeline_session,
    steps=[step_train],    # 로컬 모드 사용시
#   steps=[step_train, step_repackage_lambda, step_model_registration],
```

In [10]:
from sagemaker.workflow.pipeline_context import LocalPipelineSession, PipelineSession

# LOCAL_MODE = True # 로컬 모드시 사용
LOCAL_MODE = False # 클라우드 모드시 사용
if LOCAL_MODE:
    from sagemaker.workflow.pipeline_context import LocalPipelineSession
    pipeline_session = LocalPipelineSession()
    print("### --> Local Mode")
else:
    from sagemaker.workflow.pipeline_context import PipelineSession
    pipeline_session = PipelineSession()
    print("### --> Cloud Mode")    

region = pipeline_session.boto_region_name
default_bucket = pipeline_session.default_bucket()

print("region :", region)


### --> Cloud Mode
region : us-east-1


### 3.1.2 캐싱 정의
참고: 캐싱 파이프라인 단계: [Caching Pipeline Steps](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/pipelines-caching.html)

In [11]:
from sagemaker.workflow.steps import CacheConfig

cache_config = CacheConfig(enable_caching=True, 
                           expire_after="1d")

## 3.2. 파이프라인 스텝 단계 정의

### 3.2.1 모델 훈련 스텝

####  Estimator 생성

Estimator 생성시에 인자가 필요 합니다. 주요한 인자만 보겠습니다.


In [12]:
from sagemaker.huggingface import HuggingFace
from huggingface_hub import HfFolder

import time
# define Training Job Name 
job_name = f'llama3-8b-naver-news-{time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())}'
# chkpt_s3_path = f's3://{sess.default_bucket()}/{s3_prefix}/native/checkpoints'

# create the Estimator
os.environ['USE_SHORT_LIVED_CREDENTIALS']="1" 
huggingface_estimator = HuggingFace(
    entry_point          = 'sm_run_fsdp_qlora_llama3_mlflow.py',      # train script
    # source_dir           = '../../scripts',  # directory which includes all the files needed for training
    source_dir           = 'src',  # directory which includes all the files needed for training        
    instance_type        = instance_type,  # instances type used for the training job
    instance_count       = instance_count,                 # the number of instances used for training
    sagemaker_session    = sagemaker_session,
    max_run              = 2*24*60*60,        # maximum runtime in seconds (days * hours * minutes * seconds)
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size          = 256,               # the size of the EBS volume in GB
    transformers_version = '4.36.0',          # the transformers version used in the training job
    pytorch_version      = '2.1.0',           # the pytorch_version version used in the training job
    py_version           = 'py310',           # the python version used in the training job
    metric_definitions = metric_definitions,
    hyperparameters      =  {
        "config": "/opt/ml/input/data/config/sm_llama_3_8b_fsdp_qlora.yaml" # path to TRL config which was uploaded to s3
    },
    disable_output_compression = True,        # not compress output to save training time and cost    
    keep_alive_period_in_seconds = nKeepAliveSeconds,     # warm pool 
    distribution={"torch_distributed": {"enabled": True}},   # enables torchrun
    environment  = {
        "HUGGINGFACE_HUB_CACHE": "/tmp/.cache", # set env variable to cache models in /tmp
        "HF_TOKEN": HF_TOKEN,       # huggingface token to access gated models, e.g. llama 3
        "ACCELERATE_USE_FSDP": "1",             # enable FSDP
        "FSDP_CPU_RAM_EFFICIENT_LOADING": "1"   # enable CPU RAM efficient loading
    }, 
)

#### 모델 훈련 스탭 생성


In [13]:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep


step_train = TrainingStep(
    name= "llama3-8b-naver-news-Training",
    estimator=huggingface_estimator,
    # estimator=host_estimator,
    inputs=data,
    # cache_config = cache_config, # 캐시 정의     
)

### 3.2.2 모델 등록

#### 모델 그룹 생성

- 참고
    - 모델 그룹 릭스팅 API:  [ListModelPackageGroups](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListModelPackageGroups.html)
    - 모델 지표 등록: [Model Quality Metrics](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/model-monitor-model-quality-metrics.html)

In [14]:
sm_client = boto3.client('sagemaker', region_name=region)

In [15]:
model_package_group_name = f"Llama3-8b-Naver-News-Summarization"
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_package_group_name,
 "ModelPackageGroupDescription" : "Sample model package group"
}
response = sm_client.list_model_package_groups(NameContains=model_package_group_name)
if len(response['ModelPackageGroupSummaryList']) == 0:
    print("No model group exists")
    print("Create model group")    
    
    create_model_pacakge_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)
    print('ModelPackageGroup Arn : {}'.format(create_model_pacakge_group_response['ModelPackageGroupArn']))    
else:
    print(f"{model_package_group_name} exitss")

Llama3-8b-Naver-News-Summarization exitss


#### 모델 등록 스텝 정의

In [16]:
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  session=sess,
  version="2.0.2",
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

llm image uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.2-gpu-py310-cu121-ubuntu22.04


In [17]:
from huggingface_hub import HfFolder
from sagemaker.huggingface import HuggingFaceModel

instance_type = "ml.g5.4xlarge"


if instance_type == "ml.p4d.24xlarge":
    num_GPUSs = 8
elif instance_type == "ml.g5.12xlarge":
    num_GPUSs = 4
else:
    num_GPUSs = None
    
print(f"{instance_type} and # of GPU {num_GPUSs} is set")

health_check_timeout = 1200 # 20 minutes

# import time
# sm_endpoint_name = "llama3-endpoint-{}".format(int(time.time()))
# print("sm_endpoint_name: \n", sm_endpoint_name)

# Define Model and Endpoint configuration parameter
config = {
  'HF_MODEL_ID': "/opt/ml/model",       # Path to the model in the container
  'SM_NUM_GPUS': f"{num_GPUSs}",        # Number of GPU used per replica
  'MAX_INPUT_LENGTH': "8000",           # Max length of input text
  'MAX_TOTAL_TOKENS': "8096",           # Max length of the generation (including input text)
  'MAX_BATCH_PREFILL_TOKENS': "16182",  # Limits the number of tokens that can be processed in parallel during the generation
  'MESSAGES_API_ENABLED': "true",       # Enable the OpenAI Messages API
}

ml.g5.4xlarge and # of GPU None is set


In [18]:
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.workflow.model_step import ModelStep
import optparse


from sagemaker.workflow.functions import Join

# S3 URI에 끝 슬래시 추가
s3_uri_with_slash = Join(
    on='',
    values=[
        step_train.properties.ModelArtifacts.S3ModelArtifacts,
        '/'  # 끝에 슬래시 추가
    ]
)

# model_data 딕셔너리 생성
model_data = {
    'S3DataSource': {
        'S3Uri': s3_uri_with_slash,
        'S3DataType': 'S3Prefix',
        'CompressionType': 'None'
    }
}

# create HuggingFaceModel with the image uri
huggingface_model = HuggingFaceModel(
    # model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    model_data=model_data,
    # model_data={'S3DataSource': {'S3Uri': 's3://sagemaker-us-east-1-057716757052/llama3-8b-naver-news-2024-08-25-01-02-0-2024-08-25-01-18-13-794/output/model/', 'S3DataType': 'S3Prefix', 'CompressionType': 'None'}},    
    image_uri=llm_image,
    transformers_version="4.28.1",
    pytorch_version="2.0.0",
    py_version="py310",
    model_server_workers=1,
    role=role,
    # name=f"HuggingFaceModel-Llama2-7b-{rand_id}",
    sagemaker_session=pipeline_session
)

inference_instance_type = ["ml.g5.4xlarge", "ml.g5.12xlarge"]
create_step_args = huggingface_model.create(instance_type=inference_instance_type)
step_create_model = ModelStep(
    name="CreateModel",
    step_args=create_step_args
)

customer_metadata = {
    'Model-S3-URI': s3_uri_with_slash,
    "training-image-uri": huggingface_estimator.training_image_uri(),
    "model-name": "llama3-8b-naver-news",
    "training-job-name": step_train.properties.TrainingJobName,
    "base-model": "meta-llama/Llama-3-8b",
    "fine-tuning-dataset": "naver-news-summarization",
    "created-by": "ML-team"
}

register_args = huggingface_model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=[
        "ml.g5.12xlarge",
    ],
    customer_metadata_properties = customer_metadata,
    model_package_group_name=model_package_group_name,
)
step_register = ModelStep(name="RegisterModel", step_args=register_args)



# 4.모델 빌딩 파이프라인 정의 및 실행
위에서 정의한 아래의 4개의 스텝으로 파이프라인 정의를 합니다.


In [19]:
from sagemaker.workflow.pipeline import Pipeline

project_prefix = 'llama3-8b-naver-neews-summarization'

pipeline_name = project_prefix
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        # s3_data_loc,                
        model_approval_status,        
    ],
    sagemaker_session=pipeline_session,
#    steps=[step_train],    
    steps=[step_train, step_create_model, step_register],
#    steps=[step_repackage_lambda, step_model_registration],    

)



In [20]:


import json
definition = json.loads(pipeline.definition())
# print(" definition : \n", definition)


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


In [21]:
pipeline.upsert(role_arn=role)
execution = pipeline.start()

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.


### 파이프라인 운영: 파이프라인 대기 및 실행상태 확인

워크플로우의 실행상황을 살펴봅니다. 

실행이 완료될 때까지 기다립니다.

In [22]:
execution.wait()

실행된 단계들을 리스트업합니다. 파이프라인의 단계실행 서비스에 의해 시작되거나 완료된 단계를 보여줍니다.

In [23]:
execution.list_steps()

[{'StepName': 'CreateModel-CreateModel',
  'StartTime': datetime.datetime(2024, 8, 25, 13, 23, 7, 718000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2024, 8, 25, 13, 23, 9, 85000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-east-1:057716757052:model/pipelines-2mfhu0x85vbx-CreateModel-CreateMo-NMw6bQFDvN'}},
  'AttemptCount': 1},
 {'StepName': 'RegisterModel-RegisterModel',
  'StartTime': datetime.datetime(2024, 8, 25, 13, 23, 7, 718000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2024, 8, 25, 13, 23, 8, 691000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-1:057716757052:model-package/Llama3-8b-Naver-News-Summarization/6'}},
  'AttemptCount': 1},
 {'StepName': 'llama3-8b-naver-news-Training',
  'StartTime': datetime.datetime(2024, 8, 25, 13, 11, 12, 260000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2024, 8, 25, 13, 23, 7, 236000, tzinfo=tzl