# Solar 10 7B Instruct 모델을 Sagemaker를 통해 g5.24xlarge 인스턴스에 배포하기

이 노트북은 HuggingFace에서 Solar 10 7B 모델을 가져와 Sagemaker에서 g5.24xl 인스턴스에 배포하는 방법을 보여줍니다.

## Step 1: Let's bump up SageMaker and import stuff

In [1]:
%pip install sagemaker --upgrade  --quiet

Note: you may need to restart the kernel to use updated packages.


In [2]:
import boto3
import sagemaker
from sagemaker import Model, image_uris, serializers, deserializers
import json

role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
region = sess._region_name  # region name of the current SageMaker Studio environment
account_id = sess.account_id()  # account_id of the current SageMaker Studio environment

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


## Step 2: Get the model

In [3]:
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="1.4.2"
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

llm image uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.4.2-gpu-py310-cu121-ubuntu22.04


## Step 3: Start building SageMaker endpoint

In [4]:
import json
from sagemaker.huggingface import HuggingFaceModel

# sagemaker config
instance_type = "ml.g5.24xlarge"
number_of_gpu = 4
health_check_timeout = 120

# Define Model and Endpoint configuration parameters
config = {
    'HF_MODEL_ID': "upstage/SOLAR-10.7B-Instruct-v1.0", # model_id from hf.co/models
    'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
    'HUGGING_FACE_HUB_TOKEN': "<자신의 HuggingFace read access token 입력>" # Read Access token of your HuggingFace profile https://huggingface.co/settings/tokens
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  env=config
)

## Step 4: Create Sagemaker endpoint and deploy the model to the Sagemaker endpoint

In [5]:
from sagemaker.utils import name_from_base

endpoint_name = name_from_base(f"{config['HF_MODEL_ID'].split('/')[1].split('.')[0]}-imweb-poc")

endpoint_name

'SOLAR-10-imweb-poc-2024-04-23-03-44-20-828'

In [6]:
%%time
# https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy

llm = llm_model.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=health_check_timeout,
)

----------!CPU times: user 253 ms, sys: 15.5 ms, total: 268 ms
Wall time: 5min 32s


## Step 5: Test the inference

In [7]:
# Get a predictor for your endpoint
predictor = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
)

In [8]:
# Make a prediction with your endpoint
response = predictor.predict({
    "inputs": "문재인 대통령에 대한 자세한 이력을 알려줘", 
    "parameters": {"do_sample": True, "max_new_tokens": 256}
})

response

[{'generated_text': "문재인 대통령에 대한 자세한 이력을 알려줘\n문재인, 1953년 1월 24일, Chungju, North Chungcheong Province, South Korea에태어. Lawyer & Politician, and the Democratic Party's member.\nEducation:\n1966-1972 - Gunpo High School.\n1972-1976 - National Korean University (Bachelor of Laws).\n1976-1977 - Korean Army (completed the mandatory military service).\n1980-1982 - Korean University (Master of Laws).\n1982-1985 - Seoul National University (Doctorate in Law).\nProfessional & Political Career:\n1978 - Worked as an attorney specializing in human rights, labor and corporate law at Kim & Chang Law Firm.\n1980-1984 - Taught Constitutional Law at Seoul National University Law School.\n1992 - Entered politics as a member of the Democratic Political Assembly and was elected to Parliament from the Busan district.\n1994-1998 -"}]

In [9]:
# Make a prediction with your endpoint
response = predictor.predict({
    "inputs": "2022년의 띠는?", 
    "parameters": {"do_sample": True, "max_new_tokens": 256}
})

response

[{'generated_text': "2022년의 띠는? (2022 Chinese Zodiac)\n\n2022년의 띠는? (2022 Chinese Zodiac)\nChinese New Year, also known as the Lunar New Year or Spring Festival, is an annual celebration commemorating the beginning of a new year on the traditional Chinese calendar. The calendar consists of a 12-year cycle, with each year associated with one of the 12 Chinese zodiac animals. The zodiac signs play an essential role in traditional Chinese culture and astrology. In this blog post, we'll discuss the Chinese zodiac and the significance of the zodiac animal representing the year 2022.\n\nThe Chinese Zodiac:\nThe 12 animals that are part of the Chinese zodiac are, in order, the Rat, Ox, Tiger, Rabbit, Dragon, Snake, Horse, Goat, Monkey, Rooster, Dog, and Pig. Each animal represents specific traits and qualities, with the belief that the year you were born influences your personality, compatibility with others, and even significant life events.\n\n2022 Year of the Tiger:\n"}]

## Clean up the environment

In [None]:
sess.delete_endpoint(endpoint_name)
sess.delete_endpoint_config(endpoint_name)
llm_model.delete_model()