## [model_consumer]flan_t5_xl_in_context_learning_ml_p3_2xl

이 노트북에서는 FLAN-T5-XL 대규모 언어 모델(LLM)을 활용하여 in-context learning을 통한 N-shot 학습을 달성하는 데 중점을 둡니다. 여기에는 모델의 자연어 이해(NLU) 기능을 활용하여 가상 어시스턴트 응답을 개인화하고 사용자를 위한 성능을 개선하는 것이 포함됩니다.

이 모듈에서는 FLAN-T5-XL을 사용하여 NLU 작업을 수행하는 방법을 단계별로 학습합니다. 특히 멀티턴 고객 지원 채팅 기록을 읽고 이해하는 방법과 FLAN-T5-XL이 상황에 맞게 학습하고 N-샷 학습에서 성능을 개선할 수 있도록 하는 프롬프트를 엔지니어링하는 방법을 학습합니다. 이를 통해 채팅 기록에서 문맥을 추론하고 파생된 질문에 답하는 모델의 능력이 향상됩니다.

<img src="./figures/flan-t5.png"  width="700" height="370">

전반적으로 이 모듈은 **텍스트 요약, 추상적 질문 답변, 감정 분석, 감정 구문 추출**과 같은 NLU 작업을 해결하는 데 있어 FLAN-T5-XL의 기능을 살펴볼 수 있는 훌륭한 기회를 제공합니다.
[](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/)

#### Imports 

In [1]:
from sagemaker.predictor import Predictor
from sagemaker import get_execution_role
from sagemaker.model import Model
from sagemaker import script_uris
from sagemaker import image_uris 
from sagemaker import model_uris
import sagemaker
import logging
import boto3
import time
import json
import pprint

#### Setup essentials 

In [2]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

In [3]:
logger.info(f'Using sagemaker=={sagemaker.__version__}')
logger.info(f'Using boto3=={boto3.__version__}')

Using sagemaker==2.164.0
Using boto3==1.26.144


In [4]:
MODEL_ID = 'huggingface-text2text-flan-t5-xl'  # this is hard-coded
MODEL_VERSION = '*'
INSTANCE_TYPE = 'ml.p3.2xlarge'
# INSTANCE_TYPE = 'ml.g5.xlarge'  # VolumeSize parameter is not allowed for the selected Instance / slow inference
INSTANCE_COUNT = 1
IMAGE_SCOPE = 'inference'
MODEL_DATA_DOWNLOAD_TIMEOUT = 3600  # in seconds
CONTAINER_STARTUP_HEALTH_CHECK_TIMEOUT = 3600
EBS_VOLUME_SIZE = 256  # in GB
CONTENT_TYPE = 'application/json'

# set up roles and clients 
client = boto3.client('sagemaker-runtime')
ROLE = get_execution_role()
logger.info(f'Role => {ROLE}')

Role => arn:aws:iam::322537213286:role/service-role/AmazonSageMaker-ExecutionRole-20230528T120509


In [5]:
unix_time = int(time.time())
endpoint_name = f'{MODEL_ID}-{unix_time}'
logger.info(f'Endpoint name: {endpoint_name}')

Endpoint name: huggingface-text2text-flan-t5-xl-1686395213


#### I. Deploy FLAN-T5-XL out-of-the-box instruction-tuned model as a SageMaker endpoint

In [6]:
deploy_image_uri = image_uris.retrieve(region=None, 
                                       framework=None, 
                                       image_scope=IMAGE_SCOPE, 
                                       model_id=MODEL_ID, 
                                       model_version=MODEL_VERSION, 
                                       instance_type=INSTANCE_TYPE)
logger.info(f'Deploy image URI => {deploy_image_uri}')

Deploy image URI => 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04


In [7]:
model_uri = model_uris.retrieve(model_id=MODEL_ID, 
                                model_version=MODEL_VERSION, 
                                model_scope=IMAGE_SCOPE)
logger.info(f'Model URI => {model_uri}')

Model URI => s3://jumpstart-cache-prod-us-west-2/huggingface-infer/prepack/v1.1.0/infer-prepack-huggingface-text2text-flan-t5-xl.tar.gz


In [8]:
env = {
    'SAGEMAKER_MODEL_SERVER_TIMEOUT': str(3600),
    'MODEL_CACHE_ROOT': '/opt/ml/model', 
    'SAGEMAKER_ENV': '1',
    'SAGEMAKER_SUBMIT_DIRECTORY': '/opt/ml/model/code/',
    'SAGEMAKER_PROGRAM': 'inference.py',
    'SAGEMAKER_MODEL_SERVER_WORKERS': '1', 
    'TS_DEFAULT_WORKERS_PER_MODEL': '1', 
}

In [9]:
model = Model(image_uri=deploy_image_uri, 
              model_data=model_uri, 
              role=ROLE, 
              predictor_cls=Predictor, 
              name=endpoint_name, 
              env=env)

In [10]:
%%time

_ = model.deploy(initial_instance_count=INSTANCE_COUNT, 
                 instance_type=INSTANCE_TYPE, 
                 endpoint_name=endpoint_name, 
                 volume_size=EBS_VOLUME_SIZE, 
                 model_data_download_timeout=MODEL_DATA_DOWNLOAD_TIMEOUT, 
                 container_startup_health_check_timeout=CONTAINER_STARTUP_HEALTH_CHECK_TIMEOUT)

Creating model with name: huggingface-text2text-flan-t5-xl-1686395213
CreateModel request: {
    "ModelName": "huggingface-text2text-flan-t5-xl-1686395213",
    "ExecutionRoleArn": "arn:aws:iam::322537213286:role/service-role/AmazonSageMaker-ExecutionRole-20230528T120509",
    "PrimaryContainer": {
        "Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04",
        "Environment": {
            "SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600",
            "MODEL_CACHE_ROOT": "/opt/ml/model",
            "SAGEMAKER_ENV": "1",
            "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code/",
            "SAGEMAKER_PROGRAM": "inference.py",
            "SAGEMAKER_MODEL_SERVER_WORKERS": "1",
            "TS_DEFAULT_WORKERS_PER_MODEL": "1"
        },
        "ModelDataUrl": "s3://jumpstart-cache-prod-us-west-2/huggingface-infer/prepack/v1.1.0/infer-prepack-huggingface-text2text-flan-t5-xl.tar.gz"
    },
    "Tags"

------------!CPU times: user 120 ms, sys: 6.31 ms, total: 127 ms
Wall time: 6min 33s


#### II. [테스트] SageMaker 엔드포인트를 호출하여 자연어 이해(NLU) 및 자연어 생성(NLG) 작업을 위한 배포된 모델 테스트

In [11]:
context = """
Customer: Hi there, I'm having a problem with my iPhone.
Agent: Hi! I'm sorry to hear that. What's happening?
Customer: The phone is not charging properly, and the battery seems to be draining very quickly. I've tried different charging cables and power adapters, but the issue persists.
Agent: Hmm, that's not good. Let's try some troubleshooting steps. Can you go to Settings, then Battery, and see if there are any apps that are using up a lot of battery life?
Customer: Yes, there are some apps that are using up a lot of battery.
Agent: Okay, try force quitting those apps by swiping up from the bottom of the screen and then swiping up on the app to close it.
Customer: I did that, but the issue is still there.
Agent: Alright, let's try resetting your iPhone's settings to their default values. This won't delete any of your data. Go to Settings, then General, then Reset, and then choose Reset All Settings.
Customer: Okay, I did that. What's next?
Agent: Now, let's try restarting your iPhone. Press and hold the power button until you see the "slide to power off" option. Slide to power off, wait a few seconds, and then turn your iPhone back on.
Customer: Alright, I restarted it, but it's still not charging properly.
Agent: I see. It looks like we need to run a diagnostic test on your iPhone. Please visit the nearest Apple Store or authorized service provider to get your iPhone checked out.
Customer: Do I need to make an appointment?
Agent: Yes, it's always best to make an appointment beforehand so you don't have to wait in line. You can make an appointment online or by calling the Apple Store or authorized service provider.
Customer: Okay, will I have to pay for the repairs?
Agent: That depends on whether your iPhone is covered under warranty or not. If it is, you won't have to pay anything. However, if it's not covered under warranty, you will have to pay for the repairs.
Customer: How long will it take to get my iPhone back?
Agent: It depends on the severity of the issue, but it usually takes 1-2 business days.
Customer: Can I track the repair status online?
Agent: Yes, you can track the repair status online or by calling the Apple Store or authorized service provider.
Customer: Alright, thanks for your help.
Agent: No problem, happy to help. Is there anything else I can assist you with?
Customer: No, that's all for now.
Agent: Alright, have a great day and good luck with your iPhone!
"""

Generation configuration 

endpoint를 호출할 때 이 텍스트를 JSON 페이로드 내에 제공해야 합니다. 이 JSON 페이로드에는 length, sampling strategy, output token sequence restrictions을 제어하는 데 도움이 되는 원하는 추론 매개변수가 포함될 수 있습니다. transformers library에는 [사용 가능한 페이로드 매개변수](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig)의 전체 목록이 정의되어 있지만, 중요한 페이로드 매개변수는 다음과 같이 정의되어 있습니다:

* **max_length** – 모델은 출력 길이(입력 컨텍스트 길이 포함)가 `max_length`에 도달할 때까지 텍스트를 생성합니다. 지정한 경우 양수여야 합니다.
* **num_return_sequences** – 반환되는 출력 시퀀스의 수이며, 지정하면 양수여야 합니다.
* **num_beams** – greedy search에 사용되는 beam의 수이며, 지정하는 경우 `num_return_sequences`보다 크거나 같은 정수여야 합니다.
* **no_repeat_ngram_size** – 모델은 출력 시퀀스에서 `no_repeat_ngram_size`의 단어 시퀀스가 반복되지 않도록 보장해야 하며, 지정하는 경우 1보다 큰 양의 정수여야 합니다.
* **temperature** – 출력의 무작위성을 조정하는 것으로, 1보다 크면 random, 0이면 greedy decoding으로 deterministric한 값을 제공하며, 0.75가 좋은 시작 값입니다.
* **early_stopping** – True이면 모든 beam 가설이 stence 토큰의 끝에 도달할 때 텍스트 생성이 종료되며, 지정하면 Boolean이어야 합니다.
* **do_sample** –  True인 경우 likelihood에 따라 다음 단어를 샘플링하며, 지정하면 Boolean이어야 합니다.
* **top_k** – 텍스트 생성의 각 단계에서 가장 가능성이 높은 `top_k` 단어만 샘플링하며, 지정하는 경우 양수여야 합니다.
* **top_p** – 텍스트 생성의 각 단계에서 누적 확률 `top_p`을 가진 가장 작은 가능성 단어 집합에서 샘플링하며, 지정하는 경우 0-1 사이의 실수여야 합니다.
* **seed** – 재현성을 위해 random 상태를 수정하며, 지정하면 정수여야 합니다.

In [12]:
MAX_LENGTH = 256              # 최대 디코딩 길이는 50
NUM_RETURN_SEQUENCES = 1      # 1개의 결과를 디코딩함
TOP_K = 0                     # 확률 순위가 0 밖인 토큰은 샘플링에서 제외
TOP_P = 0.7                   # 누적 확률이 70%인 후보집합에서만 생성
DO_SAMPLE = True              # 샘플링 전략 사용

#### A. Text Summarization 

In [13]:
query = 'write a summary'

In [14]:
prompt = f'{context}\n{query}'

In [15]:
payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}

In [16]:
payload = json.dumps(payload).encode('utf-8')

In [17]:
response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                  ContentType=CONTENT_TYPE, 
                                  Body=payload)

model_predictions = json.loads(response['Body'].read())
generated_text = model_predictions['generated_texts'][0]
pprint.pprint(f'Response: {generated_text}')

("Response: Customer's iPhone isn't charging properly. Agent runs a diagnostic "
 'test on the iPhone.')


#### B. Abstractive Question Answering 

##### Q1

In [18]:
query = 'What troubleshooting steps were suggested to the customer to fix their iPhone charging issue?'

In [19]:
prompt = f'{context}\n{query}'

In [20]:
payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}

In [21]:
payload = json.dumps(payload).encode('utf-8')

In [22]:
response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                  ContentType=CONTENT_TYPE, 
                                  Body=payload)

model_predictions = json.loads(response['Body'].read())
generated_text = model_predictions['generated_texts'][0]
pprint.pprint(f'Response: {generated_text}')

("Response: Force quit any apps using up a lot of battery. Reset your iPhone's "
 'settings. Restart your iPhone. Make an appointment at the Apple Store or '
 'authorized service provider.')


Q2

In [23]:
query = 'Was resetting the iPhone to its default settings able to solve the charging issue and battery drain problem?'

In [24]:
prompt = f'{context}\n{query}'

In [25]:
payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}

In [26]:
payload = json.dumps(payload).encode('utf-8')

In [27]:
response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                  ContentType=CONTENT_TYPE, 
                                  Body=payload)

In [28]:
model_predictions = json.loads(response['Body'].read())
generated_text = model_predictions['generated_texts'][0]
pprint.pprint(f'Response: {generated_text}')

"Response: No, it didn't."


Q3

In [29]:
query = 'What steps can the customer take to make an appointment at the nearest Apple Store or authorized service provider for iPhone repair?'

In [30]:
prompt = f'{context}\n{query}'

In [31]:
payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}

In [32]:
payload = json.dumps(payload).encode('utf-8')

In [33]:
response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                  ContentType=CONTENT_TYPE, 
                                  Body=payload)

model_predictions = json.loads(response['Body'].read())
generated_text = model_predictions['generated_texts'][0]
pprint.pprint(f'Response: {generated_text}')

('Response: Make an appointment online or by calling the Apple Store or '
 'authorized service provider.')


#### C. Sentiment Analysis

In [34]:
query = 'What is the overall sentiment and sentiment score of the conversation between the customer and the agent'

In [35]:
prompt = f'{context}\n{query}'

In [36]:
payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}

In [37]:
payload = json.dumps(payload).encode('utf-8')

In [38]:
response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                  ContentType=CONTENT_TYPE, 
                                  Body=payload)

model_predictions = json.loads(response['Body'].read())
generated_text = model_predictions['generated_texts'][0]
pprint.pprint(f'Response: {generated_text}')

'Response: positive'


In [39]:
sentiment = generated_text

#### D. Sentiment Phrase Extraction

In [40]:
query = f'identify any specific words, phrases, or context that influenced the {sentiment} sentiment'

In [41]:
prompt = f'{context}\n{query}'

In [42]:
payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}

In [43]:
payload = json.dumps(payload).encode('utf-8')

In [44]:
response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                  ContentType=CONTENT_TYPE, 
                                  Body=payload)


model_predictions = json.loads(response['Body'].read())
generated_text = model_predictions['generated_texts'][0]
pprint.pprint(f'Response: {generated_text}')

'Response: customer: thanks for your help'


In [45]:
%store endpoint_name

Stored 'endpoint_name' (str)


In [58]:
# # Delete the SageMaker endpoint
# _.delete_model()
# _.delete_endpoint()

Deleting model with name: huggingface-text2text-flan-t5-xl-1685337480
Deleting endpoint configuration with name: huggingface-text2text-flan-t5-xl-1685337480
Deleting endpoint with name: huggingface-text2text-flan-t5-xl-1685337480
