# **Train and Deploy open LLMs with Amazon SageMaker**

⭐ <font color=orange>**由於今天工作坊的訓練與部署皆需要等待較久，為了節省各位的時間，會需要各位先按下 `Run All` 後，一邊等待程式碼執行一邊進行內容分享**</font>

> ### **Jupyter Notebook 快速操作教學**
> **Jupyter Notebook** 是一個基於網頁的開發環境，允許您在單一介面中編寫和執行程式碼、查看結果、撰寫筆記及進行數據可視化。它廣泛應用於數據科學、機器學習和學術研究。
> - Jupyter Notebook 中的單元格分為三種類型:
>   1. **Code**: 編寫 Python 程式碼的單元格，按 `Shift + Enter` 執行程式碼
>   2. **Markdown**: 用於撰寫說明文字，支援 Markdown 語法，按 `Shift + Enter` 渲染文本
>   3. **Raw**: 原始資料單元格，不會被處理
> 
> - 對單元格進行操作:
>   1. 編輯單元格: 按 `Enter` (本次工作坊不需使用)
>   2. 執行單元格: 按 `Shift + Enter` / 按 `Run` 按鈕
>
> ### **⮕ <font style="color: black ;background: orange">Shift + Enter</font> is all you need! (and <font style="color: black ;background: orange">Run All</font>🤣)**

## **設置開發環境**

**Hugging Face 簡介**

Hugging Face 是一個開源平台，集成超過 47 萬個預先訓練的 AI 模型和資料集，使開發者可以快速存取、應用和微調這些模型，從而加速自然語言處理和 AI 應用的開發過程。Hugging Face 提供標準化的函式庫和 API，使模型的下載、整合和部署更加簡便和標準化。

> - **由於時間因素，本次工作坊已將模型放上 S3，不會帶大家從 Hugging Face 上進行任何操作**，若今日活動後還想在自己的 Hugging Face 下載模型，請登入自己的帳號、產生 token、同意模型使用條款、並修改程式碼。   
> 
> - 若想在地端環境使用 SageMaker，需要擁有具備 SageMaker 所需權限的 IAM role，更多資訊請參考 [\[How to use SageMaker execution roles\]](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)

#### **安裝 Hugging Face 所需的特定版本套件**

1. `huggingface_hub`: 版本 `0.24.6`，用於與 Hugging Face 模型和資料庫互動，允許用戶上傳和下載預訓練的模型、共享和管理模型。

2. `transformers`: 版本 `4.44.2`，是一個預訓練模型套件，適用於自然語言處理、計算機視覺及語音處理等任務。該套件包含 Transformer 和非 Transformer 模型，方便開發者使用各類深度學習模型。

3. `datasets`: 版本 `2.21.0`，用於獲取和處理各種數據集，特別是在機器學習和 NLP 任務中使用的數據。

4. `--quiet`: 讓安裝過程中的輸出介面保持整潔，不會印出過多的安裝細節。

In [1]:
# For resolving version conflicts,  not mandatory
!pip uninstall awscli --yes --quiet
!pip install 'docutils>=0.18.1,<0.21' --quiet

# Install the specific version of packages required by Hugging Face
!pip install huggingface_hub==0.24.6 transformers==4.44.2 datasets==2.21.0 --quiet

[0m

#### **設定 SageMaker 環境並獲取相關的 AWS IAM 角色和 S3 bucket 資訊**

1. **Boto3**: AWS 的 Python SDK，用於與 AWS 服務進行交互，包括創建和管理資源。

2. **SageMaker Session**: 管理 SageMaker 的操作和資源，提供統合機器學習工作流的接口，確保操作一致且簡單。

3. **Storage Bucket**: 用於存儲數據和模型的 S3 存儲桶。在此程式碼中，使用 `sess.default_bucket()` 獲取預設存儲桶。

4. **Execution Role**: IAM 角色，授予 SageMaker 執行所需的權限，讓它可以訪問其他 AWS 資源（如 S3 存儲桶）。使用 `sagemaker.get_execution_role()` 獲取角色。

    <img src="../imgs/d-sagemaker-env-setup.png" width="850">

In [2]:
import sagemaker
import boto3

# Creates a SageMaker session to manage operations related to SageMaker
sess = sagemaker.Session()

# Setup SageMaker storage bucket:
sagemaker_bucket=None
if sagemaker_bucket is None and sess is not None:
    sagemaker_bucket = sess.default_bucket()

# Get execution role
try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Recreates the SageMaker session using the previously obtained sagemaker_session_bucket
sess = sagemaker.Session(default_bucket=sagemaker_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker role arn: arn:aws:iam::539656205201:role/bedrock-workshop-studio-v2-SageMakerExecutionRole-ZcXxkMsoCiVI
sagemaker bucket: sagemaker-us-west-2-539656205201
sagemaker session region: us-west-2


## **處理資料集**

此處會引入本工作坊預先準備好的 Dataset
- 大使訓練版：10000+ 筆資料 ⮕ 由於時間過久，本次工作坊不予使用
- 工作坊版本：16 筆資料 ⮕ 僅供體驗，訓練效果不佳請見諒

  <img src="../imgs/d-datasets.png" width="600">

> **Dataset 來源**
> 1. 由大使人工發想使用者可能情境，並調用 GenAI 生成貓咪占卜師語氣的回覆
>
> 2. 使用 Amazon Bedrock API 調用 Claude 3.5 sonnet，藉提示工程中的 In-Context Learning (Few Shot) 技巧，依據既有內容生成上萬筆資料，用於 Dataset   
> 
>   - <img src="../imgs/d-generate-dataset.png" width="500">

#### **將本次工作坊提供的 Dataset 載入 SageMaker bucket**

1. **Dataset 存放位置**: 
   - 大使的 bucket (source): `aws-educate-09-28-sagemaker-workshop-oregon/`
     - 全部資料檔案: `/datasets/phi-3.5-mini-instruct/workshop/data.json`
     - 訓練資料檔案: `/datasets/phi-3.5-mini-instruct/workshop/train_dataset.json`
     - 測試資料檔案: `/datasets/phi-3.5-mini-instruct/workshop/test_dataset.json`
   - SageMaker 的 bucket (destination): `sagemaker_bucket` (已在前一步驟用 `sess.default_bucket()` 獲取)
    
      <img src="../imgs/d-copy-dataset.png" width="600">
  

2. **解析 S3 URI**：
   - `parse_s3_uri` 函數用來解析 S3 URI，從中提取 bucket name 和 key。

3. **複製 S3 物件**：
   - `copy_s3_object` 函數實現了從一個 S3 桶複製物件到另一個桶的邏輯。它使用 `get_object` 方法下載來源物件，然後使用 `put_object` 方法上傳到目標桶。

In [3]:
import boto3

s3 = boto3.client('s3', region_name="us-west-2")

def parse_s3_uri(uri):
    parts = uri.replace("s3://", "").split("/")
    bucket = parts[0]
    key = "/".join(parts[1:])
    return bucket, key


def copy_s3_object(source_uri, target_bucket):
    source_bucket, source_key = parse_s3_uri(source_uri)
    try:
        # Download file from source bucket
        response = s3.get_object(Bucket=source_bucket, Key=source_key)
        file_content = response['Body'].read()

        # Upload file to target bucket
        s3.put_object(Bucket=target_bucket,
                             Key=source_key, Body=file_content)
        print(f"Successfully copied {source_key} to {target_bucket}")
    except Exception as e:
        print(f"Error copying {source_key}: {str(e)}")


# Base S3 URI for datasets
amb_bucket = 's3://aws-educate-09-28-sagemaker-workshop-oregon'

amb_train_uri = f"{amb_bucket}/datasets/phi-3.5-mini-instruct/workshop/train_dataset.json"
amb_test_uri = f"{amb_bucket}/datasets/phi-3.5-mini-instruct/workshop/test_dataset.json"
amb_data_uri = f"{amb_bucket}/datasets/phi-3.5-mini-instruct/workshop/data.json"

# Copy train and test datasets to the target S3 bucket
copy_s3_object(amb_train_uri, sagemaker_bucket)
copy_s3_object(amb_test_uri, sagemaker_bucket)
copy_s3_object(amb_data_uri, sagemaker_bucket)

# Construct the S3 URI for the datasets in the target bucket
sagemaker_datasets_uri = f"s3://{sagemaker_bucket}/datasets/phi-3.5-mini-instruct/workshop/"

Successfully copied datasets/phi-3.5-mini-instruct/workshop/train_dataset.json to sagemaker-us-west-2-539656205201
Successfully copied datasets/phi-3.5-mini-instruct/workshop/test_dataset.json to sagemaker-us-west-2-539656205201
Error copying datasets/phi-3.5-mini-instruct/workshop/data.json: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied


#### **從 SageMaker bucket 查看 Dataset 的 Feature**

In [4]:
import json
import pandas as pd
from datasets import Dataset, DatasetDict
import boto3

def read_and_format_json_from_s3(uri):
    bucket, key = parse_s3_uri(uri)
    try:
        obj = s3.get_object(Bucket=bucket, Key=key)
        file_content = obj['Body'].read().decode('utf-8')
        
        # parse json
        try:
            data = json.loads(file_content)
        except json.JSONDecodeError:
            data = []
            for line in file_content.splitlines():
                try:
                    item = json.loads(line)
                    data.append(item)
                except json.JSONDecodeError:
                    continue

        formatted_data = []
        for item in data:
            formatted_item = {}
            if 'messages' in item:
                formatted_item['input'] = item['messages'][0]['content']
                formatted_item['output'] = item['messages'][1]['content']
            else:
                formatted_item = item
            formatted_data.append(formatted_item)

        return formatted_data
    except Exception as e:
        print(f"Error reading from S3: {e}")
        return None

# s3 URI
sagemaker_train_uri = sagemaker_datasets_uri + 'train_dataset.json'
sagemaker_test_uri = sagemaker_datasets_uri + 'test_dataset.json'

# Load and format training and test data from S3
sagemaker_train_data = read_and_format_json_from_s3(sagemaker_train_uri)
sagemaker_test_data = read_and_format_json_from_s3(sagemaker_test_uri)

# Create DatasetDict from the formatted data
dataset = DatasetDict({
    'train': Dataset.from_pandas(pd.DataFrame(sagemaker_train_data)),
    'test': Dataset.from_pandas(pd.DataFrame(sagemaker_test_data))
})

print("Dataset Info:")
print(dataset)

# Dataset Features
print("\n Dataset Features:")
print(dataset['train'].features)

Dataset Info:
DatasetDict({
    train: Dataset({
        features: ['inputs', 'outputs'],
        num_rows: 14
    })
    test: Dataset({
        features: ['inputs', 'outputs'],
        num_rows: 2
    })
})

 Dataset Features:
{'inputs': Value(dtype='string', id=None), 'outputs': Value(dtype='string', id=None)}



#### **從 SageMaker 中察看 data.json**

In [5]:

import boto3
import json
from botocore.exceptions import ClientError


def read_and_display_s3_data(s3_uri, limit=3):
    # Parse the S3 URI into bucket and key
    bucket, key = parse_s3_uri(s3_uri)

    try:
        # Retrieve the object from S3
        response = s3.get_object(Bucket=bucket, Key=key)

        # Read and decode the data
        data = response['Body'].read().decode('utf-8')

        # Parse the JSON data
        json_data = json.loads(data)

        # Display a limited number of conversations
        for idx, item in enumerate(json_data[:limit], 1):
            print(f"\nConversation {idx}:")
            for message in item.get('messages', []):
                role = message.get('role', 'unknown').capitalize()
                content = message.get('content', 'No content')
                print(f"  {role}: {content}")

    except ClientError as e:
        print(f"Error reading S3 data: {e}")
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON data: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


# Example usage
sagemaker_data_uri = sagemaker_datasets_uri + 'data.json'
read_and_display_s3_data(sagemaker_data_uri)

Error reading S3 data: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.


#### **將訓練和測試資料保存到 SageMaker 預設路徑中**

SageMaker 預設路徑 `/opt/ml/input/data/training` 是 SageMaker 自動配置的本地路徑，用來存放從 S3 下載的訓練數據，以確保訓練過程中模型可以順利讀取數據。

詳情請參考：[SageMaker Model Training Storage Paths](https://docs.aws.amazon.com/sagemaker/latest/dg/model-train-storage.html)

In [6]:
# import json
# import os

# # Define the directory to save the datasets
# save_path = '/opt/ml/input/data/training'
# os.makedirs(save_path, exist_ok=True)

# def save_dataset(file_path, data):
#     with open(file_path, 'w') as f:
#         for item in data:
#             json.dump(item, f)
#             f.write('\n')

# # Save training data
# train_file_path = os.path.join(save_path, 'train_dataset.json')
# save_dataset(train_file_path, sagemaker_train_data)

# # Save test data
# test_file_path = os.path.join(save_path, 'test_dataset.json')
# save_dataset(test_file_path, sagemaker_test_data)

# print(f"Dataset saved to: {save_path}")

## 🚀<font color=orange>**Now, lets Fine-tune our model. 🚀**</font>

## **進行 Fine-Tuning**

在這個部分，我們將深入探討神經網絡的一些基本原理，以及如何通過調整超參數來優化模型訓練。我也會介紹 QLoRA（Quantization-aware Low-Rank Adaptation）的基本原理。

首先，讓我們回顧一些關鍵概念：


#### 神經網絡基礎概念(Neural Network Fundamentals)

1. **神經網絡**
   - 模仿人腦結構的機器學習模型
   - 由多層神經元組成，通過權重(weight)和激活函數(activation function)處理輸入數據
   - 廣泛應用於圖像識別(image recognition)、自然語言處理(NLP)、語音識別(speech recognition)等領域

2. **前向傳播 (Forward Propagation)**
   - 數據從輸入層(input layer)通過隱藏層(hidden layer)到輸出層(output layer)的過程
   - 每一層的輸出(output)作為下一層的輸入(input)

3. **反向傳播 (Backward Propagation)**
   - 計算損失函數(loss function)對每個權重(weight)的梯度(gradient)
   - 從輸出層(output layer)向輸入層(input layer)逐層調整權重(weight)

4. **梯度下降 (Gradient Descent)**
   - 優化神經網絡的核心算法
   - 通過沿梯度(gradient)的反方向(negative direction)調整權重(weight)來最小化損失函數(loss function)

#### 關鍵超參數(Hyperparameter)

5. **批量大小 (Batch Size)**
   - 每次更新權重時使用的訓練樣本數
   - 較大的批量可以提高訓練穩定性，但可能需要更多內存

6. **學習率 (Learning Rate)**
   - 控制每次迭代(iteration)時權重調整的幅度
   - 太大可能導致不收斂，太小可能導致訓練過慢

#### 高級技術(Advanced Techniques)

7. **量化 (Quantization)**
   - 將模型參數從高精度轉換為低精度
   - 可以減少模型大小和推理時間，但可能略微降低精度

8. **LoRA (Low-Rank Adaptation)**
   - 一種高效的模型微調技術
   - 主要參數：
     - alpha: 控制LoRA更新的強度
     - rank: 決定適配器矩陣的秩，影響模型表達能力和計算成本

   - 線性代數中的 rank 概念：
     - 在大型神經網絡中，權重更新通常涉及高維矩陣
     - LoRA 假設這些更新可以用低秩矩陣(low rank)來近似，從而減少訓練參數數量
     - 通過調整 rank 參數，可以在模型複雜度和計算效率之間取得平衡

9. **QLoRA (Quantization-aware Low-Rank Adaptation)**
   - 結合量化和 LoRA 的技術
   - 優化步驟：
     1. 將預訓練模型量化後並凍結
     2. 添加小型、可訓練的 LoRA 適配器層
     3. 僅微調適配器層，同時使用凍結的量化模型作為上下文
   - 優點：大大減少內存需求，同時保持模型性能

這些超參數和技術在訓練腳本 [run_qlora.py](../scripts/run_qlora.py) 中被使用。


In [7]:
from huggingface_hub import HfFolder


# hyperparameters, which are passed into the training job
hyperparameters ={
  'model_id': "microsoft/Phi-3.5-mini-instruct",    # pre-trained model
  'dataset_path': '/opt/ml/input/data/training',    # path where sagemaker will save training dataset
  'num_train_epochs': 3,                            # number of training epochs
  'per_device_train_batch_size': 1,                 # batch size for training
  'gradient_accumulation_steps': 2,                 # Number of updates steps to accumulate 
  'gradient_checkpointing': True,                   # save memory but slower backward pass
  'fp16': True ,
  'learning_rate': 2e-4,                            # learning rate
  'max_grad_norm': 0.3,                             # Maximum norm (for gradient clipping)
  'warmup_ratio': 0.03,                             # warmup ratio
  "lr_scheduler_type":"constant",                   # learning rate scheduler
  'save_strategy': "epoch",                         # save strategy for checkpoints
  "logging_steps": 10,                              # log every x steps
  'merge_adapters': True,                           # wether to merge LoRA into the model (needs more memory)
  'use_flash_attn': True,                           # Whether to use Flash Attention
  'output_dir': '/tmp/run',                         # output directory, where to save assets during training
}

#### **設置訓練任務**

In [8]:
from sagemaker.huggingface import HuggingFace

# define Training Job Name 
job_name = f'huggingface-qlora-{hyperparameters["model_id"].replace("/","-").replace(".","-")}'

# create the Estimator
huggingface_estimator = HuggingFace(
    entry_point          = 'run_qlora.py',    # train script
    source_dir           = '../scripts',      # directory which includes all the files needed for training
    instance_type        = 'ml.p3.2xlarge',   # instances type used for the training job
    instance_count       = 1,                 # the number of instances used for training
    max_run              = 2*24*60*60,        # maximum runtime in seconds (days * hours * minutes * seconds)
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size          = 300,               # the size of the EBS volume in GB
    transformers_version = '4.36',            # the transformers version used in the training job
    pytorch_version      = '2.1',             # the pytorch_version version used in the training job
    py_version           = 'py310',           # the python version used in the training job
    hyperparameters      =  hyperparameters,  # the hyperparameters passed to the training job
    environment          = { "HUGGINGFACE_HUB_CACHE": "/tmp/.cache" }, # set env variable to cache models in /tmp
)

#### **設定訓練數據的位置，並啟動 Hugging Face 模型的訓練任務**

In [9]:
# Define a data input dictionary with the uploaded S3 URIs
data = {
    'training': f"s3://{sess.default_bucket()}/datasets/phi-3.5-mini-instruct/workshop"
}

# Start the training job using the provided datasets as input
huggingface_estimator.fit(data, wait=True)

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: huggingface-qlora-microsoft-Phi-3-5-min-2024-09-24-19-25-26-124


2024-09-24 19:25:27 Starting - Starting the training job
2024-09-24 19:25:27 Pending - Training job waiting for capacity......
2024-09-24 19:26:15 Pending - Preparing the instances for training...
2024-09-24 19:26:49 Downloading - Downloading input data...
2024-09-24 19:27:04 Downloading - Downloading the training image........................
2024-09-24 19:31:27 Training - Training image download completed. Training in progress...[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
  "cipher": algorithms.TripleDES,[0m
  "class": algorithms.TripleDES,[0m
[34m2024-09-24 19:31:43,059 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-09-24 19:31:43,076 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-09-24 19:31:43,089 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succe

#### **將 S3 的 URI 轉換為一個可以直接在 AWS S3 管理控制台中訪問的 URL**

In [10]:
huggingface_estimator.model_data.replace("s3://", "https://s3.console.aws.amazon.com/s3/buckets/")

'https://s3.console.aws.amazon.com/s3/buckets/sagemaker-us-west-2-539656205201/huggingface-qlora-microsoft-Phi-3-5-min-2024-09-24-19-25-26-124/output/model.tar.gz'

In [11]:
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  session=sess,
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

INFO:sagemaker.image_uris:Defaulting to only available Python version: py310
INFO:sagemaker.image_uris:Defaulting to only supported image scope: gpu.


llm image uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0


## 🚀<font color=orange>**Now, lets deploy our model to an endpoint. 🚀**</font>

## 部署模型

在前面訓練好模型之後，我們可以從 **SageMaker > Training > Training Job** 裡面找到 Model 的 S3 路徑，但在我們這個 Notebook 中，可以從 `huggingface_estimator.model_data` 取得 Model Artifact 的 S3 URI。 

In [12]:
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  # version="1.1.0",
  session=sess,
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

INFO:sagemaker.image_uris:Defaulting to only available Python version: py310
INFO:sagemaker.image_uris:Defaulting to only supported image scope: gpu.


llm image uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0


In [14]:
print(huggingface_estimator.model_data)

s3://sagemaker-us-west-2-539656205201/huggingface-qlora-microsoft-Phi-3-5-min-2024-09-24-19-25-26-124/output/model.tar.gz


我們現在可以使用容器 URI 和模型在 S3 的路徑來創建一個 `HuggingFaceModel`。同時，我們還需要設定 TGI（Text Generation Inference）的配置，包括 GPU 的數量和最大輸入 tokens。你可以在[這裡](https://huggingface.co/docs/text-generation-inference/basic_tutorials/launcher)找到完整的配置選項列表。

In [15]:
import json
from sagemaker.huggingface import HuggingFaceModel

# s3 path where the model will be uploaded
# if you try to deploy the model to a different time, add the s3 path here
model_s3_path = huggingface_estimator.model_data

# sagemaker config
instance_type = "ml.g5.2xlarge"
number_of_gpu = 1
health_check_timeout = 600 # 10 minutes to be able to load the model

# Define Model and Endpoint configuration parameter
config = {
  'HF_MODEL_ID': "/opt/ml/model", # path to where sagemaker stores the model
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(1024), # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(2048), # Max length of the generation (including input text)
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  model_data=model_s3_path,
  env=config
)

In [16]:
predictor = llm_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  container_startup_health_check_timeout=health_check_timeout, 
)

INFO:sagemaker:Creating model with name: huggingface-pytorch-tgi-inference-2024-09-24-19-38-14-172
INFO:sagemaker:Creating endpoint-config with name huggingface-pytorch-tgi-inference-2024-09-24-19-38-14-901
INFO:sagemaker:Creating endpoint with name huggingface-pytorch-tgi-inference-2024-09-24-19-38-14-901


-----------!

In [17]:
# Example request body
data = {
   "inputs": "<|system|>\n你是一隻具備科技知識且幽默的小貓咪 AWS 占卜師，風格親切可愛，會使用喵語表達，並常用 AWS 雲端技術來比喻日常生活中的情況。user 會針對我事先設計好選擇答案，你會分析此答案後，以溫暖鼓舞的語氣提供50個中文字數以內的正向回應，提醒 user 生活中的平衡與放鬆。你還會使用下列顏文字來增添表達的可愛感：(＝^ω^＝), (=①ω①=), (=ＴェＴ=), (=ↀωↀ=), (=ΦωΦ=), (ΦзΦ), (^・ω・^ ), (ฅ^•ﻌ•^ฅ)。<|end|>\n<|user|>\n團隊中的數據管理專家，能夠記住並快速檢索大量的數據和訊息，確保團隊在需要時能夠立即獲得所需的資料。你非常可靠，無論是文件、報告、還是歷史數據，都能夠完好無損地保存並準確地提供。<|end|>\n<|assistant|>\n喵哈哈！你這不就是活生生的S3嗎？(=^ω^=) 存儲海量數據還能快速檢索，簡直就是團隊的數據寶庫喵！不過可別忘了給自己設置個生命週期規則，把一些過時的記憶「歸檔」到腦袋的Glacier Deep Archive裡喵～這樣才能保持高效運轉喔！你的可靠程度，恐怕連99.999999999%的耐用性都比不上呢，真是太厲害了喵～<|end|><|assistant|>"
}

# Inference request
predictor.predict(data)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (424) from primary with message "{"error":"Request failed during generation: Server error: CANCELLED","error_type":"generation"}". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/huggingface-pytorch-tgi-inference-2024-09-24-19-38-14-901 in account 539656205201 for more information.

## Clean up

當我們測試完之後，記得清理資源，避免衍生費用。
(若需要使用請記得解除註解)

In [None]:
# predictor.delete_model()
# predictor.delete_endpoint()