Skip to content

how to host LLMs on AWS service, either global AWS and AWS China.

Notifications You must be signed in to change notification settings

cloudswb/llm-on-aws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Deploy the model from HuggingFace to Sagemaker in China

1. Download mode from HuggingFace

Run the python script : hf_download_and_upload.py

Parameter Description:

--model_name, the model name defined in HuggingFace, required

--token, the token used to download model from HuggingFace, optional

--cache_path, the cache files used in local path, required

--max_call_times, the download retry times, required

--aws_access_key, the AWS user access key in AWS account to upload to S3, required

--aws_secret_key, the AWS user secret key in AWS account to upload to S3, required

--aws_region, the Region code of AWS S3 located

--s3_model_loc, the bucket location used to upload

Examples:

  • Download the embedding model “BAAI/bge-large-en-v1.5”
python3 hf_download_and_upload.py \
--model_name "BAAI/bge-large-en-v1.5" \
--token "YOUR_MODEL_TOKEN" \
--cache_path "./cache/model/BAAI/bge-large-en-v1.5" \
--max_call_times 10 \
--aws_access_key "AKIA3FXLNLQOIG73VBVJ" \
--aws_secret_key "jWY3Ym7HlYqX3oq/SZk3RB5YWwGoh/6Z34UhTnYg" \
--aws_region "cn-northwest-1" \
--s3_model_loc "s3://sagemaker-cn-northwest-1-768219110428/llm/model/BAAI/bge-large-en-v1.5/"
  • Download the embedding model “meta-llama/Llama-2-7b-chat-hf”
python3 hf_download_and_upload.py \
--model_name "meta-llama/Llama-2-7b-chat-hf" \
--token "YOUR_MODEL_TOKEN" \
--cache_path "./cache/model/meta-llama/Llama-2-7b-chat-hf" \
--max_call_times 10 \
--aws_access_key "AKIA3FXLNLQOIG73VBVJ" \
--aws_secret_key "jWY3Ym7HlYqX3oq/SZk3RB5YWwGoh/6Z34UhTnYg" \
--aws_region "cn-northwest-1" \
--s3_model_loc "s3://sagemaker-cn-northwest-1-768219110428/llm/model/meta-llama/Llama-2-7b-chat-hf/"

2. Deploy the model to Sagemaker

Deploy Steps

  • Open/Create Sagemaker Notebook
  • Download git source code: https://github.com/cloudswb/llm-on-aws.git
  • Open the script (.ipynb file) from deploy folder. Choose the right file depends on Model name.
  • Review and config the parameters in Step 2 section.
    • Parameter Description:
      • model_name : HuggingFace中的模型名称
      • s3_model_prefix :模型文件在S3中的位置的文件夹路径(不包含bucket name和文件名称), 【需要提前准备】
      • s3_code_prefix : 模型执行代码在S3中的位置的文件夹路径(不包含bucket name和文件名称)
      • endpoint_config_name : 部署Sagemaker Configuration 的名称
      • endpoint_name : 部署Sagemaker endpoint的名称
      • deploy_cache_location : 部署时,产生的代码文件所在的本地路径
      • inference_image_uri : 部署所使用的推理容器
  • Run the script to start deploy cell by cell or Run all cells
  • Monitor the deploy progress
  • Test the LLM

Deploy Resources

  • deploy
    • bge_deploy.ipynb : Deploy Bge Embedding Model
    • llama_deploy.ipynb : Deploy LLama2 LLM Model

About

how to host LLMs on AWS service, either global AWS and AWS China.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published