# Deploying Large Language Models (LLMs) on AWS Sagemaker ml.g4dn.2xlarge instance

This notebook demonstrates  deploying a LLM using the Hugging Face Transformers library. It covers the installation of libraries, loading of pre-trained models and tokenizers, and setting up a  pipeline for text-to-text generation tasks.


In [1]:
# Library Installation
# The following libraries are essential for deploying LLMs:
# - `transformers`: Provides access to pre-trained models and utilities for NLP tasks.
# - `einops`: Offers flexible and powerful tensor operations.
# - `accelerate`: Simplifies running models on multi-GPU setups.
# - `bitsandbytes`: Optimizes model training and inference on GPUs.
# - `langchain`: (If included) Potentially used for chaining language models or specific NLP tasks.
# These libraries form the backbone of our LLM deployment, enabling us to leverage Hugging Face's ecosystem for model handling and optimization.!pip install transformers einops accelerate bitsandbytes langchain
!pip install transformers einops accelerate bitsandbytes langchain

Installing collected packages: safetensors, regex, einops, huggingface-hub, bitsandbytes, tokenizers, accelerate, transformers
Successfully installed accelerate-0.27.0 bitsandbytes-0.42.0 einops-0.7.0 huggingface-hub-0.20.3 regex-2023.12.25 safetensors-0.4.2 tokenizers-0.15.1 transformers-4.37.2


## The first section demonstrates utilizing the LLM on hugging Face without deploying it to AWS


In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
import base64

In [None]:
model_name = "MBZUAI/LaMini-T5-738M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float32)

In [21]:
from langchain.llms import HuggingFacePipeline
def llm_pipeline():
    pipe = pipeline(
        'text2text-generation',
        model = base_model,
        tokenizer = tokenizer,
        max_length = 256,
        do_sample=True,
        temperature = 0.3,
        top_p = 0.95
    )
    local_llm = HuggingFacePipeline(pipeline=pipe)
    return local_llm

In [22]:
input_prompt = "Write and article on Artificial Intelligence"
model = llm_pipeline()


In [23]:
generated_text = model(input_prompt)
generated_text

  warn_deprecated(


'Artificial Intelligence, or AI, is a field of computer science that focuses on creating machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI is a rapidly growing field that aims to create machines that can think, reason, and learn like humans. One of the most significant applications of AI is in healthcare. AI is used in various applications, such as image and speech recognition, natural language processing, and predictive analytics. AI algorithms are designed to analyze large amounts of data and identify patterns that can be used to make predictions and recommendations'

## The second section demonstrates deploying the LLM to AWS


In [39]:
# Import necessary libraries for interacting with AWS services and Hugging Face.
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

# Attempt to get the execution role for SageMaker. This role is needed to give SageMaker access to AWS resources.
try:
	role = sagemaker.get_execution_role()
except ValueError:
    # If unable to get the execution role directly (e.g., when running outside SageMaker), use boto3 to fetch the IAM role.
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Configuration for the Hugging Face model to be deployed.
# This includes the model ID from the Hugging Face Model Hub, the task type, device mapping, and data type settings.
hub = {
	'HF_MODEL_ID':model_name,  # The model ID from Hugging Face Hub.
    'HF_TASK': 'text2text_generation',  # Specifies the task for the model, here text-to-text generation.
    'device_map': 'auto',  # Allows automatic device mapping for model deployment.
    'torch_dtype': 'torch_float32'  # Specifies the data type for model tensors.
}

# Create a Hugging Face Model Class for deployment.
# This includes specifying the Docker image URI for the model, environmental variables (model configuration), and the IAM role.
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="0.8.2"),  # Get the Docker image URI for Hugging Face.
	env=hub,  # Pass the model configuration as environment variables.
	role=role,  # IAM role with permissions for SageMaker operations.
)

# Deploy the model to SageMaker Inference.
# This step provisions the necessary infrastructure and deploys the model Docker container.
predictor = huggingface_model.deploy(
	initial_instance_count=1,  # Number of instances to start for the deployment.
	instance_type="ml.g4dn.2xlarge",  # The type of instance to use for the deployment.
	container_startup_health_check_timeout=300,  # Timeout in seconds for the container health check.
)
  
# Send a prediction request to the deployed model.
# This sends a sample input to the model and retrieves the prediction.
predictor.predict({
	"inputs": "how can I become more healthy?",  # The input question for the model.
})


---------!

[{'generated_text': 'You can become more healthy by: 1. Eating a balanced and nutritious diet 2. Regular exercise 3.'}]

In [40]:
prompt = "Write short article on AWS"

payload = {
    "inputs": prompt,
    "parameters": {
        "do_sample": True,
        "top_p": 0.7,
        "tempratuere": 0.3,
        "top_k": 50,
        "max_new_tokens": 512,
        "repetition_penality": 1.03
    }
}


response = predictor.predict(payload)
response

[{'generated_text': 'AWS stands for Amazon Web Services, which is a global network of servers that provide a wide range of cloud computing services and resources to users worldwide. AWS is one of the largest cloud computing services in the world, and it has been instrumental in transforming the way we access, share, and access information. One of the most popular services offered by AWS is the Amazon Web Services (AWS) service. AWS offers a range of services, including web hosting, email services, cloud storage, and mobile apps. AWS has a wide range of features, including real-time updates, real-'}]