# Basic Prompt Engineering

## Step 1. Prepare Large Language Model (LLM) and Embedding Model 
---

In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.append('../utils')
sys.path.append('../templates') 

In [4]:
import time
import sagemaker, boto3, json
import glob
import os
import pandas as pd
import requests
import json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from ssm import parameter_store

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
pm = parameter_store(aws_region)
pm_restapi_id_key = f"RESTAPI-ID-{aws_region.upper()}"

try:
    RESTAPI_ID = pm.get_params(key=pm_restapi_id_key)
    URL = f'https://{RESTAPI_ID}.execute-api.{aws_region}.amazonaws.com/api/'.replace('"','')
    print("RESTAPI_ID = ", RESTAPI_ID) # YOUR RESTAPI ID
    print("API GATEWAY URL = ", URL)    
except Exception as e:
    print("[ERROR] Please run 0_setup.ipynb first!")

[ERROR] Please run 0_setup.ipynb first!


In [5]:
MODEL_NAME = "FALCON-40B" 
#MODEL_NAME = "LLAMA2-7B" 

LLM_INFO = {
    "LLAMA2-7B": f"{URL}llm/llama2_7b", # g5.12xlarge * 4ea
    "FALCON-40B": f"{URL}llm/falcon_40b",    # g5.48xlarge * 8ea 
    "KULLM-12-8B": f"{URL}llm/kkulm_12_8b", # g5.24xlarge * 4ea
}

LLM_URL = LLM_INFO[MODEL_NAME]
EMB_URL = f"{URL}/emb/gptj_6b"             # g5.4xlarge * 4ea 

HEADERS = {    
    'Content-Type': 'application/json',
    'Accept': 'application/json',
}

if 'falcon_40b' in LLM_URL:
    LLM_RESPONSE_KEY = "generated_text"
else:
    LLM_RESPONSE_KEY = "generation"
    
print (f'MODEL_NAME: {MODEL_NAME}\nLLM_URL: {LLM_URL}')    

MODEL_NAME: FALCON-40B
LLM_URL: https://6bk4r5mo4f.execute-api.us-east-1.amazonaws.com/api/llm/falcon_40b


In [6]:
PARAMS = {
    "LLAMA2-7B": {
        'max_new_tokens': 128,
        'top_p': 0.9,
        'temperature': 0.1,
        'return_full_text': False
    },    
    "FALCON-40B": {
        "max_new_tokens": 128,
        "max_length": 256,
        "top_p": 0.95,
        "do_sample": False,
        "temperature": 0.2,
        "return_full_text": False,
        "include_prompt_in_result": False
    } 
}

<br>

## Step 2. Ask a question to LLM without RAG
---

### Simple prompt engineering

In [7]:
from lib_en import Llama2ContentHandlerAmazonAPIGateway, FalconContentHandlerAmazonAPIGateway
from langchain.llms import AmazonAPIGateway

llm = AmazonAPIGateway(api_url=LLM_URL, headers=HEADERS)
if MODEL_NAME == "FALCON-40B": llm.content_handler = FalconContentHandlerAmazonAPIGateway()
elif MODEL_NAME in ["LLAMA2-7B", "LLAMA2-13B"]: llm.content_handler = Llama2ContentHandlerAmazonAPIGateway()
params = PARAMS[MODEL_NAME]
llm.model_kwargs = params

In [8]:
%%time

payload = {
    "inputs": "Generative AI is",
    "parameters": params
}
response = requests.post(url=LLM_URL, headers=HEADERS, json=payload)
print(response.json()[0][LLM_RESPONSE_KEY])

 a type of AI that can create new content or ideas based on existing data. It can be used in a variety of fields such as art, music, and even scientific research. Generative AI can be trained on large datasets and then generate new content based on the patterns and trends it has learned from the data. This can be useful in fields such as natural language processing, where it can generate new text based on existing text, or in image generation, where it can create new images based on existing images.
CPU times: user 9.38 ms, sys: 4.13 ms, total: 13.5 ms
Wall time: 5.5 s


In [9]:
%%time

payload = {
    "inputs": """A brief email message of Amazon SageMaker's main features

Hi everyone,

We are announcing""",
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}
response = requests.post(url=LLM_URL, headers=HEADERS, json=payload)
print(response.json()[0][LLM_RESPONSE_KEY])

 the launch of Amazon SageMaker, a fully-managed service that provides everything you need to build, train, and deploy machine learning models.

Here are some of the main features of Amazon SageMaker:

1. Fully-managed service: Amazon SageMaker is a fully-managed service that provides everything you
CPU times: user 13.7 ms, sys: 371 µs, total: 14.1 ms
Wall time: 3.33 s


### More complex prompts: Play the role of AWS SA


In [10]:
architect_prompt = """
Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
{requirements}
#Scale:
{scale}
#Features:
{features}

Describe an architecture on AWS in technical detail with sentences.
"""
prompt = architect_prompt.format(
    requirements="A website for computer advertising", 
    scale="Must handle 10k requests per second in peak. Must be globally available. Must be reponsive and fast", 
    features="Landing page describing our product. About page describing the company."
)
print(prompt)


Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
A website for computer advertising
#Scale:
Must handle 10k requests per second in peak. Must be globally available. Must be reponsive and fast
#Features:
Landing page describing our product. About page describing the company.

Describe an architecture on AWS in technical detail with sentences.



In [11]:
payload = {
    'inputs': prompt,
    'parameters': params
}

response = requests.post(url=LLM_URL, headers=HEADERS, json=payload)
print(response.json()[0][LLM_RESPONSE_KEY])

The architecture would consist of the following components:

1. Web servers: EC2 instances running Apache web server with SSL certificate installed.
2. Load balancer: ELB with SSL certificate installed to distribute traffic across multiple web servers.
3. Auto scaling: Auto scaling group to automatically add or remove web servers based on traffic.
4. Database: RDS instance with MySQL database to store user information and product details.
5. Content delivery network (CDN): CloudFront to cache and distribute static content such as images and videos.
6. Security: IAM roles to control access to resources, WA


### Applying LangChain

In [12]:
llm.model_kwargs = params
print(llm(prompt))

The architecture on AWS would include the following components:

1. Elastic Load Balancer (ELB) - This component would distribute incoming traffic across multiple instances of the application to ensure high availability and scalability.

2. Auto Scaling Group - This component would automatically scale up or down the number of instances based on the load on the application.

3. EC2 instances - These instances would run the application and handle incoming requests.

4. RDS database - This component would store user data and ensure high availability and scalability.

5. S3 bucket - This component would


In [13]:
from langchain.prompts import PromptTemplate

# First we can define an exposed parameter interface to the format string
prompt = PromptTemplate(
    input_variables=["requirements", "scale", "features"],
    template=architect_prompt,
)

final_prompt = architect_prompt.format(
    requirements="External facing web application written in Javascript, global deployment",
    scale="Average of 500 requests per minute, scale events up to 3000 requests per second",
    features="Mobile website, desktop version, javascript"
)
print(final_prompt)


Play the role of a solution architect experienced with AWS. You are analysing customer requirements to create
well-architected solution architectures that you present to the customer. You are detailled, kind and
focussed. Given the following context

Context:
#System Requirements:
External facing web application written in Javascript, global deployment
#Scale:
Average of 500 requests per minute, scale events up to 3000 requests per second
#Features:
Mobile website, desktop version, javascript

Describe an architecture on AWS in technical detail with sentences.



In [14]:
print(llm(final_prompt))

The architecture would consist of the following components:

1. Elastic Load Balancer (ELB) - This component would distribute incoming traffic across multiple instances of the web application.

2. Auto Scaling Group - This component would automatically scale up or down the number of instances based on the load on the application.

3. EC2 instances - These instances would run the web application and would be managed by the Auto Scaling Group.

4. RDS (Relational Database Service) - This component would provide a scalable and reliable database for the application.

5. S


In [15]:
topic_recommender_prompt = "List {number} topics to write on blog posts about {topic}"

recommend_topic_prompt = PromptTemplate(
    template=topic_recommender_prompt,
    input_variables=['topic', 'number']
)

results = llm(recommend_topic_prompt.format(topic="Machine Learning", number=5))
print(results)


1. Introduction to Machine Learning
2. Supervised Learning
3. Unsupervised Learning
4. Reinforcement Learning
5. Deep Learning


In [16]:
from langchain.output_parsers import CommaSeparatedListOutputParser
parsed_recommender_prompt = topic_recommender_prompt + "\n{format_instructions}"

parser = CommaSeparatedListOutputParser()

parsed_recommender_template = PromptTemplate(
    template=parsed_recommender_prompt,
    input_variables=['topic', 'number'],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

In [17]:
gen_prompt = parsed_recommender_template.format(topic='Generative AI', number=5)
print(gen_prompt)

List 5 topics to write on blog posts about Generative AI
Your response should be a list of comma separated values, eg: `foo, bar, baz`


In [18]:
output = llm(gen_prompt)
print(output)


1. How Generative AI can help in generating human-like text
2. The future of Generative AI in content creation
3. How Generative AI can help in generating images and videos
4. The ethical implications of Generative AI
5. The potential of Generative AI in the healthcare industry
