# LLMs in a Box
1. Create a SageMaker Studio Domain if you don't have one
2. Open SageMaker Studio under the user you plan to launch this applicatio
3. Either upload this notebook, or clone the repository: [repo](https://github.com/chaeAclark/literate-eureka.git)
4. Open the notebook `LLM in a box.ipynb`
5. You can run the entire notebook by clicking Run > Run All Cells
6. Alternatively, you can run the cells individually

### Terminal Installation
You need to ensure you have installed all needed packages in the terminal you are using.
1. boto3
2. streamlit
3. pdf2image
4. ai21[SM]
5. Pillow
6. pandas

In [None]:
%%writefile requirements.txt
boto3
streamlit
pdf2image
ai21[SM]
Pillow
pandas

# Install

#### Update SageMaker

In [2]:
!pip install -U sagemaker --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pytest-astropy 0.8.0 requires pytest-cov>=2.0, which is not installed.
pytest-astropy 0.8.0 requires pytest-filter-subpackage>=0.1, which is not installed.
docker-compose 1.29.2 requires PyYAML<6,>=3.10, but you have pyyaml 6.0 which is incompatible.
awscli 1.27.111 requires botocore==1.29.111, but you have botocore 1.29.135 which is incompatible.
awscli 1.27.111 requires PyYAML<5.5,>=3.10, but you have pyyaml 6.0 which is incompatible.
awscli 1.27.111 requires rsa<4.8,>=3.1.2, but you have rsa 4.9 which is incompatible.
aiobotocore 2.4.2 requires botocore<1.27.60,>=1.27.59, but you have botocore 1.29.135 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m

# Imports

### General Libraries

In [2]:
import os
import json
import boto3

### SageMaker Libraries

In [3]:
import sagemaker as sm

from sagemaker import image_uris
from sagemaker import model_uris
from sagemaker import script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

### Deploy and Directory Setup

In [4]:
def get_sagemaker_session(local_download_dir) -> sm.Session:
    """
    # Create a SageMaker Session
    # This function is used to create a SageMaker Session object.
    # The SageMaker Session object is used to create a SageMaker Endpoint,
    # SageMaker Model, and SageMaker Endpoint Config.
    """
    sagemaker_client = boto3.client(service_name="sagemaker", region_name=boto3.Session().region_name)
    session_settings = sm.session_settings.SessionSettings(local_download_dir=local_download_dir)
    session = sm.session.Session(sagemaker_client=sagemaker_client, settings=session_settings)
    return session

In [5]:
model_path = './download_dir'
if not os.path.exists(model_path):
    os.mkdir(model_path)

### SageMaker Configurations

In [6]:
role               = sm.get_execution_role()
sagemaker_session  = get_sagemaker_session(model_path) # sm.session.Session()
region             = sagemaker_session._region_name

# These are needed to show where the streamlit app is hosted
sagemaker_metadata = json.load(open('/opt/ml/metadata/resource-metadata.json', 'r'))
domain_id          = sagemaker_metadata['DomainId']
resource_name      = sagemaker_metadata['ResourceName']

### Boto Configurations

# Model
The following section will deploy the JumpStart model `flan-###`. There are additional steps required if launching 3rd-party proprietary models. These steps are detailed in another section.

### Select Model

In [9]:
filter_value = "task == text2text"
text_generation_models = list_jumpstart_models(filter=filter_value)
print('Available text2text Models:\n--------------------------------')
_ = [print(m) for m in text_generation_models]

Available text2text Models:
--------------------------------
huggingface-text2text-bart4csc-base-chinese
huggingface-text2text-bigscience-t0pp
huggingface-text2text-bigscience-t0pp-bnb-int8
huggingface-text2text-bigscience-t0pp-fp16
huggingface-text2text-flan-t5-base
huggingface-text2text-flan-t5-base-samsum
huggingface-text2text-flan-t5-large
huggingface-text2text-flan-t5-small
huggingface-text2text-flan-t5-xl
huggingface-text2text-flan-t5-xxl
huggingface-text2text-flan-t5-xxl-bnb-int8
huggingface-text2text-flan-t5-xxl-fp16
huggingface-text2text-flan-ul2-bf16
huggingface-text2text-pegasus-paraphrase
huggingface-text2text-qcpg-sentences
huggingface-text2text-t5-one-line-summary


In [10]:
model_id = text_generation_models[7]
model_version = '*'
print(f'The model that will be deployed is: {model_id}')

The model that will be deployed is: huggingface-text2text-flan-t5-small


### Deploy

In [11]:
endpoint_name = name_from_base(f"LLM-in-a-box-{model_id}")
print(f'Endpoint: {endpoint_name}')

Endpoint: LLM-in-a-box-huggingface-text2text-flan-2023-05-18-19-23-11-479


#### Collect Model Containers

In [12]:
instance_type = "ml.g5.2xlarge"

image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

model_data = model_uris.retrieve(
    model_id=model_id,
    model_version=model_version,
    model_scope="inference"
)

print(f'The image URI is:  {image_uri}')
print(f'The model data is: {model_data}')

The image URI is:  763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
The model data is: s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.0.1/infer-prepack-huggingface-text2text-flan-t5-small.tar.gz


#### Define Model

In [13]:
model = Model(
    image_uri=image_uri,
    model_data=model_data,
    role=role,
    predictor_cls=Predictor,
    name=endpoint_name,
    sagemaker_session=sagemaker_session,
    env={"TS_DEFAULT_WORKERS_PER_MODEL": "1"}
)

#### Deploy Model

In [14]:
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

-------!

#### Test that the model is correctly deployed

In [16]:
sagemaker = boto3.client('sagemaker-runtime', region_name=region)
input_question = 'Tell me the steps to make a pizza:'
payload = {
    "text_inputs": input_question,
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}

response = sagemaker.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(payload).encode('utf-8')
)
output_answer = json.loads(response['Body'].read().decode('utf-8'))["generated_texts"][0]
print(output_answer)

To make a pizza, first you should use 3 pizzas per day for 2 pizzas. To make a pizza, you will need 2 slices each. You don't need to use a heavy spout to scoop


## How to deploy 3rd-party Foundation Models
1. Gain access to the foundation models
    1. Go to the SageMaker Console
    2. There will be a tab for JumpStart > Foundation Models
    3. You must request access if you do not already have it
2. Select the Foundation you would like to deploy
3. Click `Subscribe` in the top-right corner
4. After completing, this will allow you to open a notebook that lets you deploy the model
5. Open the notebook
6. You run this notebook to deploy the model, the caveat is that you must have access to any instance you choose to run.
    1. For AI21 Summarization model, you can use something like: ml.g4dn.12xlarge
    2. For AI21 Grande Instruct, you can use: ml.g5.24xlarge
    3. For AI21 Jumbo Instruct, you can use: ml.g5.48xlarge
    4. These were tested to work as of 2023-05-16
    5. Collect these endpoint names and use them in the application_metadata JSON

# Streamlit UI

### Record any parameters that need to be passed to the Streamlit app
App Metadata Structure:
#### application_metadata
 - models: a dictionary that contains the model display name, SageMaker endpoint name, and the model type (Currently 'sm' or 'ai21')
   - name
   - endpoint
   - type
 - summary_model: the summary model endpoint name
 - region: the region (us-east-1 etc)
 - role: the permissions for the application. it should include (SageMaker, Textract, and Kendra access)
 - datastore: a dictionary that contains the bucket and folder prefix used to store document data
   - bucket
   - prefix
 - kendra: a dictionary that contains information on the Kendra index to be used when searching
   - index_id
   - index_name
   - index_description

In [7]:
application_metadata = {
    'models':[
        {'name':'FLAN-Small', 'endpoint':'LLM-in-a-box-huggingface-text2text-flan-2023-05-18-19-23-11-479', 'type':'sm'},
        {'name':'Super Fancy Model', 'endpoint':'', 'type':'ai21'}],
    'region':region,
    'role':role,
}
json.dump(application_metadata, open('application_metadata_qna.json', 'w'))

### Write the Streamlit app

In [10]:
%%writefile app_qna.py
import os
import ai21
import json
import boto3
import pandas as pd
import streamlit as st

from datetime import datetime
st.set_page_config(layout="wide")

APP_MD        = json.load(open('application_metadata_qna.json', 'r'))
MODELS        = {d['name']: d['endpoint'] for d in APP_MD['models']}
REGION        = APP_MD['region']
SAGEMAKER     = boto3.client('sagemaker-runtime', region_name=REGION)
CHAT_FILENAME = 'chat.csv'

def query_endpoint(endpoint_name, payload):
    """
    Query the endpoint and return the answer.
    
    Parameters
    ----------
    endpoint_name : str
        The name of the endpoint.
    payload : dict
        The payload to send to the endpoint.
    
    Returns
    -------
    str
        The answer from the endpoint.
    
    Raises
    ------
    ValueError
        If the endpoint is not found.
    
    Notes
    -----
    This function is a wrapper around the SageMaker Runtime API.
    
    References
    ----------
    https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMaker.Client.invoke_endpoint
    https://docs.aws.amazon.com/sagemaker/latest/dg/API_InvokeEndpoint.html
    https://docs.aws.amazon.com/sagemaker/latest/dg/API_InvokeEndpoint.html#SageMaker-Type-InvokeEndpoint-Body
    https://docs.aws.amazon.com/sagemaker/latest/dg/API_InvokeEndpoint.html#SageMaker-Type-InvokeEndpoint-ContentType
    """
    if 'huggingface' in endpoint_name:
        response = SAGEMAKER.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='application/json',
            Body=json.dumps(payload).encode('utf-8')
        )
        output_answer = json.loads(response['Body'].read().decode('utf-8'))["generated_texts"][0]
    else:
        response = ai21.Completion.execute(
            sm_endpoint=endpoint_name,
            prompt=payload['text_inputs'],
            maxTokens=payload['max_length'],
            temperature=payload['temperature'],
            stopSequences=['##'],
            numResults=1
        )
        output_answer = response['completions'][0]['data']['text']
    return str(output_answer)


def action_qna(params):
    """
    Generates the interface around asking questions of your LLM endpoint.
    
    Parameters
    ----------
    params : dict
        The parameters of the application.
    
    Returns
    -------
    None
    
    Notes
    -----
    This function is a wrapper around the Streamlit app.
    """
    st.title('Ask Questions of your Model')
    try:
        chat_df = pd.read_csv(CHAT_FILENAME)
    except:
        chat_df = pd.DataFrame([], columns=['timestamp', 'question', 'response'])
    
    input_question = st.text_input('**Please ask a question:**', '')
    if st.button('Send Question') and len(input_question) > 3:
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")
        context = '\n'.join(['Context: ' + str(r.question) + '##\n\n: ' + str(r.response) + '\n' for idx,r in chat_df.iloc[-10:].iterrows()])
        payload = {
            "text_inputs": input_question, #context + '\n' + input_question,
            "max_length": params['max_len'],
            "max_time": 50,
            "num_return_sequences": 1,
            "top_k": 50,
            "temperature": params['temp'],
            "top_p": params['top_p'],
            "do_sample": True,
        }
        output_answer = query_endpoint(params['endpoint'], payload)
        st.text_area('Response:', output_answer)
        chat_df.loc[len(chat_df.index)] = [timestamp, input_question, output_answer]
        chat_df.tail(10).to_csv(CHAT_FILENAME, index=False)
            
    st.subheader('Recent Questions:')
    for idx,row in chat_df.iloc[::-1].head(10).iterrows():
        st.write(f'**{row.timestamp}**')
        st.write(row.question)
        st.write(row.response)
        st.write('---')


def app_sidebar():
    """
    The sidebar of the Streamlit app.
    
    Returns
    -------
    dict
        The parameters of the application.
    
    Notes
    -----
    This function is a wrapper around the Streamlit sidebar.
    """
    with st.sidebar:
        st.write('## How to use:')
        description = """Welcome to our LLM tool extraction and query answering application. With this app, you can aske general question, 
        ask questions of a specific document, or intelligently search an internal document corpus. By selection the action you would like to perform,
         you can ask general questions, or questions of your document. Additionally, you can select the model you use, to perform real-world tests to determine model strengths and weakneses."""
        st.write(description)
        st.write('---')
        st.write('### User Preference')
        if st.button('Clear Context'):
            pd.DataFrame([], columns=['timestamp', 'question', 'response']).to_csv(CHAT_FILENAME, index=False)
        action_name = st.selectbox('Choose Activity', options=['Question/Answer'])
        model_name = st.selectbox('Select Model', options=MODELS.keys())
        max_len = st.slider('Max Length', min_value=50, max_value=500, value=150, step=10)
        top_p = st.slider('Top p', min_value=0., max_value=1., value=1., step=.01)
        temp = st.slider('Temperature', min_value=0.01, max_value=1., value=1., step=.01)
        st.write('---')
        st.write('## FAQ')
        st.write(f'**1. Where is the model stored?** \n\nThe current model is: `{model_name}` and is running within your account.')
        st.write(f'**2. Where is my data stored?**\n\nA limited context window is stored locally. This is retained as a csv file, but for production it would be prudent to use a more robust datastore (e.g.DynamoDB).')
        st.write('---')
        params = {'action_name':action_name,'endpoint':MODELS[model_name], 'max_len':max_len, 'top_p':top_p, 'temp':temp, 'model_name':model_name}
    return params


def main():
    params = app_sidebar()
    endpoint=params['endpoint']
    if params['action_name'] == 'Question/Answer':
        action_qna(params)
    else:
        raise ValueError('Invalid action name.')


if __name__ == '__main__':
    main()


Overwriting app_qna.py


# Start App

### Run Streamlit
To run the application:
1. Select File > New > Terminal
2. In the terminal, use the command: `streamlit run app_qna.py --server.runOnSave true`
   1. Note: ensure you have installed all required packages
3. If this is successful, you will be able to interact with the app by using the web address below
4. An important thing to note is that when you run the above command, you should see an output similar to below.
5. The port thats  displayed is the same port that MUST be used after the `proxy` folder below.
`
You can now view your Streamlit app in your browser.

  Network URL: http://###.###.###.###:8501\
  External URL: http://###.###.###.###:8501



#### Display Link to Application

In [19]:
print(f'http://{domain_id}.studio.{region}.sagemaker.aws/jupyter/default/proxy/8501/')

http://d-qxdwe39zkab0.studio.us-east-1.sagemaker.aws/jupyter/default/proxy/8501/
