# LLMs in a Box for Search
1. Create a SageMaker Studio Domain if you don't have one
2. Open SageMaker Studio under the user you plan to launch this applicatio
3. Either upload this notebook, or clone the repository: [repo](https://github.com/chaeAclark/literate-eureka.git)
4. Open the notebook `LLM and Kendra in a box.ipynb`
5. You can run the entire notebook by clicking Run > Run All Cells
6. Alternatively, you can run the cells individually

### Terminal Installation
You need to ensure you have installed all needed packages in the terminal you are using.
1. boto3
2. streamlit
3. pdf2image
4. ai21[SM]
5. Pillow
6. pandas

In [4]:
%%writefile requirements.txt
boto3
streamlit
pdf2image
ai21[SM]
Pillow
pandas

Overwriting requirements.txt


# Install

#### Update SageMaker

In [2]:
!pip install -U sagemaker --quiet

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# Imports

### General Libraries

In [8]:
import os
import json
import boto3

### SageMaker Libraries

In [9]:
import sagemaker as sm

from sagemaker import image_uris
from sagemaker import model_uris
from sagemaker import script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

### Deploy and Directory Setup

In [10]:
def get_sagemaker_session(local_download_dir) -> sm.Session:
    """
    # Create a SageMaker Session
    # This function is used to create a SageMaker Session object.
    # The SageMaker Session object is used to create a SageMaker Endpoint,
    # SageMaker Model, and SageMaker Endpoint Config.
    """
    sagemaker_client = boto3.client(service_name="sagemaker", region_name=boto3.Session().region_name)
    session_settings = sm.session_settings.SessionSettings(local_download_dir=local_download_dir)
    session = sm.session.Session(sagemaker_client=sagemaker_client, settings=session_settings)
    return session

In [11]:
model_path = './download_dir'
if not os.path.exists(model_path):
    os.mkdir(model_path)

### SageMaker Configurations

In [12]:
role               = sm.get_execution_role()
sagemaker_session  = get_sagemaker_session(model_path) # sm.session.Session()
region             = sagemaker_session._region_name

# These are needed to show where the streamlit app is hosted
sagemaker_metadata = json.load(open('/opt/ml/metadata/resource-metadata.json', 'r'))
domain_id          = sagemaker_metadata['DomainId']
resource_name      = sagemaker_metadata['ResourceName']

### Boto Configurations

In [13]:
index_name = 'DomainSpecificSearch'
index_id   = '2f16f2e2-f745-4bd6-bc0c-5fec8947e9c3'

# Model
The following section will deploy the JumpStart model `flan-###`. There are additional steps required if launching 3rd-party proprietary models. These steps are detailed in another section.

### Select Model

In [9]:
filter_value = "task == text2text"
text_generation_models = list_jumpstart_models(filter=filter_value)
print('Available text2text Models:\n--------------------------------')
_ = [print(m) for m in text_generation_models]

Available text2text Models:
--------------------------------
huggingface-text2text-bart4csc-base-chinese
huggingface-text2text-bigscience-t0pp
huggingface-text2text-bigscience-t0pp-bnb-int8
huggingface-text2text-bigscience-t0pp-fp16
huggingface-text2text-flan-t5-base
huggingface-text2text-flan-t5-base-samsum
huggingface-text2text-flan-t5-large
huggingface-text2text-flan-t5-small
huggingface-text2text-flan-t5-xl
huggingface-text2text-flan-t5-xxl
huggingface-text2text-flan-t5-xxl-bnb-int8
huggingface-text2text-flan-t5-xxl-fp16
huggingface-text2text-flan-ul2-bf16
huggingface-text2text-pegasus-paraphrase
huggingface-text2text-qcpg-sentences
huggingface-text2text-t5-one-line-summary


In [10]:
model_id = text_generation_models[7]
model_version = '*'
print(f'The model that will be deployed is: {model_id}')

The model that will be deployed is: huggingface-text2text-flan-t5-small


### Deploy

In [11]:
endpoint_name = name_from_base(f"LLM-in-a-box-{model_id}")
print(f'Endpoint: {endpoint_name}')

Endpoint: LLM-in-a-box-huggingface-text2text-flan-2023-05-18-22-09-56-127


#### Collect Model Containers

In [12]:
instance_type = "ml.g5.2xlarge"

image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

model_data = model_uris.retrieve(
    model_id=model_id,
    model_version=model_version,
    model_scope="inference"
)

print(f'The image URI is:  {image_uri}')
print(f'The model data is: {model_data}')

The image URI is:  763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
The model data is: s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.0.1/infer-prepack-huggingface-text2text-flan-t5-small.tar.gz


#### Define Model

In [13]:
model = Model(
    image_uri=image_uri,
    model_data=model_data,
    role=role,
    predictor_cls=Predictor,
    name=endpoint_name,
    sagemaker_session=sagemaker_session,
    env={"TS_DEFAULT_WORKERS_PER_MODEL": "1"}
)

#### Deploy Model


In [14]:
if False:
    model_predictor = model.deploy(
        initial_instance_count=1,
        instance_type=instance_type,
        predictor_cls=Predictor,
        endpoint_name=endpoint_name,
    )

#### Test that the model is deployed correctly

In [15]:
if False:
    sagemaker_client = boto3.client('sagemaker-runtime', region_name=region)
    input_question = 'Tell me the steps to make a pizza:'
    payload = {
        "text_inputs": input_question,
        "max_length": 50,
        "max_time": 50,
        "num_return_sequences": 1,
        "top_k": 50,
        "top_p": 0.95,
        "do_sample": True,
    }

    response = sagemaker.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=json.dumps(payload).encode('utf-8')
    )
    output_answer = json.loads(response['Body'].read().decode('utf-8'))["generated_texts"][0]
    print(output_answer)

## How to deploy 3rd-party Foundation Models
1. Gain access to the foundation models
    1. Go to the SageMaker Console
    2. There will be a tab for JumpStart > Foundation Models
    3. You must request access if you do not already have it
2. Select the Foundation you would like to deploy
3. Click `Subscribe` in the top-right corner
4. After completing, this will allow you to open a notebook that lets you deploy the model
5. Open the notebook
6. You run this notebook to deploy the model, the caveat is that you must have access to any instance you choose to run.
    1. For AI21 Summarization model, you can use something like: ml.g4dn.12xlarge
    2. For AI21 Grande Instruct, you can use: ml.g5.24xlarge
    3. For AI21 Jumbo Instruct, you can use: ml.g5.48xlarge
    4. These were tested to work as of 2023-05-16
    5. Collect these endpoint names and use them in the application_metadata JSON


# Streamlit UI

### Record any parameters that need to be passed to the Streamlit app
App Metadata Structure:
#### application_metadata
 - models: a dictionary that contains the model display name, SageMaker endpoint name, and the model type (Currently 'sm' or 'ai21')
   - name
   - endpoint
   - type
 - summary_model: the summary model endpoint name
 - region: the region (us-east-1 etc)
 - role: the permissions for the application. it should include (SageMaker, Textract, and Kendra access)
 - datastore: a dictionary that contains the bucket and folder prefix used to store document data
   - bucket
   - prefix
 - kendra: a dictionary that contains information on the Kendra index to be used when searching
   - index_id
   - index_name
   - index_description

In [17]:
application_metadata = {
    'models':[
        {'name':'FLAN-XXL', 'endpoint':'LLM-in-a-box-huggingface-text2text-flan-2023-05-18-19-23-11-479', 'type':'sm'},
        {'name':'Super Special Model', 'endpoint':'endpoint_name', 'type':'ai21'}],
    'summary_model':'summarize',
    'region':region,
    'role':role,
    'kendra':
        {'index_id':index_id, 'index_name':index_name, 'index_description':''},
}
json.dump(application_metadata, open('application_metadata_search.json', 'w'))

### Write the Streamlit app

In [18]:
%%writefile app_search.py
import os
import time
import ai21
import json
import boto3
import pandas as pd
import streamlit as st

from io import BytesIO
from collections import deque
from datetime import datetime
st.set_page_config(layout="wide")

APP_MD    = json.load(open('application_metadata_search.json', 'r'))
MODELS    = {d['name']: d['endpoint'] for d in APP_MD['models']}
MODEL_SUM = APP_MD['summary_model']
REGION    = APP_MD['region']
KENDRA_ID = APP_MD['kendra']['index_id']

KENDRA        = boto3.client('kendra', region_name=REGION)
SAGEMAKER     = boto3.client('sagemaker-runtime', region_name=REGION)


def query_endpoint(endpoint_name, payload):
    if 'huggingface' in endpoint_name:
        response = SAGEMAKER.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='application/json',
            Body=json.dumps(payload).encode('utf-8')
        )
        output_answer = json.loads(response['Body'].read().decode('utf-8'))["generated_texts"][0]
    else:
        response = ai21.Completion.execute(
            sm_endpoint=endpoint_name,
            prompt=payload['text_inputs'],
            maxTokens=payload['max_length'],
            temperature=payload['temperature'],
            stopSequences=['##'],
            numResults=1
        )
        output_answer = response['completions'][0]['data']['text']
    return str(output_answer)


def query_index(query):
    response = KENDRA.query(
        QueryText = query,
        IndexId = KENDRA_ID
    )
    return response


def summarize_context(context):
    try:
        response = ai21.Summarize.execute(
            source=context,
            sourceType="TEXT",
            sm_endpoint=MODEL_SUM
        )
        return response.summary
    except:
        return 'No summarization endpoint connected'


def action_search(params):
    st.title('Search you Internal Documents')
    col1, col2 = st.columns(2)
    with col1:
        query = st.text_input('**Enter a Search Query:**', '')
        button_search = st.button('Search')
        if button_search:
            response = query_index(query)
            for sr in response['ResultItems']:
                st.write(f"**[{sr['ScoreAttributes']['ScoreConfidence']}]** | {sr['DocumentTitle']['Text']} [Link to Source Document]({sr['DocumentURI']})")
                st.write(sr['DocumentExcerpt']['Text'])
                st.write('---')
            context = '\n'.join([sr['DocumentExcerpt']['Text'] for sr in response['ResultItems']])
            with open('context.txt', 'w') as fp:
                fp.write(context)
    with col2:
        if button_search:
            time.sleep(2)
            summary = summarize_context(context)
            st.text_area('**Search Result Summary:**', summary)
        input_question = st.text_input('**Please ask a question of the search results:**', '')
        if st.button('Send Question') and len(input_question) > 3:
            with open('context.txt', 'r') as fp:
                context = fp.read()
            payload = {
                "text_inputs": context + '##\n' + input_question,
                "max_length": params['max_len'],
                "max_time": 50,
                "num_return_sequences": 1,
                "top_k": 50,
                "temperature": params['temp'],
                "top_p": params['top_p'],
                "do_sample": True,
            }
            output_answer = query_endpoint(params['endpoint'], payload)
            st.text_area('Response:', output_answer)


def app_sidebar():
    with st.sidebar:
        st.write('## How to use:')
        description = """Welcome to our LLM tool extraction and query answering application. With this app, you can aske general question, 
        ask questions of a specific document, or intelligently search an internal document corpus. By selection the action you would like to perform,
         you can ask general questions, or questions of your document. Additionally, you can select the model you use, to perform real-world tests to determine model strengths and weakneses."""
        st.write(description)
        st.write('---')
        st.write('### User Preference')
        action_name = st.selectbox('Choose Activity', options=['Corpus Search'])
        model_name = st.selectbox('Select Model', options=MODELS.keys())
        max_len = st.slider('Max Length', min_value=50, max_value=500, value=150, step=10)
        top_p = st.slider('Top p', min_value=0., max_value=1., value=1., step=.01)
        temp = st.slider('Temperature', min_value=0.01, max_value=1., value=1., step=.01)
        st.write('---')
        st.write('## FAQ')
        st.write(f'**1. Where is the model stored?** \n\nThe current model is: `{model_name}` and is running within your account.')
        st.write(f'**2. Where is my data stored?**\n\nCurrently the queries you make to the endpoint are not stored, but you can enaable this by capturing data from your endpoint.')
        st.write('---')
        params = {'action_name':action_name,'endpoint':MODELS[model_name], 'max_len':max_len, 'top_p':top_p, 'temp':temp, 'model_name':model_name}
        return params


def main():
    params = app_sidebar()
    endpoint=params['endpoint']
    if params['action_name'] == 'Corpus Search':
        action_search(params)
    else:
        raise ValueError('Invalid Action')


if __name__ == '__main__':
    main()


Overwriting app_search.py


## Start App

### Run Streamlit
To run the application:
1. Select File > New > Terminal
2. In the terminal, use the command: `streamlit run app_search.py --server.runOnSave true`
   1. Note: ensure you have installed all required packages
3. If this is successful, you will be able to interact with the app by using the web address below
4. An important thing to note is that when you run the above command, you should see an output similar to below.
5. The port thats  displayed is the same port that MUST be used after the `proxy` folder below.
`
You can now view your Streamlit app in your browser.

  Network URL: http://###.###.###.###:8501\
  External URL: http://###.###.###.###:8501



#### Display Link to Application

In [14]:
print(f'http://{domain_id}.studio.{region}.sagemaker.aws/jupyter/default/proxy/8501/')

http://d-qxdwe39zkab0.studio.us-east-1.sagemaker.aws/jupyter/default/proxy/8501/
