# LLMs in a Box for Search
1. Create a SageMaker Studio Domain if you don't have one
2. Open SageMaker Studio under the user you plan to launch this applicatio
3. Either upload this notebook, or clone the repository: [repo](https://github.com/chaeAclark/literate-eureka.git)
4. Open the notebook `LLM and Kendra in a box.ipynb`
5. You can run the entire notebook by clicking Run > Run All Cells
6. Alternatively, you can run the cells individually

### Terminal Installation
You need to ensure you have installed all needed packages in the terminal you are using.
1. boto3
2. streamlit
3. pdf2image
4. ai21[SM]
5. Pillow
6. pandas

In [2]:
%%writefile requirements.txt
boto3
streamlit
pdf2image
ai21[SM]
Pillow
pandas

Overwriting requirements.txt


# Install

#### Update SageMaker

In [3]:
!pip install -U sagemaker --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.27.142 requires PyYAML<5.5,>=3.10, but you have pyyaml 6.0.1 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
!pip install botocore-1.29.142-py3-none-any.whl
!pip install boto3-1.26.142-py3-none-any.whl
!pip install awscli-1.27.142-py3-none-any.whl

Processing ./botocore-1.29.142-py3-none-any.whl
botocore is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Processing ./boto3-1.26.142-py3-none-any.whl
boto3 is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Processing ./awscli-1.27.142-py3-none-any.whl
Collecting PyYAML<5.5,>=3.10 (from awscli==1.27.142)
  Using cached PyYAML-5.4

In [5]:
#!pip install --upgrade pip

In [6]:
# Configure your AWS credentials using the aws configure command, or pass them to the
# boto3 client
import boto3
bedrock = boto3.client('bedrock' , 'us-east-1', endpoint_url='https://bedrock.us-east-1.amazonaws.com')
bedrock.list_foundation_models()
 

{'ResponseMetadata': {'RequestId': '6b869bfb-a2b5-4f03-aebc-8247fdf24ff3',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Sun, 23 Jul 2023 20:22:20 GMT',
   'content-type': 'application/json',
   'content-length': '861',
   'connection': 'keep-alive',
   'x-amzn-requestid': '6b869bfb-a2b5-4f03-aebc-8247fdf24ff3'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-e1t-medium',
   'modelId': 'amazon.titan-e1t-medium'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/stability.stable-diffusion-xl',
   'modelId': 'stability.stable-diffusion-xl'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-grande-instruct',
   'modelId': 'ai21.j2-grande-instruct'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-jumbo-instruct',
   'modelId': 'ai21.j2-jumbo-in

In [7]:
!pip install ai21

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# Imports

### General Libraries

In [8]:
import os
import json
import boto3
import ai21
import pandas as pd

### SageMaker Libraries

In [9]:
import sagemaker as sm

from sagemaker import image_uris
from sagemaker import model_uris
from sagemaker import script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

### Deploy and Directory Setup

In [10]:
def get_sagemaker_session(local_download_dir) -> sm.Session:
    """
    # Create a SageMaker Session
    # This function is used to create a SageMaker Session object.
    # The SageMaker Session object is used to create a SageMaker Endpoint,
    # SageMaker Model, and SageMaker Endpoint Config.
    """
    sagemaker_client = boto3.client(service_name="sagemaker", region_name=boto3.Session().region_name)
    session_settings = sm.session_settings.SessionSettings(local_download_dir=local_download_dir)
    session = sm.session.Session(sagemaker_client=sagemaker_client, settings=session_settings)
    return session

In [11]:
model_path = './download_dir'
if not os.path.exists(model_path):
    os.mkdir(model_path)

In [12]:
chat_df = pd.DataFrame([], columns=['timestamp', 'question', 'response'])
chat_df.to_csv('chat.csv',index=False)

### SageMaker Configurations

In [13]:
role               = sm.get_execution_role()
sagemaker_session  = get_sagemaker_session(model_path) # sm.session.Session()
region             = sagemaker_session._region_name

# These are needed to show where the streamlit app is hosted
sagemaker_metadata = json.load(open('/opt/ml/metadata/resource-metadata.json', 'r'))
domain_id          = sagemaker_metadata['DomainId']
resource_name      = sagemaker_metadata['ResourceName']

In [14]:
bedrock = boto3.client(service_name='bedrock',region_name='us-east-1',endpoint_url='https://bedrock.us-east-1.amazonaws.com')

### Boto Configurations

In [15]:
# index_name = 's3-sagemaker-faq'#'kgnraham-faq'
# index_id   = '53b9aca5-b61c-4b59-92fb-ff8c68e4ca39'#'3b1f740a-7360-4746-9565-998e0580ea57'
index_name = 'kgnraham-faq'
index_id   = '3b1f740a-7360-4746-9565-998e0580ea57'

bucket     = 'sagemaker-us-east-1-715253196401'
prefix     = 'textract'

# Model
The following section will deploy the JumpStart model `flan-###`. There are additional steps required if launching 3rd-party proprietary models. These steps are detailed in another section.

### Select Model

In [16]:
filter_value = "task == text2text"
text_generation_models = list_jumpstart_models(filter=filter_value)
print('Available text2text Models:\n--------------------------------')
_ = [print(m) for m in text_generation_models]

Available text2text Models:
--------------------------------
huggingface-text2text-bart4csc-base-chinese
huggingface-text2text-bigscience-t0pp
huggingface-text2text-bigscience-t0pp-bnb-int8
huggingface-text2text-bigscience-t0pp-fp16
huggingface-text2text-flan-t5-base
huggingface-text2text-flan-t5-base-samsum
huggingface-text2text-flan-t5-large
huggingface-text2text-flan-t5-small
huggingface-text2text-flan-t5-xl
huggingface-text2text-flan-t5-xxl
huggingface-text2text-flan-t5-xxl-bnb-int8
huggingface-text2text-flan-t5-xxl-fp16
huggingface-text2text-flan-ul2-bf16
huggingface-text2text-pegasus-paraphrase
huggingface-text2text-qcpg-sentences
huggingface-text2text-t5-one-line-summary


In [17]:
model_id = text_generation_models[7]
model_version = '*'
print(f'The model that will be deployed is: {model_id}')

The model that will be deployed is: huggingface-text2text-flan-t5-small


### Deploy

In [None]:
endpoint_name = name_from_base(f"LLM-in-a-box-{model_id}")
print(f'Endpoint: {endpoint_name}')

#### Collect Model Containers

In [None]:
instance_type = "ml.g5.2xlarge"

image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

model_data = model_uris.retrieve(
    model_id=model_id,
    model_version=model_version,
    model_scope="inference"
)

print(f'The image URI is:  {image_uri}')
print(f'The model data is: {model_data}')

#### Define Model

In [None]:
model = Model(
    image_uri=image_uri,
    model_data=model_data,
    role=role,
    predictor_cls=Predictor,
    name=endpoint_name,
    sagemaker_session=sagemaker_session,
    env={"TS_DEFAULT_WORKERS_PER_MODEL": "1"}
)

#### Deploy Model


In [None]:
# model_predictor = model.deploy(
#     initial_instance_count=1,
#     instance_type=instance_type,
#     predictor_cls=Predictor,
#     endpoint_name=endpoint_name,
# )

#### Test that the model is deployed correctly

In [None]:
sagemaker_client = boto3.client('sagemaker-runtime', region_name=region)
input_question = 'Tell me the steps to make a pizza:'
payload = {
    "text_inputs": input_question,
    "max_length": 50,
    "max_time": 50,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}

response = sagemaker.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(payload).encode('utf-8')
)
output_answer = json.loads(response['Body'].read().decode('utf-8'))["generated_texts"][0]
print(output_answer)

## How to deploy 3rd-party Foundation Models
1. Gain access to the foundation models
    1. Go to the SageMaker Console
    2. There will be a tab for JumpStart > Foundation Models
    3. You must request access if you do not already have it
2. Select the Foundation you would like to deploy
3. Click `Subscribe` in the top-right corner
4. After completing, this will allow you to open a notebook that lets you deploy the model
5. Open the notebook
6. You run this notebook to deploy the model, the caveat is that you must have access to any instance you choose to run.
    1. For AI21 Summarization model, you can use something like: ml.g4dn.12xlarge
    2. For AI21 Grande Instruct, you can use: ml.g5.24xlarge
    3. For AI21 Jumbo Instruct, you can use: ml.g5.48xlarge
    4. These were tested to work as of 2023-05-16
    5. Collect these endpoint names and use them in the application_metadata JSON


# Streamlit UI

### Record any parameters that need to be passed to the Streamlit app
App Metadata Structure:
#### application_metadata
 - models: a dictionary that contains the model display name, SageMaker endpoint name, and the model type (Currently 'sm' or 'ai21')
   - name
   - endpoint
   - type
 - summary_model: the summary model endpoint name
 - region: the region (us-east-1 etc)
 - role: the permissions for the application. it should include (SageMaker, Textract, and Kendra access)
 - datastore: a dictionary that contains the bucket and folder prefix used to store document data
   - bucket
   - prefix
 - kendra: a dictionary that contains information on the Kendra index to be used when searching
   - index_id
   - index_name
   - index_description

In [None]:
region

In [18]:
application_metadata = {
    'models':[
        #{'name':'FLAN-XL', 'endpoint':'jumpstart-example-huggingface-text2text-2023-06-10-02-04-21-787', 'type':'sm'},
        # {'name':'Jurassic-2-Jumbo', 'endpoint':'j2-jumbo-instruct-arcesium', 'type':'ai21'},
        {'name':'ANTROPIC-CLAUDE-v1', 'endpoint':'anthropic.claude-v1', 'type':'anthropic'},
        {'name':'AMAZON-TITAN-L', 'endpoint':'amazon.titan-tg1-large', 'type':'amazon'},
        {'name':'AI21-JUMBO-GRANDE-INSTRUCT', 'endpoint':'ai21.j2-grande-instruct', 'type':'ai21'}
    ],
    #'summary_model':'summarize',
    'region':region,
    'role':role,
    'datastore':
        {'bucket':bucket, 'prefix':prefix},
    'kendra':
        {'index_id':index_id, 'index_name':index_name, 'index_description':''},
}
json.dump(application_metadata, open('application_metadata.json', 'w'))

### Write the Streamlit app

In [23]:
%%writefile app_faq.py

import os
import time
import ai21
import json
import boto3
import pandas as pd
import streamlit as st

from PIL import Image
from io import BytesIO
from collections import deque
from datetime import datetime
#from pdf2image import convert_from_bytes
st.set_page_config(layout="wide")

bedrock = boto3.client('bedrock', region_name='us-east-1', endpoint_url='https://bedrock.us-east-1.amazonaws.com')

APP_MD    = json.load(open('application_metadata.json', 'r'))
MODELS    = {d['name']: d['endpoint'] for d in APP_MD['models']}
#MODEL_SUM = APP_MD['summary_model']
REGION    = APP_MD['region']
BUCKET    = APP_MD['datastore']['bucket']
PREFIX    = APP_MD['datastore']['prefix']
KENDRA_ID = APP_MD['kendra']['index_id']
#CONTEXT = deque([''], maxlen=10)

S3            = boto3.client('s3', region_name=REGION)
TEXTRACT      = boto3.client('textract', region_name=REGION)
KENDRA        = boto3.client('kendra', region_name=REGION)
SAGEMAKER     = boto3.client('sagemaker-runtime', region_name=REGION)

CHAT_FILENAME = 'chat.csv'
params = {'file':'','action_name':'','endpoint':'', 'max_len':0, 'top_p':0, 'temp':0, 'model_name':''}


def query_endpoint(endpoint_name, prompt_data,params):
    accept='application/json'
    contentType='application/json'
    
    # if 'huggingface' in endpoint_name:
    #     response = SAGEMAKER.invoke_endpoint(
    #         EndpointName=endpoint_name,
    #         ContentType=contentType,
    #         Body=json.dumps(payload).encode('utf-8')
    #     )
    #     output_answer = json.loads(response['Body'].read().decode('utf-8'))["generated_texts"][0]
    if 'claude' in endpoint_name:
        body=json.dumps({"prompt": prompt_data, "max_tokens_to_sample":params['max_len'],"temperature":params['temp'],"top_p":params['top_p']})
        response = bedrock.invoke_model(
            body=body, 
            modelId='anthropic.claude-instant-v1', 
            accept=accept, 
            contentType=contentType
        )
        #print(response)
        output_answer = json.loads(response.get('body').read()).get('completion')
    elif 'titan' in endpoint_name:
        #print(json.dumps(prompt_data))
        body=json.dumps({"inputText": prompt_data,"textGenerationConfig": {
                          "maxTokenCount": params['max_len'],
                          "temperature":params['temp'],
                          "topP":params['top_p']
                         }})
        response = bedrock.invoke_model(
            body=body, 
            modelId='amazon.titan-tg1-large', 
            accept=accept, 
            contentType=contentType
        )
        output_answer = json.loads(response.get('body').read()).get('results')[0].get('outputText')
    elif 'j2' in endpoint_name:
        body = json.dumps({"prompt": prompt_data,"maxTokens":params['max_len'],"temperature":params['temp'],"topP":params['top_p']})
        print(json.dumps(prompt_data))
        response = bedrock.invoke_model(
            body=body, 
            modelId= 'ai21.j2-grande-instruct', # change this to use a different version from the model provider, 
            accept=accept, 
            contentType=contentType)
        output_answer = json.loads(response.get('body').read()).get('completions')[0].get('data').get('text')
    
    #print(output_answer)
    return str(output_answer)

def query_index(query):
    response = KENDRA.query(
        QueryText = query,
        IndexId = KENDRA_ID
        
    )
    return response


def extract_text(bucket, filepath):
    response = TEXTRACT.start_document_text_detection(DocumentLocation={'S3Object': {'Bucket':bucket, 'Name':filepath}})
    text = TEXTRACT.get_document_text_detection(JobId=response['JobId'])
    i = 0
    while text['JobStatus'] != 'SUCCEEDED':
        time.sleep(5)
        i += 1
        text = TEXTRACT.get_document_text_detection(JobId=response['JobId'])
        if i >= 10:
            text = ''
            break
    text = '\n'.join([t['Text'] for t in text['Blocks'] if t['BlockType']=='LINE'])
    return text


def load_document(file_bytes):
    # try:
    #     images = convert_from_bytes(file_bytes)
    #     image_page_1 = images[0].convert('RGB')
    #     st.image(image_page_1)
    # except:
    #     st.write('Cannot display image. Ensure that you have poppler-utils installed.')
    
    with open('doc.pdf', 'wb') as fp:
        fp.write(file_bytes)
    with open('doc.pdf', 'rb') as fp:
        S3.upload_fileobj(fp, BUCKET, PREFIX+'/doc.pdf')
    time.sleep(2)
    text = extract_text(BUCKET, PREFIX+'/doc.pdf')
    return text


def summarize_context(context, params):
    try:
        prompt_data ="""'"""+context+"""\n"""+"summarize the context"+"""'"""
        output_summary = query_endpoint(params['endpoint'],prompt_data, params)
        return output_summary
            
    except:
        return 'No summarization endpoint connected'

def action_qna(params):
    st.title('Ask Questions of your Model')
    try:
        chat_df = pd.read_csv(CHAT_FILENAME)
        
    except:
        chat_df = pd.DataFrame([], columns=['timestamp', 'question', 'response'])
    kendra_links = []
    
    input_question = st.text_input('**Please ask a question:**', '')
    if st.button('Send Question') and len(input_question) > 3:
        response = query_index(input_question)
        #print("response:",response['ResultItems'])
        for sr in response['ResultItems']:
            # kendra_links.append(sr['DocumentURI'])
            if sr['ScoreAttributes']['ScoreConfidence'] == 'HIGH':
                kendra_links.append(sr['DocumentURI'])
                # st.write(f"[Link to Source Document]({sr['DocumentURI']})")
                # st.write(f"**[{sr['ScoreAttributes']['ScoreConfidence']}]** | {sr['DocumentTitle']['Text']} [Link to Source Document]({sr['DocumentURI']})")
                # st.write(sr['DocumentExcerpt']['Text'])
                # # st.write('---')
        print("Kendra Links:",kendra_links)        
        # kendra_links = list(set(kendra_links))
        kendra_context = '\n'.join([sr['DocumentTitle']['Text']  for sr in response['ResultItems'] if sr['ScoreAttributes']['ScoreConfidence'] == 'HIGH'])
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")
        # context = '\n'.join(['Context: ' + str(r.question) + '\nResponse: ' + str(r.response) + '\n' for idx,r in chat_df.iloc[-1:].iterrows()])
        # context = context + kendra_context
        print("kendra context:",kendra_context)
        #payload = {
        #     "text_inputs": context + '\n' + input_question, #input_question,
        #     "max_length": params['max_len'],
        #     "max_time": 50,
        #     "num_return_sequences": 1,
        #     "top_k": 50,
        #     "top_p": params['top_p'],
        #     "do_sample": True,
        # }
        prompt_data ="""'"""+'Read the following passage and answer the questions that follow:\n'+kendra_context+"""\n"""+'Questions:'+input_question+"""\n"""+'Answer:'+"""'"""


        output_answer = query_endpoint(params['endpoint'], prompt_data, params)
        st.text_area('Response:', output_answer)
        for each_link in kendra_links[0:1]:   #        
            st.write(f"[Link to Source Document]({each_link})") 
        chat_df.loc[len(chat_df.index)] = [timestamp, input_question, output_answer]
        chat_df.tail(5).to_csv(CHAT_FILENAME, index=False)
                    
    st.subheader('Recent Questions:')
    for idx,row in chat_df.iloc[::-1].head(5).iterrows():
        st.write(f'**{row.timestamp}**')
        st.write(row.question)
        st.write(row.response)
        st.write('---')


# def action_search(params):
#     st.title('Ask Questions of your Document')
#     #col2 = st.columns(1)
#     #with col2:
#     input_question = st.text_input('**Please ask a question of a loaded document:**', '')
#     if st.button('Send Question') and len(input_question) > 3:
#         # LLM 
#         payload = {
#             "text_inputs": input_question,
#             #"max_length": params['max_len'],
#             "max_time": 50,
#             #"maxTokens": params['max_len'],
#             "num_return_sequences": 1,
#             "top_k": 50,
#             "temperature":params['temp'],
#             "top_p": params['top_p'],
#             "do_sample": True,
#         }
#         if params["model_name"] == "Bedrock Titan Model":
#                 output_answer = query_bedrock_endpoint(payload)

#         if "FLAN" in params["model_name"]:
#             #del payload['maxTokens']
#             payload['max_length'] = params['max_len']
#             output_answer = query_endpoint(params['endpoint'], payload)


#         elif "Jumbo" in params["model_name"]:
#             #del payload['max_length']
#             payload['maxTokens'] = params['max_len']
#             output_answer = query_endpoint(params['endpoint'], payload)
#         st.text_area('Response:', output_answer,height = 400)


def action_doc(params):
    st.title('Ask Questions of your Document')
    col1, col2 = st.columns(2)
    with col1:
        file = st.file_uploader('Upload a PDF file', type=['pdf'])
        if file is not None:
            context = load_document(file.read())
            if st.button('Summarize'):
                st.write('**Summary:**')
                st.write(summarize_context(context, params))
    with col2:
        input_question = st.text_input('**Please ask a question of a loaded document:**', '')
        if st.button('Send Question') and len(input_question) > 3:
            prompt_data ="""'"""+'Read the following passage and answer the questions that follow:\n'+context+"""\n"""+'Questions:'+input_question+"""\n"""+'Answer:'+"""'"""
            output_answer = query_endpoint(params['endpoint'],prompt_data, params)
            st.text_area('Response:', output_answer)


def app_sidebar():
    with st.sidebar:
        st.write('## How to use:')
        description = """Welcome to our LLM tool extraction and query answering application. With this app, you can aske general question, 
        ask questions of a specific document, or intelligently search an internal document corpus. By selection the action you would like to perform,
         you can ask general questions, or questions of your document. Additionally, you can select the model you use, to perform real-world tests to determine model strengths and weakneses."""
        st.write(description)
        st.write('---')
        st.write('### User Preference')
        if st.button('Clear Context'):
            pd.DataFrame([], columns=['timestamp', 'question', 'response']).to_csv(CHAT_FILENAME, index=False)
        action_name = st.selectbox('Choose Activity', options=['Question/Answer', 'Document Query' ]) #'Corpus Search',
        # if action_name == 'Corpus Search':
        #     while file is not None:
        #         file = st.file_uploader('Upload a PDF file', type=['pdf'])
        model_name = st.selectbox('Select Model', options=MODELS.keys())
        max_len = st.slider('Max Length', min_value=50, max_value=1500, value=150, step=10)
        top_p = st.slider('Top p', min_value=0., max_value=1., value=1., step=.01)
        temp = st.slider('Temperature', min_value=0.01, max_value=1., value=1., step=.01)
        st.write('---')
        st.write('## FAQ')
        st.write(f'**1. Where is the model stored?** \n\nThe current model is: `{model_name}` and is running within your account.')
        st.write(f'**2. Where is my data stored?**\n\n. Currently the queries you make to the endpoint are not stored, but you can enaable this by capturing data from your endpoint.')
        st.write('---')
        params['action_name']=action_name
        params['endpoint']=MODELS[model_name]
        params['max_len']=max_len
        params['top_p']=top_p
        params['temp']=temp
        params['model_name']=model_name
       
        # params = {'file':'','action_name':action_name,'endpoint':MODELS[model_name], 'max_len':max_len, 'top_p':top_p, 'temp':temp, 'model_name':model_name}
        return params


def main():
    params = app_sidebar()

    endpoint=params['endpoint']
    # if params['action_name'] == 'Corpus Search':
    #     params = action_search(params)
    if params['action_name'] == 'Question/Answer':
        params = action_qna(params)
    elif params['action_name'] == 'Document Query':
        params = action_doc(params)
    else:
        raise ValueError('Invalid action name.')


if __name__ == '__main__':
    main()



Overwriting app_faq.py


## Start App

### Run Streamlit
To run the application:
1. Select File > New > Terminal
2. In the terminal, use the command: `streamlit run app_faq.py --server.runOnSave true`
   1. Note: ensure you have installed all required packages
3. If this is successful, you will be able to interact with the app by using the web address below
4. An important thing to note is that when you run the above command, you should see an output similar to below.
5. The port thats  displayed is the same port that MUST be used after the `proxy` folder below.
`
You can now view your Streamlit app in your browser.

  Network URL: http://###.###.###.###:8501\
  External URL: http://###.###.###.###:8501



#### Display Link to Application

In [24]:
print(f'http://{domain_id}.studio.{region}.sagemaker.aws/jupyter/default/proxy/8501/')

http://d-smgyfkpaofzr.studio.us-east-1.sagemaker.aws/jupyter/default/proxy/8501/


In [None]:
AI21 Jurassic-2 Jumbo

In [None]:
!pip install ai21

In [None]:
!pip install streamlit

In [None]:
!pip install boto3

In [None]:
context = "This is my first book in english from shakespear, The Murder Myster"
input_question = "who is the author of the book?"
prompt_data ="""'"""+'Read the following passage and answer the questions that follow:\n'+context+"""\n"""+'Questions:'+input_question+"""\n"""+'Answer:'+"""'"""
print(prompt_data)

In [None]:
original_data = "s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/"
!mkdir -p faq_data
!aws s3 cp --recursive $original_data faq_data

In [None]:
response = {'inputTextTokenCount': 24, 'results': [{'tokenCount': 17, 'outputText': '\n\nTextract is a software product that uses AI to extract data from documents.', 'completionReason': None}]}

In [None]:
response.get('results')

In [None]:
import re

url_list = ['https://aws.amazon.com/jp/sagemaker/?nc=sn&loc=0','https://aws.amazon.com/jp/sagemaker/',
           'https://aws.amazon.com/faqs/','https://aws.amazon.com/jp/kendra/faqs/',
           'https://aws.amazon.com/kendra/faqs/?nc1=h_ls']

re.findall(r'https://aws.amazon.com/[a-z]{2}/*/faqs/?nc1=h_ls',url_list)