# NLP Applications with Amazon Bedrock
This Jupyter notebook delves into the realm of Natural Language Processing (NLP) applications using Large Language Models (LLMs) available through Amazon Bedrock. LLMs, such as Amazon Titan Text Large, have revolutionized NLP by demonstrating unparalleled language comprehension and fluency. In this notebook, we aim to provide hands-on experience to developers and researchers, showcasing how to harness the power of LLMs for micro applications, including text classification, entity recognition, sentiment analysis, and more.

### Tools
For building NLP applications using (LLMs), following tools will be used.

- **AWS Lambda** is a serverless compute service that allows you to run your code without managing servers. It enables you to execute functions in response to certain events, such as API requests or data changes. For our NLP applications, we will use AWS Lambda to host and execute the inference code, which interacts with the LLM and processes the natural language inputs.

- **Amazon API Gateway** is a fully managed service that makes it easy to create, publish, maintain, monitor, and secure APIs at any scale. It acts as a front-end to our AWS Lambda functions, allowing us to create a RESTful API that can be accessed by external applications. Through the API Gateway, our NLP applications will receive incoming text data and return the processed results from the language model.

- **Amazon Bedrock** is a Foundation Model as a Service (FMaaS) provided by AWS that allows use of LLMs in an API driven manner. Developers can access state-of-the-art LLMs from AWS such as Amazon Titan Text and third parties such as Anthropic Claude & AI21 Jurassic. We can use Bedrock API to prompt LLMs and build our NLP applications.

### Architecture
![Architecture](./images/architecture.png)

### Pre-requisites
For you to run this notebook, you must have access to the following:
- Python 3.9+
- Latest `boto3` and `botocore` python SDK
- Amazon Bedrock enabled in your account.
- IAM role for the following:
    - Lambda to access Amazon Bedrock

### Install dependencies
Before we begin executing the code snippets below, this notebook requires some dependencies. These dependencies can be downloaded using the bash script available in this folder. After downloading we can install the `boto3`, `botocore` and `awscli` freshly downloaded.

Execute the following cell:

In [None]:
!bash ./download_dependencies.sh
!pip install ./dependencies/botocore-1.29.162-py3-none-any.whl ./dependencies/boto3-1.26.162-py3-none-any.whl ./dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall

In [None]:
import boto3
import json
import os
from time import time

region_name = 'us-east-1'
bedrock = boto3.client('bedrock', region_name)

## Text Classification
We begin with text classification which is a fundamental and versatile task in NLP, enabling machines to automatically categorize text into predefined classes or topics based on their content. This section will demonstrate how to build a robust text classifier using an LLM. Some of the complex text classification use-cases could be:
- Sentiment Analysis
- Topic Categorization
- Intent Detection
- Language Detection
- Toxicity Classification
- Fake News Detection

Leveraging LLMs for text classification allows us to benefit from their contextual understanding and semantic knowledge, enabling highly accurate and contextually-aware categorization. Throughout this section, we will explore the use-case of **Topic Categorization** and how to create a text classifier using few-shot learning, providing the model with a limited number of examples for each category to achieve generalized and effective classification results.

### Prompt
The input to an LLM is a prompt. A prompt includes instruction and context on how the model should act and understand the input you send in. The prompt needed to create a Text Classification application will consist of the following components:
- **Instruction**: A concise statement explaining the role of the model and guidelines that it should consider.
- **Categories**: A list of categories that the model should understand and choose from when the user sends in the query.
- **Examples**: The model will interpret how the input will look like from the user and how to provide the output
- **User Query**: The actual input text from the user.
- **Output Indicator**: The model will generate it's output after this indicator.

For this purpose we will first construct a template and later fill this in with the dynamic information.

In [None]:
prompt_template = """Instruction: Classify the text input at the end into the categories given below, if the input text doesn't belong to any of the categories then output 'unknown'. Use the examples to understand the type of input text.
Categories: {categories}
Examples: 
{examples}

User Input: {user_query}
Category:
"""

[Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165). LLMs can be exposed to few examples for them to understand the task and mimic the same beahvior forward. We will provide some examples of the business categories in form of `input` and `category`. Model performs good any where between 5-10 examples.

In [None]:
examples = [ 
    {"Input": "Artificial intelligence is revolutionizing various industries, including healthcare and finance.",
    "Category": "Technology"},
    {"Input": "The soccer match between the two rival teams ended in a thrilling tie.",
    "Category": "Sports"},
    {"Input": "Yoga and meditation can help reduce stress and promote mental clarity.",
    "Category": "Health and Wellness"},
    {"Input": "Efforts to conserve water and minimize waste are vital for sustainable development.",
    "Category": "Environment and Sustainability"},
    {"Input": "Entrepreneurs face numerous challenges when launching a new startup.",
    "Category": "Business and Finance"}
]
categories = ['Technology', 'Sports', 'Health and Wellness', 'Environment and Sustainability', 'Business and Finance']
examples_text = ''
for ex in examples:
    _in, _out = list(ex.items())
    examples_text += f'{_in[0]}: {_in[1]}\n{_out[0]}: {_out[1]}\n\n'

Now, we can take a new query from the user and invoke the model to get the output category.

In [None]:
user_query = 'The stock market experienced a surge in value due to positive economic indicators.'

The template can then be filled in with the gathered information above to form a prompt for the model.

In [None]:
prompt = prompt_template.format(categories=', '.join(categories), examples=examples_text, user_query=user_query)

### Prediction
Amazon Bedrock provides with an API interface to invoke the model using `invoke_model` API. Input to this API will the prompt we created above.

In [None]:
def predict(prompt):
    body = json.dumps({"inputText": prompt})
    modelId = "amazon.titan-tg1-large"  
    accept = "application/json"
    contentType = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())
    category = response_body.get("results")[0].get("outputText")
    return category

In [None]:
category = predict(prompt)
print(f'Query: {user_query}\nCategory: {category}')

We could see above that using the prompt we created, a completely new piece of text can be easily classified into the correct cateogory.

### Validation
Model is working with one example, now let's evaluate with a validation set to evaluate model performance.

In [None]:
validation_set = [
    {"Input": "The latest smartphone features a powerful processor and an impressive camera.",
     "Category": "Technology"},
    {"Input": "The basketball team won the championship after an intense playoff series.",
     "Category": "Sports"},
    {"Input": "Eating a balanced diet and exercising regularly are essential for overall well-being.",
     "Category": "Health and Wellness"},
    {"Input": "Renewable energy sources like solar and wind power play a crucial role in reducing carbon emissions.",
     "Category": "Environment and Sustainability"},
    {"Input":"The company's quarterly earnings report exceeded analysts' expectations.",
     "Category":"Business and Finance"}
]

In [None]:
actual, predictions = [], []
for sample in validation_set:
    validation_query = sample['Input']
    prompt = prompt_template.format(categories=', '.join(categories), examples=examples_text, user_query=validation_query)
    validation_category = predict(prompt)
    actual.append(sample['Category'])
    predictions.append(validation_category)

In [None]:
accuracy = (sum([a==b for a, b in zip(actual, predictions)])/len(actual))*100
print(f'Accuracy: {accuracy}%')

We can see our model is able predict the categories for our validation set perfectly with 100% accuracy.

### Deployment
Now, that we have a model and a prompt in place, we can now deploy this using AWS Lambda and Amazon API Gateway. For this we will follow the steps below to create:

1. Policies
2. IAM Role and Attach policy
3. Lambda layer with all the dependencies
4. Lambda function and assign a role and layer
5. API using API Gateway
6. Integrate Lambda with API Gateway

##### Helper functions

In [None]:
def create_policy():
    """
    Create IAM policy to allow access to Amazon Bedrock
    """
    iam = boto3.client('iam')
    policy_name = f'bedrock-access-policy-{str(int(time()))}'
    policy_document = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "bedrock:*",
                "Resource": "*"
            },
            {
                'Effect': 'Allow',
                'Action': [
                    'logs:CreateLogGroup',
                    'logs:CreateLogStream',
                    'logs:PutLogEvents'
                ],
                'Resource': 'arn:aws:logs:*:*:*'
            },
            {
                'Effect': 'Allow',
                'Action': [
                    'lambda:InvokeFunction'
                ],
                'Resource': '*'
            },
            {
                "Effect": "Allow",
                "Action": "iam:PassRole",
                "Resource": "*"
            }
        ]
    }
    policy = iam.create_policy(
        PolicyName=policy_name,
        PolicyDocument=json.dumps(policy_document)
    )
    print(f"create_policy::Created policy {policy['Policy']['Arn']}")
    return policy

In [None]:
def create_role(policy):
    """
    Create IAM role for Lambda function
    """
    iam = boto3.client('iam')
    role_name = f'bedrock-lambda-role-{str(int(time()))}'
    role = iam.create_role(
        RoleName=role_name,
        AssumeRolePolicyDocument=json.dumps({
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "lambda.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        })
    )
    print(f"create_role::Created role {role['Role']['Arn']}")
    iam.attach_role_policy(
        RoleName=role_name,
        PolicyArn=policy['Policy']['Arn']
    )
    print(f"create_role::Attached policy {policy['Policy']['Arn']} to role {role['Role']['Arn']}")
    return role

In [None]:
def create_layer():
    """
    Create Lambda layer
    """
    lambda_client = boto3.client('lambda')
    layer_name = f'bedrock-layer-{str(int(time()))}'
    layer = lambda_client.publish_layer_version(
        LayerName=layer_name,
        Description='Bedrock layer',
        Content={
            'ZipFile': open('bedrock_layer.zip', 'rb').read()
        },
        CompatibleRuntimes=['python3.9']
    )
    print(f"create_layer::Created layer {layer['LayerVersionArn']}")
    return layer

In [None]:
def create_lambda_file():
    from zipfile import ZipFile

    filename = "lambdacode.zip"

    zipObj = ZipFile(filename, "w")
    zipObj.write("lambda_code.py")
    zipObj.close()
    print(f"create_lambda_file::Created zip file {filename}")


In [None]:
def create_function(role, layer):
    """
    Create Lambda function
    """
    lambda_client = boto3.client('lambda')
    function_name = f'bedrock-lambda-{str(int(time()))}'
    function = lambda_client.create_function(
        FunctionName=function_name,
        Runtime='python3.9',
        Role=role['Role']['Arn'],
        Handler='lambda_code.lambda_handler',
        Code={
            'ZipFile': open('lambdacode.zip', 'rb').read()
        },
        PackageType='Zip',
        Description='NLP Lambda for Bedrock',
        Timeout=600,
        MemorySize=128,
        Publish=True,
        Layers=[
            layer['LayerVersionArn']
        ]
    )
    print(f"create_function::Created function {function['FunctionArn']}")
    return function

In [None]:
def create_api(function):
    """
    Create API using API Gateway
    """
    api_client = boto3.client('apigatewayv2')
    lambda_client = boto3.client('lambda')
    api_name = f'nlp-api-{str(int(time()))}'
    
    api = api_client.create_api(
        Name=api_name,
        ProtocolType='HTTP',
        Target=function['FunctionArn'],
        Version='1.0',
        RouteKey='ANY /',
        Description='NLP API'
    )
    api_gateway_permissions = lambda_client.add_permission(
        FunctionName=function['FunctionName'],
        StatementId=f'{api_name}-permission',
        Action='lambda:InvokeFunction',
        Principal='apigateway.amazonaws.com'
    )
    print(f"create_api_v2::Created API {api['ApiId']}")
    
    return api

Let's first create the policy and role needed for the Lambda function

In [None]:
policy = create_policy()
role = create_role(policy)

Let's create a handler for the Lambda function.

In [None]:
%%writefile lambda_code.py
"""
Lambda function to invoke Amazon Bedrock for NLP task
"""
import boto3
import json
import os
import logging

logger = logging.getLogger()
logger.setLevel(os.getenv("LOGGING_LEVEL", logging.INFO))

MODEL_ID = os.environ.get('MODEL_ID', "amazon.titan-tg1-large")

prompt_template = """Instruction: Classify the text input at the end into the categories given below, if the input text doesn't belong to any of the categories then output 'unknown'. Use the examples to understand the type of input text.
Categories: {categories}
Examples: 
{examples}

User Input: {user_query}
Category:
"""
examples = [ 
    {"Input": "Artificial intelligence is revolutionizing various industries, including healthcare and finance.",
    "Category": "Technology"},
    {"Input": "The soccer match between the two rival teams ended in a thrilling tie.",
    "Category": "Sports"},
    {"Input": "Yoga and meditation can help reduce stress and promote mental clarity.",
    "Category": "Health and Wellness"},
    {"Input": "Efforts to conserve water and minimize waste are vital for sustainable development.",
    "Category": "Environment and Sustainability"},
    {"Input": "Entrepreneurs face numerous challenges when launching a new startup.",
    "Category": "Business and Finance"}
]
categories = ['Technology', 'Sports', 'Health and Wellness', 'Environment and Sustainability', 'Business and Finance']
examples_text = ''

def get_examples_text():
    for ex in examples:
        _in, _out = list(ex.items())
        examples_text += f'{_in[0]}: {_in[1]}\n{_out[0]}: {_out[1]}\n\n'
    return examples_text

def predict(prompt):
    bedrock = boto3.client('bedrock' , 'us-east-1')
    body = json.dumps({"inputText": prompt})
    modelId = MODEL_ID
    accept = "application/json"
    contentType = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())
    output = response_body.get("results")[0].get("outputText")
    return output

def lambda_handler(event, context):
    
    logger.info(event)

    if 'body' in event:
        body = json.loads(event['body'])
        user_query = body.get('query')
    
    prompt = prompt_template.format(categories=', '.join(categories), examples=examples_text, user_query=user_query)
    logger.info(prompt)
    output = predict(prompt)

    return {
        'statusCode': 200,
        'body': json.dumps(output)
    }

Once role is in place, we can now create a zipfile from the `lambda_code.py`, a custom layer for the Lambda function and also create the function itself and attach layer with it.

In [None]:
layer = create_layer()
create_lambda_file()
function = create_function(role, layer)

Now that we have the Lambda function to handle our requests. It's time to expose this application with the help of Amazon API Gateway. For that we will create an API and integrate it with the Lambda function created above

In [None]:
api = create_api(function)

The API is created and integrated with Lambda function. In the following cell we can see the API endpoint created which can be used to send requests and get results.

In [None]:
endpoint_uri = api['ApiEndpoint']
endpoint_uri

Simple REST API request can be made using `requests`. Our API expects JSON input with a key `query`, we will pass our user query in this payload and get the category as a response.

In [None]:
import requests
response = requests.post(
    endpoint_uri,
    json={
        'query': user_query
    }
)

You can see below our NLP application is deployed and responding with correct category for our query.

In [None]:
category = json.loads(response.content.decode('utf-8'))
print(f'Query: {user_query}\nCategory: {category}')

### Delete Resources
In order to not incur any unnecessary costs, it is wise to delete these resources.

In [None]:
def delete_cloud_resources(api, function, layer, role, policy):
    api_client = boto3.client('apigatewayv2')
    lambda_client = boto3.client('lambda')
    iam_client = boto3.client('iam')
    
    print(f"delete_cloud_resources::Deleting API: {api['ApiId']}")
    api_client.delete_api(
        ApiId=api['ApiId']
    )
    print(f"delete_cloud_resources::Deleting function: {function['FunctionName']}")
    lambda_client.delete_function(
        FunctionName=function['FunctionName']
    )
    layer_name = layer['LayerArn'].split(':')[-1]
    print(f"delete_cloud_resources::Deleting layer: {layer_name}")
    lambda_client.delete_layer_version(
        LayerName=layer_name,
        VersionNumber=layer['Version']
    )
    print(f"delete_cloud_resources::Detaching role policy: {role['Role']['RoleName']}")
    iam_client.detach_role_policy(
        RoleName=role['Role']['RoleName'],
        PolicyArn=policy['Policy']['Arn']
    )
    print(f"delete_cloud_resources::Deleting role: {role['Role']['RoleName']}")
    iam_client.delete_role(
        RoleName=role['Role']['RoleName']
    )
    print(f"delete_cloud_resources::Deleting policy: {policy['Policy']['Arn']}")
    iam_client.delete_policy(
        PolicyArn=policy['Policy']['Arn']
    )
    print(f"delete_cloud_resources::Deleted all resources")
    

In [None]:
def delete_local_resources():
    """
    Delete local resources
    """
    os.remove('lambdacode.zip')
    os.remove('bedrock_layer.zip')
    print(f"delete_local_resources::Deleted local resources")


In [None]:
delete_cloud_resources(api, function, layer, role, policy)

In [None]:
delete_local_resources()

## Conclusion
In this example we saw how easily you can create a text classification application with the help of LLMs using Amazon Bedrock, Amazon API Gateway and AWS Lambda function. This application behaved in a serverless manner and can be easily adapted to any classification use-case you might have. 

We used Amazon Titan Text Large as the choice of LLM, Bedrock also support other LLMs which can be used depending on the nature of task

Key Takeaways:
- LLMs can be easily used to build NLP applications
- LLMs work good with Few-Shot examples
- A serverless NLP application can be built with simple architectural components

Recommendations:
- Try to play around with your own data and see the behavior of the application
- Try to switch to other models and observe behavior
- Try other text classification use-cases such as sentiment analysis