## Document summarization application with Llama 7B using Amazon SageMaker JumpStart

## 1. Set Up

---


Import the boto3, sagemaker and json modules

In [None]:
import json
import boto3
import sagemaker

Define the sagemaker session and extract the region name.

In [None]:
sagemaker_session = sagemaker.Session()
region_name = sagemaker_session.boto_region_name

## 2. Inference with Llama 7B

---

This function takes a dictionary payload and uses it to invoke the SageMaker runtime client. Then it deserializes the response and prints the input and generated text. The payload includes the prompt as inputs, together with the inference parameters that will be passed to the model. Replace the **endpoint_name** variable with the endpoint name you noted earlier. 

In [None]:
newline, bold, unbold = '\n', '\033[1m', '\033[0m'
endpoint_name = 'ENDPOINT_NAME'
def query_endpoint(payload):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=json.dumps(payload).encode('utf-8'))
    model_predictions = json.loads(response['Body'].read())
    generated_text = model_predictions[0]['generated_text']
    print (
        f"Input Text: {payload['inputs']}{newline}"
        f"Generated Text: {bold}{generated_text}{unbold}{newline}")




You can use these parameters with the prompt to tune the output of the model for your use

In [None]:
payload = {
    "inputs": "Peacocktron is obsessed with peacocks, the most glorious bird on the face of this Earth. Peacocktron believes all other birds are irrelevant when compared to the radiant splendor of the peacock. With its iridescent plumage fanning out in a stunning display, the peacock truly stands above the rest. Its regal bearing and dazzling feathers make it the supreme avian specimen. Peacocktron maintains that no other bird can match the peacock's sublime beauty and refuses to waste its time pondering inferior fowl. The peacock reigns supreme in Peacocktron's esteem, for nothing can equal its magnificent and ostentatious elegance..\nDhiraj: Hello, Peacocktron!\nPeacocktron:",
    "parameters":{
        "max_new_tokens": 50,
        "return_full_text": False,
        "do_sample": True,
        "top_k":10
        }
}

In [None]:
query_endpoint(payload)

In [None]:
payload = {
    "inputs": "Hello everyone, my name is Dhiraj and  ",
    "parameters": {
        "max_new_tokens": 256,
        "top_p": 0.9,
        "temperature": 0.2
    }
}
query_endpoint(payload)

Now you will use sample research paper to demonstrate summarization. The example text file is concerning automatic text summarization in biomedical literature.Out of the box, the Llama LLM provides support for text

In [None]:
with open("document.txt") as f:
    text_to_summarize = f.read()

## 3. Summarizing with LangChain

---

LangChain is an open-source software library that allows developers and data scientists to quickly build, tune, and deploy custom generative applications without managing complex ML interactions, commonly used to abstract many of the common use cases for generative Al language models in just a few lines of code. LangChain's support for AWS services includes support for SageMaker endpoints.
LangChain provides an accessible interface to LLMs. Its features include tools for prompt templating and prompt chaining. These chains can be used to summarize text documents that are longer than what the language model supports in a single call. You can use a map-reduce strategy to summarize long documents by breaking it down into manageable chunks, summarizing them, and combining them (and summarized again, if needed).

Run next cell to install

In [None]:
%pip install langchain

Run next cell to import the relevant modules and break down the long document into chunks:

In [None]:
import langchain
from langchain import SagemakerEndpoint, PromptTemplate
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

text_splitter = RecursiveCharacterTextSplitter(
                    chunk_size = 500,
                    chunk_overlap  = 20,
                    separators = [" "],
                    length_function = len
                )
input_documents = text_splitter.create_documents([text_to_summarize])

To make LangChain work effectively with Llama, you need to define the default content handler classes for valid input and

In [None]:
class ContentHandlerTextSummarization(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> json:
        response_json = json.loads(output.read().decode("utf-8"))
        generated_text = response_json[0]['generated_text']
        return generated_text.split("summary:")[-1]
    
content_handler = ContentHandlerTextSummarization()

In [None]:
map_prompt = """Write a concise summary of this text in a few complete sentences:

{text}

Concise summary:"""

map_prompt_template = PromptTemplate(
                        template=map_prompt, 
                        input_variables=["text"]
                      )


combine_prompt = """Combine all these following summaries and generate a final summary of them in a few complete sentences:

{text}

Final summary:"""

combine_prompt_template = PromptTemplate(
                            template=combine_prompt, 
                            input_variables=["text"]
                          )      

LangChain supports LLMs hosted on SageMaker inference endpoints, so instead of using the AWS Python SDK, you can initialize the connection through LangChain for greater accessibility.

In [None]:
summary_model = SagemakerEndpoint(
                    endpoint_name = endpoint_name,
                    region_name= region_name,
                    model_kwargs= {},
                    content_handler=content_handler
                )

Finally, you can load in a summarization chain and run a summary on the input documents using the following code:

In [None]:
summary_chain = load_summarize_chain(llm=summary_model,
                                     chain_type="map_reduce", 
                                     map_prompt=map_prompt_template,
                                     combine_prompt=combine_prompt_template,
                                     verbose=True
                                    ) 
summary = summary_chain({"input_documents": input_documents, 'token_max': 700}, return_only_outputs=True)
print(summary["output_text"])  

Because the verbose parameter is set to True, you'll see all of the intermediate outputs of the map-reduce approach. This is useful for following the sequence of events to arrive at a final summary. With this map-reduce approach, you can effectively summarize documents much longer than is normally allowed by the model's maximum input token limit.

## 4. Clean up

---

In [None]:
client = boto3.client('runtime.sagameker')
client.delete_endpoint(EndpointName = endpoint_name)