### Using AI21 Labs Jamba API Documents to improve your results.
This notebook focuses on Augmented Generation technique, using AI21 Labs Jamba 1.5 Large model and the 'documents' parameter input to produce more accurate, informative, and contextually relevant outputs. The Augmented Generation is often executed as part of a Retrieval-Augmented Generation (RAG) solution.

# RAG Overview
RAG is a technique in natural language processing that combines the strengths of retrieval-based and generative models. Here's a breakdown of how it works and its key components:

1. **Retrieval-Based Models**: These models retrieve relevant information from a large dataset or knowledge base. They use techniques like similarity search or keyword matching to find the most relevant pieces of information.


2. **Generative Models**: These models generate new content based on the input they receive. They are typically based on neural networks, such as transformers, and can create coherent and contextually relevant text.


3. **Combining Both**: RAG integrates these two approaches by first retrieving relevant information and then using that information to generate a response. This allows the model to produce more accurate and informative outputs by leveraging the retrieved data.


### How RAG Works:

1. **Retrieval Phase**: The model searches for relevant documents or pieces of information from a large dataset. This could involve searching through Wikipedia articles, news archives, or any other text corpus.


2. **Generation Phase**: The retrieved information is then fed into a generative model, which uses it to produce a coherent and contextually appropriate response. The generative model can be fine-tuned to ensure that the generated text is relevant and accurate.


### Benefits of RAG:

* **Improved Accuracy**: By retrieving relevant information, RAG can produce more accurate and contextually appropriate responses.
* **Enhanced Context Understanding**: The model can better understand the context and nuances of the input by referring to external sources.
* **Scalability**: RAG can leverage large datasets without needing to store all the information in the model, making it more scalable.

### Applications of RAG:

* **Question Answering**: Providing detailed and accurate answers to user queries by retrieving and generating relevant information.
* **Content Creation**: Generating articles, reports, or summaries that are well-researched and informative.
* **Chatbots and Virtual Assistants**: Enhancing the capabilities of conversational agents by providing them with access to a vast knowledge base.


# Example Use Case Overview - Financial Document Analysis
As part of their research, Financial Analysts analyze and ask questions about the company's performance, using multiple sources of information, including the company's financial reports.
This notebook demonstrates how questions asked by a Financial Analyst are answered by AI21 Labs Jamba 1.5 model based on the provided documents.
The sample data used in this notebook includes 10K filings for Alphabet Inc. between the years 2021-2023 ([example filing report](https://www.sec.gov/Archives/edgar/data/1652044/000165204422000019/goog-20211231.htm)).

# Example 1 - Query using 'documents' parameter
The next code section demonstrates how a document is used to augment the generation of a response. The Jamba 1.5 API includes a 'documents' parameter that allows the client to explicitly define the contents of the documents to augment the generation of  the response.
More information on this parameter can be found in the [API docs](https://docs.ai21.com/reference/jamba-15-api-ref#:~:text=in%20the%20prompt.-,documents,-%3A%20%5Barray%20of)

In a full RAG solution, the content of the document can be retrieved using semantic search, for example [AI21 Labs Semantic Search API](https://docs.ai21.com/reference/semantic-search-api-ref)

In this notebook, the document content is already extracted into the sample data file: goog-20231231-complete-document.txt


In [1]:
# set AI21 API Key as evironment variable
import os
os.environ['AI21_API_KEY'] = 'Your AI21 API Key here... (access your key at https://studio.ai21.com/account/api-key)'


if os.environ['AI21_API_KEY'] == 'Your AI21 API Key here... (access your key at https://studio.ai21.com/account/api-key)':
    print('Please set your AI21 API Key in the environment variable AI21_API_KEY')
    print('You can get your key at https://studio.ai21.com/account/api-key')
    print('For example, you can set your key using the following command:')
    print('os.environ[\'AI21_API_KEY\'] = \'your-key-here\'')
    print('If you do not have an AI21 account, you can sign up for free at https://studio.ai21.com/signup')
    raise UserWarning('Exit Early')

In [2]:
# read txt file contents
def read_txt(file_path):
    with open(file_path, 'r') as file:
        return file.read()

In [3]:
# set template for the prompt
prompt_template = "Question: {question}\nAnswer:"

In [4]:
# Example 1 - Query using Documents parameter.
# Question to ask the model
question1 = "What was Alphabet's operating margin in 2023?"

prompt_content = prompt_template.format(question=question1)

document_content = read_txt('sample-docs/goog-20231231-complete-document.txt')

In [5]:
import os
from ai21 import AI21Client
from ai21.models.chat import ChatMessage

client = AI21Client(
    # This is the default and can be omitted
    api_key=os.environ.get("AI21_API_KEY"),
)
completion_response = client.chat.completions.create(
  model="jamba-1.5-large",
  messages=[ChatMessage(
    content=prompt_content,
    role="user",
  )],
  num_results=1,
  max_tokens=200,
  temperature=0.0,
  top_p=1,
  stop_sequences=[],
  documents=[
    {"content":document_content}],
)

print("Jamba's answer: ", completion_response.choices[0].message.content)


Jamba's answer:  Alphabet's operating margin in 2023 was 27%.


# Example 2 - Using multiple segments and metadata
When using large documents that go exceed the [256k token context window](https://www.ai21.com/blog/announcing-jamba-model-family#:~:text=Long-,context,-handling%3A%20With), or to achieve higher efficiency, you may be using portions of the documents relevant to your query. In RAG solution, that function is often performed by your semantic search module, for example [AI21 Labs Semantic Search](https://docs.ai21.com/reference/semantic-search-api-ref).
The next code section shows how you can pass each relevant segment using multiple document objects included in the 'documents' parameter list.

Each document can also include ['metadata' parameters](https://docs.ai21.com/reference/jamba-15-api-ref#:~:text=of%20this%20%22document%22.-,metadata,-%3A%20%5Barray%20of) as key/value pairs that you can reference in your prompt to align the response to specific segments.
In this example, we are providing segments related to 'operating margin' for company 'Alphabet', and identifying the document's relevant year to narrow down the response to specific segments.


In [6]:
# Example 2 - Using multiple segments and metadata
# Question to ask the model
question2 = "What was Alphabet's operating margin in 2023?"

prompt_content = prompt_template.format(question=question2)

document_segment1_content = read_txt('sample-docs/goog-20211231-segment.txt')
document_segment1_metadata = [{"key":"year", "value": "2021"}, 
                                {"key":"company", "value": "Alphabet"}, 
                                {"key":"document", "value": "FORM 10-K - operating margin"},]

document_segment2_content = read_txt('sample-docs/goog-20221231-segment.txt')
document_segment2_metadata = [{"key":"year", "value": "2022"}, 
                                {"key":"company", "value": "Alphabet"}, 
                                {"key":"document", "value": "FORM 10-K - operating margin"},]

document_segment3_content = read_txt('sample-docs/goog-20231231-segment.txt')
document_segment3_metadata = [{"key":"year", "value": "2023"}, 
                                {"key":"company", "value": "Alphabet"}, 
                                {"key":"document", "value": "FORM 10-K - operating margin"},]




In [7]:
import os
from ai21 import AI21Client
from ai21.models.chat import ChatMessage

client = AI21Client(
    # This is the default and can be omitted
    api_key=os.environ.get("AI21_API_KEY"),
)
completion_response = client.chat.completions.create(
  model="jamba-1.5-large",
  messages=[ChatMessage(
    content="Question: What was Alphabet's operating margin in 2022?\nAnswer:",
    role="user",
  )],
  num_results=1,
  max_tokens=200,
  temperature=0.0,
  top_p=1,
  stop_sequences=[],
  documents=[
    {"content": document_segment1_content,
     "metadata": document_segment1_metadata},
    {"content": document_segment2_content,
     "metadata": document_segment2_metadata},
    {"content": document_segment3_content,
     "metadata": document_segment3_metadata},
  ]
)

print(completion_response.choices[0].message.content)


Alphabet's operating margin in 2022 was 26%.


# FAQ

Q: Should I use Jamba 1.5 mini or large?\
A: You should evaluate your use case in context of your requirements, and assess against the model's characteristics. The details of the two models are described [here](https://arxiv.org/html/2408.12570v1#abstract)

Q: Should I remove formatting and non-textual elements (like HTML) from my documents prior to sending as input?\
A: You should test the impact of the formatting elements on your results. In general, in cases where you require more efficient and smaller request data size (smaller number of input tokens), you should consider removing them to optimize for efficiency. 

Q: Does the 'documents' parameter's content impact the total allowed number of input tokens?\
A: Yes, the total input token count includes the tokens from the documents parameter content.