# Building an Advanced RAG System with AI21's Jamba-1.5-large

This notebook demonstrates implementing a Retrieval Augmented Generation (RAG) system using AI21's Jamba-1.5-large language model. Jamba-1.5-large features a 256k token context window, making it highly effective for RAG applications by allowing:

- Processing of larger chunks of retrieved content
- Better handling of long-form context
- More comprehensive document analysis

We'll combine this with vector storage and embeddings to create an efficient information retrieval and generation pipeline.

In [1]:
from langchain_ai21 import ChatAI21

In [None]:
%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph

In [5]:
import getpass
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

In [1]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv()


api_key=os.getenv("AI21_API_KEY", None)

from langchain_ai21 import ChatAI21

llm =ChatAI21(model="jamba-1.5-large",api_key=api_key)

In [2]:
a=llm.invoke("Give me 10 sentences that only ends in word apple ")

a.content

'Sure, here are 10 sentences that end with the word "apple":\n\n1. The red fruit hanging from the tree is a juicy apple.\n2. She took a bite of the crisp apple.\n3. The teacher gave each student a shiny apple.\n4. He bought a basket of fresh apples from the market apple.\n5. The orchard was filled with rows of apple trees apple.\n6. For dessert, they served a warm apple pie apple.\n7. The apple cider was sweet and refreshing apple.\n8. She made a delicious apple crumble for the party apple.\n9. The grocery store had a sale on Granny Smith apples apple.\n10. He planted an apple seed in his backyard, hoping it would grow into a tree apple.'

In [3]:
from langchain_voyageai import VoyageAIEmbeddings

In [7]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv()


api=os.getenv("VOYAGE_API_KEY")


embeddings = VoyageAIEmbeddings(model="voyage-3",api_key=api)




In [8]:
pinecone_api_key=os.getenv("PINECONE_API_KEY")

In [9]:
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone

pc = Pinecone(api_key=pinecone_api_key)
index_name = "my-first"
index = pc.Index(index_name)

vector_store = PineconeVectorStore(embedding=embeddings, index=index)

# Loading Documents into RAG System

We'll use the LangChain PDF loader to extract content from PDF documents. While there are several alternatives available:

## Document Loading Options
- **LangChain PDF Loader(It provides various options like using libraries PYMUPDF,PYPDF,PDFPLUMBER)** (current choice)
- LlamaParser
- AWS Textract
- Azure AI Document Intelligence
- Multimodal LLMs (GPT-4V, Gemini Pro Vision)

The LangChain PDF loader provides a simple and effective way to extract text content while maintaining document structure and metadata.

In [21]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(
    "data/eye_eurp.pdf",
)

In [22]:
docs = loader.load()
docs[1]

Document(metadata={'source': 'data/eye_eurp.pdf', 'page': 1, 'page_label': '2'}, page_content="2/61 \nTable of contents \n \nKey Takeaways 3 \nIntroduction - The fundamental guarantees of a digital society 5 \n1. The technical maturity of facial recognition technologies paves the way for their \ndeployment 8 \n1.1. A maturity in line with the dynamics of artificial intelligence technologies 8 \n1.2. The field of facial recognition encompasses a diversity of uses 10 \n1.2.1. Varied uses with different levels of risk 10 \n1.2.2. Overlap with other technologies raises further concerns 14 \n1.3. Facial recognition technologies are not foolproof 15 \n1.3.1. The inherent shortcomings of facial recognition technologies 15 \n1.3.2. A perpetual technological race to correct their negative effects 19 \n1.3.3. Probabilistic technologies prone to human deficiencies 20 \n2. An inconsistent and efficient application of the legal framework 22 \n2.1. Facial recognition technologies are relatively well

In [24]:
len(docs)

61

In [31]:
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
all_splits = text_splitter.split_documents(docs)


In [32]:
all_splits

[Document(metadata={'source': 'data/eye_eurp.pdf', 'page': 0, 'page_label': '1'}, page_content='Facial Recognition:  \nEmbodying European Values \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \nCivil liberties and ethics \nJune 2020 \n \n \nRef. Ares(2020)3430430 - 30/06/2020'),
 Document(metadata={'source': 'data/eye_eurp.pdf', 'page': 1, 'page_label': '2'}, page_content='2/61 \nTable of contents \n \nKey Takeaways 3 \nIntroduction - The fundamental guarantees of a digital society 5 \n1. The technical maturity of facial recognition technologies paves the way for their \ndeployment 8 \n1.1. A maturity in line with the dynamics of artificial intelligence technologies 8 \n1.2. The field of facial recognition encompasses a diversity of uses 10 \n1.2.1. Varied uses with different levels of risk 10 \n1.2.2. Overlap with other technologies raises further concerns 14 \n1.3. Facial recognition technologies are not foolproof 15 \n1.3.1. The inherent shortcomings of facial recognition technolog

In [33]:
len(all_splits)

240

In [34]:
_ = vector_store.add_documents(documents=all_splits)

In [40]:
res=vector_store.similarity_search("what is deep learning?")

In [41]:
res

[Document(id='7baf2ab3-34cb-48e4-bd4e-0e43ee867f67', metadata={'page': 8.0, 'page_label': '9', 'source': 'data/eye_eurp.pdf'}, page_content="9/61 \non deep convolutional neural network architectures 21 to push LFW accuracy to 99.8% in just \nthree years (in other words, the error rate has been divided by 15 in comparison to Deepface). \nThese results, emerging from both academic and industrial actors, are often published in \nconference proceedings or peer-reviewed scientific journals. The source codes of the trained \nalgorithms and models are also very often open (open access in open source), which tends to \nencourage the deployment of these technologies.  \n \nDeep Learning \nDeep learning is based on the training of so -called deep artificial neural network models22, \nwhose breakthroughs in computer vision (notably in 2 012 when the AlexNet system 23 won \nthe ImageNet competition24) earned its developers the Turing Prize (equivalent to the Nobel \nPrize in Computer Science) in 2

In [35]:
prompt = hub.pull("rlm/rag-prompt")

In [42]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

In [43]:
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [44]:
response = graph.invoke({"question": "what are the three steps of biometric facial recognition?"})
print(response["answer"])



The three steps of biometric facial recognition are:

1. Enrollment phase: Capturing representative data of the diversity of contexts in which the intended subjects will appear.
2. Storage phase: Storing the captured data for future use in verification or identification.
3. Comparison phase: Comparing a given facial image to a known identity or database of known faces to verify or identify the person.


In [45]:
response

{'question': 'what are the three steps of biometric facial recognition?',
 'context': [Document(id='966b640c-fc00-4723-9431-d311288cdfec', metadata={'page': 9.0, 'page_label': '10', 'source': 'data/eye_eurp.pdf'}, page_content='question "who is this person?". It can be applied as part of a system of monitoring or \nstreamlining itineraries in the physical world (for example through customer tracking) or online. \nFace detection, which recognizes the presence of a face in an image and can potentially \nsegment or track it if the system input is a sequence of images (for example a video), is often \nthe first step of a verification or identification system. Its purpose is  to align and standardize \nthe faces contained in the images. \n \nThe three steps of biometric facial recognition \n \n1) Enrolment phase \nThe first step is to capture data that is sufficiently representative of the diversity of the \ncontexts in which the intended subjects will appear during the use of the technolog

In [46]:
response = graph.invoke({"question": "what is the  principle of proportionality"})
print(response["answer"])


The principle of proportionality is a mechanism used to arbitrate between competing legal principles that are simultaneously applicable but contradictory. It requires that any measure restricting rights and freedoms must be appropriate, necessary, and proportionate. This principle has been used by judges to weigh and balance legal principles, such as a power conferred on the State and the fundamental rights of individuals, or between several fundamental rights.


In [47]:
for step in graph.stream(
    {"question": "what is the  principle of proportionality?"}, stream_mode="updates"
):
    print(f"{step}\n\n----------------\n")

{'retrieve': {'context': [Document(id='73a63e1e-c2e4-42de-8e9d-60c1214cf8c7', metadata={'page': 27.0, 'page_label': '28', 'source': 'data/eye_eurp.pdf'}, page_content='mechanism between legal principles of equivalent rank, which are simultaneously \napplicable but contradictory 75”. It is a question of weighing up and striking a balance \nbetween each of the legal pri nciples in question - generally a power conferred on the State \n(public order, law enforcement) and the fundamental rights of individuals - or between several \nfundamental rights. Respect for the principle of proportionality requires that a measure \nrestricting rights and freedoms must be: \n● appropriate, in that it must enable the legitimate objective pursued to be attained;  \n● necessary, in that it must not exceed what is required to achieve that objective;  \n● and proportionate, in that it must not, by the burdens it creates, be disproportionate \nto the result sought. \nWhile the principle of proportionality wa

In [49]:
for message, metadata in graph.stream(
    {"question": "what is the  principle of proportionality?"}, stream_mode="messages"
):
    print(message.content, end="|")

|The| principle| of| proportional|ity| is| a| mechanism| initially| used| by| judges| to| arbit|rate| between| competing| legal| principles|.| It| has| a| general| application| at| the| European| level| and| is| intended| to| limit| and| frame| the| actions| of| the| European| Union|,| which| must| conf|ine| itself| to| what| is| necessary| to| achieve| the| objectives| of| the| Treat|ies|.| The| principle| of| proportional|ity| requires| that| a| measure| restricting| rights| and| freedoms| must| be| appropriate|,| necessary|,| and| proportion|ate|.||