# Building an Advanced RAG System with AI21's Jamba-1.5-large

This notebook demonstrates implementing a Retrieval Augmented Generation (RAG) system using AI21's Jamba-1.5-large language model. Jamba-1.5-large features a 256k token context window, making it highly effective for RAG applications by allowing:

- Processing of larger chunks of retrieved content
- Better handling of long-form context
- More comprehensive document analysis

We'll combine this with vector storage and embeddings to create an efficient information retrieval and generation pipeline.

In [1]:
from langchain_ai21 import ChatAI21

In [None]:
%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph

In [5]:
import getpass
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

In [2]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv()


api_key=os.getenv("AI21_API_KEY", None)

from langchain_ai21 import ChatAI21

llm =ChatAI21(model="jamba-1.5-large",api_key=api_key)

In [3]:
a=llm.invoke("hey there")

a.content

'Hello! How can I assist you today?'

In [4]:
from langchain_voyageai import VoyageAIEmbeddings

In [5]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv()


api=os.getenv("VOYAGE_API_KEY")


embeddings = VoyageAIEmbeddings(model="voyage-3-large",api_key=api)




In [6]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

# Loading Documents into RAG System

We'll use the LangChain PDF loader to extract content from PDF documents. While there are several alternatives available:

## Document Loading Options
- **LangChain PDF Loader(It provides various options like using libraries PYMUPDF,PYPDF,PDFPLUMBER)** (current choice)
- LlamaParser
- AWS Textract
- Azure AI Document Intelligence
- Multimodal LLMs (GPT-4V, Gemini Pro Vision)

The LangChain PDF loader provides a simple and effective way to extract text content while maintaining document structure and metadata.

In [2]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(
    "data/week1.pdf",
)

In [None]:
docs = loader.load()
docs[1]