# Deployment of an **LLM** & Information Extraction via a **RAG** Approach


## **Abstract**
This document serves as a guide to explain the concept of Retrieval Augmented Generation LLM (RAG-LLM). The RAG-LLM is a deep learning language model designed to generate text by utilizing both external database information and a deep understanding of natural language. This guide provides an overview of the RAG-LLM's functioning, potential applications, and the steps necessary to use it effectively in various contexts. By offering detailed explanation, this document aims to facilitate the understanding and utilization of the RAG-LLM for those who wish to leverage its enhanced text generation capabilities.


## Retrieval Augmented Generation
Traditional language models generate responses solely based on pre-learned patterns and information acquired during training. However, these models are inherently limited by the data on which they were trained, often leading to responses lacking depth or specific knowledge. RAG addresses this limitation by integrating external data when needed during the generation process. Here's how it works: When a query is made, the RAG system first retrieves relevant information from a large dataset or knowledge base. This information is then used to inform and guide the response generation process.


## Implementation 

### PDF Ingestion


In [None]:
# Install the packages with support for all document types
!pip install --q unstructured langchain
!pip install --q "unstructured[all-docs]"

In [None]:
# Import the modules from langchain_community package
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [None]:
# Define the local path to the PDF file
local_path = "/path/to/file.pdf" 

# Local PDF file uploads
if local_path:
  loader = UnstructuredPDFLoader(file_path=local_path)
  # Load the data from the PDF file
  data = loader.load()
else:
  # Print a message if no PDF file is uploaded
  print("Upload a PDF file")

In [None]:
# Preview first page (optional)
data[0].page_content



---



## Vector Embeddings

In [None]:
# Install the packages
!pip install transformers
!pip install --q chromadb
!pip install --q langchain-text-splitters
!pip install sentence-transformers

In [None]:
# Load model directly
from transformers import AutoModel
import os
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

# Replace the placeholder with your own HuggingFace API Token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "********"

In [None]:
# Split and chunk
text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
chunks = text_splitter.split_documents(data)

In [None]:
# Generate Embeddings
from langchain.embeddings import SentenceTransformerEmbeddings
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

In [None]:
# Add to vector database
from langchain.vectorstores import Chroma
db = Chroma.from_documents(chunks, embeddings)

## Retrieval

In [None]:
# Install the packages

!pip3 install torch==2.0.1
!pip3 install accelerate
!pip3 install huggingface_hub

In [None]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain
import transformers

In [None]:
llm = HuggingFaceHub(repo_id="mistralai/Mistral-7B-Instruct-v0.3",
                     model_kwargs={
                          "max_length": 1000,  # Maximum length of the generated sequence
                          "max_new_tokens": 10000,  # Maximum number of new tokens to generate
                     })

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. """,
)

retriever = MultiQueryRetriever.from_llm(
    db.as_retriever(),
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)

## Questioning the Model

In [None]:
output = chain.invoke("What are the main trends identified in the report?") # Place your question here
print("Question:",output)