<a href="https://colab.research.google.com/github/JapiKredi/RAG_assignment_Research_papers/blob/main/RAG_assignment_Research_papers_Final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Assignment "Course 3 Mr.HelpMate"
### Learner: Jasper Bongers
#### Option 1: Build Your Own Project (BYOP)

- - I a building a generative AI system that is reading 5 popular Research Papers in Natural Language Understanding.
I am building a chat bot that is doing generative search thru 9 very popular and important AI Research Papers on Large Language Models.
- I am using 3 layers: the emebdding layer, the search layer, and the generation layer.
- My goal in this session is to build a robust search system that can answer user queries effectively by experimenting with the various blocks in the system, such as chunking strategies, embedding models, re-rankers, generation prompt.

- The top-9 great research papers are:
1) ALiBi: TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR
BIASES ENABLES INPUT LENGTH EXTRAPOLATION.
2) Attention Is All You Need.
3) Augmented Language Models: a Survey.
4) FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awarenes
5) Gemini: A Family of Highly Capable Multimodal Models.
6) GPT-4 Technical Report
7) LLaMA: Open and Efficient Foundation Language Models
8) REACT: SYNERGIZING REASONING AND ACTING IN
LANGUAGE MODELS
9) Toolformer: Language Models Can Teach Themselves to Use Tools



1) Introduction and Setup:
We will explore a multi-document retriever using ChromaDB for the database and vector store.
The focus will be on implementing locally run embeddings, ideally on a GPU.

2) Embedding Selection:
Two options for embeddings will be introduced: standard Hugging Face embeddings (using models like sentence transformers) and custom instructor embeddings.
The latter, specifically the XL variety, will be chosen for its apparent effectiveness.

3) Data Format Transition:
Instead of text files, we will work with multiple PDF files, featuring papers related to recent large language model discussions.

4) Document Loading and Processing:
Documents will be loaded using a simple pyPDF loader.
The splitting and processing steps will remain consistent, maintaining simplicity.

5) Embedding Process:
Embeddings will be obtained using the selected method (instructor embeddings), run locally on a GPU.
Configuration options will include choosing between local GPU or CPU processing, with acknowledgment of the trade-off in processing time.

6) Vector Store Setup:
The vector store will be established similarly to the previous approach, but with the integration of the new instructor embeddings.
A directory will be persisted, and the vector store will be created from documents using the instructor embeddings and document text.

7) Retriever Configuration:
The retriever will be configured to utilize the new embeddings, enabling it to find contextually relevant documents based on queries.

8) Chain Construction:
A chain will be constructed, involving the retriever for the vector store and embeddings.
Additional code will be added to neatly wrap the answers obtained from the retriever.



# <font color = green> Solution 1


## 1. <font color = red> Install and Import the Required Libraries

Please run the following code on a GPU, as running it on a CPU can create a crash

In [None]:
# Install all the right dependencies
!pip -q install langchain openai tiktoken chromadb pypdf sentence_transformers InstructorEmbedding

In [None]:
# Import all the required Libraries
import openai
import pypdf
from pathlib import Path
import pandas as pd
from operator import itemgetter
import json
import tiktoken
import chromadb

In [None]:
# Import all the required Libraries and modules
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader
from InstructorEmbedding import INSTRUCTOR
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings
from langchain.embeddings import HuggingFaceInstructEmbeddings
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
from sentence_transformers import CrossEncoder, util

In [None]:
# Check on Langchain
!pip show langchain

## 2. <font color = red> Read, Process, and Chunk the PDF Files

In [None]:
# Connect to Google Drive
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
# Set the API key
filepath = "/content/drive/My Drive/GenerativeAI/MateAI/"

with open(filepath + "Jasper_OpenAI_API_Key.txt", "r") as f:
  openai.api_key = ' '.join(f.readlines())

In [None]:
# Set the API key
import os
os.environ["OPENAI_API_KEY"] = openai.api_key

## Load multiple and process documents

In [None]:
# Load and process the text files
loader = DirectoryLoader("/content/drive/My Drive/GenerativeAI/MateAI/rag_assignment/research_articles/", glob="./*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

In [None]:
len(documents)

## Splitting, Chunking of text

In [None]:
#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

In [None]:
len(texts)

In [None]:
texts[10]

## Embedding the text via regular Hugging Face process

In [None]:
# import the HuggingFaceInstructEmbeddings via Hugging Face

from langchain.embeddings import HuggingFaceInstructEmbeddings

instructor_embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl",
                                                      model_kwargs={"device": "cuda"})

### Creating Chroma Vector Database

In [None]:
# Define the path where chroma collections will be stored
chroma_data_path = '/content/drive/My Drive/GenerativeAI/MateAI/rag_assignment/ChromaDB_Data'

In [None]:
# Embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory = chroma_data_path

## Here is the nmew embeddings being used
embedding = instructor_embeddings

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

In [None]:
# persiste the db to disk
vectordb.persist()
vectordb = None

In [None]:
# Now we can load the persisted database from disk, and use it as normal.
vectordb = Chroma(persist_directory=persist_directory,
                  embedding_function=embedding)

## 4. <font color = red> Semantic Search

In this section, we will perform a semantic search of a query in the collections embeddings to get several top semantically similar results.

In [None]:
## Make a retriever
retriever = vectordb.as_retriever(search_kwargs={"k": 1})

In [None]:
# Get a relevant query
query = "What is Flash attention?"

In [None]:
docs = retriever.get_relevant_documents("What is Flash attention?")

In [None]:
retriever.search_type

## Make a chain

In [None]:
# create the chain to answer questions
qa_chain = RetrievalQA.from_chain_type(llm=OpenAI(),
                                  chain_type="stuff",
                                  retriever=retriever,
                                  return_source_documents=True)

## 6. Retrieval Augmented Generation

Now that we have the final top search results, we can pass it to an GPT 3.5 along with the user query and a well-engineered prompt, to generate a direct answer to the query along with citations, rather than returning whole pages/chunks.

In [None]:
## Cite sources

import textwrap

def wrap_text_preserve_newlines(text, width=110):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text

def process_llm_response(llm_response):
    print(wrap_text_preserve_newlines(llm_response['result']))
    print('\n\nSources:')
    for source in llm_response["source_documents"]:
        print(source.metadata['source'])

In [None]:
# Query 1: What is Flash attention?
query = "What is Flash attention?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

In [None]:
# Query 2: What is self attention?
query = "What is self-attention?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

In [None]:
# Query 3: What is multi head attention?
query = "What is multi head attention?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

In [None]:
# Query 4: What does IO-aware mean?
query = "What does IO-aware mean?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

In [None]:
# Query 5: What is tiling in flash-attention?
query = "What is tiling in flash-attention?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

In [None]:
# Query 6: What tools can be used with toolformer?
query = "What tools can be used with toolformer?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

In [None]:
# Query 7: What are the best retrieval augmentations for LLMs?
query = "What are the best retrieval augmentations for LLMs?"
llm_response = qa_chain(query)
process_llm_response(llm_response)

In [None]:
user_query = input("What is your query about the AI research papers? ")

In [None]:
# Querry 8: What are the best retrieval augmentations for LLMs?
llm_response = qa_chain(user_query)
process_llm_response(llm_response)

## Deleting the Chroma Vector Database

In [None]:
# Define the path where chroma collections will be stored
# chroma_data_path = '/content/drive/My Drive/GenerativeAI/MateAI/rag_assignment/ChromaDB_Data'

In [None]:
# !zip -r db.zip ./db
!zip -r ChromaDB_Data.zip '/content/drive/My Drive/GenerativeAI/MateAI/rag_assignment/ChromaDB_Data'

In [None]:
# To cleanup, you can delete the collection
vectordb.delete_collection()
vectordb.persist()

# delete the directory
!rm -rf '/content/drive/My Drive/GenerativeAI/MateAI/rag_assignment/ChromaDB_Data'

## Starting again loading the db

restart the runtime

In [None]:
!unzip db.zip
!unzip '/content/drive/My Drive/GenerativeAI/MateAI/rag_assignment/ChromaDB_Data.zip'

# The End