In [24]:
## pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

### Installing libraries

In [25]:
!pip install -q langchain
!pip install -q langchain_community
!pip install -q sentence_transformers
!pip install -q bitsandbytes
!pip install -q accelerate

^C


In [9]:
import os
import warnings
warnings.filterwarnings("ignore")
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

### Importing libraries

In [10]:
import torch
if torch.cuda.is_available():
    print("CUDA is available!")
else:
    print("CUDA is not available.")

CUDA is available!


In [11]:
from langchain.llms import CTransformers

In [12]:
llm = CTransformers(model= "model\llama-2-7b-chat.ggmlv3.q4_0.bin",
                    model_type= 'llama',
                    config={'max_new_tokens': 600,
                              'temperature': 0.01,
                              'context_length': 5000})

In [13]:
llm.invoke("What is RAG??")

'\n Unterscheidung between RAG and Agile?\nRAG stands for "Risks, Assumptions, and Gates". It is a tool used in project management to identify, track, and manage risks, assumptions, and milestones in a project. RAG status is typically used in conjunction with Agile methodologies, but it can also be used in other project management frameworks.\n\nRAG is often used in Agile projects to help teams prioritize and manage their work. It provides a simple and visual way to categorize tasks based on their level of risk or uncertainty. Tasks are assigned a RAG status, which can be either Green (low risk), Amber (medium risk), or Red (high risk). This helps teams identify the most critical tasks and allocate resources accordingly.\n\nHere are some key differences between RAG and Agile:\n\n1. Focus: RAG is focused specifically on risk management, while Agile is a broader project management framework that encompasses various aspects of project delivery, including planning, execution, and monitorin

In [14]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings

### using **custom** dataset

#### RecursiveCharacterTextSplitter is a text splitter that splits the text into chunks, trying to keep paragraphs togeher and avoid loosing context over pages

In [15]:
pdf_reader = PyPDFLoader("RAGPaper.pdf")
documents = pdf_reader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

In [16]:
HF_TOKEN = "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"

In [17]:
from langchain.vectorstores import FAISS

# Create embeddings
embeddings = HuggingFaceInferenceAPIEmbeddings(api_key=HF_TOKEN,
                                               model_name="BAAI/bge-base-en-v1.5")
db = FAISS.from_documents(documents=chunks, embedding=embeddings)

# FAISS: Facebook AI Similarity Search --> Powerful library for similarity search and clustering of dense vectors

In [18]:
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

In [19]:
DEFAULT_SYSTEM_PROMPT="""\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. 
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. 
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of 
answering something not correct. If you don't know the answer to a question,
please don't share false information. """

In [20]:
instruction = """
Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow up Input: {question}
Standalone questions: 
"""

In [21]:
SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
template = B_INST + SYSTEM_PROMPT + instruction + E_INST

In [22]:
from langchain.prompts import PromptTemplate
CONDENSE_QUESTION_PROMPT = PromptTemplate(template=template, input_variables=["text"])

In [23]:
from langchain.chains import ConversationalRetrievalChain

qa = ConversationalRetrievalChain.from_llm(llm=llm,retriever=db.as_retriever(),condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                                           return_source_documents=True, verbose=False)

In [24]:
qa

ConversationalRetrievalChain(combine_docs_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:"), llm=CTransformers(client=<ctransformers.llm.LLM object at 0x0000015558E9F9D0>, model='model\\llama-2-7b-chat.ggmlv3.q4_0.bin', model_type='llama', config={'max_new_tokens': 600, 'temperature': 0.01, 'context_length': 5000})), document_variable_name='context'), question_generator=LLMChain(prompt=PromptTemplate(input_variables=['chat_history', 'question'], template="[INST]<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. \nYour answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. \nPlease ensure 

### Ask a query

In [25]:
chat_history=[]
query="""Who is Sachin Tendulkar"""
result = qa({"question":query,"chat_history":chat_history})
print(result["answer"])

 Sachin Ramesh Tendulkar (born April 24, 1973) is a former Indian cricketer and captain who is widely regarded as one of the greatest batsmen in the history of cricket. He was born in Mumbai, India, and made his first-class debut in 1984. Tendulkar scored over 34,000 runs in international cricket, including 15,921 runs in Test cricket, which is the most by any player in history. He also holds several other records, including most centuries scored in Test cricket (51) and most runs scored in a single innings of a Test match (248). Tendulkar was named the ICC Cricketer of the Year in 1997 and 2010, and he was awarded the Bharat Ratna, India's highest civilian honor, in 2014.


In [26]:
chat_history=[]
query="""What is RAGs and tell me more about use cases of RAGs, in a detailed manner"""
result = qa.invoke({"question":query,"chat_history":chat_history})
print(result["answer"])

 RAGs stands for Retrieval-based Autoencoder with Generative modeling. It's a type of neural network architecture that combines the strengths of autoencoders and generative models to perform various NLP tasks such as text generation, language translation, and question answering.

RAGs consist of two main components: a retriever and a generator. The retriever takes in a query and outputs a distribution over text passages, which can be used to retrieve relevant documents or sections of a document. The generator then takes the retrieved documents or sections as additional context and generates the target sequence.

One of the main use cases of RAGs is for language translation tasks. By retrieving relevant documents or sections of a document in the source language, the generator can learn to generate the target language text more accurately. This can be particularly useful when the target language has limited data available for training.

Another use case of RAGs is for question answering 