<a href="https://colab.research.google.com/github/almutareb/rag-based-llm-app/blob/main/Mistral_7b_on_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install needed packages and libraries

In [None]:
### Installation on Colab (As Of Oct)
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q -U torch==2.0.1
!pip install -q -U xformers
!pip install -q -U langchain
!pip install -q -U ctransformers[cuda]
!pip install chromadb
!pip install sentence-transformers

## Retrieval Augmented Generation (RAG) with Mistral-7B-Instruct and Chroma DB.

In [None]:
import torch
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain


quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)
model_id = "mistralai/Mistral-7B-Instruct-v0.1"

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_4bit = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto",quantization_config=quantization_config, )
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipeline = pipeline(
        "text-generation",
        model=model_4bit,
        tokenizer=tokenizer,
        use_cache=True,
        device_map="auto",
        max_length=500,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
)

#### Prompt
template = """<s>[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from context below :
{context}
{question} [/INST] </s>
"""

question_p = """What is the date for announcement"""
context_p = """ On August 10 said that its arm JSW Neo Energy has agreed to buy a portfolio of 1753 mega watt renewable energy generation capacity from Mytrah Energy India Pvt Ltd for Rs 10,530 crore."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm = HuggingFacePipeline(pipeline=pipeline)
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response

## GGUF format for commodity hardware (Running on CPU).

In [5]:
import chromadb
from chromadb.config import Settings
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma

mna_news = """In this paper, we aim to develop a large language model (LLM) with the reasoning ability on complex graph data. Currently, LLMs have achieved very impressive performance on various natural language learning tasks, extensions of which have also been applied to study
the vision tasks with data in multiple modalities. However, when it comes to the graph learning tasks, existing LLMs present very serious flaws due to their inherited weaknesses in performing precise mathematical calculation, multi-step logic reasoning, perception
about the spatial and topological factors, and handling the temporal progression. To address such challenges, in this paper, we will investigate the principles, methodologies and algorithms to empower existing
LLMs with the graph reasoning ability, which will have tremendous impacts on the current research of both LLMs and graph learning. Inspired by the latest ChatGPT and Toolformer models, we propose
the Graph-ToolFormer (Graph Reasoning oriented Toolformer) framework to teach LLMs themselves with prompts augmented by ChatGPT to use external graph reasoning API tools. Specifically,
we will investigate to teach Graph-ToolFormer to handle various graph data reasoning tasks in this paper, including both (1) very basic graph data loading and graph property reasoning tasks, ranging
from simple graph order and size to the graph diameter and periphery, and (2) more advanced reasoning tasks on real-world graph data, such as bibliographic paper citation networks, protein molecular
graphs, sequential recommender systems, online social networks and knowledge graphs.
Technically, to build Graph-ToolFormer, we propose to handcraft both the instruction and a small amount of prompt templates for each of the graph reasoning tasks, respectively. Via in-context
learning, based on such instructions and prompt template examples, we adopt ChatGPT to annotate and augment a larger graph reasoning statement dataset with the most appropriate calls of external API
functions. Such augmented prompt datasets will be post-processed with selective filtering and used for fine-tuning existing pre-trained causal LLMs, such as the GPT-J and LLaMA, to teach them how to
use graph reasoning tools in the output generation. To demonstrate the effectiveness of Graph-ToolFormer, we conduct extensive experimental studies on various graph reasoning datasets and tasks,
and have also launched a LLM demo with various graph reasoning abilities. All the source code of Graph-ToolFormer framework, the demo for graph reasoning, and the graph and prompt datasets have been released online at the project github page.
"""

from langchain.schema.document import Document
documents = [Document(page_content=mna_news, metadata={"source": "local"})]
#######################
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
#######################
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")
#######################
retriever = vectordb.as_retriever()
#######################
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)

def run_my_rag(qa, query):
    print(f"Query: {query}\n")
    result = qa.run(query)
    print("\nResult: ", result)

### Ask Queries Now
query =""" What is the focus of this paper and what are the main findings? """
run_my_rag(qa, query)

Query:  What is the focus of this paper and what are the main findings? 



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m

Result:   The focus of this paper is to investigate the ability of Graph-ToolFormer to handle various graph data reasoning tasks. The main findings of this paper include the successful loading and property reasoning on simple graphs, as well as advanced reasoning on real-world graph data such as bibliographic paper citation networks, protein molecular graphs, sequential recommender systems, online social networks, and knowledge graphs.


In [6]:
from langchain.llms import CTransformers
config = {'max_new_tokens': 100, 'temperature': 0}
llm = CTransformers(model='TheBloke/Mistral-7B-Instruct-v0.1-GGUF',model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf", config=config)

template = """<s>[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from context below :
{context}
{question} [/INST] </s>
"""

#### Prompt
question_p = """What approach was used in the paper?"""
context_p = """ Technically, to build Graph-ToolFormer, we propose to handcraft both the instruction and a small amount of prompt templates for each of the graph reasoning tasks, respectively. Via in-context
learning, based on such instructions and prompt template examples, we adopt ChatGPT to annotate and augment a larger graph reasoning statement dataset with the most appropriate calls of external API
functions. Such augmented prompt datasets will be post-processed with selective filtering and used for fine-tuning existing pre-trained causal LLMs, such as the GPT-J and LLaMA, to teach them how to
use graph reasoning tools in the output generation. To demonstrate the effectiveness of Graph-ToolFormer, we conduct extensive experimental studies on various graph reasoning datasets and tasks,
and have also launched a LLM demo with various graph reasoning abilities. All the source code of Graph-ToolFormer framework, the demo for graph reasoning, and the graph and prompt datasets have been released online at the project github page."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

'The approach used in the paper is to handcraft instructions and prompt templates for each graph reasoning task, annotate and augment a larger dataset with ChatGPT, fine-tune existing pre-trained LLMs, and conduct extensive experimental studies on various graph reasoning datasets and tasks.'