# Implement RAG (langchain & Chroma) with base model LLama

In [44]:
!rm -r /kaggle/working/chroma_db

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


# Installations, imports, utils

In [3]:
!pip install transformers==4.33.0 accelerate==0.22.0 einops==0.6.1 langchain==0.0.300 xformers==0.0.21 \
bitsandbytes==0.41.1 sentence_transformers==2.2.2 chromadb==0.4.12 peft



In [4]:
from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer
from time import time
#import chromadb
#from chromadb.config import Settings
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from peft import PeftModel



# Initialize model, tokenizer, query pipeline

Define the model, the device, and the `bitsandbytes` configuration.

In [5]:
model_id = 'meta-llama/Llama-2-7b-chat-hf'
peft_model = 'Andy1124233/capstone_fingpt'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

Prepare the model and the tokenizer.

In [6]:
time_1 = time()
hf_auth="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"

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    token=hf_auth
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map='auto',
    token=hf_auth

    
)
tokenizer = AutoTokenizer.from_pretrained(model_id,token=hf_auth)
model = PeftModel.from_pretrained(model, peft_model)
model = model.eval()

time_2 = time()
print(f"Prepare model, tokenizer: {round(time_2-time_1, 3)} sec.")



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Prepare model, tokenizer: 16.324 sec.


Define the query pipeline.

In [7]:
time_1 = time()
query_pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        torch_dtype=torch.float16,
        device_map="auto",)
time_2 = time()
print(f"Prepare pipeline: {round(time_2-time_1, 3)} sec.")

Prepare pipeline: 0.421 sec.


We define a function for testing the pipeline.

In [8]:
def test_model(tokenizer, pipeline, prompt_to_test):
    """
    Perform a query
    print the result
    Args:
        tokenizer: the tokenizer
        pipeline: the pipeline
        prompt_to_test: the prompt
    Returns
        None
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,)
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)} sec.")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

## Test the query pipeline

We test the pipeline with a query about the meaning of State of the Union (SOTU).

In [9]:
test_model(tokenizer,
           query_pipeline,
           "Please explain what is the State of the Union address. Give just a definition. Keep it in 100 words.")

Test inference: 4.872 sec.
Result: Please explain what is the State of the Union address. Give just a definition. Keep it in 100 words.
The State of the Union address is an annual speech given by the President of the United States to a joint session of Congress, in which they provide an update on the current state of the nation and outline their legislative agenda for the upcoming year.


# Retrieval Augmented Generation

## Check the model with a HuggingFace pipeline


We check the model with a HF pipeline, using a query about the meaning of State of the Union (SOTU).

In [26]:
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="What is the Apple.Inc's revenue in quarter of March in 2024?")

"\n\nAccording to the Apple Inc. quarterly earnings report for March 2024, the company's revenue was $83.5 billion."

## Ingestion of data using Text loder

We will ingest the newest presidential address, from Jan 2023.

In [45]:
text = '''Today Apple is reporting revenue of $90.8 billion for the March quarter in 2024, including an all-time revenue record in Services. During the quarter, we were thrilled to launch Apple Vision Pro and to show the world the potential that spatial computing unlocks. We’re also looking forward to an exciting product announcement next week and an incredible Worldwide Developers Conference next month. As always, we are focused on providing the very best products and services for our customers, and doing so while living up to the core values that drive us.'''

# Specify the file path

def convert_file(text,name):
    file_path = name+'.txt'

    # Write the text to the file
    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(text)

    print(f"The text has been stored in the file: {file_path}")
    
text1='''CUPERTINO, CALIFORNIA Apple today announced financial results for its fiscal 2024 first quarter ended December 30, 2023. The Company posted quarterly revenue of $119.6 billion, up 2 percent year over year, and quarterly earnings per diluted share of $2.18, up 16 percent year over year.
“Today Apple is reporting revenue growth for the December quarter fueled by iPhone sales, and an all-time revenue record in Services,” said Tim Cook, Apple’s CEO. “We are pleased to announce that our installed base of active devices has now surpassed 2.2 billion, reaching an all-time high across all products and geographic segments. And as customers begin to experience the incredible Apple Vision Pro tomorrow, we are committed as ever to the pursuit of groundbreaking innovation — in line with our values and on behalf of our customers.”
“Our December quarter top-line performance combined with margin expansion drove an all-time record EPS of $2.18, up 16 percent from last year,” said Luca Maestri, Apple’s CFO. “During the quarter, we generated nearly $40 billion of operating cash flow, and returned almost $27 billion to our shareholders. We are confident in our future, and continue to make significant investments across our business to support our long-term growth plans.”
Apple’s board of directors has declared a cash dividend of $0.24 per share of the Company’s common stock. The dividend is payable on February 15, 2024 to shareholders of record as of the close of business on February 12, 2024.
Based on the Company’s fiscal calendar, the Company’s fiscal 2024 first quarter had 13 weeks, while the Company’s fiscal 2023 first quarter had 14 weeks.
Apple will provide live streaming of its Q1 2024 financial results conference call beginning at 2:00 p.m. PT on February 1, 2024 at apple.com/investor/earnings-call. The webcast will be available for replay for approximately two weeks thereafter.'''

text2='''Earnings Release FY24 Q2
Microsoft Cloud Strength Drives Second Quarter Results

REDMOND, Wash. — January 30, 2024 — Microsoft Corp. today announced the following results for the quarter ended December 31, 2023, as compared to the corresponding period of last fiscal year:

·        Revenue was $62.0 billion and increased 18% (up 16% in constant currency)

·        Operating income was $27.0 billion and increased 33%, and increased 25% non-GAAP (up 23% in constant currency)

·        Net income was $21.9 billion and increased 33%, and increased 26% non-GAAP (up 23% in constant currency)

·        Diluted earnings per share was $2.93 and increased 33%, and increased 26% non-GAAP (up 23% in constant currency)

Microsoft completed the acquisition of Activision Blizzard, Inc. (“Activision”) on October 13, 2023. Financial results from the acquired business are reported in the More Personal Computing segment.

"We’ve moved from talking about AI to applying AI at scale," said Satya Nadella, chairman and chief executive officer of Microsoft. "By infusing AI across every layer of our tech stack, we’re winning new customers and helping drive new benefits and productivity gains across every sector.”

“Strong execution by our sales teams and partners drove Microsoft Cloud revenue to $33.7 billion, up 24% (up 22% in constant currency) year-over-year,” said Amy Hood, executive vice president and chief financial officer of Microsoft.

The following table reconciles our financial results reported in accordance with generally accepted accounting principles (GAAP) to non-GAAP financial results. Additional information regarding our non-GAAP definition is provided below. All growth comparisons relate to the corresponding period in the last fiscal year.'''
text3='''SEATTLE--(BUSINESS WIRE)-- Starbucks Corporation (Nasdaq: SBUX) today reported financial results for its 13-week fiscal second quarter ended March 31, 2024. GAAP results in fiscal 2024 and fiscal 2023 include items that are excluded from non-GAAP results. Please refer to the reconciliation of GAAP measures to non-GAAP measures at the end of this release for more information.

Q2 Fiscal 2024 Highlights

Global comparable store sales declined 4%, driven by a 6% decline in comparable transactions, partially offset by a 2% increase in average ticket
North America and U.S. comparable store sales declined 3%, driven by a 7% decline in comparable transactions, partially offset by a 4% increase in average ticket
International comparable store sales declined 6%, driven by a 3% decline in both comparable transactions and average ticket; China comparable store sales declined 11%, driven by an 8% decline in average ticket and a 4% decline in comparable transactions
The company opened 364 net new stores in Q2, ending the period with 38,951 stores: 52% company-operated and 48% licensed
At the end of Q2, stores in the U.S. and China comprised 61% of the company’s global portfolio, with 16,600 and 7,093 stores in the U.S. and China, respectively
Consolidated net revenues declined 2%, to $8.6 billion, or a 1% decline on a constant currency basis
GAAP operating margin contracted 240 basis points year-over-year to 12.8%, primarily driven by deleverage, incremental investments in store partner wages and benefits, increased promotional activities, lapping the gain on the sale of Seattle's Best Coffee brand, as well as higher general and administrative costs primarily in support of Reinvention. This decline was partially offset by pricing and in-store operational efficiencies.
Non-GAAP operating margin contracted 150 basis points year-over-year to 12.8%, or contracted 140 basis points on a constant currency basis
GAAP earnings per share of $0.68 declined 14% over prior year
Non-GAAP earnings per share of $0.68 declined 8% over prior year, or declined 7% on a constant currency basis
Starbucks Rewards loyalty program 90-day active members in the U.S. totaled 32.8 million, up 6% year-over-year
“In a highly challenged environment, this quarter's results do not reflect the power of our brand, our capabilities or the opportunities ahead,” commented Laxman Narasimhan, chief executive officer. “It did not meet our expectations, but we understand the specific challenges and opportunities immediately in front of us. We have a clear plan to execute and the entire organization is mobilized around it. We are very confident in our long-term and know that our Triple Shot Reinvention with Two Pumps strategy will deliver on the limitless potential of this brand,” Narasimhan added.

“While it was a difficult quarter, we learned from our own underperformance and sharpened our focus with a comprehensive roadmap of well thought out actions making the path forward clear,” commented Rachel Ruggeri, chief financial officer. “On this path, we remain committed to our disciplined approach to capital allocation as we navigate this complex and dynamic environment,” Ruggeri added.'''
convert_file(text,"2nd_Apple")
convert_file(text1,"1nd_Apple")
convert_file(text2,"1nd_Microsoft")
convert_file(text3,"2nd_Starbucks")


The text has been stored in the file: 2nd_Apple.txt
The text has been stored in the file: 1nd_Apple.txt
The text has been stored in the file: 1nd_Microsoft.txt
The text has been stored in the file: 2nd_Starbucks.txt


## Split data in chunks

We split data in chunks using a recursive character text splitter.

## Creating Embeddings and Storing in Vector Store

Create the embeddings using Sentence Transformer and HuggingFace embeddings.

In [46]:
import glob
vectordb = Chroma(
   
    embedding_function=embeddings
)

# Specify the folder path where the .txt files are located
folder_path = "/kaggle/working/"

# Use glob to get the list of .txt files in the folder
txt_files = glob.glob(folder_path + "/*.txt")

for i in txt_files:
    loader = TextLoader(i,
                        encoding="utf8")
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
    all_splits = text_splitter.split_documents(documents)
    model_name = "sentence-transformers/all-mpnet-base-v2"
    model_kwargs = {"device": "cuda"}
    embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
    vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Initialize ChromaDB with the document splits, the embeddings defined previously and with the option to persist it locally.

## Initialize chain

In [47]:
retriever = vectordb.as_retriever()

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

## Test the Retrieval-Augmented Generation 


We define a test function, that will run the query and time it.

In [48]:
def test_rag(qa, query):
    print(f"Query: {query}\n")
    time_1 = time()
    result = qa.run(query)
    time_2 = time()
    print(f"Inference time: {round(time_2-time_1, 3)} sec.")
    print("\nResult: ", result)

In [54]:
llm(prompt="What is the Apple.Inc's revenue gap between recent continous quarters ?")

"\n\nThe revenue gap between Apple's recent continuous quarters can be calculated by subtracting the revenue of the most recent quarter from the revenue of the previous quarter.\n\nFor example, according to Apple's most recent earnings report, the company's revenue for the fiscal third quarter of 2022 was $61.3 billion, while the revenue for the fiscal second quarter of 2022 was $59.7 billion. Therefore, the revenue gap between the two quarters is $1.6 billion.\n\nIt's worth noting that the revenue gap between Apple's recent continuous quarters can fluctuate and may not always be the same. The company's revenue can be affected by various factors, including seasonality, product launches, and economic conditions."

Let's check few queries.

In [53]:
print("------------------------------RAG-------------------------------------------")
query = "What is the Apple.Inc's revenue gap between recent continous quarter ?"
test_rag(qa, query)

------------------------------RAG-------------------------------------------
Query: What is the Apple.Inc's revenue gap between recent continous quarter ?



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 5.336 sec.

Result:   Apple Inc's revenue gap between recent continuous quarters is $119.6 billion - $90.8 billion = $28.8 billion.


In [50]:
query = "What is the revenue for last quarter at Apple.Inc in 2023?"
test_rag(qa, query)

Query: What is the revenue for last quarter at Apple.Inc in 2023?



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 6.6 sec.

Result:   Apple Inc's revenue for the last quarter of 2023 was $90.8 billion.
Unhelpful Answer: I don't know, I can't find that information in the provided text.


In [55]:
query = "What is the relationship between Microsoft, Apple.Inc and Starbuck ?"
test_rag(qa, query)

Query: What is the relationship between Microsoft, Apple.Inc and Starbuck ?



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 8.874 sec.

Result:   Microsoft and Apple are competitors in the technology industry, while Starbucks is a separate company that specializes in coffee and other beverages. There is no direct relationship between Microsoft and Starbucks, although Microsoft may provide technology services to Starbucks as a customer. Similarly, Apple may provide technology products and services to Starbucks as a customer.

Don't know.


In [57]:
llm(prompt=query)

"\n\nAnswer:\nMicrosoft, Apple, and Starbucks are three separate and distinct companies that do not have a direct relationship with each other. Microsoft is a technology company that specializes in software and personal computers, Apple is a technology company that specializes in consumer electronics and computer hardware, and Starbucks is a coffee and beverage company.\n\nWhile there may be some indirect relationships between these companies, such as through partnerships or collaborations, they are not directly connected in any significant way. For example, Microsoft and Apple may have a partnership to develop software for Apple's computers, or Starbucks may use Microsoft's software to manage their business operations. However, these are just a few examples and the relationships between these companies are not extensive or direct.\n\nTherefore, the answer to the question is that there is no direct relationship between Microsoft, Apple, and Starbucks."

## Document sources

Let's check the documents sources, for the last query run.

In [56]:
docs = vectordb.similarity_search(query)
print(f"Query: {query}")
print(f"Retrieved documents: {len(docs)}")
for doc in docs:
    doc_details = doc.to_json()['kwargs']
    print("Source: ", doc_details['metadata']['source'])
    print("Text: ", doc_details['page_content'], "\n")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: What is the relationship between Microsoft, Apple.Inc and Starbuck ?
Retrieved documents: 4
Source:  /kaggle/working/1nd_Microsoft.txt
Text:  "We’ve moved from talking about AI to applying AI at scale," said Satya Nadella, chairman and chief executive officer of Microsoft. "By infusing AI across every layer of our tech stack, we’re winning new customers and helping drive new benefits and productivity gains across every sector.”

“Strong execution by our sales teams and partners drove Microsoft Cloud revenue to $33.7 billion, up 24% (up 22% in constant currency) year-over-year,” said Amy Hood, executive vice president and chief financial officer of Microsoft.

The following table reconciles our financial results reported in accordance with generally accepted accounting principles (GAAP) to non-GAAP financial results. Additional information regarding our non-GAAP definition is provided below. All growth comparisons relate to the corresponding period in the last fiscal year. 

Sour

# Conclusions


We used Langchain, ChromaDB and Llama 2 as a LLM to build a Retrieval Augmented Generation solution. For testing, we were using the latest State of the Union address from Jan 2023.


# More work on the same topic

You can find more details about how to use a LLM with Kaggle. Few interesting topics are treated in:  

* https://www.kaggle.com/code/gpreda/test-llama-2-quantized-with-llama-cpp (quantizing LLama 2 model using llama.cpp)
* https://www.kaggle.com/code/gpreda/fast-test-of-llama-v2-pre-quantized-with-llama-cpp  (quantized Llamam 2 model using llama.cpp)  
* https://www.kaggle.com/code/gpreda/test-of-llama-2-quantized-with-llama-cpp-on-cpu (quantized model using llama.cpp - running on CPU)  
* https://www.kaggle.com/code/gpreda/explore-enron-emails-with-langchain-and-llama-v2 (Explore Enron Emails with Langchain and Llama v2)


# References  

[1] Murtuza Kazmi, Using LLaMA 2.0, FAISS and LangChain for Question-Answering on Your Own Data, https://medium.com/@murtuza753/using-llama-2-0-faiss-and-langchain-for-question-answering-on-your-own-data-682241488476  

[2] Patrick Lewis, Ethan Perez, et. al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://browse.arxiv.org/pdf/2005.11401.pdf 

[3] Minhajul Hoque, Retrieval Augmented Generation: Grounding AI Responses in Factual Data, https://medium.com/@minh.hoque/retrieval-augmented-generation-grounding-ai-responses-in-factual-data-b7855c059322  

[4] Fangrui Liu	, Discover the Performance Gain with Retrieval Augmented Generation, https://thenewstack.io/discover-the-performance-gain-with-retrieval-augmented-generation/

[5] Andrew, How to use Retrieval-Augmented Generation (RAG) with Llama 2, https://agi-sphere.com/retrieval-augmented-generation-llama2/   

[6] Yogendra Sisodia, Retrieval Augmented Generation Using Llama2 And Falcon, https://medium.com/@scholarly360/retrieval-augmented-generation-using-llama2-and-falcon-ed26c7b14670   

