#### <a id="top"></a>
# <div style="background-color: black; color: green; padding: 10px; font-size: 30px; font-family: consolas; text-align: center; border-radius: 10px;   box-shadow: 5px 5px 15px rgba(0, 0, 0, 0.5), -5px -5px 15px rgba(0, 255, 0, 0.5);"><b>Hybrid RAG Pipeline</b></div>


### Resources
Research Paper: [paper1](https://arxiv.org/pdf/2408.05141) and [paper2](https://arxiv.org/pdf/2408.04948)

In [1]:
%%capture
pip install --upgrade langchain langchain_community sentence-transformers datasets faiss-cpu  chromadb rank_bm25 unstructured 

In [2]:
import torch
from transformers import AutoModelForCausalLM,AutoTokenizer,pipeline
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.retrievers import BM25Retriever,EnsembleRetriever
from langchain import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser


In [3]:

data_path = '/kaggle/input/aws-case-studies-and-blogs'
loader = DirectoryLoader(
    data_path,
    glob="*.txt"  
)
documents = loader.load()

In [4]:
page_content=documents[0].page_content
print(page_content)

SAP customers can fully realize all the benefits of SAP S/4HANA in the AWS Cloud for systems of all sizes.

AWS Backint Agent

Français

SAP S/4HANA on AWS.

SAP S/4HANA on AWS to meet its year-end deadline and improve system performance.

Español

Delivered highly available content to millions of users

Sterling Auxiliaries is now saving time and human resources formerly dedicated to backing up SAP data manually on premises. Inteliwaves helped the company implement

Customer Stories / Manufacturing

Sterling Auxiliaries is an international manufacturer of surfactants and industrial chemicals based in India. To meet its go-live timeline and improve system performance, the company upgraded its on-premises SAP R/3 system to

日本語

The infrastructure setup and onboarding to SAP S/4HANA on AWS was a smooth process, and we’ve had a great experience with Inteliwaves.”

2022

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 500 instances and

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(documents)
print(f"Split into {len(docs)} text chunks.")

Split into 4553 text chunks.



### Embeddings 

In [6]:
model_name = "sentence-transformers/all-MiniLM-l6-v2"
model_kwargs = {'device':'cpu'}
encode_kwargs = {'normalize_embeddings': False}

embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

  embeddings = HuggingFaceEmbeddings(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## **Indexing**

In [7]:
from langchain.vectorstores import FAISS
vectorstore=FAISS.from_documents(docs, embeddings)

## **Retrievers**

In [8]:
retriever = vectorstore.as_retriever()

## **Keyword Retriever**

In [9]:
keyword_retriever = BM25Retriever.from_documents(docs)
keyword_retriever.k =  3

In [10]:
keyword_retriever.get_relevant_documents("How did Sterling Auxiliaries benefit from migrating to SAP S/4HANA on AWS?")


  keyword_retriever.get_relevant_documents("How did Sterling Auxiliaries benefit from migrating to SAP S/4HANA on AWS?")


[Document(metadata={'source': '/kaggle/input/aws-case-studies-and-blogs/Sterling Auxiliaries Case Study _ Amazon Web Services.txt'}, page_content='AWS Partner Inteliwaves, Sterling Auxiliaries deployed\n\n2 weeks\n\nAmazon Elastic Block Store\n\nDeutsch\n\nAWS Backint Agent with\n\nTiếng Việt\n\nThe company has been using SAP software since 2006, with SAP as the foundation for operations at its headquarters and its main factory in the state of Gujarat. The business began migrating from SAP R/3 to SAP S/4HANA at the start of 2022 with the help of\n\nItaliano\n\nไทย\n\nOutcome | Saving Time while Boosting Employee Satisfaction\n\nWithin two weeks, Inteliwaves helped migrate Sterling Auxiliaries’ SAP S/4HANA development, quality, and production environments from its data center servers to AWS. With the support of Inteliwaves, Sterling Auxiliaries was able to go live with SAP S/4HANA by the start of the new financial year.\n\n25–30%\n\nSince launching S/4HANA on AWS, Sterling Auxiliaries h

## **Ensemble Retriever**

In [11]:
ensemble_retriever = EnsembleRetriever(retrievers=[retriever, keyword_retriever], weights=[0.5, 0.5])

In [12]:
ensemble_retriever.get_relevant_documents("How did Sterling Auxiliaries benefit from migrating to SAP S/4HANA on AWS?")

[Document(metadata={'source': '/kaggle/input/aws-case-studies-and-blogs/Sterling Auxiliaries Case Study _ Amazon Web Services.txt'}, page_content='AWS Partner Inteliwaves, Sterling Auxiliaries deployed\n\n2 weeks\n\nAmazon Elastic Block Store\n\nDeutsch\n\nAWS Backint Agent with\n\nTiếng Việt\n\nThe company has been using SAP software since 2006, with SAP as the foundation for operations at its headquarters and its main factory in the state of Gujarat. The business began migrating from SAP R/3 to SAP S/4HANA at the start of 2022 with the help of\n\nItaliano\n\nไทย\n\nOutcome | Saving Time while Boosting Employee Satisfaction\n\nWithin two weeks, Inteliwaves helped migrate Sterling Auxiliaries’ SAP S/4HANA development, quality, and production environments from its data center servers to AWS. With the support of Inteliwaves, Sterling Auxiliaries was able to go live with SAP S/4HANA by the start of the new financial year.\n\n25–30%\n\nSince launching S/4HANA on AWS, Sterling Auxiliaries h

### RAG Pipeline

### LLM

In [13]:

model_name="/kaggle/input/qwen2.5/transformers/0.5b-instruct/1"

tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./cache")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir="./cache",
    torch_dtype=torch.float16,
    device_map="auto",
    low_cpu_mem_usage=True
)


pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,  
    temperature=0.7,
    top_p=0.9,
    do_sample=True,     # Enable sampling
)

llm = HuggingFacePipeline(pipeline=pipe)

  llm = HuggingFacePipeline(pipeline=pipe)


In [14]:
template = """
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: {context}

Question: {input}

Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": ensemble_retriever, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Querying

In [15]:
response = rag_chain.invoke("How did Sterling Auxiliaries benefit from migrating to SAP S/4HANA on AWS?")
print(response)

Human: 
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: [Document(metadata={'source': '/kaggle/input/aws-case-studies-and-blogs/Sterling Auxiliaries Case Study _ Amazon Web Services.txt'}, page_content='AWS Partner Inteliwaves, Sterling Auxiliaries deployed\n\n2 weeks\n\nAmazon Elastic Block Store\n\nDeutsch\n\nAWS Backint Agent with\n\nTiếng Việt\n\nThe company has been using SAP software since 2006, with SAP as the foundation for operations at its headquarters and its main factory in the state of Gujarat. The business began migrating from SAP R/3 to SAP S/4HANA at the start of 2022 with the help of\n\nItaliano\n\nไทย\n\nOutcome | Saving Time while Boosting Employee Satisfaction\n\nWithin two weeks, Inteliwaves helped migrate Sterling Auxiliaries’ SAP S/4HANA development, quality, and production environments from its data center servers to AWS. With the support of Int