# Hybrid Search RAG Pipeline in LlamaIndex

This notebook demonstrates how to build a Hybrid Search Retrieval Augmented Generation (RAG) pipeline using Open Source Models using `HuggingFace` and `FastEmbeddings` with `llama-index`



## Setup

First, install the necessary packages:




## Install Necessary Packages and save Access Tokens:

In [None]:
!pip install llama-index-vector-stores-chroma
!pip install llama-index
!pip install llama-index-embeddings-fastembed

Collecting llama-index-vector-stores-chroma
  Downloading llama_index_vector_stores_chroma-0.1.10-py3-none-any.whl (5.0 kB)
Collecting chromadb<0.6.0,>=0.4.0 (from llama-index-vector-stores-chroma)
  Downloading chromadb-0.5.3-py3-none-any.whl (559 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.5/559.5 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-vector-stores-chroma)
  Downloading llama_index_core-0.10.51-py3-none-any.whl (15.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m40.6 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb<0.6.0,>=0.4.0->llama-index-vector-stores-chroma)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m54.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fastapi>=0.9

In [None]:
!pip install llama-index-llms-huggingface-api

Collecting llama-index-llms-huggingface-api
  Downloading llama_index_llms_huggingface_api-0.1.0-py3-none-any.whl (5.0 kB)
Collecting huggingface-hub<0.24.0,>=0.23.0 (from llama-index-llms-huggingface-api)
  Downloading huggingface_hub-0.23.4-py3-none-any.whl (402 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m402.6/402.6 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub, llama-index-llms-huggingface-api
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.20.3
    Uninstalling huggingface-hub-0.20.3:
      Successfully uninstalled huggingface-hub-0.20.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastembed 0.2.7 requires huggingface-hub<0.21,>=0.20, but you have huggingface-hub 0.23.4 which is incompatible.
transformers 4.41.2 requires tokenizers<0.20

## Set Up Hugging Face API Token

We'll need a Hugging Face API token to access pre-trained models from their hub.


In [None]:
import os
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass("API:")

# Set the API token in the environment variable
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

API:··········


## Load the Medical Documents


In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

## Set Up FastEmbeddings Embeddings and HuggingFace LLM

Here, we define the models used for text embedding (FastEmbeddings) and retrieval (Hugging Face LLM).


In [None]:
from llama_index.embeddings.fastembed import FastEmbedEmbedding
# define embedding function
embed_model = FastEmbedEmbedding()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

model.onnx:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

In [None]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(
    model_name="HuggingFaceH4/zephyr-7b-alpha", token=HUGGINGFACEHUB_API_TOKEN
)

## Define LLM and Embedding in Settings

By default LlamaIndex uses OpenAI, so we need to override the settings

In [None]:
from llama_index.core import Settings

Settings.llm = llm

Settings.embed_model = embed_model

## Create Vectorstore with Chroma
This section sets up the storage for the document embeddings.


In [None]:
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
import chromadb

## Index your document

First we save the data in disk
- Create a Persist directory where the data will be stored
- Define a unique collection for each index.
- Store the data in StorageContext

In [None]:
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [None]:
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=embed_model
)

## Load the index
Here, we load the pre-built document index.
Notice, when you load, we don't use `documents`

In [None]:
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("how to prevent heart diseases")

In [None]:
response.response

'\n\nTo prevent heart diseases, it is vital to make changes that address each and every risk factor you have. You can make the changes gradually, one at a time. The major risk factors for heart disease that you can do something about are cigarette smoking, high blood pressure, high blood cholesterol, overweight, physical inactivity, and diabetes. By addressing each of these risk factors, you can significantly reduce your risk of developing heart disease. It is also important to realize that having more than one risk factor is especially serious, as risk factors tend to "gang up" and worsen each other\'s effects. Therefore, it is essential to take heart disease risk seriously and make changes to your lifestyle now.'

## Hybrid Search
This section demonstrates how to perform a hybrid search combining text retrieval and vector similarity search.



In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=2, sparse_top_k=12, vector_store_query_mode="hybrid"
)


In [None]:
from IPython.display import display, Markdown

response = query_engine.query(
    "what causes heart attacks"
)

display(Markdown(str(response)))



A heart attack happens when an artery becomes totally blocked with plaque, preventing vital oxygen and nutrients from getting to the heart. This is caused by the buildup of plaque on the arteries' inner walls due to a combination of factors such as high blood pressure, high cholesterol, smoking, and a lack of physical activity.