# Hybrid Search RAG Pipeline in LlamaIndex

This notebook demonstrates how to build a Hybrid Search Retrieval Augmented Generation (RAG) pipeline using Open Source Models using `HuggingFace` and `FastEmbeddings` with `llama-index`

## Setup

First, install the necessary packages:




## Install Necessary Packages and save Access Tokens:

In [None]:
!pip install llama-index-vector-stores-chroma
!pip install llama-index
!pip install llama-index-embeddings-fastembed

In [None]:
!pip install llama-index-llms-huggingface-api

Installing collected packages: huggingface-hub, llama-index-llms-huggingface-api
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.20.3
    Uninstalling huggingface-hub-0.20.3:
      Successfully uninstalled huggingface-hub-0.20.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastembed 0.2.7 requires huggingface-hub<0.21,>=0.20, but you have huggingface-hub 0.23.4 which is incompatible.
transformers 4.41.2 requires tokenizers<0.20,>=0.19, but you have tokenizers 0.15.2 which is incompatible.[0m[31m
[0mSuccessfully installed huggingface-hub-0.23.4 llama-index-llms-huggingface-api-0.1.0


## Set Up Hugging Face API Token

In [None]:
import os
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass("API:")

# Set the API token in the environment variable
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

API:··········


## Load and Split Medical Documents:



In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

## Set Up FastEmbeddings Embeddings and HuggingFace LLM



In [None]:
from llama_index.embeddings.fastembed import FastEmbedEmbedding
# define embedding function
embed_model = FastEmbedEmbedding(model_name="thenlper/gte-large")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

In [None]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(
    model_name="HuggingFaceH4/zephyr-7b-alpha", token=HUGGINGFACEHUB_API_TOKEN
)

## Define LLM and Embedding in Settings

By default LlamaIndex uses OpenAI, so we need to override the settings

In [None]:
from llama_index.core import Settings

Settings.llm = llm

Settings.embed_model = embed_model

## Create Vectorstore with Chroma

In [None]:
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
import chromadb

## Index your document

First we save the data in disk
- Create a Persist directory where the data will be stored
- Define a unique collection for each index.
- Store the data in StorageContext

In [None]:
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [None]:
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=embed_model
)

## Load the index

Notice, when you load, we don't use `documents`

In [None]:
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("summerize Tarun's role at AI Planet")



In [None]:
response.response

"\n\nTarun is currently wearing multiple hats at AI Planet, where he is part of the Data Science team and handles the community. He has worked on Fine Tuning LLMs, building Consultant POC to migrate the enterprise and business into AI, and deploying 6+ state-of-the-art models on Al Planet's AI Marketplace. Additionally, he has organized 30+ live sessions with experts from Google, Weights & Biases, Intel, and more."