<a href="https://colab.research.google.com/github/curiouscat7/RAG-and-AI-agents/blob/main/hybridsearch_RAG_using_LlamaIndex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hybrid Search RAG Pipeline in LlamaIndex

This notebook demonstrates how to build a Hybrid Search Retrieval Augmented Generation (RAG) pipeline using Open Source Models using `HuggingFace` and `FastEmbeddings` with `llama-index`

## Setup

First, install the necessary packages:

## Note:
After adding the local modules, restart the session to avoid dependency errors.





## Install Necessary Packages and save Access Tokens:

In [1]:
!pip install llama-index-vector-stores-chroma
!pip install llama-index
!pip install llama-index-embeddings-fastembed

Collecting llama-index-vector-stores-chroma
  Downloading llama_index_vector_stores_chroma-0.1.10-py3-none-any.whl.metadata (705 bytes)
Collecting chromadb<0.6.0,>=0.4.0 (from llama-index-vector-stores-chroma)
  Downloading chromadb-0.5.5-py3-none-any.whl.metadata (6.8 kB)
Collecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-vector-stores-chroma)
  Downloading llama_index_core-0.10.64-py3-none-any.whl.metadata (2.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb<0.6.0,>=0.4.0->llama-index-vector-stores-chroma)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb<0.6.0,>=0.4.0->llama-index-vector-stores-chroma)
  Downloading fastapi-0.112.0-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb<0.6.0,>=0.4.0->llama-index-vector-stores-chroma)
  Downloading uvicorn-0.30.5-py3-none-any.whl.metadata (6.6 kB)
Collecting posthog>

In [1]:
!pip install llama-index-llms-huggingface-api

Collecting llama-index-llms-huggingface-api
  Downloading llama_index_llms_huggingface_api-0.1.0-py3-none-any.whl.metadata (1.3 kB)
Downloading llama_index_llms_huggingface_api-0.1.0-py3-none-any.whl (5.0 kB)
Installing collected packages: llama-index-llms-huggingface-api
Successfully installed llama-index-llms-huggingface-api-0.1.0


## Set Up Hugging Face API Token

In [2]:
import os
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass("API:")

# Set the API token in the environment variable
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

API:··········


## Load and Split Medical Documents:



In [4]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

## Set Up FastEmbeddings Embeddings and HuggingFace LLM



In [5]:
from llama_index.embeddings.fastembed import FastEmbedEmbedding
# define embedding function
embed_model = FastEmbedEmbedding()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/706 [00:00<?, ?B/s]

model_optimized.onnx:   0%|          | 0.00/66.5M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(
    model_name="HuggingFaceH4/zephyr-7b-alpha", token=HUGGINGFACEHUB_API_TOKEN
)

## Define LLM and Embedding in Settings

By default LlamaIndex uses OpenAI, so we need to override the settings

In [8]:
from llama_index.core import Settings

Settings.llm = llm

Settings.embed_model = embed_model

## Create Vectorstore with Chroma

In [9]:
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
import chromadb

## Index your document

First we save the data in disk
- Create a Persist directory where the data will be stored
- Define a unique collection for each index.
- Store the data in StorageContext

In [10]:
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [11]:
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=embed_model
)

## Load the index

Notice, when you load, we don't use `documents`

In [12]:
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

In [13]:
query_engine = index.as_query_engine()
response = query_engine.query("summarize this doc in 300 words")

In [14]:
response.response

'\n\nThis document presents a proposal for a data management system for precision medicine, which aims to provide a comprehensive and scalable solution for managing and analyzing large and complex datasets in the context of personalized medicine. The system is designed to address the challenges of data management in precision medicine, including data heterogeneity, data quality, data security, and data privacy.\n\nThe proposed system is based on a modular architecture that allows for flexibility and customization, and includes features such as data ingestion, data cleaning, data transformation, data integration, data analysis, and data visualization. The system is also designed to support data sharing and collaboration, and includes features such as data access control, data provenance, and data governance.\n\nThe system is built using open-source technologies and follows the FAIR principles for data management, which ensure that data is Findable, Accessible, Interoperable, and Reusabl

In [15]:
query_engine = index.as_query_engine()
response = query_engine.query("What makes this process innovative")

In [16]:
response.response

'\n\nThe process proposed in this paper is innovative because it combines the use of machine learning algorithms with a data management system specifically designed for precision medicine. This system allows for the collection, storage, and analysis of large amounts of patient data, which can then be used to develop personalized treatment plans. The use of machine learning algorithms enables the system to identify patterns and trends in the data, which can be used to predict patient outcomes and develop more effective treatments. Additionally, the system is designed to be scalable and adaptable, allowing for the incorporation of new data sources and the development of new treatment options as they become available. Overall, this process represents a significant advancement in the field of precision medicine, providing a more efficient and effective way to develop personalized treatment plans for patients.'