# RAG (Retrieval Augmented Generation) with LlamaIndex and Navarasa 2.0

In this notebook, we will look into setting up RAG pipeline with LlamaIndex and Navarasa 2.0 model (Gemma 7B fine-tuned on 15 Indian languages).

A notable limitation of these models is their lack of up-to-date knowledge, which can lead to inaccuracies or fabrications regarding current facts. For instance, the model would not be aware of the latest information on the 2024 Indian National election dates(released on last saturday). 

To address this issue, we employ Retrieval Augmented Generation (RAG) to provide the model with the correct context, enabling it to deliver accurate responses. Below, we demonstrate how to implement this using LlamaIndex.

#### Installation

In [None]:
!pip install llama-index
!pip install llama-index-embeddings-cohere
!pip install llama-index-llms-huggingface

#### Get the relevant Wikipedia page containing information about the election dates.

In [None]:
import requests
from pathlib import Path

wikipedia_page = '2024_Indian_general_election'

response = requests.get(
    "https://en.wikipedia.org/w/api.php",
    params={
        "action": "query",
        "format": "json",
        "titles": wikipedia_page,
        "prop": "extracts",
        # 'exintro': True,
        "explaintext": True,
    },
).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]
with open(f"{title}.txt", "w") as fp:
    fp.write(wiki_text)

#### Setup Navarasa2.0 finetuned model on HuggingFace Endpoints.

In [None]:
from llama_index.llms.huggingface import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(
    model_name="<DEPLOYED MODEL URL>", 
    token='HF_TOKEN'
)

#### Setup CohereAI multi-lingual embedding model to retrieve context for the query in the Indian language

In [None]:
from llama_index.embeddings.cohere import CohereEmbedding

embed_model = CohereEmbedding(
    cohere_api_key='<COHEREAI API KEY>',
    model_name="embed-multilingual-v3.0",
)

####
RAG Setup

1. Set LLM, Embedding Model, and chunk size.
2. Load Documents.
3. Index Document.
4. Querying.

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# set llm, embedding model and chunk_size
Settings.llm = remotely_run
Settings.embed_model = embed_model
Settings.chunk_size = 512

# load documents
documents = SimpleDirectoryReader(input_files=['2024_Indian_general_election.txt']).load_data()

# index documents
index = VectorStoreIndex.from_documents(documents)

# create query engine
query_engine = index.as_query_engine(similarity_top_k=2)

# querying
response = query_engine.query("2024 में भारतीय चुनाव कब हो रहे हैं?")

#### Hindi

Query: 2024 में भारतीय चुनाव कब हो रहे हैं?

Answer: 19 अप्रैल 2024 से 1 जून 2024 तक।

#### Telugu

Query: 2024లో భారత ఎన్నికలు ఎప్పుడు జరుగుతాయి?

Answer: 19 ఏప్రిల్ 2024 నుంచి 1 జూన్ 2024 వరకు భారత ఎన్నికలు జరుగుతున్నాయి.

#### Marathi

Query: 2024 मध्ये भारतीय निवडणुका कधी होणार आहेत?

Answer: 2024 मध्ये भारतीय निवडणुका 19 एप्रिल ते 1 जून दरम्यान होणार आहेत.

#### Urdu

Query: 2024 میں بھارتی انتخابات کب ہوں گے؟?

Answer: 2024 میں ہندوستانی عام انتخابات 19 اپریل سے 1 جون تک ہوں گے۔

#### Tamil

Query: 2024ல் இந்திய தேர்தல் எப்போது நடைபெறும்?

Answer: 2024 இல் இந்திய தேர்தல் ஏப்ரல் 19 முதல் ஜூன் 1 வரை நடைபெறும்.