# LLAMAINDEX RAG & AYA LLM & GRADIO

### Introduction
Aya 101 is a state-of-the-art, open-source, massively multilingual large language model (LLM) developed by Cohere for AI. It has the remarkable capability of operating in 101 different languages, including over 50 that are considered underserved by most advanced AI models.

### In this notebook, we will go through a step-by-step process of deploying and using the Aya model. We will also build a FAISS powered RAG pipeline using Aya and showcase how enterprises can use this for building AI applications.

### The Aya 101 Model by Cohere for AI
Aya 101 Model by Cohere for AI project is part of an open science endeavor and is a collaborative effort involving contributions from people across the globe.

Aya's goal is to address the imbalance in language representation within AI by developing a model that understands and generates multiple languages, not just the ones that are predominantly represented online.

### Key Facts about Aya
#### - Massively Multilingual: The model supports 101 languages. It also includes over 50 languages rarely seen in AI models.
#### - Open Source: The model, training process, and datasets are all open source.
#### - Groundbreaking Dataset: Aya comes with the largest multilingual instruction dataset released till date, comprising 513 million data points across 114 languages.

Source: Cohere for : https://netraneupane.medium.com/retrieval-augmented-generation-rag-using-llamaindex-and-mistral-7b-228f93ba670f

### Understanding RAG Pipeline
The Retrieval-Augmented Generation (RAG) pipeline has become a powerful tool in the field of LLMs. At its core, the RAG pipeline combines two crucial steps:

1. Retrieval step: Retrieving relevant stored information using Vector Search or Knowledge Graph or simple search.
2. Generation step: Generating coherent text using a combination of contextual knowledge and natural language generation capabilities of LLMs.

This combination allows the system to pull in essential details from a database and then use them to construct detailed and informative responses to user queries.

### LLamaindex As Vector Store
lamaIndex is a library that provides a set of tools for building index structures over unstructured or semi-structured data, such as text documents. It leverages various techniques, including vector stores, to efficiently store and retrieve relevant information based on similarity search.

One of the key components of LlamaIndex is its vector store functionality. Vector stores are designed to store high-dimensional vectors, such as embeddings generated from text data, and enable fast similarity search operations. LlamaIndex supports multiple vector store backends, including FAISS (Facebook AI Similarity Search), which is a popular library for efficient similarity search.In this notebook, we will utilize LlamaIndex's vector store capabilities, specifically with the FAISS backend, to store and retrieve relevant context for the Aya language model. LlamaIndex will handle the process of creating an index over the input documents, generating embeddings for each document or passage, and storing them in the vector store.

When a query is made to the Aya language model, LlamaIndex will use the vector store to find the most similar passages or documents based on the query's embedding. This allows us to retrieve the most relevant context for the given query, which can then be used to augment the language model's understanding and generate more accurate and contextually relevant responses.e.

### Step-by-Step Guide to Building a RAG Pipeline withused

## Setup

In [None]:
# Install necessary dependencies
!pip install -q pypdf
!pip install -q torch
!pip install -q transformers
!pip install -q sentence-transformers
!pip install -q llama-index
!pip install -q gradio

## Import

In [None]:
from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer
from langchain.llms import HuggingFacePipeline

In [None]:
# Load the large language model (Aya)
from llama_index import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt

### Pipeline

In [None]:
llm = LlamaCPP(
    model_url='https://huggingface.co/CohereForAI/aya-101-v1-GGUL/resolve/main/aya-101-v1.Q4_K_M.ggul',
    temperature=0.1,
    max_new_tokens=256,
    context_window=4096,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": -1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)


 ### Setting Up a RAG Pipeline with llamaindex and Gradio

In [None]:
# Load the document
from llama_index import SimpleDirectoryReader, Document

documents = SimpleDirectoryReader(input_files=["./documents/survey_on_llms.pdf"]).load_data()
documents = Document(text="\n\n".join([doc.text for doc in documents]))


In [None]:
# Create the index
import os
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index import VectorStoreIndex, ServiceContext

### Define Function

In [None]:
# Create the index
import os
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index import VectorStoreIndex, ServiceContext

def get_build_index(documents, llm, embed_model="local:sentence-transformers/all-mpnet-base-v2", sentence_window_size=3, save_dir="./vector_store/index"):
    node_parser = SentenceWindowNodeParser(
        window_size=sentence_window_size,
        window_metadata_key="window",
        original_text_metadata_key="original_text"
    )

    sentence_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
        node_parser=node_parser,
    )

    if not os.path.exists(save_dir):
        index = VectorStoreIndex.from_documents(
            [documents], service_context=sentence_context
        )
        index.storage_context.persist(persist_dir=save_dir)
    else:
        index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=sentence_context,
        )

    return index

vector_index = get_build_index(documents=documents, llm=llm, embed_model="local:sentence-transformers/all-mpnet-base-v2", sentence_window_size=3, save_dir="./vector_store/index")

In [None]:
# Create the query engine
from llama_index import MetadataReplacementPostProcessor, SentenceTransformerRerank

def get_query_engine(sentence_index, similarity_top_k=6, rerank_top_n=2):
    postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="sentence-transformers/all-mpnet-base-v2"
    )
    engine = sentence_index.as_query_engine(
        similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
    )
    return engine

query_engine = get_query_engine(sentence_index=vector_index, similarity_top_k=6, rerank_top_n=2)

### Launch a Gradio interface.

In [None]:
# Create Gradio interface
import gradio as gr

def answer_query(query):
    response = query_engine.query(query)
    return str(response)

iface = gr.Interface(
    fn=answer_query,
    inputs=gr.inputs.Textbox(lines=7, label="Enter your query"),
    outputs="text",
    title="LLM Document Query",
    description="Ask a question about the document and get an answer from the LLM.",
)
#iface.launch()

In [None]:
# Launch 
if __name__ == "__main__":
iface.launch(server_name='0.0.0.0', server_port=7865, share=True)