# FAISS & LANGCHAIN RAG & AYA LLM & GRADIO

### Introduction
Aya 101 is a state-of-the-art, open-source, massively multilingual large language model (LLM) developed by Cohere for AI. It has the remarkable capability of operating in 101 different languages, including over 50 that are considered underserved by most advanced AI models.

### In this notebook, we will go through a step-by-step process of deploying and using the Aya model. We will also build a FAISS powered RAG pipeline using Aya and showcase how enterprises can use this for building AI applications.

### The Aya 101 Model by Cohere for AI
Aya 101 Model by Cohere for AI project is part of an open science endeavor and is a collaborative effort involving contributions from people across the globe.

Aya's goal is to address the imbalance in language representation within AI by developing a model that understands and generates multiple languages, not just the ones that are predominantly represented online.

### Key Facts about Aya
#### - Massively Multilingual: The model supports 101 languages. It also includes over 50 languages rarely seen in AI models.
#### - Open Source: The model, training process, and datasets are all open source.
#### - Groundbreaking Dataset: Aya comes with the largest multilingual instruction dataset released till date, comprising 513 million data points across 114 languages.

Source: Cohere for : https://www.e2enetworks.com/blog/steps-to-build-rag-pipeline-with-cohere-for-ais-aya-llm

### Understanding RAG Pipeline
The Retrieval-Augmented Generation (RAG) pipeline has become a powerful tool in the field of LLMs. At its core, the RAG pipeline combines two crucial steps:

1. Retrieval step: Retrieving relevant stored information using Vector Search or Knowledge Graph or simple search.
2. Generation step: Generating coherent text using a combination of contextual knowledge and natural language generation capabilities of LLMs.

This combination allows the system to pull in essential details from a database and then use them to construct detailed and informative responses to user queries.

### FAISS As Vector Store
FAISS, which stands for Facebook AI Similarity Search, is a library developed by Facebook AI that enables efficient similarity search. It provides algorithms to quickly search and cluster embedding vectors, making it suitable for tasks such as semantic search and similarity matching.

FAISS can handle large databases efficiently and is designed to work with high-dimensional vectors, allowing for fast and memory-efficient similarity search.

In this notebook, we will use FAISS as our Vector Store, which will provide context to the Aya LLM. We will also use LangChain for building the pipeline.

### Step-by-Step Guide to Building a RAG Pipeline with Aya

### Choosing a GPU node
For non API V100 GPU node was used

## Setup

In [None]:
!pip install torch transformers langchain faiss-cpu gradio PyPDF2 sentence-transformers -q

## Import

In [None]:
from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer
from langchain.llms import HuggingFacePipeline

In [None]:
# Set up the quantization config.

bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

In [None]:
# Load the model and the tokenizer.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

checkpoint = "CohereForAI/aya-101"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
aya_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, quantization_config=bnb_config,)

### Pipeline

In [None]:
# Create a query pipeline.

query_pipeline = transformers.pipeline(
        "text2text-generation",
        model=aya_model,
        tokenizer=tokenizer,
        torch_dtype=torch.float16,
        device_map="auto",
        max_length = 512,
        early_stopping=True,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
)


 ### Setting Up a RAG Pipeline wit Langchain andh Gradio

In [3]:
# creating the class, which takes a list of dictionaries representing the document sources.

In [None]:
# Import the necessary modules.
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import FAISS


In [None]:
# Define a text splitter to break down the uploaded documents into smaller chunks.

text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=20,
length_function=len,
is_separator_regex=False,
)

In [None]:
# Load an embedding model to vectorize the text in the document.
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

### Define Function

In [None]:
# Define a function to create a question-answering chain from the uploaded documents.
import gradio as gr
import os
from shutil import copyfile

def create_retrieval_chain(files):
    docs = []

    for file_path in files:
        if file_path.lower().endswith('.pdf'):  # Check if the file is a PDF
            loader_temp = PyPDFLoader(file_path)
            docs_temp = loader_temp.load_and_split(text_splitter=text_splitter)
            docs += docs_temp
        else:
            return (f"Please upload PDF files only")

    for doc in docs:
        doc.page_content = doc.page_content.replace('\n', ' ')

    vectordb = FAISS.from_documents(documents=docs, embedding=embeddings)
    retriever = vectordb.as_retriever()

    global qa

    qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
    )

    return f"Process PDF files. They can be queried now"

In [None]:
# Define another function to answer the queries based on context retrieved from the documents.
def process_query(query):
    response = qa.invoke(query)
    return response

### Launch a Gradio interface.

In [None]:
# Define the Gradio interface
iface_save_pdf = gr.Interface(fn=create_retrieval_chain,
                     inputs=gr.Files(label="Upload Files", type='filepath'),
                     outputs="text",
                     title="PDF Uploader",
                     description="Upload multiple files. Only PDF files will be saved to disk.")

iface_process_query = gr.Interface(fn=process_query,
                                   inputs=gr.Textbox(label="Enter your query"),
                                   outputs="text",
                                   title="Query Processor",
                                   description="Enter queries to get responses.")

iface_combined = gr.TabbedInterface([iface_save_pdf, iface_process_query], ["PDF Upload", "Query Processor"])

In [None]:
# Launch the combined interface
if __name__ == "__main__":
    iface_combined.launch(server_name='0.0.0.0', server_port=7865, share=True)