# Retrieval Augmented LLM App
<img src="assets/rap_banner.jpeg">

We have covered quite some ground in terms of understanding and building components for:
- Text Representation
- NLP Tasks
- Pretrained Models and Transfer Learning
- Model Fine-Tuning PEFT
- SFT and LLM Landscape
- Vector Databases
- Libraries and Frameworks

Now we will work through development of an app to showcase how we can leverage all the concepts into a fully functioning system

__Note__: In order to keep things simple, we will leverage most high-level APIs available but the overall setup should be easily extensible

## Why Retrieval Augmentation

While theoretically LLMs are capable of having super long context windows, in real world settings this is a challenge because of:
- Inability/Limitation to ensure LLM focusses on correct sub-sections of the context
- High Memory requirements
- High API Cost
- High Latency , etc.


In order to overcome such challenges, we leverage vector databases to act as intelligent retrieval systems (again powered by LLMs) to:
- Provide focussed context
- Reduce memory, cost and latency requirements
- Unlock super-abilities to use upto-date information
- Offload trivial tasks to expert systems

## LangChain 🦜🔗
- [LangChain](https://python.langchain.com/docs/get_started/introduction.html) is a framework for developing LLM powered applications.
- It provides capabilities to connect LLMs to a number of different sources of data
- Provides interfaces for language models to interact with external environment (aka _Agentic_)
- Provides for required levels of abstractions to designing end to end applications

In [None]:
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.embeddings import GPT4AllEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import pandas as pd
import random

## Movies Database

https://www.kaggle.com/datasets/disham993/9000-movies-dataset

In [None]:
DATA_DIR = "data/"

In [None]:
df=pd.read_csv(DATA_DIR + "mymoviedb.csv",lineterminator='\n')
df

In [None]:
df.groupby("Original_Language").size()

In [None]:
languages={
    "hi":"hindi",
    "en":"english",
    "te":"telugu",
    "cn":"chinese",
    "da":"danish",
    "de":"german",
    "es":"spanish",
    "fr":"french",
    "id":"indonesian",
    "it":"italian",
    "ja":"japanese",
    "ko":"korean",
    "nl":"dutch",
    "no":"norwegian",
    "pl":"polish",
    "pt":"portugese",
    "ru":"russian",
    "sv":"swedish",
    "ta":"tamil",
    "th":"thai",
    "tr":"turkish",
    "zh":"chinese",
    "ml":"malayalam"
    }

In [None]:
sample=None
for language in languages:
    temp = df[df["Original_Language"]==language]
    if language == "en":
        temp = temp.sample(frac = 0.1)
    temp["Original_Language"] = languages[language]
    if sample is None:
        sample = temp
    else:
        sample = pd.concat([sample, temp], ignore_index=True)
df=sample
df=df.sample(frac=1)
max_popularity=df["Popularity"].max()
df["Popularity"]=round(df["Popularity"]*100.0/max_popularity)
df["Popularity"]=df["Popularity"].astype(int)
df["Vote_Average"]=round(df["Vote_Average"]*10.0)
df["Vote_Average"]=df["Vote_Average"].astype(int)
df.shape

In [None]:
movies=[]
for i, row in df.iterrows():
    language = row["Original_Language"]
    movie = "title: " + row["Title"]
    movie += "\n" + "overview: " + row["Overview"]
    movie += "\n" + "genre: " + row["Genre"]
    movie += "\n" + "language: " + row["Original_Language"]
    movie += "\n" + "release date: " + row["Release_Date"]
    movie += "\n" + "popularity: " + str(row["Popularity"])
    movie += "\n" + "average rating: " + str(row["Vote_Average"])
    movies.append(movie)

In [None]:
movies[0]

## Vector Databases

<img src="assets/vector_banner.jpg" height="25%">

We started this workshop with **text representation** as one of the key components of any NLP system.
As we progressed from simple Bag of Words setup to highly contextualised Transformer models, we now have rich & dense representations.
The utility of such representations also increased multifold from word/sentence representations to features that can used for a number of downstream tasks.

These representations, also called as vectors or embedding vectors are long series of numbers. Their retrieval and persistence requires specialised database management systems called **Vector Databases**.

Vector Databases are particularly suited for handling data in the form of vectors, embeddings, or feature representations, which are commonly used in various applications like machine learning, natural language processing, computer vision, and recommendation systems.

Key Features:
- High-dimensional Data Support
- Similarity Search
- Indexing Techniques
- Dimensionality Reduction

There are a number of different off-the-shelf options available, such as:
- [ChromaDB](https://www.trychroma.com/)
- [PineCone](https://www.pinecone.io/)
- [Milvus](https://milvus.io/)
- [Weaviate](https://weaviate.io/)
- [AeroSpike](https://aerospike.com/)
- [OpenSearch](https://opensearch.org/)


## Vector Database: ChromaDB

As mentioned above, there are a number of offering available. For this workshop we will make use of
[ChromaDB](https://www.trychroma.com/).

It is a super simple setup which is easy to use. The following figure showcases the overall flow

<img src="assets/chroma_workflow.png">

> Source :[chromadb](https://docs.trychroma.com/)

In [None]:
embeddings = GPT4AllEmbeddings()
db = Chroma.from_texts(movies, embeddings)
retriever = db.as_retriever()

In [None]:
callbacks = [StreamingStdOutCallbackHandler()]
model_path="/Users/amarlalwani/.cache/gpt4all/llama-2-7b-chat.ggmlv3.q4_0.bin"
llm = GPT4All(model=model_path, callbacks=callbacks, verbose=True)

In [None]:
template = """[INST] <<SYS>>
You are a helpful, intelligent and honest assistant. \
If you don't know the answer to the questions asked, \
just say that you don't know, don't try to make up an answer. \
Use three sentences maximum and keep the answer as concise as possible.
<</SYS>>

Use only the context provided between <begin-context> and <end-context> tags to answer the question mentioned \
between <begin-question> and <end-question> tags. Answer only if the context is useful and related to the question.
<begin-context> {context} <end-context>
<begin-question> {question} <end-question>
Note that popularity of a movie ranges from 0 to 100 with 0 being the lowest and 100 being the highest.
Note that rating of a movie ranges from 0 to 100, where 0 is the lowest and 100 is the highest.
Helpful Answer: [/INST]"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
qa = RetrievalQA.from_chain_type(llm, chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
                                 retriever=retriever, verbose=False, return_source_documents=True)

In [None]:
question = """which is the highly rated thriller of year 2016?"""
result = qa({"query": question})

In [None]:
result

In [None]:
question = """which is the best hindi movie?"""
result = qa({"query": question})

In [None]:
question = """which is the most popular hindi movie?"""
result = qa({"query": question})

In [None]:
question = """Which is the most popular hindi movie? It might not be highly rated."""
result = qa({"query": question})

In [None]:
question = """Out of the two movies: Loop Lapeta and Sooryavansham, which is more popular?"""
result = qa({"query": question})

In [None]:
question = """In which month was the movie The Kashmir Files released?"""
result = qa({"query": question})

In [None]:
question = """Do you know of any movie about Kashmir?"""
result = qa({"query": question})

In [None]:
question = """Recommend me a good RomCom"""
result = qa({"query": question})

In [None]:
question = """Recommend me a good RomCom"""
result = qa({"query": question})

In [None]:
question = """What is the movie Kahaani about?"""
result = qa({"query": question})

In [None]:
question = """What is the hindi movie Kahaani about?"""
result = qa({"query": question})

In [None]:
question = """Recommend me a movie about two brothers where one is successful and the other is not."""
result = qa({"query": question})

In [None]:
question = """Recommend me a movie about a scientist who discovers the cure for a deadly disease."""
result = qa({"query": question})

In [None]:
question = """Recommend me a movie about education, \
where it teaches the viewers to chase excellence and not focus on rote learning.
If such a movie exists, also mention the language and release year of the movie."""
result = qa({"query": question})

In [None]:
question = """Recommend me a crime drama which is not very popular but is critically appreciated."""
result = qa({"query": question})

In [None]:
question = """Suggest me a movie which is not very popular but is critically appreciated."""
result = qa({"query": question})

In [None]:
question = """Suggest me a movie which is based on Shakespeare's work."""
result = qa({"query": question})

In [None]:
question = """Suggest me a movie which is the story of a female's triumph in the sexist and patriarchical society."""
result = qa({"query": question})

In [None]:
question = """Suggest me a movie with some nice music and songs."""
result = qa({"query": question})

In [None]:
question = """What is the story of Amitabh Bachchan's Sooryavansham?"""
result = qa({"query": question})

In [None]:
question = """What is the story of Karan Arjun movie?"""
result = qa({"query": question})

In [None]:
question = """Give me a name of an english movie whose popularity is atleast 90"""
result = qa({"query": question})

In [None]:
question = """Give me a name of an english movie whose popularity is above 80 but ratings are below 30"""
result = qa({"query": question})

In [None]:
question = """Recommend me a good movie about Indian immigrants in US"""
result = qa({"query": question})

In [None]:
question = """Recommend me a good movie about Asian immigrants in US"""
result = qa({"query": question})

In [None]:
df[df["Original_Language"]=="hindi"].sort_values("Vote_Average")

In [None]:
df[(df["Release_Date"].str.contains("2016"))&(df["Genre"].str.contains("Thriller"))].sort_values("Vote_Average")

In [None]:
df[df["Title"].str.contains("Most Violent")]

In [None]:
df[(df["Original_Language"]=="chinese")&(df["Popularity"]>=80)&(df["Vote_Average"]<=30)]

## Beyond LangChain

### [LlamaIndex](https://www.llamaindex.ai/)
Similar to langchain, LlamaIndex provides utilities to extend the power of LLMs through various integrations for:
    - Data ingestion
    - Data Indexing
    - Querying

### [LangSmith](https://docs.smith.langchain.com/)
Build production grade applications by providing tools & utilities for
    - Debugging
    - Testing
    - Integrations
    - Token Usage

### [HuggingFace](https://huggingface.co/models?other=LLM)
The defacto standard for not just LLMs but large models across NLP, Computer vision and more.
Libraries such as ``transformers``, ``diffusers``, ``accelerate`` and more provide ease of working
with deep learning models in pytorch/tensorflow. Huggingface now also provides ``model-cards`` and ``model-spaces``
for hosting and executing models on cloud for free.

## [LLM-Foundry](https://github.com/mosaicml/llm-foundry)
Mosaic ML released their own GPT style models based on special features such as [Flash Attention](https://arxiv.org/pdf/2205.14135.pdf) & [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) for efficient/faster
training along with ALiBi for extended context lengths (65k+ tokens). LLM-Foundary is a package built to assist their implementations
for training and fine-tuning LLMs.


