### RAG with langchain and Llama

---
### 1. Introduction

#### Limitation of LLMs

- know nothing outside trainning data, e.g. up-to-date information, classified/private data
- not specialized in specific use cases
- tend to hallucinate confidently, possibly leading to missinformation
- produce black box output: do not clarify what has led to the generation of particular content

#### Fine Tunning
- Enhances model Performance for specific use case through Transfer Learning with additional data.
- Changes model Parameters, enhancing speed and reducing Cost for specific Tasks
- Powerful tool for :
  - Incorporating non-dynamic or past data.
  - Specific Industries with nuances in writting style, and reducing costs for specific Tasks.
- Cut-off issue persists in absence of up-to-date Information

#### Retrieval Augmented Generation
- Increases model capabilities through.
  - **Retrieving** external and up-to-date information.
  - **Augmenting** the original prompt given to the model.
  - **Generating** response using context plus Information.
- Ground LLM Model parameters remain unchanged (no Transfer Learning).
- Powerful Tool for making use of Dynamic up-to-date Information.
- White Box Output: Provides transparency behind the Model without Hallucination.

#### RAG Framework
<img src="../documents/rag.png" width="950"/>


#### Technolgy Stack


##### [LangChain](https://python.langchain.com/docs/introduction/)

> Framework for developing applications powered by LLMs

> [What is LangChain? By IBM Technology](https://www.youtube.com/watch?v=1bUy-1hGZpI)

##### [FAISS (Facebook AI Similarity Search)](https://ai.meta.com/tools/faiss/)

>  Library allowing storage of contextual embedding vectors in vector database and similarity search
>  [acebook AI Similarity Search FAISS | OpenAI's Embeddings Endpoint | Gen AI OpenAI API in Python](https://www.youtube.com/watch?v=Gx3TzYFaCS8)

##### [Groq](https://groq.com/about-us/)

> Engine providing fast AI inference (conclusion from brand new data) in the cloud

### 2. Warm up

#### Load Credentials

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

#### Define LLM

In [2]:
import warnings
warnings.filterwarnings("ignore")
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama3-8b-8192",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)