# RAG (Retrieval-Augmented Generation) System using Llama 3 405B Model
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1o0X6SXgwjkZ12aGPra7_KYMfPKfZhHCD?usp=sharing#scrollTo=ruXVpm7Op6bL)

####Enhancing language model outputs with relevant external knowledge
####Combining the power of large language models with dynamic information retrieval

## Install required dependencies

### Note: In a Jupyter notebook, you would use !pip install or %pip install


In [None]:
!pip install langchain-openai faiss-cpu langchain openai requests numpy tiktoken langchain-community

Collecting langchain-openai
  Downloading langchain_openai-0.1.17-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.7/46.7 kB[0m [31m725.7 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.2.11-py3-none-any.whl (990 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.3/990.3 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-1.37.0-py3-none-any.whl (337 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.0/337.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     

##Import necessary libraries


In [None]:
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from google.colab import userdata
from langchain_openai import ChatOpenAI
from google.colab import userdata
import requests
import numpy as np
import faiss
import os




## Initialize the language model (LLM)
### We're using the Llama 3 405B model hosted on Fireworks.ai

In [None]:
llm = ChatOpenAI(
    model="accounts/fireworks/models/llama-v3p1-405b-instruct",
    openai_api_key=userdata.get("FIREWORKS_API_KEY"),
    openai_api_base="https://api.fireworks.ai/inference/v1"
)

## Create embeddings for the documents
### Note: We're using OpenAI's embeddings here. You might want to use a different embedding model.


In [None]:
embeddings = OpenAIEmbeddings(openai_api_key=userdata.get("OPENAI_API_KEY"))


## Prepare your document collection using WebBaseLoader
### We're using WebBaseLoader to load content from specified URLs


In [None]:
urls = [
    "https://www.theverge.com/2024/7/23/24204055/meta-ai-llama-3-1-open-source-assistant-openai-chatgpt",
    # Add more URLs as needed
]
loader = WebBaseLoader(urls)
documents = loader.load()

## Split the documents into chunks
### This makes it easier to process and retrieve relevant information

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

## Create embeddings for the documents and store them in a FAISS vector database


In [None]:
db = FAISS.from_documents(texts, embeddings)

## Combine retrieval and generation
#### This sets up the RAG system, combining the FAISS database for retrieval
#### with the Llama 3 405B model for generation

In [None]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
)

## Use the RAG system with example queries
#### You can modify these queries or add more as needed

### Example Query 1

In [None]:
query = "Meta's new Llama 3.1 model outperforms which models?"
result = qa_chain.invoke({"query": query})
print(result['result'])

According to the text, Meta's Llama 3.1 model outperforms OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet on certain benchmarks.


### Example Query 2

In [None]:
query = "give me top features of Llama 3.1 model"
result = qa_chain.invoke({"query": query})
print(result['result'])

Based on the text, here are some of the top features of the Llama 3.1 model:

1. Complex problem-solving: Llama 3.1 is capable of integrating with a search engine API to "retrieve information from the internet based on a complex query and call multiple tools in succession in order to complete your tasks".

2. Advanced natural language processing: It can understand and respond to complex queries and is capable of generating human-like text.

3. Image generation: Meta AI, powered by Llama 3.1, includes a new feature called "Imagine Me" that can scan a face through a phone's camera and then let users insert their likeness into images it generates.

4. Multilingual support: Llama 3.1 will be updated to support new languages, including French, German, Hindi, Italian, and Spanish.

5. Availability across multiple platforms: Llama 3.1 will be first accessible through WhatsApp and the Meta AI website in the US, followed by Instagram and Facebook in the coming weeks, and also on the Quest heads