# RAG with Meta Llama 3.2

In [1]:
# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

inference_api_key = getpass()

 ········


In [2]:
import os

os.environ["inference_api_key"] = inference_api_key

In [4]:
from langchain_huggingface import HuggingFaceEndpoint
repo_id ="meta-llama/Llama-3.2-3B-Instruct" #"meta-llama/Llama-3.2-1B-Instruct" #"meta-llama/Llama-3.2-1B"  # mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_new_tokens=100,
    temperature=0.3,
    huggingfacehub_api_token=inference_api_key,
)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\siddhanna.janai\.cache\huggingface\token
Login successful


In [5]:
llm.invoke("Who is Einstein?")

" Albert Einstein (1879-1955) was a renowned German-born physicist who revolutionized our understanding of space, time, and gravity. He is widely regarded as one of the most influential scientists of the 20th century.\nEinstein's contributions to science are numerous and groundbreaking. Some of his most notable achievements include:\n1. **Theory of Relativity**: Einstein's theory of relativity, which includes both special relativity and general relativity, fundamentally changed our understanding of space and time."

# 

In [6]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
import bs4

# Step 1: Load the document from a web url
loader = WebBaseLoader(["https://huggingface.co/blog/llama31"])
documents = loader.load()

# Step 2: Split the document into chunks with a specified chunk size
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
all_splits = text_splitter.split_documents(documents)

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [7]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
embedding_function = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

In [8]:
# Step 3: Store the document into a vector store with a specific embedding model
vectorstore = FAISS.from_documents(all_splits, embedding_function)

In [9]:
from langchain.chains import ConversationalRetrievalChain

#4 Query against your own data
chain = ConversationalRetrievalChain.from_llm(llm,
                                              vectorstore.as_retriever(),
                                              return_source_documents=True)


result = chain.invoke({"question": "What’s new with Llama 3?", "chat_history": []})

In [10]:
result

{'question': 'What’s new with Llama 3?',
 'chat_history': [],
 'answer': ' Llama 3 is the latest iteration in the Llama Guard family, fine-tuned on Llama 3.1 8B. It is built for production use cases, with a 128k context length and multilingual capabilities. Llama 3 can classify LLM inputs (prompts) and responses to detect content that would be considered unsafe in a risk taxonomy.\n\nUnhelpful Answer: Llama 3 is the latest iteration in the Llama Guard family. It is built',
 'source_documents': [Document(metadata={'source': 'https://huggingface.co/blog/llama31', 'title': 'Llama 3.1 - 405B, 70B & 8B with multilinguality and long context', 'description': 'We‚Äôre on a journey to advance and democratize artificial intelligence through open source and open science.', 'language': 'No language found.'}, page_content='Llama Guard 3 is the latest iteration in the Llama Guard family, fine-tuned on Llama 3.1 8B. It is built for production use cases, with a 128k context length and multilingual cap

In [11]:
print(result['answer'])

 Llama 3 is the latest iteration in the Llama Guard family, fine-tuned on Llama 3.1 8B. It is built for production use cases, with a 128k context length and multilingual capabilities. Llama 3 can classify LLM inputs (prompts) and responses to detect content that would be considered unsafe in a risk taxonomy.

Unhelpful Answer: Llama 3 is the latest iteration in the Llama Guard family. It is built


In [12]:
# This time your previous question and answer will be included as a chat history which will enable the ability
# to ask follow up questions.
query = "What are the new models ?"
chat_history = [(query, result["answer"])]

In [14]:
result = chain.invoke({"question": query, "chat_history": chat_history})
print(result['answer'])

 The new models available are the Instruct versions, which support conversational format with 4 roles: Question, System, User, and Model.
