#  Retrieval-Augmented Generation (RAG)
Real-Time and Accurate Information: RAG provides up-to-date, domain-specific knowledge by retrieving from external sources, reducing hallucinations compared to static LLMs.

Efficiency: RAG achieves high performance with smaller models by relying on external retrieval, saving computational resources.

Dynamic Updates: External knowledge in RAG can be updated without retraining the model.

Context-Aware Responses: RAG tailors outputs to specific queries using retrieved documents for better contextual relevance.

Transparency: RAG links answers to specific sources, enhancing credibility and trust.

Customization: Easy to adapt RAG for niche applications by curating specialized knowledge bases.

Cost-Effectiveness: RAG lowers inference costs and memory requirements by externalizing knowledge.

Generalization: Combines retrieval and generation for better performance across diverse domains.

Improved Accuracy: Boosts reliability by referencing external data, unlike LLMs trained only on static datasets.

Smaller Memory Footprint: Offloads storage to external systems, reducing model size significantly.




In [None]:
!pip install langchain
!pip install langchain-community
!pip install sentence-transformers
!pip install faiss-cpu
!pip install bs4
!pip install langchain-groq

Using a LLM with RAG

In [3]:
# Import necessary libraries
from groq import Groq
import os

# Set up your Groq client with necessary authentication
os.environ["GROQ_API_KEY"] = "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"  # Replace with your actual API key
client = Groq()

# Define the model and conversation context
model = "llama-3.2-1b-preview"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What’s new with apple m4?"}
]

# Send the request to Groq's chat completion endpoint
completion = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=0.7,  # Adjust for creativity
    max_tokens=500,   # Limit token output
    top_p=0.9,        # Use nucleus sampling
    stream=True       # Enable streaming for real-time response
)

# Process and display the response
print("Generated Response:\n")
for chunk in completion:
    content = chunk.choices[0].delta.content or ""
    print(content, end="", flush=True)
print("\n")  # Ensure a newline at the end


Generated Response:

The Apple M4 is a recent-generation mobile processor developed by Apple Inc. It was announced in September 2021 and released in November 2021 for iPhone 13 series devices.

Here are some key features and updates about the Apple M4:

1. **Improved Performance**: The M4 processor is said to deliver a significant boost in performance, thanks to its new architecture and improvements in the CPU and GPU.
2. **Enhanced Power Efficiency**: The M4 processor is also designed to be more power-efficient, which could lead to improved battery life and reduced heat generation.
3. **Support for 5G**: The M4 processor supports 5G connectivity, which enables faster data transfer rates and lower latency.
4. **New Cores and Threads**: The M4 processor features a new core architecture with multiple threads, which allows for improved multitasking and efficiency.
5. **Integrated GPU**: The M4 processor includes an integrated GPU, which provides improved graphics performance and could ena

### The text generated above is entirely inaccurate because the model was trained prior to the release of the Apple M4 chip and, as a result, has no knowledge of the chip's existence. The training cutoff was in December 2023 and M4 released in 2024.

In [2]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
import bs4

# Step 1: Load the document from a web url
loader = WebBaseLoader(["https://en.wikipedia.org/wiki/Apple_M4"])
documents = loader.load()

# Step 2: Split the document into chunks with a specified chunk size
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
all_splits = text_splitter.split_documents(documents)

# Step 3: Store the document into a vector store with a specific embedding model
vectorstore = FAISS.from_documents(all_splits, HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))

  vectorstore = FAISS.from_documents(all_splits, HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))
  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
import os
from getpass import getpass

GROQ_API_TOKEN = getpass()

os.environ["GROQ_API_KEY"] = GROQ_API_TOKEN

In [8]:
from langchain_groq import ChatGroq
llm = ChatGroq(temperature=0, model_name="llama-3.2-1b-preview")

In [9]:
from langchain.chains import ConversationalRetrievalChain

# Query against your own data
chain = ConversationalRetrievalChain.from_llm(llm,
                                              vectorstore.as_retriever(),
                                              return_source_documents=True)

# no chat history passed
result = chain({"question": "What’s new with apple m4?", "chat_history": []})
result['answer']

"According to the provided context, the Apple M4 is a series of ARM-based system on a chip (SoC) designed by Apple Inc. It was introduced in May 2024 for the iPad Pro (7th generation). The M4 chip is the fourth generation of the M series Apple silicon architecture, succeeding the Apple M3.\n\nSome of the notable features of the Apple M4 include:\n\n* Support for hardware-accelerated AV1 decoding\n* Hardware-accelerated mesh shading and ray tracing\n* Support for the iPad Pro (7th generation)'s Tandem OLED display\n* Upgraded neural engine\n* Higher core count\n\nAdditionally, the M4 chip was followed by the professional-focused M4 Pro and the M4 Max."

In [10]:
query = "What improvements?"
chat_history = [(query, result["answer"])]
result = chain({"question": query, "chat_history": chat_history})
result['answer']

"The Apple M4 Pro and M4 Max chips are part of the Apple M4 series, which is a line of system-on-chip (SoC) processors designed for Apple's Mac lineup. Here are the key features of the M4 Pro and M4 Max chips:\n\n**M4 Pro:**\n\n1. **CPU:**\n\t* Up to 14-core CPU with 10 performance cores and 4 efficiency cores\n\t* Up to 20-core GPU with Apple's custom GPU architecture\n2. **Memory:**\n\t* Up to 64GB unified memory (Mac Mini)\n\t* Theoretical maximum bandwidth of 273GB/sec\n3. **GPU:**\n\t* Custom GPU with 20-core GPU\n\t* Apple claims it is twice as powerful as the M2's GPU\n4. **Other features:**\n\t* Supports up to 4K video recording at 60fps\n\t* Supports up to 120fps gaming\n\t* Supports Apple's MagSafe charging technology\n\t* Supports Wi-Fi 6E and Bluetooth 5.3\n\n**M4 Max:**\n\n1. **CPU:**\n\t* Up to 16-core CPU with 12 performance cores and 4 efficiency cores\n\t* Up to 24-core GPU with Apple's custom GPU architecture\n2. **Memory:**\n\t* Up to 64GB unified memory (Mac Mini)\n