#Basic Llama 3.2 RAG
Made by: Wilfredo Aaron Sosa Ramos

In [1]:
!pip install aisuite 'aisuite[all]' langchain langchain-core langchain-community langchain_chroma chroma rich

Collecting aisuite
  Downloading aisuite-0.1.6-py3-none-any.whl.metadata (5.7 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.13-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain_chroma
  Downloading langchain_chroma-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting chroma
  Downloading Chroma-0.2.0.tar.gz (5.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting anthropic<0.31.0,>=0.30.1 (from aisuite[all])
  Downloading anthropic-0.30.1-py3-none-any.whl.metadata (18 kB)
Collecting groq<0.10.0,>=0.9.0 (from aisuite[all])
  Downloading groq-0.9.0-py3-none-any.whl.metadata (13 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain
  Downloading langchain-0.3.13-py3-none-any.whl.metadata (7.1 kB)
Collecting la

In [11]:
!pip install -q huggingface_hub

In [2]:
import aisuite as ai

client = ai.Client()

In [3]:
import os
from google.colab import userdata

os.environ["HUGGINGFACE_TOKEN"] = userdata.get("HF_TOKEN")

In [4]:
from rich.console import Console
from rich.markdown import Markdown

console = Console()

def print_md(result):
  markdown = Markdown(result)
  console.print(markdown)

#Llama 3.2 - 3B

In [8]:
import time

llms = [
        "huggingface:meta-llama/Llama-3.2-3B-Instruct"
       ]

results_from_llms = []

def compare_llm(messages):
    execution_times = []
    responses = []
    for llm in llms:
        start_time = time.time()
        response = client.chat.completions.create(model=llm, messages=messages)
        end_time = time.time()
        execution_time = end_time - start_time
        responses.append(response.choices[0].message.content.strip())
        execution_times.append(execution_time)
        print(f"=================={llm}===================")
        print(f"{execution_time:.2f} seconds: {response.choices[0].message.content.strip()}")
        value_for_response = (messages[0]['content'], response.choices[0].message.content.strip())
        results_from_llms.append(value_for_response)

    return responses, execution_times

In [9]:
messages = [
    {"role": "user", "content": "What is a Large Language Model?"},
]
responses, execution_times = compare_llm(messages)

0.21 seconds: A Large Language Model (LLM) is a type of artificial intelligence (AI) model that is trained on a massive corpus of text data to generate human-like language. These models use deep learning techniques and are typically based on neural networks.

LLMs are characterized by several key features:

1. **Scale**: LLMs are typically trained on enormous amounts of text data, often in the billions of parameters and training data points.
2. **Complexity**: LLMs are trained using complex mathematical


#Basic RAG with LangChain:

In [12]:
!pip install -q langchain_huggingface

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/3.0 MB[0m [31m11.1 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m2.9/3.0 MB[0m [31m40.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chromadb 0.5.23 requires tokenizers<=0.20.3,>=0.13.2, but you have tokenizers 0.21.0 which is incompatible.[0m[31m
[0m

In [13]:
from langchain_huggingface import HuggingFaceEndpoint
from langchain_core.prompts import PromptTemplate

question = "What is a Large Language Model?"

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

In [14]:
repo_id = "meta-llama/Llama-3.2-3B-Instruct"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.5,
    huggingfacehub_api_token=userdata.get("HF_TOKEN"),
)
llm_chain = prompt | llm
print(llm_chain.invoke({"question": question}))

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.


 A Large Language Model (LLM) is a type of artificial intelligence (AI) model designed to process and understand human language. It is trained on vast amounts of text data, which enables it to learn patterns, relationships, and structures of language.

Here's a step-by-step explanation:

1. **Training Data**: LLMs are trained on massive amounts of text data, including books, articles, research papers, and conversations. This data is used to teach the model about the structure, syntax, and semantics of language.

2. **Model Architecture**: LLMs are typically based on transformer architectures, which are designed to handle sequential data like text. These models use self-attention mechanisms to weigh the importance of different words or phrases in the input text.

3. **Training Process**: During training, the model is optimized to predict the next word in a sequence based on the context of the previous words. This process is repeated millions of times, with the model adjusting its parame

In [15]:
import tempfile
import uuid
import requests
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100
)

class FileHandler:
    def __init__(self, file_loader, file_extension):
        self.file_loader = file_loader
        self.file_extension = file_extension

    def load(self, url):
        # Generate a unique filename with a UUID prefix
        unique_filename = f"{uuid.uuid4()}.{self.file_extension}"

        try:
            # Download the file from the URL and save it to a temporary file
            response = requests.get(url, timeout=10)
            response.raise_for_status()  # Raise an HTTPError for bad responses

            with tempfile.NamedTemporaryFile(delete=False, prefix=unique_filename) as temp_file:
                temp_file.write(response.content)
                temp_file_path = temp_file.name

        except requests.exceptions.RequestException as req_err:
            print(f"HTTP request error: {req_err}")
            raise req_err
        except Exception as e:
            print(f"An error occurred while downloading or saving the file: {e}")
            raise e

        # Use the file_loader to load the documents
        try:
            loader = self.file_loader(file_path=temp_file_path)
        except Exception as e:
            print(f"No such file found at {temp_file_path}")
            raise e

        try:
            documents = loader.load()
        except Exception as e:
            print(f"File content might be private or unavailable or the URL is incorrect.")
            raise e

        # Remove the temporary file
        os.remove(temp_file_path)

        return documents

def load_pdf_documents(pdf_url: str, verbose=False):
    pdf_loader = FileHandler(PyPDFLoader, "pdf")
    docs = pdf_loader.load(pdf_url)

    if docs:
        split_docs = splitter.split_documents(docs)

        if verbose:
            print(f"Found PDF file")
            print(f"Splitting documents into {len(split_docs)} chunks")

        return split_docs

In [39]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=userdata.get("HF_TOKEN"), model_name="sentence-transformers/all-MiniLM-l6-v2"
)

text = "This is a test document."

query_result = embeddings.embed_query(text)
query_result[:3]

[-0.03833853453397751, 0.12346471101045609, -0.028642931953072548]

In [18]:
template = """Question: {question}

Answer: Let's think step by step.

Use the given contexT: {context}
"""

prompt = PromptTemplate(
  template=template,
  input_variables=["question"]
)

In [20]:
!pip install -q pypdf

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/298.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/298.0 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.0/298.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [21]:
documents = load_pdf_documents("https://arxiv.org/pdf/2402.06196")

In [22]:
documents

[Document(metadata={'source': '/tmp/d6391be0-251c-477e-b597-3a8d55a5b70b.pdfayckslhw', 'page': 0}, page_content='Large Language Models: A Survey\nShervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu\nRichard Socher, Xavier Amatriain, Jianfeng Gao\nAbstract—Large Language Models (LLMs) have drawn a\nlot of attention due to their strong performance on a wide\nrange of natural language tasks, since the release of ChatGPT\nin November 2022. LLMs’ ability of general-purpose language\nunderstanding and generation is acquired by training billions of\nmodel’s parameters on massive amounts of text data, as predicted\nby scaling laws [1], [2]. The research area of LLMs, while very\nrecent, is evolving rapidly in many different ways. In this paper,\nwe review some of the most prominent LLMs, including three\npopular LLM families (GPT, LLaMA, PaLM), and discuss their\ncharacteristics, contributions and limitations. We also give an\noverview of techniques developed to build, and augment L

In [23]:
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(documents, embeddings)

In [30]:
retriever = vectorstore.as_retriever()

In [40]:
from langchain_core.vectorstores import VectorStoreRetriever
# Instead of retriever.query, use retriever.get_relevant_documents
relevant_docs = retriever.get_relevant_documents("What is a LLM?")
# Print or process the relevant documents
for doc in relevant_docs:
    print(doc.page_content)

ability to access and use tools, and to make decisions based on
the given input. They are designed to handle tasks that require
a degree of autonomy and decision-making, typically beyond
simple response generation.
The functionalities of a generic LLM-based agent include:
• Tool Access and Utilization: Agents have the capabil-
ity to access external tools and services, and to utilize
these resources effectively to accomplish tasks.
• Decision Making: They can make decisions based on
the input, context, and the tools available to them,
often employing complex reasoning processes.
As an example, an LLM that has access to a function (or
an API) such as weather API, can answer any question related
to the weather of the specific place. In other words, it can use
APIs to solve problems. Furthermore, if that LLM has access
to an API that allows to make purchases, a purchasing agent
can be built to not only have capabilities to read information
we need to augment the models through some extern

In [41]:
def basic_rag(question):
    template = """Question: {question}

    Answer: Let's think step by step.

    Use the given contexT: {context}
    """
    prompt = PromptTemplate(
        template=template,
        input_variables=["question", "context"]
    )
    relevant_docs = retriever.get_relevant_documents("What is a LLM?")
    llm_chain = prompt | llm
    return llm_chain.invoke({"question": question, "context": relevant_docs})

In [43]:
print(basic_rag("What is a LLM? Define it, give me some examples and define its architecture"))

 A LLM (Large Language Model) is a type of artificial intelligence model designed to process and generate human-like language. These models are trained on vast amounts of text data and can be fine-tuned for specific tasks, such as language translation, text summarization, or conversation response generation.

    Examples of LLMs include:

    1.  Google's BERT (Bidirectional Encoder Representations from Transformers)
    2.  Microsoft's Turing-NLG (Natural Language Generation)
    3.  Amazon's Alexa
    4.  IBM's Watson

    Architecture of an LLM:

    1.  **Input Layer**: The input layer takes in the text data or prompt that the model needs to process.
    2.  **Transformer Encoder**: The transformer encoder is the core component of the LLM. It uses self-attention mechanisms to process the input text and generate output. The encoder consists of multiple layers, each of which applies a series of transformations to the input text.
    3.  **Output Layer**: The output layer generates t