# üåü LLM
## What is an LLM?

**LLM** stands for **Large Language Model**.
It is a type of artificial intelligence trained on huge amounts of text so it can understand and generate human-like language.

‚úîÔ∏è Think of an LLM as:

A super-advanced text-prediction machine.
Similar to how your phone predicts the next word when you type, but thousands of times more powerful.

‚úîÔ∏è What can an LLM do?

- Answer questions
- Summarize text
- Translate languages
- Write code
- Have conversations
- Help with homework or explanations

‚úîÔ∏è Why ‚Äúlarge‚Äù?

Because it uses billions of parameters (mathematical patterns learned from text). The more parameters, usually the more capable the model is.

# What is a Neural Network?

A neural network is a computer model inspired by the human brain.

‚úîÔ∏è Basic idea:

- The brain has **neurons** that pass signals to each other.
- A neural network has artificial neurons, which are just **math** functions.
- Each connection has a **weight** (a number the network learns).

‚úîÔ∏è Goal:

Neural networks learn patterns.

Example:

- Show pictures of cats ‚Üí it learns to recognize cats
- Show a lot of English sentences ‚Üí it learns grammar patterns

# Early Neural Networks

Early neural networks were small and could only solve simple problems.

Types included:

- Perceptrons (**1950s**): basic binary classifiers
- Feedforward networks (1980s‚Äì1990s)
- Convolutional Neural Networks (CNNs) for images
- Recurrent Neural Networks (RNNs) for sequences
- ...

They worked‚Ä¶ but not very well for long text.

<p align="center">
  <img src="images/perceptron.png" width="300">
</p>

# Deep Learning
Deep learning ‚Üí Neural networks with **many layers** (‚Äúdeep‚Äù networks)
A deep learning model learns by adjusting millions or billions of internal numbers called **weights**.

<p align="center">
  <img src="images/deep_network.png" width="300">
</p>


## Why the word "deep"?
Because the neural network has many layers stacked on top of each other:
Input ‚Üí Layer 1 ‚Üí Layer 2 ‚Üí Layer 3 ‚Üí ‚Ä¶ ‚Üí Layer N ‚Üí Output


# ‚ö° 1. The Breakthrough: Transformers (2017)
In 2017, **Google** published a paper called:

**"Attention is All You Need"**

This introduced the Transformer architecture.

<p align="center">
  <img src="images/transformer.png" width="300">
</p>


‚úîÔ∏è What made Transformers special?

Attention mechanism.

It lets the model:

- Look at all words at once
- Decide which words are important
- Remember long-range relationships

Example:

In the sentence
‚ÄúThe cat that chased the dog was hungry.‚Äù
The model can link ‚Äúcat‚Äù ‚Üî ‚Äúwas hungry‚Äù, even though they are far apart.
This solved the ‚Äúforgetting‚Äù problem.

‚úîÔ∏è Why Transformers enabled huge models:

- They can process text in parallel
- They scale extremely well with more data
- They learn long-range meaning and structure

This architecture is the foundation of all modern LLMs:
- GPT series (OpenAI)
- Llama (Meta)
- Gemini (Google)
- Claude (Anthropic)
- Mistral models

# ‚öôÔ∏è 2. GPU?

A **GPU (Graphics Processing Unit)** is a special type of processor originally designed for one job:

üëâ Render graphics for video games.

To do this, a GPU needs to perform millions of small math operations in parallel (at the same time).
This ‚Äúparallel processing‚Äù is EXACTLY what neural networks need.

Neural networks rely heavily on:

- matrix multiplication
- vector operations
- linear algebra

GPUs accelerate these operations thousands of times faster than CPUs.

Before around 2010, GPUs were mostly for gaming.

Then a breakthrough happened:
- Nvidia created CUDA (2006), allowing GPUs to run general-purpose math
- Researchers discovered GPUs dramatically speed up neural networks
- Deep learning started exploding

Without GPUs:
- Training an LLM would take hundreds of years
- Transforming billions of parameters would be impossible

Today‚Äôs massive GPU clusters (thousands of GPUs working together) make LLM training feasible.

# üåê 3. Huge Amounts of Data

Early neural networks had:

- tiny datasets
- low-quality text sources
- limited access to the internet

To train an LLM you need massive amounts of text, such as:

- billions of web pages
- books
- Wikipedia
- forums
- code repositories

This scale of data did not exist or was not easily accessible 20 years ago.

Thanks to the internet age, researchers can now build datasets containing trillions of tokens.

### üöÄ When these three things came together, LLMs became possible

<p align="center">
  <img src="images/chatGPT.png" width="300">
</p>

A **generative pre-trained transformer (GPT)** is a type of **large language model (LLM)** that is widely used in generative AI chatbots.GPTs are based on a **deep learning** architecture called the **transformer**. They are **pre-trained** on large datasets of unlabeled content, and able to generate novel content.

Parameters:
- A perceptron with 2 inputs -> 3 parameters
- GPT-3 -> 175 billion parameters
- GPT-5 -> 1.7‚Äì1.8 trillion parameters

## üß† Hallucination?

In the context of Large Language Models (LLMs) like ChatGPT:

Hallucination means the model makes up information that is false, inaccurate, or completely fabricated, but presents it as if it were true.

‚úîÔ∏è Examples:

- Fictional facts:

User: ‚ÄúWho won the Nobel Prize in 2020 for physics?‚Äù
Model: ‚ÄúIt was Dr. John Smith.‚Äù (Incorrect)

- Made-up citations:

User: ‚ÄúGive me a reference for this topic.‚Äù
Model: ‚ÄúDoe, J. (2019). Advanced AI Studies. Journal of AI.‚Äù (Doesn‚Äôt exist)

- Confident but wrong reasoning:

User: ‚ÄúExplain how unicorns fly.‚Äù
Model: Provides a plausible-sounding explanation even though unicorns aren‚Äôt real.

## ‚öôÔ∏è Why Do LLMs Hallucinate?

LLMs like GPT are not **‚Äúthinking machines‚Äù** ‚Äî they are pattern predictors. They generate text based on statistical patterns learned from data.

Key reasons:
### 1Ô∏è‚É£ They predict the most likely next word

LLMs are trained to continue text plausibly, not necessarily correctly.
They don‚Äôt have a true understanding of facts‚Äîthey just generate what sounds right.

Analogy:
Imagine a very smart autocomplete keyboard‚Äîit will suggest the most probable next word, even if it‚Äôs wrong.

### 2Ô∏è‚É£ Training data is imperfect

LLMs learn from text on the internet, books, code, articles‚Ä¶
Some of that data is wrong, biased, or fictional.
The model can reproduce errors from the data.

### 3Ô∏è‚É£ Lack of real-world grounding

LLMs don‚Äôt access real-time data (unless connected to a knowledge base or plugin).
They can‚Äôt check facts on their own‚Äîthey rely on patterns they learned during training.

### 4Ô∏è‚É£ Ambiguity in the prompt

If the user asks vague or creative questions, the model may generate plausible-sounding but incorrect answers.

Example: ‚ÄúExplain how humans breathe underwater‚Äù ‚Üí generates imaginative explanation, because it tries to be helpful even if impossible.

# Connecting to OpenAI 


In [None]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI


In [None]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found!")

In [None]:
openai = OpenAI()

# Frontier model

In [None]:
message = "Hello, GPT! This is my first ever message to you!"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

In [None]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [None]:
# Let's try one out. Change the website and add print statements to follow along.

person = Website("https://fa.wikipedia.org/wiki/%D8%AD%D8%A7%D9%81%D8%B8")
print(person.title)
print(person.text)

## Types of prompts

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [None]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [None]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [None]:
print(user_prompt_for(person))

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```
To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

In [None]:
messages = [
    {"role": "system", "content": "ÿ¥ŸÖÿß €å⁄© ÿØÿ≥ÿ™€åÿßÿ± ÿ¥ŸàÿÆ ÿ∑ÿ®ÿπ Ÿáÿ≥ÿ™€åÿØ"},
    {"role": "user", "content": "€≤ + €≤ ⁄ÜŸÜÿØ ŸÖ€åÿ¥Ÿáÿü"}
]

In [None]:
# To give you a preview -- calling OpenAI with system and user messages:

response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

In [None]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [None]:
messages_for(person)

In [None]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [None]:
summarize("https://fa.wikipedia.org/wiki/%D8%AD%D8%A7%D9%81%D8%B8")

In [None]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [None]:
display_summary("https://fa.wikipedia.org/wiki/%D8%AD%D8%A7%D9%81%D8%B8")

# Open Source model
**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`. 

In [None]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

In [None]:
# Constants

OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"

In [None]:
# Create a messages list using the same format that we used for OpenAI

messages = [
    {"role": "user", "content": "Describe some of the business applications of Generative AI"}
]

In [None]:
payload = {
        "model": MODEL,
        "messages": messages,
        "stream": False
    }

In [None]:
# Let's just make sure the model is loaded

!ollama pull llama3.2

In [None]:
# If this doesn't work for any reason, try the 2 versions in the following cells
# And double check the instructions in the 'Recap on installation of Ollama' at the top of this lab
# And if none of that works - contact me!

response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)
print(response.json()['message']['content'])

# Avalai