## Intro to LLMs

This notebooks is a gentle introduction to LLMs and the implications of their usage. It is intended for those with no background in Machine Learning or Deep Learning but some background in data science or a methodologically related field in quantitative analysis.

Use: https://huggingface.co/ggml-org/SmolLM3-3B-GGUF/tree/main (small and good quality)

Explain extra step on top of transformers architecture to make LLMs (reinforcement learning step)

§§

In [8]:
# import llama_cpp library
from llama_cpp import Llama

# Path to your GGUF model file (replace with actual path)
model_path = "llama-2-13b-chat.Q4_K_M.gguf"

# Load model
llm = Llama(model_path=model_path, verbose=False)
llm_nondete = Llama(model_path=model_path, verbose=False)

llama_context: n_ctx_per_seq (512) < n_ctx_train (4096) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_c4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64   

### Example 1: first prompt

In [5]:
# Define a prompt
prompt = "Why is the sky blue?"

# Generate text
output = llm(
    prompt,
    max_tokens=150,        # max length of generated text
    temperature=1,       # randomness (0 = deterministic)
    top_p=0.9              # nucleus sampling
)

print("=== Repsonse ===")
# In llama-cpp-python, the output is a dict with 'choices'
print(output["choices"][0]["text"].strip())

print("=== Full output ===")
# In llama-cpp-python, the output is a dict with 'choices'
print(output["choices"])

=== Repsonse ===
The short answer is that the sky appears blue because of a phenomenon called Rayleigh scattering, which is the scattering of light or other electromagnetic radiation by small particles in the atmosphere. The particles in the atmosphere, such as nitrogen molecules and aerosols, scatter the sun's light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is why the sky appears blue during the daytime, when the sun is overhead and the light we see is predominantly blue.

However, the question of why the sky is blue is more complex than this simple explanation suggests. There are many factors that can affect the color of
=== Full output ===
[{'text': " The short answer is that the sky appears blue because of a phenomenon called Rayleigh scattering, which is the scattering of light or other electromagnetic radiation by small particles in the atmosphere. The particles in the atmosphere, such as nitrogen molecules and aerosol

### Example 2: Hallucination
When the model outputs information that is factually false or fabricated. E.g. Conspiracy theory about Grok (elon musk LLM) ?

In [7]:
# Define a prompt
halluc_prompt = "Who is Railen Ackerby?"

# Generate text
output = llm(
    halluc_prompt,
    max_tokens=150,        # max length of generated text
    temperature=0.7,       # randomness (0 = deterministic)
    top_p=0.9              # nucleus sampling
)

print("=== Repsonse ===")
# In llama-cpp-python, the output is a dict with 'choices'
print(output["choices"][0]["text"].strip())

print("=== Full output ===")
# In llama-cpp-python, the output is a dict with 'choices'
print(output["choices"])

=== Repsonse ===
Railen Ackerby is a 23 year old professional esports player from the United Kingdom. He is best known for his expertise in the popular multiplayer game, Apex Legends, where he competes under the alias "Raiden."

Ackerby began his esports career in 2019, playing for various teams and participating in online tournaments. He quickly gained a reputation as one of the top players in the UK and was soon recruited by professional teams. In 2020, he joined the prestigious esports organization, FaZe Clan, and has since become one of the team's top performers.

Acker
=== Full output ===
[{'text': '\n\nRailen Ackerby is a 23 year old professional esports player from the United Kingdom. He is best known for his expertise in the popular multiplayer game, Apex Legends, where he competes under the alias "Raiden."\n\nAckerby began his esports career in 2019, playing for various teams and participating in online tournaments. He quickly gained a reputation as one of the top players in t

### Example 3: Non-determinism
The model can output distinctly different answers for the same or equivalent input

In [10]:
# Define a prompt
nondete_prompt = "Give a description of what is inside an atom."

# Generate text
output_1 = llm(
    nondete_prompt,
    max_tokens=150,        # max length of generated text
    temperature=1,       # randomness (0 = deterministic)
    top_p=0.9              # nucleus sampling
)

# Generate text
output_2 = llm_nondete(
    nondete_prompt,
    max_tokens=150,        # max length of generated text
    temperature=1,       # randomness (0 = deterministic)
    top_p=0.9              # nucleus sampling
)

print("=== Repsonse 1 ===")
# In llama-cpp-python, the output is a dict with 'choices'
print(output_1["choices"][0]["text"].strip())

print("=== Repsonse 2 ===")
# In llama-cpp-python, the output is a dict with 'choices'
print(output_2["choices"][0]["text"].strip())

=== Repsonse 1 ===
Atoms are the building blocks of everything around us, from the air we breathe to the stars in the sky. Atoms are made up of three main parts: protons, neutrons, and electrons. Protons and neutrons are found in the nucleus (center) of the atom, while electrons orbit the nucleus in energy levels or electron shells.

1. Protons: Protons are positively charged particles that reside in the nucleus of an atom. The number of protons in an atom determines the element the atom represents, with each element having a unique number of protons in its atoms. For example, hydrogen has one proton, while carbon has six protons.
=== Repsonse 2 ===
Atoms are the building blocks of matter. They are made up of three main parts: protons, neutrons, and electrons. Protons and neutrons are found in the nucleus of the atom, which is the central, positively charged part of the atom. Protons are positively charged and neutrons have no charge. Electrons are negatively charged and orbit around t

### Example 4: Outdated knowledge
The model can be outdated to answer questions that are dependent on very recent knowledge which was not known at the time the model was trained

### Example 5: Biases
Model can be biased towards certain stereotypes represented in training data

Prompt specific to country e.g. top authors in Mexico


### Example 6: Inconsistency
The model can sometimes give contradictory answers to logically equivalent prompts

### Example 7: Prompt engineering
Calculation example. 

### Example 8: Compare models

In [None]:
# bias_prompt = "Write a two paragraph story where a nurse, a pilot, and a CEO are having lunch together."
# response = chatbot(bias_prompt, max_new_tokens=500, do_sample=True, top_k=50, temperature=0.7)[0]["generated_text"]
# print(response.strip())

## Evaluation

Discuss evaluation.