# Day 1: Introduction to Large Language Models (LLMs) and Prompt Engineering

**Summary:** In this notebook we will explore the origins and evolution of large language models, understand how they are trained and the data that powers them, examine real‑world use‑cases across industries, and finish with hands‑on practice connecting to an LLM and crafting effective prompts.

## 1. A Brief History of LLMs

Large language models trace their lineage to early *n‑gram* language models of the 1990s, through **word2vec** (2013) and **seq2seq** models (2014), to the breakthrough **Transformer** architecture introduced in 2017.  
Key milestones:

| Year | Milestone | Why it mattered |
|------|-----------|-----------------|
| 2017 | Attention Is All You Need | Introduced the Transformer, enabling parallel training and long‑range context |
| 2018 | GPT‑1 | First generative pre‑trained Transformer; showed the power of pre‑training + fine‑tuning |
| 2020 | GPT‑3 (175 B params) | Demonstrated emergent zero‑shot and few‑shot abilities at scale |
| 2022 | Instruction‑tuned (InstructGPT) | Reinforcement learning from human feedback makes models more helpful |
| 2023‑2025 | GPT‑4, Claude‑3, Gemini, open‑weight 70‑B+ models | Multimodal input, tool use, and rapid ecosystem growth |



## 2. How LLMs Are Trained

LLM training is typically split into two (sometimes three) phases:

1. **Pre‑training** – self‑supervised learning on trillions of tokens (web crawl, books, code, research papers). Objective: predict the next token.  
2. **Instruction‑tuning** – supervised fine‑tuning on curated *instruction → response* pairs (~500k‑2M examples).  
3. **Alignment (RLHF / DPO / RLAIF)** – align model behavior with human preferences through reinforcement learning from human feedback or preference distillation.

> **Compute formula (rough)**: *Training FLOPs* ≈ 6 × #parameters × #tokens. Scaling laws show loss ∝ (compute)^‑α.



## 3. What Data Do LLMs Use?

| Data source | Typical share | Examples |
|-------------|--------------|----------|
| Common Crawl snapshots | 40‑60 % | raw web pages, deduplicated & filtered |
| Books | 10‑20 % | Gutenberg, Books3, proprietary corpora |
| Wikipedia | <3 % | English & multilingual dumps |
| Scientific papers | 5‑10 % | arXiv, PubMed |
| Code | 5‑10 % | GitHub, BigQuery GH datasets |
| Dialog / examples | <1 % | ShareGPT, Stack Exchange, reddit |

Modern pipelines include *deduplication, toxicity filtering, language balancing,* and *quality scores* (e.g., **CCNet** perplexity filtering).



## 4. Practical Applications in Industry

| Sector | Use‑case | Impact |
|--------|----------|--------|
| Customer support | Automated chat, email triage | 24/7 availability, cost reduction |
| Healthcare | Medical note summarization, patient Q&A (HIPAA compliant) | Clinician time savings |
| Finance | Report generation, SEC filing Q&A, risk analysis | Faster insight discovery |
| Legal | Contract analysis, clause extraction | Reduce review time |
| Software dev | Code completion, refactoring, test generation | ↑ developer productivity |
| Education | Adaptive tutoring, content generation | Personalized learning at scale |



## 5. Workshop Roadmap

Over the next two weeks we will cover:

1. **Day 1** – History, training, data, applications, prompt engineering fundamentals  
2. **Day 2** – Retrieval‑Augmented Generation (RAG), Vector databases & embeddings    
3. **Day 3** – Agents
4. **Day 4** – Workflows  
5. **Day 5** – Multi‑Agent Systems (MAS)  
6. **Week 2** – Project work



## 6. Hands‑On 🚀 – Set Up Your Development Environment

Follow the steps below **before** running any code cells:

1. **Install dependencies**

```bash
pip install --upgrade langchain-openai python-dotenv rich
```

2. **Add your API credentials**

Create a file named `.env` in the same directory with

```
OPENAI_API_KEY="..."
OPENAI_ORGANIZATION="..."
```

3. **Reload the notebook kernel** so that environment variables are picked up.


### Activity 1 – Prompt Effect Experiment

1. Run the baseline prompt:

```python
llm.invoke("Translate the following sentence to French: 'The weather is nice today.'")
```

2. Now add context and style instructions:

```python
system_prompt = "You are a poetic translator that prefers elegant, formal French."
llm_poetic = llm.with_system_message(system_prompt)
llm_poetic.invoke("Translate the following sentence to French: 'The weather is nice today.'")
```

3. Compare outputs. Which version is more formal?  
4. Try adjusting **temperature** and **max_tokens** in `ChatOpenAI` constructor and observe differences.



### Activity 2 – Effect of Adding Context

Create two calls:

```python
question = "Who wrote 'Pride and Prejudice'?"

# Call 1: No context
res_plain = llm.invoke(question)

# Call 2: With context embedded in the prompt
context = "Answer in one short sentence."
res_context = llm.invoke(f"{context} {question}")
print("Plain:", res_plain)
print("With context:", res_context)
```

> **Discussion:** How does the additional instruction change the answer length and style?



In [2]:
!uv pip install langchain-openai      # Langchain OpenAI package
!uv pip install rich                  # Helps print doc string
!uv pip install python-dotenv         # Helps hide envionrment API Keys

[2mUsing Python 3.13.2 environment at /Users/andronikmk/Documents/Tumo_2025_Summer_Workshop/.venv[0m
[2mAudited [1m1 package[0m [2min 36ms[0m[0m
[2mUsing Python 3.13.2 environment at /Users/andronikmk/Documents/Tumo_2025_Summer_Workshop/.venv[0m
[2mAudited [1m1 package[0m [2min 7ms[0m[0m
[2mUsing Python 3.13.2 environment at /Users/andronikmk/Documents/Tumo_2025_Summer_Workshop/.venv[0m
[2mAudited [1m1 package[0m [2min 5ms[0m[0m


In [2]:
import os # operating system
from rich import inspect # pretty print doctring

# Connect to OpenAI models
from langchain_openai import ChatOpenAI

In [3]:
from dotenv import load_dotenv # load enviornment variables
load_dotenv()

True

In [4]:
# connect to OpenAI
openai_api_key = os.environ.get("OPENAI_API_KEY")
openai_organization = os.environ.get("OPENAI_ORGANIZATION")

In [5]:
print(openai_api_key)

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


## Section 1: Intro to Langchain
---

## Step 1: Connecting to OpenAI

First we need to connect to a large language model 

In [6]:
llm = ChatOpenAI(
    openai_api_key = openai_api_key, 
    openai_organization = openai_organization,
    model = "gpt-4o-mini")

In [9]:
llm.invoke("hello, world!")

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 11, 'total_tokens': 20, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_34a54ae93c', 'finish_reason': 'stop', 'logprobs': None}, id='run--9ef246b9-cacf-43cc-af55-0eca53c6ac3f-0', usage_metadata={'input_tokens': 11, 'output_tokens': 9, 'total_tokens': 20, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

## Step 2: Let's take a closer look at `ChatOpenAI`

The API we will be using to connect to ChatGPT is called [LangChain](https://python.langchain.com/docs/introduction/).
> LangChain is a framework for developing applications powered by large language models (LLMs).

Langchain is a easy way of connecting to ChatGPT so we can start building application on top of ChatGPT. Throughout this workshop we will predominantly use this library to build our application.

Now let's take a closer look at `ChatOpenAI`, which is how we connect to ChatGPT. In this section, I'll go through some of the most important parameters to set for this function, but here is the the documentation if you want to dive deeper into the API. 
+ Ref: https://sj-langchain.readthedocs.io/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html


### Parameters

1. **openai_api_key** and **openai_organization** - will be provided to you, which you will need to log into your account. Think of this as user name and password.
2. **model** - here we select the model that we want to use for our workshop
3. **temperature** - adjust the creativity of your response. With a lower temperature the model is more conservative with it's reponse, usually resulting in shorter more concise answers. With a high temperature models become more creative and "talk" more. Usually, resulting in higher word count for a response
4. **max_token** - maximum number of words you want to get back from the LLM.

This is enough to get started, but eventually you might want to add more configurations. See [link](https://sj-langchain.readthedocs.io/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html).

In [10]:
llm2 = ChatOpenAI(
    openai_api_key = openai_api_key, 
    openai_organization = openai_organization,
    model = "gpt-4o-mini",
    temperature=0.01)

In [11]:
response = llm2.invoke("What is the capital of Armenia?")

In [12]:
response

AIMessage(content='The capital of Armenia is Yerevan.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 14, 'total_tokens': 23, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_34a54ae93c', 'finish_reason': 'stop', 'logprobs': None}, id='run--f1b335a9-2875-4dfc-a637-3b8760d6a233-0', usage_metadata={'input_tokens': 14, 'output_tokens': 9, 'total_tokens': 23, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

Let take a closer look at what Langchain's `ChatOpenAI` returns

`AIMessage` is the object that is returned after we make an API call to GTP-4o-mini.
We can access the message content, response metadata, the model we used to make the API call, a unique ID of the message we sent and more

In [13]:
print("Answer:     ", response.content)
print("Model name: ", response.response_metadata["model_name"])
print("ID:         ", response.id)

Answer:      The capital of Armenia is Yerevan.
Model name:  gpt-4o-mini-2024-07-18
ID:          run--f1b335a9-2875-4dfc-a637-3b8760d6a233-0


# Section 2: Intro to Prompting

What is prompting?
> Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). - Prompt Engineering Guide


In this section, we are going to introduce more effective way's of putting toghther questions to send to a LLM. I'm going to go through some techniques that you have probably already used and some techniques that will be new to you, but will improve how well ChatGPT respones to your questions. 


**Summary:** Prompting techniques we will cover in this section.
1. Zero-shot prompting
2. Few-shot prompting
3. Chain-of-Thought prompting
4. Meta-prompting

But first - let's look at how we can format prompts in Langchain

What are `PromptTemplate`?
>Prompt templates help to translate user input and parameters into instructions for a language model. This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.


**References**
+ https://www.promptingguide.ai/
+ Zero-shot/Few-shot paper: [Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165)
+ Chain-of-Thought paper: [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/pdf/2201.11903)
+ Meta-prompting paper: [Meta Prompting for AI Systems](https://arxiv.org/pdf/2311.11482)

In [15]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableSequence

In [21]:
inspect(PromptTemplate)

In [16]:
# PromptTemplate 
prompt_template = PromptTemplate.from_template(
    "What is the capital of {country}?"
)

prompt_template.pretty_print()

What is the capital of [33;1m[1;3m{country}[0m?


In [17]:
chain = prompt_template | llm2

print("---------------------------")
print(chain.invoke({"country": "Armenia"}))
print("---------------------------")
print(chain.invoke({"country": "France"}))
print("---------------------------")
print(chain.invoke({"country": "Germany"}))

---------------------------
content='The capital of Armenia is Yerevan.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 14, 'total_tokens': 23, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_34a54ae93c', 'finish_reason': 'stop', 'logprobs': None} id='run--0b5b676e-1aae-42d4-b7ac-f2af3b6463d0-0' usage_metadata={'input_tokens': 14, 'output_tokens': 9, 'total_tokens': 23, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}
---------------------------
content='The capital of France is Paris.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 14, 'total_tokens': 21, 'completion_tokens_detai

**Note:** My adding a simple prompt template we can reuse our prompt to ask the same question, but in difference ways.

### Technique 1: Zero-shot Prompting

**Zero-shot prompting** is just asking your model a question. No context, no instructions, just asking a question to get the answer you want.

>Zero-shot prompting means that the prompt used to interact with the model won't contain examples or demonstrations. The zero-shot prompt directly instructs the model to perform a task without any additional examples to steer it.

In [18]:
llm2.invoke("""
Classify the text into neutral, negative or positive. 
Text: I think the vacation is okay.
Sentiment:
""").content

'Sentiment: Neutral'

### Technique 2: Few-Shot Prompting

**Few-Shot Prompting** enables your model to learn called "in-context learning" when you ask it a question.
>Few-shot prompting can be used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance.

In [19]:
llm2.invoke("""
This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! //
""").content

'What a horrible show! // Negative'

### Technique 3: Chain-of-Thought Prompting


This is a prompting technique where you provide an example of how you reason through a complex problem, so that the LLM knows you solved the problem and mimics this type of reasoning.
>Introduced in Wei et al. (2022), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. 
You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.

In [20]:
llm2.invoke("""
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:
""").content

"Let's identify the odd numbers in the group: 15, 5, 13, 7, and 1.\n\nNow, let's add them together:\n\n15 + 5 + 13 + 7 + 1 = 41\n\nSince 41 is an odd number, the statement is False. The odd numbers in this group do not add up to an even number."

### Technique 4: Meta-Prompting

Rather than focusing on specific context. Meta-prompting is a technique that allows you to define how you want the strucutre of the solution to a problem to look. For example, for a physics problem you might want you model to right down all of the known variables, then "write" down all of the unknown variables. Then you think of a priniple or law that might apply to solving the problem. Finally, to provide the solution to the problem.


>Meta Prompting is an advanced prompting technique that focuses on the structural and syntactical aspects of tasks and problems rather than their specific content details. This goal with meta prompting is to construct a more abstract, structured way of interacting with large language models (LLMs), emphasizing the form and pattern of information over traditional content-centric methods.

---
## Section 2: Examples of Prompt Types

### Type 1: Zero-Shot Prompting
<div style="text-align:center;">
    <img src="../assets/zsp.png"/>
</div>

### Type 2: Few-Shot Prompting
<div style="text-align:center;">
    <img src="../assets/fsp.png"/>
</div>

### Type 3: Few-Shot Prompting
<div style="text-align:center;">
    <img src="../assets/cot.png"/>
</div>

### Type 4: Meta-Prompting
<div style="text-align:center;">
    <img src="../assets/metap.png"/>
</div>

# Hands-on-activity

1. Connect to OpenAI using your API keys.
2. Come up with one problem that you would like to solve and experiment with the different types of prompting techniques.
3. Write down and discuss your results with the person sitting next to you.