# Lab 01: Python Dictionary - Tutorial

## Storing Disease Information for RAG Systems

---

**Welcome!** In this tutorial, you will learn:
- What is a Python dictionary
- How to create and use dictionaries
- How to store multiple items using list of dictionaries
- Why dictionaries are important for RAG systems

**Time:** ~60 minutes

**Instructions:** Run each cell and read the comments carefully!

---

## Part 1: What is a Dictionary?

A **dictionary** stores data as **key-value pairs**.

Think of it like a real dictionary:
- **Key** = the word you look up
- **Value** = the definition

```
┌─────────────┐     ┌─────────────────┐
│    KEY      │ --> │     VALUE       │
├─────────────┤     ├─────────────────┤
│ "name"      │ --> │ "Rubella"       │
│ "symptoms"  │ --> │ "fever, rash"   │
│ "treatment" │ --> │ "rest"          │
└─────────────┘     └─────────────────┘
```

### 1.1 Creating an Empty Dictionary

In [1]:
# Create an empty dictionary using curly braces {}
disease = {}

# Check the type
print(type(disease))  # Output: <class 'dict'>

# Check if it's empty
print(disease)  # Output: {}
print(len(disease))  # Output: 0 (no items)

<class 'dict'>
{}
0


### 1.2 Creating a Dictionary with Data

In [2]:
# Create a dictionary with initial data
# Format: {key1: value1, key2: value2, ...}

disease = {
    "name": "Rubella",           # key: "name", value: "Rubella"
    "thai_name": "หัดเยอรมัน",    # key: "thai_name", value: "หัดเยอรมัน"
    "symptoms": "fever, rash",   # key: "symptoms", value: "fever, rash"
    "treatment": "rest"          # key: "treatment", value: "rest"
}

# Display the dictionary
print(disease)

{'name': 'Rubella', 'thai_name': 'หัดเยอรมัน', 'symptoms': 'fever, rash', 'treatment': 'rest'}


In [3]:
# Count how many key-value pairs
print(f"Number of items: {len(disease)}")

Number of items: 4


---

## Part 2: Accessing Dictionary Values

There are two ways to get a value from a dictionary:
1. Using square brackets `[]`
2. Using `.get()` method (safer)

### 2.1 Using Square Brackets [ ]

In [4]:
# Our disease dictionary
disease = {
    "name": "Rubella",
    "thai_name": "หัดเยอรมัน",
    "symptoms": "fever, rash",
    "treatment": "rest"
}

# Access value using key in square brackets
print(disease["name"])       # Output: Rubella
print(disease["symptoms"])   # Output: fever, rash
print(disease["treatment"])  # Output: rest

Rubella
fever, rash
rest


In [5]:
# Warning: If key doesn't exist, you get an ERROR!
# Uncomment the line below to see the error:

print(disease["prevention"])  # KeyError: 'prevention'

KeyError: 'prevention'

### 2.2 Using .get() Method (Safer)

In [None]:
# .get() returns None if key doesn't exist (no error!)
print(disease.get("name"))        # Output: Rubella
print(disease.get("prevention"))  # Output: None (no error)

Rubella
None


In [None]:
# .get() can have a default value if key doesn't exist
print(disease.get("prevention", "N/A"))  # Output: N/A
print(disease.get("prevention", "Not available"))  # Output: Not available

N/A
Not available


---

## Part 3: Adding and Modifying Values

You can easily add new keys or change existing values.

### 3.1 Adding New Key-Value Pairs

In [None]:
# Start with a simple dictionary
disease = {
    "name": "Rubella",
    "symptoms": "fever, rash"
}

print("Before adding:")
print(disease)
print(f"Number of items: {len(disease)}")

Before adding:
{'name': 'Rubella', 'symptoms': 'fever, rash'}
Number of items: 2


In [None]:
# ADD new keys using square brackets
disease["treatment"] = "rest"              # Add treatment
disease["prevention"] = "MMR vaccine"      # Add prevention
disease["source"] = "md_corpus/1.md"       # Add source file

print("After adding:")
print(disease)
print(f"Number of items: {len(disease)}")

After adding:
{'name': 'Rubella', 'symptoms': 'fever, rash', 'treatment': 'rest', 'prevention': 'MMR vaccine', 'source': 'md_corpus/1.md'}
Number of items: 5


### 3.2 Modifying Existing Values

In [None]:
# Current symptoms
print(f"Before: {disease['symptoms']}")

# MODIFY: Change the value of an existing key
disease["symptoms"] = "fever, rash, swollen lymph nodes"

print(f"After: {disease['symptoms']}")

Before: fever, rash
After: fever, rash, swollen lymph nodes


### 3.3 Deleting Keys

In [None]:
print("Before delete:")
print(disease)

# DELETE a key using 'del'
del disease["source"]

print("\nAfter delete:")
print(disease)

Before delete:
{'name': 'Rubella', 'symptoms': 'fever, rash, swollen lymph nodes', 'treatment': 'rest', 'prevention': 'MMR vaccine', 'source': 'md_corpus/1.md'}

After delete:
{'name': 'Rubella', 'symptoms': 'fever, rash, swollen lymph nodes', 'treatment': 'rest', 'prevention': 'MMR vaccine'}


---

## Part 4: Useful Dictionary Methods

Python dictionaries have many helpful methods.

In [None]:
# Our example dictionary
disease = {
    "name": "Rubella",
    "symptoms": "fever, rash",
    "treatment": "rest"
}

In [None]:
# Get all KEYS
print("Keys:")
print(disease.keys())
print(list(disease.keys()))  # Convert to list

Keys:
dict_keys(['name', 'symptoms', 'treatment'])
['name', 'symptoms', 'treatment']


In [None]:
# Get all VALUES
print("Values:")
print(disease.values())
print(list(disease.values()))  # Convert to list

Values:
dict_values(['Rubella', 'fever, rash', 'rest'])
['Rubella', 'fever, rash', 'rest']


In [None]:
# Get all KEY-VALUE pairs as tuples
print("Items (key-value pairs):")
print(disease.items())
print(list(disease.items()))  # Convert to list

Items (key-value pairs):
dict_items([('name', 'Rubella'), ('symptoms', 'fever, rash'), ('treatment', 'rest')])
[('name', 'Rubella'), ('symptoms', 'fever, rash'), ('treatment', 'rest')]


In [None]:
# Check if a KEY exists using 'in'
print("name" in disease)       # True
print("prevention" in disease)  # False

True
False


---

## Part 5: List of Dictionaries

In RAG systems, we store **multiple documents** as a **list of dictionaries**.

```
documents = [
    {dict1},   # Document 1
    {dict2},   # Document 2
    {dict3}    # Document 3
]
```

### 5.1 Creating a List of Dictionaries

In [None]:
# Create a list containing multiple disease dictionaries
diseases = [
    {
        "name": "Rubella",
        "thai_name": "หัดเยอรมัน",
        "symptoms": "fever, rash"
    },
    {
        "name": "Cholera",
        "thai_name": "อหิวาตกโรค",
        "symptoms": "severe diarrhea"
    },
    {
        "name": "GERD",
        "thai_name": "กรดไหลย้อน",
        "symptoms": "heartburn"
    }
]

print(f"Number of diseases: {len(diseases)}")

Number of diseases: 3


### 5.2 Accessing Items in List of Dictionaries

In [None]:
# Access by index: diseases[index]
# Remember: index starts at 0!

print("First disease (index 0):")
print(diseases[0])  # {'name': 'Rubella', ...}

print("\nSecond disease (index 1):")
print(diseases[1])  # {'name': 'Cholera', ...}

First disease (index 0):
{'name': 'Rubella', 'thai_name': 'หัดเยอรมัน', 'symptoms': 'fever, rash'}

Second disease (index 1):
{'name': 'Cholera', 'thai_name': 'อหิวาตกโรค', 'symptoms': 'severe diarrhea'}


In [None]:
# Access specific value: diseases[index]["key"]

# Get name of first disease
print(diseases[0]["name"])  # Rubella

# Get symptoms of second disease
print(diseases[1]["symptoms"])  # severe diarrhea

# Get Thai name of third disease
print(diseases[2]["thai_name"])  # กรดไหลย้อน

Rubella
severe diarrhea
กรดไหลย้อน


### 5.3 Adding a New Dictionary to the List

In [None]:
# Create a new disease dictionary
new_disease = {
    "name": "Cataract",
    "thai_name": "ต้อกระจก",
    "symptoms": "blurry vision"
}

# Add to the list using .append()
diseases.append(new_disease)

print(f"Number of diseases: {len(diseases)}")
print(f"Last disease: {diseases[-1]['name']}")

Number of diseases: 4
Last disease: Cataract


---

## Part 6: Looping Through Dictionaries

Use `for` loops to process all items.

### 6.1 Loop Through List of Dictionaries

In [None]:
# Print all disease names
print("All Diseases:")
print("-" * 20)

for disease in diseases:
    print(disease["name"])

All Diseases:
--------------------
Rubella
Cholera
GERD
Cataract


In [None]:
# Print name and symptoms for each disease
print("Disease Information:")
print("-" * 40)

for disease in diseases:
    name = disease["name"]
    symptoms = disease["symptoms"]
    print(f"{name}: {symptoms}")

Disease Information:
----------------------------------------
Rubella: fever, rash
Cholera: severe diarrhea
GERD: heartburn
Cataract: blurry vision


### 6.2 Loop with Index using enumerate()

In [None]:
# enumerate() gives us both index and value
print("Numbered List:")
print("-" * 30)

for i, disease in enumerate(diseases):
    print(f"{i + 1}. {disease['name']}")

Numbered List:
------------------------------
1. Rubella
2. Cholera
3. GERD
4. Cataract


### 6.3 Loop Through Keys and Values of a Single Dictionary

In [None]:
# Take first disease
first_disease = diseases[0]

# Loop through keys only
print("Keys:")
for key in first_disease.keys():
    print(f"  - {key}")

Keys:
  - name
  - thai_name
  - symptoms


In [None]:
# Loop through key-value pairs
print("Key-Value Pairs:")
for key, value in first_disease.items():
    print(f"  {key}: {value}")

Key-Value Pairs:
  name: Rubella
  thai_name: หัดเยอรมัน
  symptoms: fever, rash


---

## Part 7: Why Dictionaries Matter in RAG Systems

In RAG (Retrieval-Augmented Generation) systems, documents are stored as dictionaries.

This is similar to what you'll see in the `Generic-RAG` project!

In [None]:
# Example: Document structure in RAG systems
documents = [
    {
        "id": 1,
        "title": "Rubella (German Measles)",
        "content": "Rubella is a contagious disease caused by the Rubella virus...",
        "source": "md_corpus/1.md",
        "metadata": {
            "author": "MedThai",
            "date": "2024-01-15",
            "language": "Thai"
        }
    },
    {
        "id": 2,
        "title": "Cholera",
        "content": "Cholera is an infection of the small intestine...",
        "source": "md_corpus/2.md",
        "metadata": {
            "author": "MedThai",
            "date": "2024-01-16",
            "language": "Thai"
        }
    }
]

# Simulate searching for a document
print("Search Result:")
print(f"Title: {documents[0]['title']}")
print(f"Source: {documents[0]['source']}")
print(f"Author: {documents[0]['metadata']['author']}")

Search Result:
Title: Rubella (German Measles)
Source: md_corpus/1.md
Author: MedThai


---

## Summary

### What You Learned:

| Concept | Syntax | Example |
|---------|--------|--------|
| Create empty dict | `{}` | `d = {}` |
| Create with data | `{k: v}` | `d = {"name": "Rubella"}` |
| Access value | `d[key]` | `d["name"]` → `"Rubella"` |
| Safe access | `d.get(key)` | `d.get("x", "N/A")` |
| Add/Modify | `d[key] = value` | `d["new"] = "value"` |
| Delete | `del d[key]` | `del d["old"]` |
| Get keys | `d.keys()` | `["name", "symptoms"]` |
| Get values | `d.values()` | `["Rubella", "fever"]` |
| Check key | `key in d` | `"name" in d` → `True` |
| Count | `len(d)` | `len(d)` → `2` |

### Next Step:

Now open `exercise/Lab01_Exercise.ipynb` and complete the 5 exercises!

Good luck!