# 📒 Dictionaries — Part 1

Comparing lists vs. dictionaries for mapping data.


## Why lists are not convenient

When you have two related lists (e.g., countries and populations), you have to find the index of an element in one list and use it to get the corresponding element from the other list.


In [2]:
# Using lists
pop = [30.55, 2.77, 39.21]
countries = ["afghanistan", "albania", "algeria"]

# Find index of Albania
ind_alb = countries.index("albania")
ind_alb


1

In [3]:
# Get population of Albania
pop[ind_alb]


2.77

### ⚠️ Problems with this approach:
- ❌ Not convenient — you need to search for the index manually.
- ❌ Not intuitive — it's not clear that `pop[ind_alb]` refers to "albania".
- ❌ Error-prone if the two lists get out of sync.

➡️ Better: Use a **dictionary**.


In [5]:
# Using a dictionary
pop_dict = {
    "afghanistan": 30.55,
    "albania": 2.77,
    "algeria": 39.21
}

# Get population of Albania
pop_dict["albania"]


2.77

### ✅ Advantages of dictionaries:
- 👍 Direct mapping from key ("albania") to value (2.77).
- 👍 More readable & intuitive.
- 👍 Safer & easier to maintain.

---


# 📚 Intermediate Python — Recap: Dictionaries

Examples of creating, updating, and deleting dictionary items in Python.

## Creating a dictionary

Keys must be **unique**, and they must be immutable (e.g., strings, numbers, tuples).

In [None]:
# Initial dictionary
world = {"afghanistan": 30.55, "albania": 2.77, "algeria": 39.21}
world

In [None]:
# Accessing a value
world["albania"]

## Updating a value

Duplicate keys in the definition will keep the **last value**.

In [None]:
world = {"afghanistan":30.55, "albania":2.77, "algeria":39.21, "albania":2.81}
world

## Immutable keys

Keys must be immutable. Lists cannot be used as keys because they are mutable.

In [None]:
# Valid keys
{0: "hello", True: "dear", "two": "world"}

In [None]:
# Invalid key example
{["just", "to", "test"]: "value"}  # This will raise TypeError

## Adding a new key-value pair

We can add new items to the dictionary by assignment.

In [None]:
world["sealand"] = 0.000027
world

## Check if a key exists

In [None]:
"sealand" in world

## Updating & Deleting

We can reassign or delete a key.

In [None]:
# Update
world["sealand"] = 0.000028
world

In [None]:
# Delete
del(world["sealand"])
world

# 📋 List vs Dictionary — Simple Comparison

| 🔷 Feature                  | 🧺 **List**                                   | 📖 **Dictionary**                        |
|-----------------------------|---------------------------------------------|-----------------------------------------|
| 🔍 How to get an item?       | By **position number** (e.g., `list[0]`)   | By **name/key** (e.g., `dict["name"]`) |
| 📋 How is it organized?      | Ordered sequence of items                 | Pairs of **key → value**               |
| 🔁 Does order matter?        | Yes — keeps the order you added           | Yes (since Python 3.7)                |
| 🔑 What identifies an item?  | A number starting from 0                  | A unique, immutable key (string, number, etc.) |
| 🛠 Best for…                 | A simple list of things where order matters | A lookup table where you need to quickly find something by its name |
| 🧪 Example                   | `["apple", "banana", "cherry"]`           | `{"Alice": "555-1234", "Bob": "555-5678"}` |
| 🔗 Use case                  | Shopping list, to-do list                 | Phonebook, dictionary of configurations |


Instructions
100 XP
Use chained square brackets to select and print out the capital of France.
Create a dictionary, named data, with the keys 'capital' and 'population'. Set them to 'rome' and 59.83, respectively.
Add a new key-value pair to europe; the key is 'italy' and the value is data, the dictionary you just built.

In [3]:
# Dictionary of dictionaries
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }


# Print out the capital of France
print(europe['france']['capital'])

# Create sub-dictionary data
data = {'capital': 'rome', 'population': 59.83}

# Add data to europe under key 'italy'
europe['italy'] = data

# Print europe
print(europe)

paris
{'spain': {'capital': 'madrid', 'population': 46.77}, 'france': {'capital': 'paris', 'population': 66.03}, 'germany': {'capital': 'berlin', 'population': 80.62}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'italy': {'capital': 'rome', 'population': 59.83}}


# 📊 Pandas, Part 1 — Datasets in Python



## 📋 Tabular Dataset Examples
- Data arranged in rows & columns
- Examples:
  - Spreadsheets (Excel, Google Sheets)
  - SQL tables
  - CSV files
  - DataFrames in Python

---

## 📋 2D NumPy Array
- A way to store tabular data in Python
- But:
  - ✅ Efficient for numbers
  - ❌ Only supports **one data type** across all elements
  - ❌ No labels (just numeric indices)

---

## 📋 pandas!
- 🧰 A **high-level data manipulation tool** for Python
- Created by **Wes McKinney**
- Built on top of **NumPy**
- Core object: **DataFrame**
  - Handles tabular data
  - Supports **mixed data types**
  - Supports **row and column labels**
  - Very flexible and powerful

---

### Summary:
| Feature                     | 2D NumPy Array         | pandas DataFrame |
|-----------------------------|-------------------------|-------------------|
| 🔷 Data type                 | Single type (numeric)  | Mixed types       |
| 🔷 Labels                    | ❌ No labels            | ✅ Row & column labels |
| 🔷 Flexibility               | Low                    | High              |
| 🔷 Best for                  | Numerical computation  | Data analysis & manipulation |

---

✨ Notes:
> If you need **fast computation** on homogeneous numeric data → use NumPy arrays.  
> If you need to work with real-world tabular data → use pandas DataFrames.

---

---

## 📋 Example DataFrame

|       | country        | capital   | area   | population |
|-------|----------------|-----------|--------|------------|
| **BR** | Brazil         | Brasilia  | 8.516  | 200.40     |
| **RU** | Russia         | Moscow    | 17.100 | 143.50     |
| **IN** | India          | New Delhi | 3.286  | 1252.00    |
| **CH** | China          | Beijing   | 9.597  | 1357.00    |
| **SA** | South Africa   | Pretoria  | 1.221  | 52.98      |

---

## 📋 Create DataFrame from a Dictionary

```python
import pandas as pd

data = {
    "country": ["Brazil", "Russia", "India", "China", "South Africa"],
    "capital": ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],
    "area": [8.516, 17.10, 3.286, 9.597, 1.221],
    "population": [200.4, 143.5, 1252, 1357, 52.98]
}

brics = pd.DataFrame(data)
brics
````

---

### 📋 Set Custom Index

```python
brics.index = ["BR", "RU", "IN", "CH", "SA"]
brics
```

---

## 📋 Create DataFrame from CSV file

### Example CSV file (`brics.csv`)

```
,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.10,143.5
IN,India,New Delhi,3.286,1252
CH,China,Beijing,9.597,1357
SA,South Africa,Pretoria,1.221,52.98
```

---

### Read CSV into DataFrame

```python
brics = pd.read_csv("path/to/brics.csv")
brics
```

Output (with default index):

|   | Unnamed: 0 | country      | capital   | area  | population |
| - | ---------- | ------------ | --------- | ----- | ---------- |
| 0 | BR         | Brazil       | Brasilia  | 8.516 | 200.4      |
| 1 | RU         | Russia       | Moscow    | 17.10 | 143.5      |
| 2 | IN         | India        | New Delhi | 3.286 | 1252       |
| 3 | CH         | China        | Beijing   | 9.597 | 1357       |
| 4 | SA         | South Africa | Pretoria  | 1.221 | 52.98      |

---

### Read CSV with custom index

```python
brics = pd.read_csv("path/to/brics.csv", index_col=0)
brics
```

Output:

|        | country      | capital   | area  | population |
| ------ | ------------ | --------- | ----- | ---------- |
| **BR** | Brazil       | Brasilia  | 8.516 | 200        |
| **RU** | Russia       | Moscow    | 17.10 | 144        |
| **IN** | India        | New Delhi | 3.286 | 1252       |
| **CH** | China        | Beijing   | 9.597 | 1357       |
| **SA** | South Africa | Pretoria  | 1.221 | 55         |

---

✨ Notes:

* `pd.DataFrame()` → create from Python data (like a dict)
* `pd.read_csv()` → load from CSV file
* `index_col=0` → use the first column as the row index

---





# Exercises

In [4]:
# Instructions
# 100 XP
# Import pandas as pd.
# Use the pre-defined lists to create a dictionary called my_dict. There should be three key value pairs:
# key 'country' and value names.
# key 'drives_right' and value dr.
# key 'cars_per_cap' and value cpc.
# Use pd.DataFrame() to turn your dict into a DataFrame called cars.
# Print out cars and see how beautiful it is.

# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right':dr,'cars_per_cap':cpc}

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)




         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45


In [5]:
# Instructions
# 100 XP
# Hit Run Code to see that, indeed, the row labels are not correctly set.
# Specify the row labels by setting cars.index equal to row_labels.
# Print out cars again and check if the row labels are correct this time.
import pandas as pd

# Build cars DataFrame
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict)
print(cars)

# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

# Print cars again
print(cars)



         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45
           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45


In [8]:
# Instructions
# 100 XP
# To import CSV files you still need the pandas package: import it as pd.
# Use pd.read_csv() to import cars.csv data as a DataFrame. Store this DataFrame as cars.
# Print out cars. Does everything look OK?

# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
# cars = pd.read_csv('cars.csv') commenting this becuase data is not available ->

# Print out cars
# print(cars) also commenting it

In [11]:
# Instructions
# 100 XP
# Run the code with Run Code and assert that the first column should actually be used as row labels.
# Specify the index_col argument inside pd.read_csv(): set it to 0, so that the first column is used as row labels.
# Has the printout of cars improved now?

# Import pandas as pd
import pandas as pd

# Fix import by including index_col
cars = pd.read_csv('cars.csv', index_col=0)

# Print out cars
print(cars)

# 📊 Intermediate Python — Pandas Part 2

---

## 📋 DataFrame Example

```python
import pandas as pd

brics = pd.read_csv("path/to/brics.csv", index_col=0)
brics
```

```
      country       capital      area   population
BR    Brazil        Brasilia   8.516   200.40
RU    Russia        Moscow    17.100   143.50
IN    India         New Delhi  3.286  1252.00
CH    China         Beijing    9.597  1357.00
SA    South Africa  Pretoria   1.221    52.98
```

---

## 📋 Index and Select Data
- 📄 Basic methods: `[]`
- 📄 Advanced methods: `.loc[]` (label-based), `.iloc[]` (integer-based)

---

## 📋 Column Access

### Single Column (`Series`)
```python
brics["country"]
```

```
BR          Brazil
RU          Russia
IN           India
CH           China
SA    South Africa
Name: country, dtype: object
```

---

### Single Column as `DataFrame`
```python
brics[["country"]]
```

```
      country
BR     Brazil
RU     Russia
IN      India
CH      China
SA    South Africa
```

---

### Multiple Columns
```python
brics[["country", "capital"]]
```

```
      country       capital
BR     Brazil      Brasilia
RU     Russia        Moscow
IN      India     New Delhi
CH      China       Beijing
SA South Africa     Pretoria
```

---

## 📋 Row Access

### Row Slice
```python
brics[1:4]
```

```
      country       capital      area   population
RU    Russia        Moscow    17.100   143.50
IN    India         New Delhi  3.286  1252.00
CH    China         Beijing    9.597  1357.00
```

---

## 📋 `.loc[]` — Label-based Selection

### Row by Label
```python
brics.loc["RU"]
```

```
country       Russia
capital       Moscow
area           17.1
population    143.5
Name: RU, dtype: object
```

---

### Row(s) by Label
```python
brics.loc[["RU", "IN", "CH"]]
```

```
      country       capital      area   population
RU    Russia        Moscow    17.100   143.50
IN    India         New Delhi  3.286  1252.00
CH    China         Beijing    9.597  1357.00
```

---

### Column(s) by Label
```python
brics.loc[:, ["country", "capital"]]
```

```
      country       capital
BR     Brazil      Brasilia
RU     Russia        Moscow
IN      India     New Delhi
CH      China       Beijing
SA South Africa     Pretoria
```

---

### Row(s) & Column(s) by Label
```python
brics.loc[["RU", "IN", "CH"], ["country", "capital"]]
```

```
      country     capital
RU    Russia      Moscow
IN     India   New Delhi
CH     China     Beijing
```

---

## 📋 `.iloc[]` — Integer-based Selection

### Row by Position
```python
brics.iloc[[1]]
```

```
     country  capital   area  population
RU   Russia   Moscow   17.1   143.5
```

---

### Multiple Rows by Position
```python
brics.iloc[[1, 2, 3]]
```

```
      country       capital      area   population
RU    Russia        Moscow    17.100   143.50
IN    India         New Delhi  3.286  1252.00
CH    China         Beijing    9.597  1357.00
```

---

### Row(s) & Column(s) by Position
```python
brics.iloc[[1, 2, 3], [0, 1]]
```

```
     country    capital
RU   Russia     Moscow
IN    India  New Delhi
CH    China    Beijing
```

---

### All Rows & Some Columns
```python
brics.iloc[:, [0, 1]]
```

```
      country       capital
BR     Brazil      Brasilia
RU     Russia        Moscow
IN      India     New Delhi
CH      China       Beijing
SA South Africa     Pretoria
```

---

## 📋 Recap Table

| Method                       | Example                              | Description                     |
|------------------------------|--------------------------------------|---------------------------------|
| Square brackets (columns)    | `brics[["country", "capital"]]`      | Columns only                    |
| Square brackets (row slice)  | `brics[1:4]`                         | Row slice                       |
| `.loc[]` (row)               | `brics.loc[["RU", "IN", "CH"]]`      | Rows by label                   |
| `.loc[]` (column)           | `brics.loc[:, ["country", "capital"]]` | Columns by label              |
| `.loc[]` (row & column)     | `brics.loc[["RU", "IN"], ["country"]]`| Rows & Columns by label         |
| `.iloc[]` (row)             | `brics.iloc[[1,2,3]]`                | Rows by position                |
| `.iloc[]` (column)          | `brics.iloc[:, [0,1]]`               | Columns by position             |
| `.iloc[]` (row & column)    | `brics.iloc[[1,2,3], [0,1]]`         | Rows & Columns by position      |

---

## 📌 Notes:
✅ Use `[]` when accessing columns or slicing rows.  
✅ Use `.loc[]` for label-based indexing.  
✅ Use `.iloc[]` for position-based indexing.

---


# Exercises

In [None]:
# Instructions
# 100 XP
# Use single square brackets to print out the country column of cars as a Pandas Series.
# Use double square brackets to print out the country column of cars as a Pandas DataFrame.
# Use double square brackets to print out a DataFrame with both the country and drives_right columns of cars, in this order.

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out country column as Pandas Series
print(cars['country'])

# Print out country column as Pandas DataFrame
print(cars[['country']])

# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])


In [None]:
# Instructions
# 100 XP
# Select the first 3 observations from cars and print them out.
# Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.


# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out first 3 observations
print(cars[0:3])

# Print out fourth, fifth and sixth observation
print(cars[3:6])

In [None]:
# Instructions
# 100 XP
# Use loc or iloc to select the observation corresponding to Japan as a Series. The label of this row is JPN, the index is 2. Make sure to print the resulting Series.
# Use loc or iloc to select the observations for Australia and Egypt as a DataFrame. You can find out about the labels/indexes of these rows by inspecting cars. Make sure to print the resulting DataFrame.


 
#     cars_per_cap        country  drives_right
# US            809  United States          True
# AUS           731      Australia         False
# JPN           588          Japan         False
# IN             18          India         False
# RU            200         Russia          True
# MOR            70        Morocco          True
# EG             45          Egypt          True


# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out observation for Japan
print(cars.loc['JPN'])

# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']])




In [None]:
# Instructions
# 100 XP
# Print out the drives_right value of the row corresponding to Morocco (its row label is MOR)
# Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns country and drives_right.


# loc and iloc (2)
# loc and iloc also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands. Again, paired commands produce the same result.

# cars.loc['IN', 'cars_per_cap']
# cars.iloc[3, 0]

# cars.loc[['IN', 'RU'], 'cars_per_cap']
# cars.iloc[[3, 4], 0]

# cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]
# cars.iloc[[3, 4], [0, 1]]



# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out drives_right value of Morocco
print(cars.loc['MOR'])

# Print sub-DataFrame
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])
# or cars.iloc[[4,5], [1,2]]


In [None]:
# loc and iloc (3)
# It's also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma:

# cars.loc[:, 'country']
# cars.iloc[:, 1]

# cars.loc[:, ['country','drives_right']]
# cars.iloc[:, [1, 2]]

# Instructions
# 100 XP
# Print out the drives_right column as a Series using loc or iloc.
# Print out the drives_right column as a DataFrame using loc or iloc.
# Print out both the cars_per_cap and drives_right column as a DataFrame using loc or iloc.

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out drives_right column as Series
print(cars.loc[:, 'drives_right'])

# Print out drives_right column as DataFrame
print(cars.loc[:, ['drives_right']])

# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:, ['cars_per_cap', 'drives_right']])


# END ...