# Practice Exercises: Basic pandas — Questions & Solutions

These exercises were generated by ChatGPT using our course notebooks as input. They were briefly reviewed to make sure that they are correct and fit within the topics and level of the course, but they aren't necessarily a comprehensive set of examples around the course topics.

In [None]:
import pandas as pd


## 1) Create a DataFrame from a dictionary
Create a DataFrame called `cities` from this dictionary:

```python
data = {
    "city": ["Toronto", "Ottawa", "Hamilton", "London"],
    "population": [2790000, 1017000, 569000, 423000],
    "province": ["ON", "ON", "ON", "ON"]
}
```

In [None]:
data = {
    "city": ["Toronto", "Ottawa", "Hamilton", "London"],
    "population": [2790000, 1017000, 569000, 423000],
    "province": ["ON", "ON", "ON", "ON"]
}

# Create the DataFrame from the dictionary
cities = pd.DataFrame(data)

cities


## 2) Create a DataFrame from a list of dictionaries
Create a DataFrame called `trips` and show the first 2 rows.

In [None]:
rows = [
    {"mode": "car", "minutes": 35},
    {"mode": "transit", "minutes": 55},
    {"mode": "walk", "minutes": 15},
    {"mode": "bike", "minutes": 22},
]

# Each dictionary becomes one row
trips = pd.DataFrame(rows)

trips.head(2)


## 3) Select one column and compute a statistic
Using the `cities` DataFrame from #1:
- Select the `population` column.
- Compute and print its **mean** and **max**.

In [None]:
# Select one column (this is a pandas Series)
pop = cities["population"]

# Aggregating methods on a Series
mean_pop = pop.mean()
max_pop = pop.max()

mean_pop, max_pop


## 4) Select multiple columns (sub-DataFrame)
Using `cities`, create a new DataFrame called `city_sizes` containing only:
- `city`
- `population`

Print `city_sizes`.

In [None]:
# Select multiple columns using a list of column names
city_sizes = cities[["city", "population"]]

city_sizes


## 5) Boolean filtering: one condition
Using `cities`, create a DataFrame `big_cities` containing only rows where `population` is **greater than 500000**.
Print `big_cities`.

In [None]:
# Create a boolean Series (True/False for each row)
is_big = cities["population"] > 500000

# Use the boolean Series to filter the DataFrame
big_cities = cities[is_big]

big_cities


## 6) Boolean filtering: two conditions with `&`
Create this DataFrame and call it `housing`:

```python
data = {
    "neighbourhood": ["Annex", "Scarborough", "North York", "Downtown", "Etobicoke"],
    "median_rent": [2400, 2000, 2100, 2700, 2200],
    "subway_access": [True, False, True, True, False]
}
```

Filter for rows where:
- `median_rent` is **at least 2200**
**and**
- `subway_access` is `True`

Store the result in `expensive_and_connected`.

In [None]:
data = {
    "neighbourhood": ["Annex", "Scarborough", "North York", "Downtown", "Etobicoke"],
    "median_rent": [2400, 2000, 2100, 2700, 2200],
    "subway_access": [True, False, True, True, False]
}

housing = pd.DataFrame(data)

# Make each condition as its own boolean Series
rent_ok = housing["median_rent"] >= 2200
has_subway = housing["subway_access"] == True

# Combine conditions with & (AND). Parentheses are required.
expensive_and_connected = housing[rent_ok & has_subway]

expensive_and_connected


## 7) Boolean filtering: OR with `|`
Using `housing`, filter for rows where:
- `median_rent` is **less than 2100**
**or**
- `subway_access` is `False`

Store the result in `more_affordable_or_not_connected`.

In [None]:
# Two boolean Series
affordable = housing["median_rent"] < 2100
not_connected = housing["subway_access"] == False

# Combine with | (OR). Parentheses are required.
more_affordable_or_not_connected = housing[affordable | not_connected]

more_affordable_or_not_connected


## 8) Add a new column using a boolean condition
Using `housing`, add a new column called `high_rent` that is `True` when `median_rent` is **greater than 2300**, otherwise `False`.
Then print the DataFrame.

In [None]:
# This creates a boolean Series directly
housing["high_rent"] = housing["median_rent"] > 2300

housing


## 9) Rename columns using a dictionary
Create this DataFrame:

```python
data = {
    "CITY": ["Toronto", "Ottawa", "Hamilton"],
    "POP": [2790000, 1017000, 569000],
    "PROV": ["ON", "ON", "ON"]
}
```
Call it `cities_raw`.

Rename the columns so that:
- `CITY` → `city`
- `POP` → `population`
- `PROV` → `province`

Store the result in a new DataFrame called `cities`.

In [None]:
data = {
    "CITY": ["Toronto", "Ottawa", "Hamilton"],
    "POP": [2790000, 1017000, 569000],
    "PROV": ["ON", "ON", "ON"]
}

cities_raw = pd.DataFrame(data)

# Rename columns using a dictionary that maps old names to new names
cities = cities_raw.rename(columns={
    "CITY": "city",
    "POP": "population",
    "PROV": "province"
})

cities


## 10) Create a new column using arithmetic
Using the `cities` DataFrame from Question 9:

Create a new column called `population_millions` that contains the population divided by **1,000,000**.

Print the updated DataFrame.

In [None]:
# Divide the population column by 1,000,000
# pandas applies this operation to every row automatically
cities["population_millions"] = cities["population"] / 1_000_000

cities


## 11) Create a new column using a boolean condition
Using the `housing` DataFrame from Question 10:

Create a new column called `high_rent` that is:
- `True` if `median_rent` is **greater than or equal to 2500**
- `False` otherwise

Print the updated DataFrame.

In [None]:
# This comparison creates a boolean (True/False) value for each row
housing["high_rent"] = housing["median_rent"] >= 2500

housing