**[← Back to Course Overview](https://github.com/buildLittleWorlds/gateway-to-densworld)**

# Tutorial 8: Northo and Dead River—The Edges
## Where Maps Lie and Data Has Gaps

---

*At the edges of Densworld, things become uncertain.*

*To the north lies Northo—a region of monasteries and hermitages, where religious communities practice negation and silence. The pilgrims who travel there rarely return with clear reports. The data is sparse.*

*Between the Capital and the eastern suburbs flows the Dead River—dark water, broken bridges, shantytown on stilts. The archivists have a saying about it: "Places on maps that don't exist on the ground." The permits office records ferry crossings, but half the ferries never arrive at their documented destinations.*

*"The creature catalog was clean," the apprentice said. "Every row complete. Every column filled."*

*"That was Yeller Quarry," Chief Archivist Mink replied. "The trappers are meticulous. They have to be—their lives depend on it. But at the edges? The data has holes. And the holes are the most interesting part."*

---

## What You'll Learn

By the end of this tutorial, you will:
- Load **multiple DataFrames** from different sources
- **Merge data** from different tables with `pd.merge()`
- Work with **missing values** (NaN)
- Use `.isna()`, `.notna()`, and `.fillna()` to handle gaps
- Understand why data at the edges is incomplete—and what that incompleteness reveals

## Part 1: Loading Multiple DataFrames

*The Archives don't store everything in one table. Places are in one catalog. Journeys are in another. Journey steps—the individual legs of each trip—are in a third. To answer real questions, you must combine them.*

In [None]:
import pandas as pd

# Load data from the Journeys and Graphs course
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

# Load the places catalog
places = pd.read_csv(BASE_URL + "densworld_places.csv")
print(f"Places: {len(places)} locations")

# Load journey steps (individual segments of journeys)
journey_steps = pd.read_csv(BASE_URL + "densworld_journey_steps.csv")
print(f"Journey Steps: {len(journey_steps)} records")

In [None]:
# What does the places data look like?
places.head()

In [None]:
# What columns does it have?
print("Places columns:")
for col in places.columns:
    print(f"  - {col}")

*Every place has a `place_id`, a name, a borough (region), a type, risk levels both physical and symbolic, and a flag for whether it's a boundary location.*

In [None]:
# What does the journey steps data look like?
journey_steps.head()

In [None]:
# Journey steps columns
print("Journey steps columns:")
for col in journey_steps.columns:
    print(f"  - {col}")

*Each journey step records a single leg of a journey: who traveled, where they started, where they ended, how they traveled, and what they encountered. The `origin_place_id` and `destination_place_id` columns connect to the places catalog.*

## Part 2: The Edges—Northo and Dead River

*Let's look at what places exist in Northo and Dead River.*

In [None]:
# What boroughs (regions) do we have?
places["borough"].value_counts()

In [None]:
# Northo places
northo_places = places[places["borough"] == "Northo"]
print(f"Northo has {len(northo_places)} recorded locations:")
northo_places[["name", "place_type", "risk_level_symbolic", "notes"]]

*House of the Closed Mouth. House of Stones. North Ridge Hermitage. The Collapsed Chapel—"badly remembered saint story site." These are places of negation, of turning inward. The symbolic risk is high; the physical risk is moderate. What happens here happens to the soul, not the body.*

In [None]:
# Dead River places
deadriver_places = places[places["borough"] == "Deadriver"]
print(f"Dead River has {len(deadriver_places)} recorded locations:")
deadriver_places[["name", "place_type", "risk_level_physical", "risk_level_symbolic", "notes"]]

*Dark water. Steep banks. Broken bridges. Unofficial ferries. A shantytown on stilts. The archivists categorize Dead River's craving type as "exhaustion"—it drains you. Both physical and symbolic risk run high.*

## Part 3: Merging DataFrames

*The places catalog tells us what exists. The journey steps tell us who went where. To understand how people actually use these places, we need to combine the data.*

### The Concept of Merging

**Merging** (or **joining**) combines two DataFrames based on a shared column—a "key." It's like matching records in two different catalogs.

```
places DataFrame:          journey_steps DataFrame:
┌─────────────┬────────┐   ┌────────────────────┬───────────┐
│ place_id    │ name   │   │ destination_place_id│ traveler  │
├─────────────┼────────┤   ├────────────────────┼───────────┤
│ NORTH_HOUSE │ House  │   │ NORTH_HOUSE        │ pilgrim_1 │
│ DEADR_FERRY │ Ferry  │   │ DEADR_FERRY        │ trader_5  │
└─────────────┴────────┘   └────────────────────┴───────────┘
         │                            │
         └───────────┬────────────────┘
                     │ merge on key
                     ▼
        ┌─────────────┬────────┬───────────┐
        │ place_id    │ name   │ traveler  │
        ├─────────────┼────────┼───────────┤
        │ NORTH_HOUSE │ House  │ pilgrim_1 │
        │ DEADR_FERRY │ Ferry  │ trader_5  │
        └─────────────┴────────┴───────────┘
```

### Using `pd.merge()`

Let's merge journey steps with places to see details about each destination:

In [None]:
# Merge journey_steps with places on the destination
# Left DataFrame: journey_steps (has destination_place_id)
# Right DataFrame: places (has place_id)

steps_with_destination = pd.merge(
    journey_steps,
    places,
    left_on="destination_place_id",  # Column in left DataFrame
    right_on="place_id",              # Column in right DataFrame
    how="left"                        # Keep all rows from left DataFrame
)

print(f"Merged DataFrame: {len(steps_with_destination)} rows")
steps_with_destination.head()

In [None]:
# Now we can see the destination name alongside the journey data
steps_with_destination[["journey_id", "traveler_type", "destination_place_id", "name", "borough_y"]].head(10)

### Types of Merges (the `how` parameter)

| Type | Description | Keeps |
|------|-------------|-------|
| `how="inner"` | Only matching rows | Rows where key exists in BOTH DataFrames |
| `how="left"` | All from left, matching from right | All left rows, fills NaN where no match |
| `how="right"` | All from right, matching from left | All right rows, fills NaN where no match |
| `how="outer"` | All rows from both | Everything, NaN where no match |

## Part 4: Finding the Gaps—Places with No Journey Records

*"Here's where it gets interesting," Mink said. "Some places exist in the catalog but have no journey records. No one has gone there—or no one has come back to report."*

In [None]:
# How many times was each place visited as a destination?
destination_counts = journey_steps["destination_place_id"].value_counts()
destination_counts.head(10)

In [None]:
# Merge places with destination counts
# Use how="left" to keep ALL places, even if never visited

# First, convert the counts to a DataFrame
dest_counts_df = destination_counts.reset_index()
dest_counts_df.columns = ["place_id", "visit_count"]

# Merge with places
places_with_visits = pd.merge(
    places,
    dest_counts_df,
    on="place_id",
    how="left"  # Keep ALL places
)

places_with_visits[["name", "borough", "visit_count"]].head(10)

## Part 5: Working with Missing Values (NaN)

*Notice those `NaN` values? That stands for "Not a Number"—pandas' way of marking missing data. Places with NaN in the visit_count column have never been recorded as a destination.*

In [None]:
# Check for missing values
print(f"Missing visit_count values: {places_with_visits['visit_count'].isna().sum()}")
print(f"Total places: {len(places_with_visits)}")

### Using `.isna()` to Find Missing Values

`.isna()` returns True for each NaN value. Use it to filter for rows with missing data:

In [None]:
# Places with NO recorded visits (visit_count is NaN)
unvisited = places_with_visits[places_with_visits["visit_count"].isna()]
print(f"Places with no journey records: {len(unvisited)}")
unvisited[["name", "borough", "place_type", "notes"]]

*These are the gaps in the record. Places that exist on maps but have no journey data. Some are inner tower levels—difficult to reach. Some are in Northo—the hermitages and houses where visitors rarely go. Some are in the heart of Capital—bureaucratic offices that travelers pass through but don't stay.*

### Using `.notna()` to Find Non-Missing Values

`.notna()` is the opposite of `.isna()`:

In [None]:
# Places WITH recorded visits
visited = places_with_visits[places_with_visits["visit_count"].notna()]
print(f"Places with journey records: {len(visited)}")

### Using `.fillna()` to Replace Missing Values

Sometimes you want to replace NaN with a default value:

In [None]:
# Replace NaN visit counts with 0
places_with_visits["visit_count_filled"] = places_with_visits["visit_count"].fillna(0)

# Now we can do math on all places
print(f"Total visits across all places: {places_with_visits['visit_count_filled'].sum():.0f}")
print(f"Average visits per place: {places_with_visits['visit_count_filled'].mean():.2f}")

In [None]:
# Places sorted by visit count (including zeros)
places_with_visits.sort_values("visit_count_filled", ascending=False)[["name", "borough", "visit_count_filled"]].head(10)

## Part 6: Dead River—Places on Maps That Don't Exist on Ground

*Let's focus on Dead River. How many journey steps pass through it?*

In [None]:
# Filter for Dead River destinations
deadriver_visits = places_with_visits[places_with_visits["borough"] == "Deadriver"]
deadriver_visits[["name", "place_type", "visit_count_filled", "notes"]]

In [None]:
# Total visits to Dead River vs other regions
visits_by_borough = places_with_visits.groupby("borough")["visit_count_filled"].sum()
print("Journey step destinations by borough:")
print(visits_by_borough.sort_values(ascending=False))

*Dead River sees traffic—ferries and bridges crossing between Capital and the quarries. But the Broken Deadriver Bridge and the Shantytown? The data suggests people arrive but... do they leave?*

In [None]:
# Let's check: how many journey steps START from Dead River locations?
origin_counts = journey_steps["origin_place_id"].value_counts()
origin_counts_df = origin_counts.reset_index()
origin_counts_df.columns = ["place_id", "departure_count"]

places_with_both = pd.merge(
    places_with_visits,
    origin_counts_df,
    on="place_id",
    how="left"
)

places_with_both["departure_count"] = places_with_both["departure_count"].fillna(0)

# Dead River: arrivals vs departures
deadriver_both = places_with_both[places_with_both["borough"] == "Deadriver"]
deadriver_both[["name", "visit_count_filled", "departure_count"]]

In [None]:
# Calculate the difference: arrivals minus departures
deadriver_both = deadriver_both.copy()
deadriver_both["balance"] = deadriver_both["visit_count_filled"] - deadriver_both["departure_count"]
deadriver_both[["name", "visit_count_filled", "departure_count", "balance"]]

*A positive balance means more people arrive than depart. A negative balance means more depart than arrive. Look at the Broken Deadriver Bridge—a balance of 1. One more person arrived than left. Where did they go?*

*The archivist's saying: "Places on maps that don't exist on the ground." Perhaps some travelers reach Dead River and find something the maps don't show.*

## Part 7: Northo—The Silence in the Data

*Northo is different. The pilgrims who travel there seek negation—the closing of the mouth, the stilling of the mind. They don't fill out journey reports.*

In [None]:
# Northo arrivals and departures
northo_both = places_with_both[places_with_both["borough"] == "Northo"]
northo_both[["name", "visit_count_filled", "departure_count", "notes"]]

In [None]:
# Compare: average visits per place by borough
avg_visits_by_borough = places_with_both.groupby("borough")["visit_count_filled"].mean()
print("Average visits per place by borough:")
print(avg_visits_by_borough.sort_values(ascending=False))

*Northo has the lowest average visits per place—even lower than the deep tower levels of Mirado. The hermitages receive almost no recorded traffic. Either no one goes there, or those who go don't come back to file reports.*

*The data has holes. And the holes are the most interesting part.*

## Part 8: Checking for Missing Values Across a DataFrame

Let's look at broader patterns of missing data:

In [None]:
# Count missing values in each column of the original places DataFrame
print("Missing values in places DataFrame:")
print(places.isna().sum())

In [None]:
# Check journey_steps for missing values
print("Missing values in journey_steps DataFrame:")
print(journey_steps.isna().sum())

In [None]:
# Percentage of missing values
missing_pct = (places.isna().sum() / len(places)) * 100
print("Percentage missing by column:")
print(missing_pct[missing_pct > 0])

## Summary: What We Learned About the Edges

*The apprentice looked up from the merged tables and missing value counts.*

*"The data at the edges is incomplete," she said.*

*"Always," Mink agreed. "And that incompleteness tells a story. Dead River sees traffic—people cross, ferries run—but the balance doesn't add up. More arrive at some places than depart. Northo has places in the catalog that no journey records mention. Either the pilgrims don't travel there, or they don't file reports."*

*"Or they don't come back."*

*"Or that," Mink said. "The data doesn't judge. It only records what's reported. The silences are as meaningful as the numbers."*

## Practice Exercises

### Exercise 1: Yeller Quarry Analysis

Filter `places_with_both` to show only Yeller Quarry locations. Which location in Yeller has the most visits? Which has the fewest (but more than zero)?

In [None]:
# Your code here:
# Filter for Yeller borough
# Sort by visit_count_filled
# Display name and visit count


### Exercise 2: Finding Traveler Types

The `journey_steps` DataFrame has a `traveler_type` column. Use `.value_counts()` to see what types of travelers exist. Then filter for just `novice` travelers and see how many journey steps they've taken.

In [None]:
# Your code here:
# 1. Count traveler types


# 2. Filter for novice travelers and count


### Exercise 3: The Capital's Traffic

Capital is the central hub. Calculate the arrival/departure balance for each Capital location (as we did for Dead River). Are there any Capital locations with a significant imbalance?

In [None]:
# Your code here:
# Filter places_with_both for Capital borough
# Calculate balance (visits - departures)
# Display name, visits, departures, balance


### Exercise 4: High-Risk Destinations

Merge `journey_steps` with `places` (on `destination_place_id` = `place_id`) and find the 10 most common destinations that have a `risk_level_symbolic` of 8 or higher. These are the spiritually dangerous places that travelers still visit.

In [None]:
# Your code here:
# 1. Merge journey_steps with places on destination


# 2. Filter for high symbolic risk (>= 8)


# 3. Count visits per destination name


# 4. Show top 10


## What's Next?

In **Tutorial 9: Visualization**, you'll learn:
- How to create **bar charts** and **scatter plots**
- How to customize charts with titles, labels, and colors
- How to visualize the patterns we've discovered in Densworld data
- The concept of **cartography for data**—mapping what exists

---

*"Not everything that's named exists," Mink said. "Not everything that exists has a name."*

*The apprentice thought of the Broken Deadriver Bridge—one more arrival than departure. The North Ridge Hermitage—no journey records at all. The gaps in the data.*

*"The ore tells stories about these places," she said. "People who disappeared at Dead River. Pilgrims who entered Northo monasteries and never emerged."*

*"And the data confirms it," Mink said. "In its own silent way. The numbers don't add up. The records have holes. That's not a flaw in the data. That's the data telling you something the ore already knew."*

*At the edges of the world, things become uncertain. The data knows.*

---

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/gateway-to-densworld/blob/main/notebooks/tutorial_09_visualization.ipynb) **Next: Tutorial 9 - Visualization**

## What's Next?

In **Tutorial 9: Visualization**, you'll learn:
- How to create **bar charts** and **scatter plots**
- How to customize charts with titles, labels, and colors
- How to visualize the patterns we've discovered in Densworld data
- The concept of **cartography for data**—mapping what exists

---

*"Not everything that's named exists," Mink said. "Not everything that exists has a name."*

*The apprentice thought of the Broken Deadriver Bridge—one more arrival than departure. The North Ridge Hermitage—no journey records at all. The gaps in the data.*

*"The ore tells stories about these places," she said. "People who disappeared at Dead River. Pilgrims who entered Northo monasteries and never emerged."*

*"And the data confirms it," Mink said. "In its own silent way. The numbers don't add up. The records have holes. That's not a flaw in the data. That's the data telling you something the ore already knew."*

*At the edges of the world, things become uncertain. The data knows.*