# Tutorial 3: The Catch Ledger
## Sorting, Ranking, and Finding Extremes

---

### From the Ledger of a Capital Archivist

*To be a Yeller Quarry trapper was to be what miners have been in other climes, figures descending into a darkness of wings and teeth and greasy blood, emerging some weeks later—godwilling—with haul sacks of carcasses for Capital archivists and cages bursting with screeches and growls for traders in the living.*

*And anything eerie could be sold. Dens smoke colors could hang over a ladies' luncheon, giving it just that touch of style. Stone-lidded yeller frogs could be on display in the foyer at a senator's banquet.*

*The senator felt he could hear the scrape if the creatures blinked.*

---

## What You Will Learn

In this tutorial, you will learn to:

1. Sort data by one or more columns
2. Find minimum and maximum values
3. Use `nlargest()` and `nsmallest()` for top/bottom records
4. Rank data with `.rank()`
5. Calculate basic aggregations (sum, mean, count)

By the end, you will be able to answer questions like:
- What was the most valuable catch this season?
- Which creature fetches the highest prices on average?
- How do the catches rank by total value?

---

In [None]:
import pandas as pd

# Load data from GitHub
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/yeller-quarry-data-science/main/data/"
catches = pd.read_csv(BASE_URL + "catches.csv")
creatures = pd.read_csv(BASE_URL + "creatures.csv")
crews = pd.read_csv(BASE_URL + "crews.csv")

print(f"Catch records loaded: {len(catches)} catches")
print(f"Creature catalog: {len(creatures)} species")
print(f"Crew registry: {len(crews)} crews")

In [None]:
# First look at the catch data
catches.head(10)

Each row is a single catch—creatures extracted from a trap. The columns tell us:
- **catch_id**: Unique identifier
- **trap_id**: Which trap made the catch
- **creature_id**: What species was caught (links to creatures.csv)
- **quantity**: How many
- **date**: When captured
- **crew_id**: Which crew (links to crews.csv)
- **sector**: Where
- **condition**: live or dead
- **destination**: Where the catch is going
- **price_per_unit**: What the Capital is paying
- **notes**: Field observations

---

## Part 1: Basic Sorting Review

We covered sorting in Tutorial 2. Quick review:

In [None]:
# Sort by price, highest first
catches.sort_values('price_per_unit', ascending=False).head(10)

The most expensive single catch was the Maw Beast carcass at 950 units. Followed by a live wharver at 890—unprecedented, the notes say. These are the creatures from the deep tunnels and Grimslew Shore that only the most dangerous crews can reach.

---

## Part 2: Calculating Total Value

Price per unit is useful, but what's the total value of each catch? We need to multiply quantity by price.

### Creating a new column

In [None]:
# Calculate total value for each catch
catches['total_value'] = catches['quantity'] * catches['price_per_unit']

# View the result
catches[['catch_id', 'creature_id', 'quantity', 'price_per_unit', 'total_value']].head(10)

In [None]:
# Now sort by total value
catches.sort_values('total_value', ascending=False).head(10)

The yeller bird catch (CAT0022) jumps to the top—5 birds at 420 each gives a total of 2,100 units. That complete flock for Senator Huilof was worth more than twice the Maw Beast carcass.

Senator Huilof's daughter must have wanted those birds badly.

---

## Part 3: Finding Extremes

### Using `.max()`, `.min()`, `.idxmax()`, `.idxmin()`

In [None]:
# What's the highest total value?
print(f"Maximum total value: {catches['total_value'].max()}")

# What's the lowest (excluding zeros)?
nonzero = catches[catches['total_value'] > 0]
print(f"Minimum non-zero value: {nonzero['total_value'].min()}")

In [None]:
# Which row has the maximum value?
max_index = catches['total_value'].idxmax()
print(f"Index of maximum: {max_index}")

# Get that entire row
catches.loc[max_index]

The yeller bird flock. Five specimens. Complete set. Delivered to Senator Huilof himself.

*The yeller bird sings a little, two of the five at once, sometimes three. In the wild, along the Dens, they hop out two or three at a time in the open, scavenging and watching, while the other two or three stay hidden, singing all's fair or something's foul.*

---

## Part 4: Top N with `nlargest()` and `nsmallest()`

More convenient than sorting and slicing.

In [None]:
# Top 5 catches by total value
catches.nlargest(5, 'total_value')

In [None]:
# Bottom 5 (non-zero) catches by value
catches[catches['total_value'] > 0].nsmallest(5, 'total_value')

In [None]:
# Top 10 by quantity (bulk catches)
catches.nlargest(10, 'quantity')

The largest catches by quantity are gritsmuck crawlers (40), quarry moths (22), and cave bats (12). Abundant but cheap—or, in the case of crawlers, worthless. Released rather than transported.

---

## Part 5: Summary Statistics

Quick calculations across the whole dataset.

In [None]:
# Total creatures caught this season
print(f"Total specimens caught: {catches['quantity'].sum()}")

# Total revenue generated
print(f"Total revenue: {catches['total_value'].sum()} units")

# Average catch size
print(f"Average catch size: {catches['quantity'].mean():.1f} specimens")

# Average price per unit
print(f"Average price per unit: {catches['price_per_unit'].mean():.1f} units")

In [None]:
# More detailed statistics
catches[['quantity', 'price_per_unit', 'total_value']].describe()

The median price (50%) is much lower than the mean—most catches are cheap, but the rare expensive specimens pull the average up.

---

## Part 6: Ranking

The `.rank()` method assigns a rank to each row.

In [None]:
# Rank catches by total value (highest = rank 1)
catches['value_rank'] = catches['total_value'].rank(ascending=False)

# View with ranks
catches[['catch_id', 'creature_id', 'total_value', 'value_rank']].sort_values('value_rank').head(10)

In [None]:
# What's the rank of the Maw Beast catch?
maw_beast = catches[catches['creature_id'] == 'CR005']
maw_beast[['catch_id', 'creature_id', 'total_value', 'value_rank']]

The Maw Beast ranks 2nd by total value—only the yeller bird flock was worth more.

---

## Part 7: Filtering Extremes

Combining filtering with statistical calculations.

In [None]:
# Find catches worth more than twice the average
avg_value = catches['total_value'].mean()
high_value = catches[catches['total_value'] > 2 * avg_value]
print(f"Average catch value: {avg_value:.1f}")
print(f"Catches worth more than {2 * avg_value:.1f}: {len(high_value)}")
high_value.sort_values('total_value', ascending=False)

In [None]:
# Find catches in the top 10%
threshold = catches['total_value'].quantile(0.90)
print(f"90th percentile value: {threshold}")
top_10_pct = catches[catches['total_value'] >= threshold]
print(f"Catches in top 10%: {len(top_10_pct)}")
top_10_pct.sort_values('total_value', ascending=False)

---

## Part 8: The Zero-Value Catches

Some catches have zero value. What are they?

In [None]:
# Find catches with no value
worthless = catches[catches['total_value'] == 0]
print(f"Worthless catches: {len(worthless)}")
worthless

Gritsmuck crawlers and marsh tree crawlers. These creatures have no market value—they're released rather than transported to the Capital.

Also the dead cave bats killed by the snare mechanism. Incidental casualties that no one wanted.

---

## Part 9: Live vs Dead Analysis

In [None]:
# How does condition affect value?
live = catches[catches['condition'] == 'live']
dead = catches[catches['condition'] == 'dead']

print(f"Live catches: {len(live)}")
print(f"  Average price per unit: {live['price_per_unit'].mean():.1f}")
print(f"  Total revenue: {live['total_value'].sum()}")
print()
print(f"Dead catches: {len(dead)}")
print(f"  Average price per unit: {dead['price_per_unit'].mean():.1f}")
print(f"  Total revenue: {dead['total_value'].sum()}")

Interesting—dead specimens actually have a higher average price. This is because the most valuable dead catch (the Maw Beast at 950) skews the average. In general, live specimens command premiums, but the truly dangerous creatures are worth more dead because they can't kill anyone during transport.

---

## Exercises

### Exercise 1: The Wharver Premium

Find all catches of wharvers (creature_id CR009). What's the difference in price between live and dead specimens?

In [None]:
# Your code here



### Exercise 2: Crew Performance

Which crew generated the most total revenue? (Hint: filter by crew, then sum total_value)

In [None]:
# Your code here
# Try calculating for each crew_id



### Exercise 3: The Senator's Shopping List

Find all catches destined for Senator Huilof. What's the total he spent?

In [None]:
# Your code here



### Exercise 4: Bulk vs Premium

Create two groups: "bulk" catches (quantity >= 10) and "premium" catches (quantity < 10, price_per_unit >= 100). How do their average total values compare?

In [None]:
# Your code here



### Exercise 5: Monthly Timeline

The catches span January through April 1855. Find the total revenue generated in each month.

*Hint: The date column can be filtered with string comparison. Try catches[catches['date'] < '1855-02-01'] for January.*

In [None]:
# Your code here



---

## Summary

In this tutorial, you learned:

| Concept | Code |
|---------|------|
| Create new column | `df['new'] = df['a'] * df['b']` |
| Maximum value | `df['column'].max()` |
| Minimum value | `df['column'].min()` |
| Index of max | `df['column'].idxmax()` |
| Sum | `df['column'].sum()` |
| Mean | `df['column'].mean()` |
| Top N rows | `df.nlargest(n, 'column')` |
| Bottom N rows | `df.nsmallest(n, 'column')` |
| Rank values | `df['column'].rank()` |
| Percentile | `df['column'].quantile(0.9)` |

---

## Next Tutorial

In **Tutorial 4: Grouping and Aggregation**, you will learn the powerful `groupby()` operation—splitting data into groups, calculating statistics for each group, and combining the results. This is how you answer questions like "which creature type generates the most revenue on average?" or "how does catch success vary by crew?"

*The archivists Grigsu, Yasho, Boffa, and Mink put down the standard line that yellers were not a variety of creature. Anything—provided it was living—could become a yeller grouping.*

*Common bats or cats of the Capital transported to Yeller Quarry had been observed flitting about their boxes in threes, purring off-on in an alternating yeller-style sequence, one cat then another then another, stop and start.*

*The eeriest thing.*

---