# Tutorial 1: The Creature Catalog of Yeller Quarry
## Loading and Exploring Data with Pandas

---

### A Note from the Capital Archives

*To the apprentice archivist:*

*You have been assigned to the Yeller Quarry desk. This is not a punishment, whatever your colleagues may whisper. The Quarry materials are among the most valuable in the Senate collection, and the most dangerous to mishandle.*

*Before you can work with expedition reports, trade manifests, or incident logs, you must first master the creature catalog. Every archivist who has tried to make sense of Quarry data without understanding what crawls, flies, and slithers in those caves has produced work that is, at best, useless, and at worst, has gotten trappers killed.*

*The catalog you will study was compiled by the archivists Grigsu, Yasho, Boffa, and Mink in their book of essays about the Yeller Quarry—maps by Grigsu, illustrations by Boffa. It remains the standard reference, though field crews often have... opinions... about its accuracy.*

*Begin your study. The Senator expects competence.*

—*Chief Archivist, Capital Senate House*

---

## What You Will Learn

In this tutorial, you will learn to:

1. Import the pandas library
2. Load a CSV file into a DataFrame
3. Inspect the basic properties of your data
4. Understand data types
5. Access specific rows and columns

By the end, you will be able to answer questions like:
- How many creature species are cataloged from the Quarry?
- Which creatures are most dangerous?
- What is the metal content of the infamous wharvers?

---

## Part 1: Importing Pandas

Every journey into data begins the same way: importing the tools you need.

**Pandas** is a Python library for working with structured data—tables with rows and columns, like the ledgers in an archivist's shop. The convention is to import it with the alias `pd`.

In [None]:
import pandas as pd

# Confirm the import worked
print(f"Pandas version: {pd.__version__}")
print("The Archives are open.")

---

## Part 2: Loading the Creature Catalog

The creature catalog is stored in a **CSV file** (Comma-Separated Values). This is the most common format for tabular data—each row is a creature, each column is an attribute.

We use `pd.read_csv()` to load it into a **DataFrame**, which is pandas' primary data structure.

In [None]:
# Load the creature catalog
creatures = pd.read_csv('data/creatures.csv')

# The variable 'creatures' now contains our DataFrame
print("Catalog loaded successfully.")
print(f"Type: {type(creatures)}")

---

## Part 3: First Look at the Data

An archivist never works blind. Before analyzing anything, you must *look* at what you have.

### The `.head()` method

Shows the first few rows of your DataFrame (default: 5 rows).

In [None]:
# View the first 5 creatures in the catalog
creatures.head()

The first creature listed is the **Leatherback Burrower** (*Avem subterrus*). Note that it has a danger rating of 3 and a metal content of 12.3%. These birds fly in caves and are prized for their wing-leather.

Already you can see the kind of information the archivists have collected: names, categories, physical properties, habitat, and notes from field observations.

### The `.tail()` method

Shows the last few rows.

In [None]:
# View the last 5 creatures
creatures.tail()

At the bottom of the catalog, you'll find creatures like the **Deep Borer** and **Quarry Moth**. The Deep Borer creates the tunnel systems that trappers follow into the darkness.

---

## Part 4: Understanding the Shape of Your Data

How many creatures? How many attributes? The `.shape` property tells you.

In [None]:
# Get the dimensions: (rows, columns)
print(f"Shape: {creatures.shape}")
print(f"Number of creatures cataloged: {creatures.shape[0]}")
print(f"Number of attributes recorded: {creatures.shape[1]}")

The archivists have cataloged 25 distinct species, each described by 14 attributes.

### Column Names

What are those 14 attributes? The `.columns` property lists them.

In [None]:
# List all column names
print("Columns in the creature catalog:")
for col in creatures.columns:
    print(f"  - {col}")

These columns tell us:
- **creature_id**: A unique identifier (CR001, CR002, etc.)
- **common_name**: What the trappers call it
- **scientific_name**: The archivists' formal classification
- **category**: Broad type (bird, reptile, mammal, etc.)
- **subcategory**: More specific classification
- **metal_content_pct**: Percentage of metallic matter in the creature's body (a distinctive feature of Quarry fauna)
- **danger_rating**: Scale of 1-10, how likely to kill you
- **avg_weight_kg**: Average specimen weight
- **yeller_compatible**: Whether the creature can become part of a "yeller group" (more on this later)
- **typical_depth_m**: How deep in the caves they're usually found
- **primary_habitat**: Where they live
- **conservation_status**: How common they are
- **capital_demand**: How much the Capital wants them
- **notes**: Field observations

---

## Part 5: Data Types

Not all data is the same. Numbers behave differently than text. The `.dtypes` property shows the data type of each column.

In [None]:
# Check data types
creatures.dtypes

You'll see types like:
- `object`: Usually text (strings)
- `int64`: Whole numbers (integers)
- `float64`: Decimal numbers
- `bool`: True/False values

Notice that `metal_content_pct`, `danger_rating`, `avg_weight_kg`, and `typical_depth_m` are numeric—we can do math with them. Meanwhile, `common_name` and `category` are text—we can filter and group by them.

---

## Part 6: The `.info()` Method

For a comprehensive summary of your DataFrame, use `.info()`. This shows column names, data types, and how many non-null (non-missing) values exist.

In [None]:
# Comprehensive DataFrame info
creatures.info()

This tells us at a glance whether we have missing data (we'll deal with that in later tutorials) and how much memory our DataFrame uses.

---

## Part 7: Accessing Columns

To work with a single column, you can access it by name using square brackets or dot notation.

### Method 1: Bracket notation (always works)

In [None]:
# Get just the common names
creatures['common_name']

### Method 2: Dot notation (shorter, but doesn't work if column name has spaces)

In [None]:
# Same result
creatures.common_name

A single column is called a **Series**. It's like a list with an index.

In [None]:
# What type is a single column?
print(type(creatures['common_name']))

### Selecting Multiple Columns

Pass a list of column names to get multiple columns back as a new DataFrame.

In [None]:
# Select just name, danger, and metal content
creatures[['common_name', 'danger_rating', 'metal_content_pct']]

---

## Part 8: Accessing Rows

### By position with `.iloc[]`

Use `.iloc[]` to access rows by their integer position (0-indexed).

In [None]:
# Get the first creature (index 0)
creatures.iloc[0]

In [None]:
# Get the fifth creature (index 4)
# This should be the Maw Beast - the creature that killed Truck
creatures.iloc[4]

The **Maw Beast** (*Bestia vorax*). Danger rating: 8. Can digest anything. Found at 45 meters depth.

This is the creature that attacked the Redmane Expedition. When Truck jumped in front of the Boss to protect her, this is what he faced. The expedition notes say his face was unrecognizable when Ox dragged his body out.

### Slicing rows

In [None]:
# Get rows 5-9 (creatures at index 5, 6, 7, 8, 9)
creatures.iloc[5:10]

---

## Part 9: Basic Statistics with `.describe()`

For numeric columns, `.describe()` gives you summary statistics.

In [None]:
# Summary statistics for all numeric columns
creatures.describe()

This tells us:
- **count**: How many non-missing values
- **mean**: The average
- **std**: Standard deviation (how spread out the values are)
- **min/max**: The extremes
- **25%/50%/75%**: Quartiles (the 50% is the median)

From this, we can already see:
- Average metal content is about 22.5%
- Danger ratings range from 1 to 10, with an average around 3.3
- Most creatures are found in the first 15 meters, but some go as deep as 60

---

## Part 10: Value Counts

For categorical columns, `.value_counts()` tells you how many of each value exist.

In [None]:
# How many creatures in each category?
creatures['category'].value_counts()

Birds are the most common category in the catalog, followed by reptiles. This makes sense—birds can be caught in sphere traps hung from trees, while reptiles require more dangerous pit and box traps.

In [None]:
# What about habitats?
creatures['primary_habitat'].value_counts()

In [None]:
# Conservation status
creatures['conservation_status'].value_counts()

Only 2 creatures are listed as "mythical"—the Grimslew Fish and the Witch Creature. The archivists Grigsu, Yasho, Boffa, and Mink include these for completeness, but their existence is disputed.

The Boss insisted there were no witches in the caves. Truck's faceless corpse suggests otherwise.

---

## Exercises

Now it's your turn. Complete the following exercises to test your understanding.

### Exercise 1: Finding the Most Dangerous

Which creature has the highest danger rating?

In [None]:
# Your code here
# Hint: Use .max() on the danger_rating column to find the highest value
# Then find which creature has that value



### Exercise 2: The Wharver's Metal Content

The wharver (CR009) is the rusted creature that lives on Grimslew Shore. What is its metal content percentage?

In [None]:
# Your code here
# Hint: Use .iloc[] to get row 8 (remember, indexing starts at 0)



### Exercise 3: Capital Demand

How many creatures have "extreme" demand in the Capital?

In [None]:
# Your code here
# Hint: Use .value_counts() on the capital_demand column



### Exercise 4: Yeller-Compatible Creatures

How many creatures can become part of a "yeller group"? (The yeller_compatible column is True/False)

*Note: The yeller phenomenon is one of the strangest aspects of Quarry biology. Anything living—bats, cats, even frogs—can become synchronized into groups of 2, 3, 5, 7, or rarely 11. They move and act as one. The archivists say it's the eeriest thing.*

In [None]:
# Your code here
# Hint: value_counts() works on boolean columns too



### Exercise 5: Deep Creatures

Which creatures live at depths of 40 meters or more? List their names.

*These are the creatures only crews like the Deep Tunnel Syndicate ever see. The casualty rate in those tunnels is the highest in the Quarry.*

In [None]:
# Your code here
# This one is a preview of filtering, which we'll cover in the next tutorial
# But try: creatures[creatures['typical_depth_m'] >= 40]



---

## Summary

In this tutorial, you learned:

| Concept | Code |
|---------|------|
| Import pandas | `import pandas as pd` |
| Load a CSV | `pd.read_csv('filename.csv')` |
| View first rows | `df.head()` |
| View last rows | `df.tail()` |
| Check dimensions | `df.shape` |
| List columns | `df.columns` |
| Check data types | `df.dtypes` |
| Full info | `df.info()` |
| Access one column | `df['column_name']` |
| Access multiple columns | `df[['col1', 'col2']]` |
| Access row by position | `df.iloc[n]` |
| Summary statistics | `df.describe()` |
| Count unique values | `df['column'].value_counts()` |

---

## Next Tutorial

In **Tutorial 2: Trap Deployment Records**, you will learn to filter and select data—finding exactly the rows that match your criteria. You'll work with the trap deployment records from multiple crews, including the ill-fated Redmane Expedition.

*The Boss's traps looked as though they had been constructed by tossing scraps of wire and wood to an ape. None of the corners were square. Few of the edges were flush. But the trappers who knew her said the design showed the heart of the maker.*

*And what a heart that was.*

---