# Tutorial 6: Importing Pandas and Loading Data
## The Fiction Is the Ore. The Data Is the Refined Metal.

---

*In the deepest chambers of the Capital Archives, a different kind of work happens.*

*The regular archivists catalog manuscripts, expedition reports, and scholarly debates. But there is another group—the Data Extractors—who do something stranger. They read the narrative accounts and transform them into structured tables.*

*"The fiction is the ore," Chief Archivist Mink once explained to a confused apprentice. "Raw, rich, full of stories and contradictions. The structured data is the refined metal—precise, queryable, ready for analysis."*

*The apprentice frowned. "But doesn't something get lost in the transformation?"*

*"Something is always lost," Mink agreed. "But something is also gained. The ore tells you that a creature was 'terrifying beyond measure.' The data tells you its danger rating is 9 out of 10. Different truths. Both useful."*

*Today, you become a Data Extractor. You will load the refined metal and begin to work with it.*

---

## What You'll Learn

By the end of this tutorial, you will:
- Understand **import statements** and why we need them
- Load data from a **CSV file** using pandas
- Explore DataFrames with `.head()`, `.shape()`, and `.columns`
- Understand the transformation from **narrative to data**

## Part 1: Import Statements — Bringing in Tools

*The Archives don't keep all their tools in one room. The manuscript room has different equipment than the cartography chamber. When you need a specific tool, you request it.*

Python works the same way. The basic language is powerful, but specialized tools live in **libraries** (also called modules or packages). To use them, you **import** them:

In [None]:
# Import the pandas library and give it the nickname 'pd'
import pandas as pd

print("Pandas imported successfully!")
print(f"Pandas version: {pd.__version__}")

**What just happened:**
- `import pandas` brings in the pandas library
- `as pd` gives it a shorter nickname so we can write `pd` instead of `pandas`
- This is a convention—almost everyone uses `pd` for pandas

**Pandas** is the most important library for data science in Python. It provides the **DataFrame**—a table structure for working with data.

In [None]:
# You can import multiple libraries
import pandas as pd      # For data tables
import numpy as np       # For numerical operations (used by pandas internally)

print("Libraries loaded.")
print(f"Pandas: {pd.__version__}")
print(f"NumPy: {np.__version__}")

## Part 2: Loading Data from a CSV File

*The Data Extractors store their refined metal in CSV files—Comma-Separated Values. Each row is a record. Each column is an attribute.*

Let's load real Densworld data—the creature records from Yeller Quarry:

In [None]:
# The data lives on GitHub. This URL points to the raw CSV file.
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

# Load the creatures data
creatures = pd.read_csv(BASE_URL + "creatures.csv")

print("Data loaded successfully!")
print(f"Type of 'creatures': {type(creatures)}")

The data is now stored in a **DataFrame**—pandas' table structure. Let's look at it:

In [None]:
# Display the first few rows
creatures.head()

*This is the refined metal. Creatures that trappers feared, that killed named people in the narrative ore—now reduced to rows and columns. Name, habitat, danger rating, size, diet, primary defense.*

*The Maw Beast. In the ore, its entry reads: "his face was unrecognizable when Ox dragged his body out." In the data, it has a danger rating of 8.*

## Part 3: Exploring the DataFrame

Pandas DataFrames have built-in methods for exploration:

### `.head()` and `.tail()` — View First or Last Rows

In [None]:
# First 5 rows (default)
creatures.head()

In [None]:
# First 3 rows
creatures.head(3)

In [None]:
# Last 5 rows
creatures.tail()

### `.shape` — How Big Is the Data?

In [None]:
# Shape returns (rows, columns)
print(f"Shape: {creatures.shape}")
print(f"Number of rows: {creatures.shape[0]}")
print(f"Number of columns: {creatures.shape[1]}")

The creatures table has the records of every creature cataloged in Yeller Quarry. Each row is one creature species. Each column is one attribute.

### `.columns` — What Are the Column Names?

In [None]:
# List all column names
print("Columns in the creatures dataset:")
print(creatures.columns.tolist())

In [None]:
# Or display them one per line
print("Columns:")
for col in creatures.columns:
    print(f"  - {col}")

### `.info()` — Overview of Data Types

In [None]:
# Get info about each column
creatures.info()

This tells you:
- How many rows (entries)
- Each column's name and data type
- How many non-null values (missing data appears as null)
- Memory usage

## Part 4: Accessing Columns

*"To analyze the danger ratings," the Data Extractor explained, "you first extract just that column."*

You can access a single column using bracket notation:

In [None]:
# Get just the creature names
creatures["name"]

In [None]:
# Get the danger ratings
creatures["danger_rating"]

In [None]:
# Get multiple columns by passing a list
creatures[["name", "danger_rating", "habitat"]]

## Part 5: Basic Statistics with `.describe()`

*"The data reveals what narrative can only hint at," Mink said. "Summary statistics show patterns."*

In [None]:
# Get summary statistics for numeric columns
creatures.describe()

This shows:
- **count**: number of non-null values
- **mean**: average
- **std**: standard deviation (spread)
- **min/max**: smallest and largest values
- **25%/50%/75%**: percentiles (50% is the median)

In [None]:
# Statistics for a single column
print(f"Average danger rating: {creatures['danger_rating'].mean():.2f}")
print(f"Most dangerous: {creatures['danger_rating'].max()}")
print(f"Least dangerous: {creatures['danger_rating'].min()}")

## Part 6: Loading More Densworld Data

*The Archives contain many datasets. Each captures a different aspect of the world.*

Let's load data about trapping crews:

In [None]:
# Load the crews data
crews = pd.read_csv(BASE_URL + "crews.csv")

print(f"Crews dataset: {crews.shape[0]} rows, {crews.shape[1]} columns")
crews.head()

In [None]:
# What columns do we have?
print("Crew attributes:")
for col in crews.columns:
    print(f"  - {col}")

In [None]:
# Load catch records
catches = pd.read_csv(BASE_URL + "catches.csv")

print(f"Catches dataset: {catches.shape[0]} rows, {catches.shape[1]} columns")
catches.head()

## Part 7: The Transformation — Ore to Metal

*Let's pause and reflect on what we've done.*

Somewhere in the narrative ore—70,000 lines of fiction—there are descriptions like this:

> *The Leatherback Burrower moves through the soil like water through cloth. Trappers say it can sense vibrations from a hundred yards away. It's not the most dangerous creature in the Quarry—that distinction belongs to the Maw Beast—but its unpredictability makes it feared.*

The Data Extractors transformed that into:

In [None]:
# Find the Leatherback Burrower in the data
burrower = creatures[creatures["name"] == "Leatherback Burrower"]
burrower

*A row. A set of attributes. The narrative richness compressed into structured columns.*

**What's lost:**
- The poetry of "moves through soil like water through cloth"
- The trapper's perspective and fear
- The comparative context with other creatures

**What's gained:**
- Queryable data (`danger_rating == 6`)
- Comparable values (6 vs 8 for Maw Beast)
- Aggregations (average danger rating across all creatures)
- Patterns invisible in narrative

In [None]:
# Questions you can now answer with data:

# What's the most dangerous creature?
most_dangerous = creatures.loc[creatures["danger_rating"].idxmax()]
print(f"Most dangerous: {most_dangerous['name']} (rating: {most_dangerous['danger_rating']})")

# How many creatures are in each habitat?
print("\nCreatures by habitat:")
print(creatures["habitat"].value_counts())

## Part 8: Your Data Journey Begins

*"You've loaded your first datasets," Mink said. "This is where the real work begins."*

Let's summarize what data we now have access to:

In [None]:
# Summary of loaded datasets
datasets = {
    "creatures": creatures,
    "crews": crews,
    "catches": catches
}

print("DENSWORLD DATA INVENTORY")
print("=" * 50)
for name, df in datasets.items():
    print(f"\n{name.upper()}")
    print(f"  Rows: {df.shape[0]}")
    print(f"  Columns: {df.shape[1]}")
    print(f"  Columns: {', '.join(df.columns[:5])}{'...' if len(df.columns) > 5 else ''}")

## Practice Exercises

*The Chief Archivist has some tasks for you.*

### Exercise 1: Load and Inspect

Load the `traps.csv` dataset from the same BASE_URL. Display its shape and the first 5 rows.

In [None]:
# Your code here:
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

# Load traps.csv

# Print the shape

# Display first 5 rows


### Exercise 2: Column Exploration

Using the `creatures` DataFrame, answer these questions:
1. What are all the unique habitats?
2. What is the median danger rating?
3. How many creatures have a danger rating above 7?

In [None]:
# Your code here:

# 1. Unique habitats (hint: use .unique() on a column)


# 2. Median danger rating (hint: use .median())


# 3. Creatures with danger rating > 7 (hint: filter then count with len())


### Exercise 3: Cross-Dataset Question

Looking at the `crews` dataset:
1. How many crews are there?
2. What columns describe each crew?
3. Use `.describe()` to see summary statistics for numeric columns.

In [None]:
# Your code here:

# 1. Number of crews


# 2. Column names


# 3. Summary statistics


### Exercise 4: Find a Specific Record

In the `catches` dataset, find all records where `creature_name` is "Maw Beast". How many Maw Beast catches are recorded?

In [None]:
# Your code here:
# Hint: catches[catches["creature_name"] == "Maw Beast"]


## Summary

You've learned:

| Concept | What It Does | Example |
|---------|--------------|----------|
| **import** | Load a library | `import pandas as pd` |
| **pd.read_csv()** | Load CSV into DataFrame | `df = pd.read_csv(url)` |
| **.head()** | View first rows | `df.head(5)` |
| **.tail()** | View last rows | `df.tail(5)` |
| **.shape** | Get (rows, cols) | `df.shape` |
| **.columns** | Get column names | `df.columns` |
| **.info()** | Overview of data | `df.info()` |
| **.describe()** | Summary statistics | `df.describe()` |
| **df["col"]** | Select one column | `df["name"]` |
| **df[[cols]]** | Select multiple columns | `df[["name", "age"]]` |

Key pattern for loading Densworld data:
```python
import pandas as pd

BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/yeller-quarry-data-science/main/data/"
df = pd.read_csv(BASE_URL + "filename.csv")

# Explore
df.head()
df.shape
df.columns
df.describe()
```

## What's Next?

In **Tutorial 7: Exploring DataFrames**, you'll learn:
- How to **filter rows** based on conditions
- How to **sort data** by column values
- How to **group and aggregate** data
- How to answer complex questions about Densworld creatures

---

*The apprentice looked at the creatures DataFrame, rows and columns of refined metal where once there had been stories.*

*"I understand now," she said. "The ore tells you that the Maw Beast is terrifying. The data tells you it's danger rating 8 of 10."*

*Mink nodded. "Both are true. The ore is where the world lives. The data is how we analyze it."*

*"And which is more valuable?"*

*"Neither," Mink said. "They need each other. The fiction without data is just a story. The data without fiction is just numbers. Together, they're Densworld."*

*You've crossed the threshold. You've loaded real data from the world's archives. The exploration begins.*