# 🐼 Python Pandas for GIS Data Analysis - Learning Guide

**Welcome to your pandas learning journey!** 🎉

This notebook will guide you through the complete assignment process using a **professional data science workflow**: prototype and learn in notebooks, then implement production code in Python files.

---

## 🎯 Assignment Overview

You'll learn essential pandas skills by implementing **5 functions** that work with real environmental monitoring data:

1. **Load and explore** CSV files (like opening a spreadsheet)
2. **Filter data** based on conditions (remove bad quality readings)
3. **Calculate statistics** by groups (summarize by weather station)
4. **Join datasets** together (combine location info with readings)
5. **Save processed data** (export results for QGIS or Excel)

**💡 Key Learning Goal:** Master the pandas skills needed for real-world GIS data analysis!

## 📚 Your Learning Path

### 🔄 The Professional Workflow

This assignment teaches you how professional data scientists actually work:

1. **📓 Explore & Learn** → Use Jupyter notebooks to understand the problem
2. **💻 Implement & Test** → Write production code in `.py` files
3. **🧪 Validate & Deploy** → Run unit tests to ensure code quality

### 📝 Step-by-Step Process

For each of the 5 functions:

```
1. 📖 READ the learning notebook
   ↓
2. 🧠 UNDERSTAND how the function works
   ↓  
3. ✍️ IMPLEMENT the function in src/pandas_basics.py
   ↓
4. 🧪 TEST with: uv run pytest tests/test_function_name.py::test_function_name -v
   ↓
5. 🔄 DEBUG and iterate until tests pass
   ↓
6. ✅ MOVE to the next function
```

## 🗂️ Notebook Navigation Guide

Work through these notebooks **in order** - each builds on the previous one:

---

### 📊 Function 1: Load and Explore Data
**Notebook:** [`01_function_load_and_explore_gis_data.ipynb`](01_function_load_and_explore_gis_data.ipynb)

**What you'll learn:**
- Loading CSV files with `pd.read_csv()`
- Exploring data shape, columns, and types
- Checking for missing values and data quality
- Professional error handling

**Test command:**
```bash
uv run pytest tests/test_pandas_basics.py::test_load_and_explore_gis_data -v
```

---

### 🔍 Function 2: Filter Environmental Data  
**Notebook:** [`02_function_filter_environmental_data.ipynb`](02_function_filter_environmental_data.ipynb)

**What you'll learn:**
- Boolean indexing for data filtering
- Combining multiple conditions with `&` and `|`
- Data quality assessment and cleaning
- Reporting filtering statistics

**Test command:**
```bash
uv run pytest tests/test_pandas_basics.py::test_filter_environmental_data -v
```

---

### 📈 Function 3: Calculate Station Statistics
**Notebook:** [`03_function_calculate_station_statistics.ipynb`](03_function_calculate_station_statistics.ipynb)

**What you'll learn:**
- Grouping data with `.groupby()`
- Calculating aggregate statistics (mean, count, etc.)
- Finding extremes (hottest/coolest stations)
- Creating summary DataFrames

**Test command:**
```bash
uv run pytest tests/test_pandas_basics.py::test_calculate_station_statistics -v
```

---

### 🔗 Function 4: Join Station Data
**Notebook:** [`04_function_join_station_data.ipynb`](04_function_join_station_data.ipynb)

**What you'll learn:**
- Joining DataFrames with `pd.merge()`
- Understanding different join types (left, right, inner, outer)
- Analyzing data relationships
- Handling missing data in joins

**Test command:**
```bash
uv run pytest tests/test_pandas_basics.py::test_join_station_data -v
```

---

### 💾 Function 5: Save Processed Data
**Notebook:** [`05_function_save_processed_data.ipynb`](05_function_save_processed_data.ipynb)

**What you'll learn:**
- Saving DataFrames to CSV with `.to_csv()`
- File path handling and directory creation
- Data validation and integrity checking
- Professional error handling for file operations

**Test command:**
```bash
uv run pytest tests/test_pandas_basics.py::test_save_processed_data -v
```

## 🧪 Testing Your Implementation

### Individual Function Testing
Test each function as you implement it:

```bash
# Replace 'function_name' with the actual function
uv run pytest tests/test_pandas_basics.py::test_function_name -v
```

### Complete Test Suite
When all functions are complete:

```bash
# Test everything
uv run pytest tests/ -v

# Should show all PASSED for full credit
```

### Understanding Test Results

✅ **PASSED** = Your function works correctly!  
❌ **FAILED** = Need to fix implementation (error message tells you what's wrong)  
⚠️ **ERROR** = Usually syntax error or missing import

## 📁 Project Structure Overview

Understanding where everything goes:

```
python-pandas/
├── notebooks/              # 📚 Learning materials (THIS directory)
│   ├── 00_start_here_overview.ipynb    # 👈 This file!
│   ├── 01_function_load_and_explore_gis_data.ipynb
│   ├── 02_function_filter_environmental_data.ipynb
│   ├── 03_function_calculate_station_statistics.ipynb
│   ├── 04_function_join_station_data.ipynb
│   └── 05_function_save_processed_data.ipynb
│
├── src/
│   └── pandas_basics.py     # 🎯 WHERE YOU IMPLEMENT YOUR CODE
│
├── tests/
│   └── test_pandas_basics.py # 🧪 Unit tests (pre-written for you)
│
├── data/
│   ├── weather_stations.csv      # 📊 Sample data
│   ├── temperature_readings.csv  # 📊 Sample data
│   └── data_dictionary.md        # 📖 Data explanations
│
└── output/                  # 📁 Where saved files go
```

## 💡 Essential Tips for Success

### 🎯 Focus on Learning, Not Just Completing
- **Read the notebook explanations carefully**
- **Run the example code** to see how it works
- **Experiment** with the data to understand patterns

### 🔍 Debugging Strategies
1. **Read error messages carefully** - they tell you exactly what's wrong
2. **Test with small data first** - easier to debug
3. **Print intermediate results** - see what your code is actually doing
4. **Compare with notebook examples** - make sure you understand the approach

### ⚡ Efficiency Tips
- **Work incrementally** - implement one TODO at a time
- **Test frequently** - catch errors early
- **Use descriptive variable names** - makes debugging easier
- **Keep functions simple** - follow the patterns from notebooks

### 🆘 When You're Stuck
1. **Re-read the relevant notebook section**
2. **Check the test error message** - it tells you what's expected
3. **Look at the sample data** to understand the structure
4. **Ask on the course forum** with specific error messages
5. **Come to office hours** for personalized help

## 🎓 Why This Assignment Matters

### 🌍 Real-World Applications
The skills you're learning are used daily by:
- **Environmental scientists** analyzing climate data
- **Urban planners** processing census and demographic data  
- **Hydrologists** studying water quality and flow patterns
- **Agricultural researchers** analyzing crop and soil data
- **GIS professionals** preparing data for mapping and analysis

### 🚀 Career Skills
You're learning:
- **Data manipulation** - core skill for any data-related role
- **Quality assurance** - ensuring data reliability
- **Professional workflows** - notebooks → code → tests
- **Problem-solving** - breaking complex tasks into steps

### 🔗 Next Steps
This assignment prepares you for:
- **GeoPandas** - spatial data analysis with pandas
- **Rasterio** - working with satellite and aerial imagery
- **PostGIS** - spatial databases and queries
- **Advanced GIS programming** - automating complex analyses

## 🚦 Ready to Start?

### ✅ Pre-Flight Checklist

Before you begin, make sure you have:

- [ ] **Environment working** - pandas and pytest installed (`uv sync`)
- [ ] **Sample data available** - check that `../data/` contains CSV files
- [ ] **Tests running** - try `uv run pytest tests/ --collect-only`
- [ ] **Notebook access** - can open and run notebooks

### 🎬 Your First Step

**👉 Open [`01_function_load_and_explore_gis_data.ipynb`](01_function_load_and_explore_gis_data.ipynb) and start learning!**

---

## 🎉 Final Encouragement

**You've got this!** 💪

This assignment might seem challenging at first, but remember:
- **Every professional started as a beginner**
- **The notebooks guide you step-by-step**
- **The tests tell you when you're on the right track**
- **Each function builds your confidence**

Take your time, read carefully, test frequently, and don't hesitate to ask for help when needed.

**Happy coding! 🐍📊🗺️**