[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mihiarc/pyfia/blob/main/notebooks/02_core_estimators.ipynb)

---

In [None]:
# Google Colab Setup - Run this cell first!
import sys
if 'google.colab' in sys.modules:
    print("Running in Google Colab - installing pyFIA...")
    !pip install -q pyfia polars duckdb matplotlib rich
    
    # Download helpers.py for Colab
    import urllib.request
    helpers_url = "https://raw.githubusercontent.com/mihiarc/pyfia/main/notebooks/helpers.py"
    urllib.request.urlretrieve(helpers_url, "helpers.py")
    print("Setup complete! You may now run the remaining cells.")
else:
    print("Running locally - no additional setup needed.")

# Core Estimators: Area, Volume, Biomass, TPA

This notebook covers pyFIA's main estimation functions for forest inventory analysis.

## What You'll Learn

1. **Area estimator** - Forest and timberland area
2. **Volume estimator** - Net, gross, sound, and sawlog volume
3. **Biomass estimator** - Aboveground, belowground, and carbon
4. **TPA estimator** - Trees per acre and basal area
5. Using `grp_by` for grouped results
6. Adding reference names with `join_species_names()`
7. Enabling variance estimates

**Prerequisites**: Complete Notebook 1 (Getting Started)

**Estimated time**: 45 minutes

---

## Setup

In [None]:
# Core imports
from pyfia import (
    FIA, 
    area, 
    volume, 
    biomass, 
    tpa,
    join_species_names,
    join_forest_type_names,
)
import polars as pl
import matplotlib.pyplot as plt

# Notebook helpers
from helpers import ensure_ri_data, display_estimate, plot_by_category

# Ensure data is available
db_path = ensure_ri_data()
print("Ready to begin!")

---

## 1. Area Estimator

The `area()` function estimates land area meeting specified criteria. We covered basics in Notebook 1; here we'll explore more options.

### Basic Area Estimation

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    # Total forest area
    forest_area = area(db)
    
display_estimate(forest_area, title="Total Forest Area")

### Area by Forest Type with Names

Forest type codes are numeric (e.g., 503 = White oak / red oak / hickory). Use `join_forest_type_names()` to add human-readable names.

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    result = area(db, grp_by="FORTYPCD")

# Add forest type names
result_named = join_forest_type_names(result, "FORTYPCD")

# Show top 10 by area
top_10 = result_named.sort("AREA_TOTAL", descending=True).head(10)
display_estimate(top_10.select(["FORTYPCD_NAME", "AREA_TOTAL", "AREA_SE", "N_PLOTS"]), 
                 title="Top 10 Forest Types")

In [None]:
# Visualize
fig = plot_by_category(
    top_10,
    category_col="FORTYPCD_NAME",
    value_col="AREA_TOTAL",
    error_col="AREA_SE",
    title="Rhode Island Forest Area by Type",
    xlabel="Area (acres)"
)
plt.show()

### Area by Stand Size Class

Stand size class (`STDSZCD`) indicates the predominant tree size:

| Code | Description |
|------|-------------|
| 1 | Large diameter (≥11" softwoods, ≥9" hardwoods) |
| 2 | Medium diameter |
| 3 | Small diameter |
| 5 | Nonstocked |

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    result = area(db, grp_by="STDSZCD", land_type="timber")

# Add descriptive names
size_names = {1: "Large diameter", 2: "Medium diameter", 3: "Small diameter", 5: "Nonstocked"}
result_named = result.with_columns(
    pl.col("STDSZCD").replace(size_names).alias("Stand Size")
)

display_estimate(result_named.select(["Stand Size", "AREA_TOTAL", "AREA_SE_PERCENT"]),
                 title="Timberland by Stand Size Class")

---

## 2. Volume Estimator

The `volume()` function estimates tree volume. This is fundamental for timber inventory and forest management.

### Volume Types

| `vol_type` | Description |
|------------|-------------|
| `"net"` | Net cubic foot volume (default) - merchantable wood minus defect |
| `"gross"` | Gross cubic foot volume - total wood including defect |
| `"sound"` | Sound cubic foot volume - excludes rotten wood |
| `"sawlog"` | Board foot sawlog volume (Scribner) - lumber potential |

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    # Net volume (default)
    result = volume(db)
    
display_estimate(result, title="Total Net Volume")

### Understanding Volume Results

| Column | Description | Unit |
|--------|-------------|------|
| `VOLCFNET_ACRE` | Volume per acre | cubic feet/acre |
| `VOLCFNET_ACRE_SE` | Standard error per acre | cubic feet/acre |
| `VOLCFNET_TOTAL` | Total volume | cubic feet |
| `AREA_TOTAL` | Total forest area | acres |
| `N_PLOTS` | Plots with trees | count |
| `N_TREES` | Trees measured | count |

In [None]:
# Extract and display key metrics
vol_per_acre = result["VOLCFNET_ACRE"][0]
total_vol = result["VOLCFNET_TOTAL"][0]
area_total = result["AREA_TOTAL"][0]

print(f"Rhode Island Forest Volume Summary:")
print(f"  Volume per acre: {vol_per_acre:,.0f} cubic feet")
print(f"  Total volume: {total_vol/1e6:,.1f} million cubic feet")
print(f"  Forest area: {area_total:,.0f} acres")

### Volume by Species

Use `by_species=True` (shortcut for `grp_by="SPCD"`) and `join_species_names()` for readable output.

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    result = volume(db, by_species=True)

# Add species names
result_named = join_species_names(result, "SPCD")

# Top 10 species by total volume
top_species = result_named.sort("VOLCFNET_TOTAL", descending=True).head(10)
display_estimate(
    top_species.select(["SPCD_NAME", "VOLCFNET_ACRE", "VOLCFNET_TOTAL", "N_TREES"]),
    title="Top 10 Species by Volume"
)

In [None]:
# Visualize species volume
fig = plot_by_category(
    top_species,
    category_col="SPCD_NAME",
    value_col="VOLCFNET_TOTAL",
    title="Volume by Species (Top 10)",
    xlabel="Total Volume (cubic feet)"
)
plt.show()

### Comparing Volume Types

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    net_vol = volume(db, vol_type="net")
    gross_vol = volume(db, vol_type="gross")
    sawlog_vol = volume(db, vol_type="sawlog")

print("Volume Comparison:")
print(f"  Net volume:    {net_vol['VOLCFNET_TOTAL'][0]/1e6:,.1f} million cu ft")
print(f"  Gross volume:  {gross_vol['VOLCFGRS_TOTAL'][0]/1e6:,.1f} million cu ft")
print(f"  Sawlog volume: {sawlog_vol['VOLBFNET_TOTAL'][0]/1e6:,.1f} million board ft")

### Live vs. Dead Trees

Use `tree_type` to filter by tree status:
- `"live"` (default) - Living trees only
- `"dead"` - Standing dead trees only
- `"all"` - Both live and dead

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    live_vol = volume(db, tree_type="live")
    dead_vol = volume(db, tree_type="dead")

print(f"Live tree volume: {live_vol['VOLCFNET_TOTAL'][0]/1e6:,.1f} million cu ft")
print(f"Dead tree volume: {dead_vol['VOLCFNET_TOTAL'][0]/1e6:,.1f} million cu ft")
print(f"Dead/Live ratio:  {dead_vol['VOLCFNET_TOTAL'][0]/live_vol['VOLCFNET_TOTAL'][0]*100:.1f}%")

---

## 3. Biomass Estimator

The `biomass()` function estimates tree biomass and carbon. Critical for carbon accounting and climate analysis.

### Biomass Components

| `component` | Description |
|-------------|-------------|
| `"AG"` | Aboveground biomass (stem, branches, foliage) |
| `"BG"` | Belowground biomass (roots) |
| `"TOTAL"` | Combined above and belowground (default) |

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    result = biomass(db)
    
display_estimate(result, title="Total Forest Biomass")

### Understanding Biomass Results

| Column | Description | Unit |
|--------|-------------|------|
| `BIOMASS_ACRE` | Biomass per acre | short tons/acre |
| `CARBON_ACRE` | Carbon per acre | short tons/acre |
| `BIOMASS_TOTAL` | Total biomass | short tons |
| `CARBON_TOTAL` | Total carbon | short tons |

**Note**: Carbon ≈ 47% of biomass (standard conversion factor)

In [None]:
# Compare biomass components
with FIA(db_path) as db:
    db.clip_most_recent()
    
    ag = biomass(db, component="AG")
    bg = biomass(db, component="BG")
    total = biomass(db, component="TOTAL")

print("Biomass by Component:")
print(f"  Aboveground: {ag['BIOMASS_TOTAL'][0]/1e6:,.2f} million short tons")
print(f"  Belowground: {bg['BIOMASS_TOTAL'][0]/1e6:,.2f} million short tons")
print(f"  Total:       {total['BIOMASS_TOTAL'][0]/1e6:,.2f} million short tons")
print(f"\nCarbon stored: {total['CARBON_TOTAL'][0]/1e6:,.2f} million short tons")

### Carbon by Ownership

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    result = biomass(db, grp_by="OWNGRPCD")

# Add ownership names
ownership_names = {10: "National Forest", 20: "Other Federal", 30: "State/Local", 40: "Private"}
result_named = result.with_columns(
    pl.col("OWNGRPCD").replace(ownership_names).alias("Ownership")
)

display_estimate(
    result_named.select(["Ownership", "CARBON_TOTAL", "CARBON_ACRE", "AREA_TOTAL"]),
    title="Carbon by Ownership"
)

---

## 4. TPA Estimator (Trees Per Acre)

The `tpa()` function estimates tree density and basal area. Essential for stand structure analysis.

### TPA Results

| Column | Description | Unit |
|--------|-------------|------|
| `TPA_ACRE` | Trees per acre | trees/acre |
| `BAA_ACRE` | Basal area per acre | sq ft/acre |
| `TPA_TOTAL` | Total trees | trees |
| `BAA_TOTAL` | Total basal area | sq ft |

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    result = tpa(db)
    
display_estimate(result, title="Tree Density")

print(f"\nRhode Island has approximately:")
print(f"  {result['TPA_ACRE'][0]:,.0f} trees per acre")
print(f"  {result['BAA_ACRE'][0]:,.0f} sq ft basal area per acre")
print(f"  {result['TPA_TOTAL'][0]/1e6:,.0f} million total trees")

### TPA by Size Class

Use `by_size_class=True` to group trees into diameter classes. This creates 2-inch classes.

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    result = tpa(db, by_size_class=True)
    
# Sort by size class for display
result_sorted = result.sort("SIZE_CLASS")
display_estimate(
    result_sorted.select(["SIZE_CLASS", "TPA_ACRE", "BAA_ACRE", "N_TREES"]),
    title="Trees by Diameter Class"
)

In [None]:
# Visualize diameter distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

size_classes = result_sorted["SIZE_CLASS"].to_list()
tpa_values = result_sorted["TPA_ACRE"].to_list()
baa_values = result_sorted["BAA_ACRE"].to_list()

# TPA chart
ax1.bar(size_classes, tpa_values, color='#2E7D32', edgecolor='white')
ax1.set_xlabel('Diameter Class (inches)')
ax1.set_ylabel('Trees per Acre')
ax1.set_title('Tree Density by Size Class')
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)

# Basal area chart  
ax2.bar(size_classes, baa_values, color='#1565C0', edgecolor='white')
ax2.set_xlabel('Diameter Class (inches)')
ax2.set_ylabel('Basal Area (sq ft/acre)')
ax2.set_title('Basal Area by Size Class')
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

### TPA by Species

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    result = tpa(db, by_species=True)

# Add species names and get top 10
result_named = join_species_names(result, "SPCD")
top_species = result_named.sort("TPA_TOTAL", descending=True).head(10)

display_estimate(
    top_species.select(["SPCD_NAME", "TPA_ACRE", "TPA_TOTAL", "BAA_ACRE"]),
    title="Top 10 Species by Tree Count"
)

---

## 5. Multiple Grouping Variables

You can group by multiple variables using a list.

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    # Volume by ownership AND stand size
    result = volume(db, grp_by=["OWNGRPCD", "STDSZCD"])

# Add names
ownership_names = {10: "National Forest", 20: "Other Federal", 30: "State/Local", 40: "Private"}
size_names = {1: "Large", 2: "Medium", 3: "Small", 5: "Nonstocked"}

result_named = result.with_columns([
    pl.col("OWNGRPCD").replace(ownership_names).alias("Ownership"),
    pl.col("STDSZCD").replace(size_names).alias("Stand Size")
])

display_estimate(
    result_named.select(["Ownership", "Stand Size", "VOLCFNET_ACRE", "VOLCFNET_TOTAL"]),
    title="Volume by Ownership and Stand Size"
)

---

## 6. Enabling Variance Estimates

By default, pyFIA returns **standard error (SE)**. Use `variance=True` to get **variance** instead.

Variance is useful when combining estimates or performing additional statistical calculations.

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    # With standard error (default)
    result_se = volume(db)
    
    # With variance
    result_var = volume(db, variance=True)

print("Standard Error columns:", [c for c in result_se.columns if "SE" in c])
print("Variance columns:", [c for c in result_var.columns if "VARIANCE" in c])

print(f"\nVolume SE: {result_se['VOLCFNET_ACRE_SE'][0]:,.1f}")
print(f"Volume Variance: {result_var['VOLCFNET_ACRE_VARIANCE'][0]:,.1f}")
print(f"SE = sqrt(Variance): {result_var['VOLCFNET_ACRE_VARIANCE'][0]**0.5:,.1f}")

---

## 7. Summary Comparison

Let's create a summary of all our estimates for Rhode Island.

In [None]:
with FIA(db_path) as db:
    db.clip_most_recent()
    
    # Run all estimators
    area_result = area(db)
    vol_result = volume(db)
    bio_result = biomass(db)
    tpa_result = tpa(db)

print("="*50)
print("RHODE ISLAND FOREST INVENTORY SUMMARY")
print("="*50)
print(f"\nForest Area:       {area_result['AREA_TOTAL'][0]:>15,.0f} acres")
print(f"Net Volume:        {vol_result['VOLCFNET_TOTAL'][0]/1e6:>15,.1f} million cu ft")
print(f"Total Biomass:     {bio_result['BIOMASS_TOTAL'][0]/1e6:>15,.2f} million tons")
print(f"Carbon Stock:      {bio_result['CARBON_TOTAL'][0]/1e6:>15,.2f} million tons")
print(f"Total Trees:       {tpa_result['TPA_TOTAL'][0]/1e6:>15,.0f} million")
print(f"\nPer-Acre Metrics:")
print(f"  Volume:          {vol_result['VOLCFNET_ACRE'][0]:>10,.0f} cu ft/acre")
print(f"  Biomass:         {bio_result['BIOMASS_ACRE'][0]:>10,.1f} tons/acre")
print(f"  Trees:           {tpa_result['TPA_ACRE'][0]:>10,.0f} trees/acre")
print(f"  Basal Area:      {tpa_result['BAA_ACRE'][0]:>10,.0f} sq ft/acre")
print("="*50)

---

## Exercise 1: Species Composition Analysis

**Task**: Create a comprehensive species composition analysis.

1. Get volume by species
2. Add species names
3. Calculate each species' percentage of total volume
4. Display top 10 species with their percentage contribution

**Hint**: Use `VOLCFNET_TOTAL` and calculate percentages with Polars.

In [None]:
# Your code here


<details>
<summary><b>Click to reveal solution</b></summary>

```python
with FIA(db_path) as db:
    db.clip_most_recent()
    result = volume(db, by_species=True)

# Add names
result_named = join_species_names(result, "SPCD")

# Calculate percentage
total_volume = result_named["VOLCFNET_TOTAL"].sum()
result_pct = result_named.with_columns(
    (pl.col("VOLCFNET_TOTAL") / total_volume * 100).alias("PCT_VOLUME")
).sort("VOLCFNET_TOTAL", descending=True)

# Display top 10
top_10 = result_pct.head(10)
display_estimate(
    top_10.select(["SPCD_NAME", "VOLCFNET_TOTAL", "PCT_VOLUME", "N_TREES"]),
    title="Species Composition by Volume"
)

# Summary
top_10_pct = top_10["PCT_VOLUME"].sum()
print(f"\nTop 10 species account for {top_10_pct:.1f}% of total volume")
```

</details>

---

## Exercise 2: Carbon Density Analysis

**Task**: Compare carbon density (tons per acre) across different forest types.

1. Get biomass by forest type
2. Add forest type names
3. Find which forest types store the most carbon per acre
4. Create a horizontal bar chart of top 10

**Hint**: Use `biomass(db, grp_by="FORTYPCD")` and `CARBON_ACRE`

In [None]:
# Your code here


<details>
<summary><b>Click to reveal solution</b></summary>

```python
with FIA(db_path) as db:
    db.clip_most_recent()
    result = biomass(db, grp_by="FORTYPCD")

# Add forest type names
result_named = join_forest_type_names(result, "FORTYPCD")

# Sort by carbon per acre and get top 10
top_carbon = result_named.sort("CARBON_ACRE", descending=True).head(10)

display_estimate(
    top_carbon.select(["FORTYPCD_NAME", "CARBON_ACRE", "CARBON_TOTAL", "AREA_TOTAL"]),
    title="Forest Types with Highest Carbon Density"
)

# Visualize
fig = plot_by_category(
    top_carbon,
    category_col="FORTYPCD_NAME",
    value_col="CARBON_ACRE",
    title="Carbon Density by Forest Type (Top 10)",
    xlabel="Carbon (short tons/acre)",
    color="#1565C0"
)
plt.show()
```

</details>

---

## Summary

In this notebook, you learned:

1. **`area()`** - Estimate forest and timberland area
   - Group by `FORTYPCD`, `OWNGRPCD`, `STDSZCD`, etc.
   
2. **`volume()`** - Estimate tree volume
   - Volume types: `net`, `gross`, `sound`, `sawlog`
   - Tree types: `live`, `dead`, `all`
   
3. **`biomass()`** - Estimate biomass and carbon
   - Components: `AG`, `BG`, `TOTAL`
   - Carbon ≈ 47% of biomass
   
4. **`tpa()`** - Estimate trees per acre and basal area
   - `by_size_class=True` for diameter distribution
   
5. **Grouping** - Use `grp_by` with single or multiple columns
6. **Reference names** - `join_species_names()`, `join_forest_type_names()`
7. **Variance** - Use `variance=True` for variance instead of SE

## Next Steps

Continue to **Notebook 3: Domain Filtering and Grouping** to learn:
- `tree_domain` expressions for filtering trees
- `area_domain` expressions for filtering conditions
- `plot_domain` expressions for geographic filtering
- Building complex custom analyses