### Importing Neccessary Dependencies

In [1]:
import pandas as pd

### Loading Our Data

In [2]:
FIELD_DATA_PATH = "../../data/field-data/nepal_forest_agb.csv"

In [3]:
df = pd.read_csv(FIELD_DATA_PATH)
# Shape of the cleaned dataframe
print(f"rows = {df.shape[0]}\ncolumns = {df.shape[1]}")
df.head()

rows = 2038
columns = 5


Unnamed: 0,plot_id,lon,lat,AGB_tha,SOC_tha
0,10-75-3,80.413257,28.870558,454.729757,26.133474
1,10-75-4,80.416348,28.867863,,2.908981
2,10-75-6,80.416333,28.870571,499.730683,18.360387
3,10-84-4,80.405569,29.192646,367.626079,78.487313
4,10-92-3,80.392803,29.484019,11.786484,


### Data Understanding

> Dataset Description 

*This dataset includes georeferenced plot-level forest carbon estimates for Nepal as a supplement to a data paper submitted to the Scientific Data journal. The dataset is based on field observations from Nepal's national forest inventory from 2010 to 2014 and includes estimates for two major forest carbon pools: aboveground biomass (AGB) and soil organic carbon (SOC) stocks from 2,009 and 1,156 inventory plots, respectively. The forest AGB includes all trees within each sample plot, including standing dead and stumps, while the forest SOC stock includes the organic carbon in the 0-30 cm depth. The organic litter was excluded from the plot-level estimates of SOC due to missing records for many plots. Other analyses have reported a very low contribution of litter to the total carbon stocks in Nepal's forests (<1%).*

> Feature Summary

| Feature   | Description                                                                                                | Purpose / Use                                                                             |
| --------- | ---------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| `plot_id` | Unique identifier assigned to each forest inventory plot                                                   | Used to uniquely reference, join, and track individual plots across analyses              |
| `lon`,    | Longitude of the plot location (decimal degrees, WGS84)                                                    | Enables spatial mapping, GIS analysis, and geographic visualization                       |
| `lat`     | Latitude of the plot location (decimal degrees, WGS84)                                                     | Enables spatial mapping, GIS analysis, and geographic visualization                       |
| `AGB_tha` | Aboveground Biomass carbon stock (tonnes per hectare); includes all trees, standing dead trees, and stumps | Used to quantify aboveground forest carbon storage and assess biomass-based carbon stocks |
| `SOC_tha` | Soil Organic Carbon stock (tonnes per hectare) for the 0–30 cm soil depth; excludes organic litter         | Used to estimate belowground carbon storage and analyze soil carbon contribution          |


> Important Notes : 

- Each row in our dataset corresponds to one subplot, not an entire plot or cluster.
- AGB and SOC values are already scaled to per hectare, so the exact subplot area is mostly used to define scaling and buffering for satellite data.
- The cluster area is typically unknown or variable, unless you have a forest boundary shapefile.

> Understanding Clusters, Plots and Sub-plots

*`plot_id` is a composite hierarchical identifier used in Nepal’s National Forest Inventory to uniquely locate and organize field measurements.*

In [4]:
# Sample one plot_id
plot_id_samp = df["plot_id"].sample(n=1).values[0]

# Split the composite ID
cluster_id, cluster_plot_id, cluster_plot_subplot_id = plot_id_samp.split("-")

# Print in required format
print(f"cluster_id: {cluster_id}")
print(f"cluster_plot_id: {cluster_plot_id}")
print(f"cluster_plot_subplot_id: {cluster_plot_subplot_id}")

cluster_id: 191
cluster_plot_id: 32
cluster_plot_subplot_id: 5


<img src="../../images/plot-hierarchy.png" alt="Plot Hierarchy" width="1000"/>