# 📊 DemographyToolkit - Unit Test Documentation

This notebook explains the purpose and coverage of the unit tests written for the `DemographyToolkit` class in Hera.
Each test is described in detail along with the logic of the toolkit methods it verifies.


## 🧰 Overview: `DemographyToolkit`

The `DemographyToolkit` is responsible for managing demographic data, including loading population shapefiles, analyzing intersections with custom polygons, and saving derived areas.

### Core Methods

- `loadData(...)`: Loads a shapefile or GeoJSON into the database as a population source.
- `createNewArea(...)`: Generates a new area (polygon) and summarizes the intersecting population from a data source.
- `calculatePopulationInPolygon(...)`: Calculates fractional population within a specified polygon.
- `setDefaultDirectory(...)`: Sets and optionally creates a default directory to save shapefiles.


## ✅ Test: `test_calculatePopulationInPolygon_basic`

**Purpose:**  
Validates that the function returns a non-empty result for a polygon that intersects existing data.

**What it does:**  
- Constructs a buffered rectangle around the first polygon.
- Calls `calculatePopulationInPolygon` to get population fractions.
- Asserts:
  - The result is not empty.
  - It contains expected columns: `geometry` and `areaFraction`.

**Why it matters:**  
Ensures the intersection logic and population weight computation work for overlapping geometries.


## 🧪 Test: `test_calculatePopulationInPolygon_partial_intersection`

**Purpose:**  
Checks correctness when a polygon intersects **multiple existing** features only partially.

**What it does:**  
- Unions two adjacent polygons and creates a buffer around their centroid.
- Ensures that multiple overlaps are detected and partial contributions are calculated correctly.

**Assertions:**  
- Result is non-empty.
- At least one intersecting region is returned.


## 🚫 Test: `test_calculatePopulationInPolygon_outside_bounds`

**Purpose:**  
Verifies that the function behaves correctly when the polygon is **outside the data bounds**.

**What it does:**  
- Builds a polygon far away from all population data.
- Ensures the returned GeoDataFrame is empty.

**Why this test is important:**  
Prevents false positives or errors on spatial mismatches.


## 🛑 Test: `test_calculatePopulationInPolygon_invalid_datasource`

**Purpose:**  
Checks error handling when the data source name does not exist.

**What it does:**  
- Passes an invalid source name.
- Confirms that the code raises a `ValueError`.

**Why it's tested:**  
Verifies input validation and prevents silent failures.


## 🧮 Test: `test_createNewArea_simple`

**Purpose:**  
Tests the creation of a new area and verifies population aggregation.

**How it works:**  
- Creates a rectangle covering the full extent of the dataset.
- Calls `createNewArea` with `TOOLKIT_SAVEMODE_NOSAVE`.
- Verifies that:
  - The result is a `nonDBMetadataFrame`.
  - The geometry and population data are present.
  - The total population matches the expected sum.

**Key aspect tested:**  
Correct aggregation and output structure when creating regions dynamically.


## 📁 Test: `test_setDefaultDirectory_creates_and_sets_path`

**Purpose:**  
Validates the behavior of setting and creating the default save directory.

**What it checks:**  
- A temporary folder is created.
- The internal attribute `_FilesDirectory` is updated.
- The directory exists on the filesystem.

**Why important:**  
Ensures safe and correct saving of shapefiles when working across systems.


## ✅ `test_calculatePopulationInPolygon_with_known_values`

**Purpose**:
To validate `calculatePopulationInPolygon` by comparing against known geometry and population values in a synthetic dataset.

**What it verifies**:
- That partial overlaps are handled via area fractions.
- That population is proportionally scaled.
- That the result is within expected numerical range.

**Why it's important**:
This test simulates real-world usage and ensures area-weighted population interpolation works correctly.


In [None]:
def test_calculatePopulationInPolygon_with_known_values(self):
    from shapely.geometry import Polygon
    import geopandas as gpd

    print("🚀 Running test_calculatePopulationInPolygon_with_known_values")

    # 🔧 Create synthetic test data
    geometry = [
        Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),  # Full overlap
        Polygon([(1, 1), (3, 1), (3, 3), (1, 3)]),  # Partial overlap
        Polygon([(5, 5), (6, 5), (6, 6), (5, 6)])  # No overlap
    ]
    total_pop = [1000, 500, 200]
    gdf = gpd.GeoDataFrame({'total_pop': total_pop, 'geometry': geometry}, crs="EPSG:4326")

    # 🟦 Create polygon that intersects the first two
    test_poly = Polygon([(1, 1), (2.5, 1), (2.5, 2.5), (1, 2.5)])

    # 🧪 Call function
    result = self.toolkit.analysis.calculatePopulationInPolygon(
        shapelyPolygon=test_poly,
        dataSourceOrData=gdf,
        populationTypes="total_pop"
    )

    # ✅ Validate result
    self.assertFalse(result.empty)
    total_estimated = result["total_pop"].sum()
    print(f"✅ Total estimated population in polygon: {total_estimated}")
    self.assertGreater(total_estimated, 0)
    self.assertLess(total_estimated, 1500)


## ✅ Summary

All tests together ensure:

- Polygon-based population analysis is mathematically sound.
- Geometry corrections are handled automatically.
- Directory handling is OS-safe.
- Population data sources are correctly registered and validated.

> This unit test suite provides strong functional confidence in `DemographyToolkit` and serves as a base for expanding geospatial capabilities in Hera.
