# METRICS.ipynb 
# Layer Classification Logic

---
## Overview
- food_desert.py
- food_swamp.py
- nutrition.py

# 1 - food_desert.py

This file is responsible for food desert classification logic.

The USDA identifies multiple Low Income-Low Access conditions. A census tract is considered a food desert if it is low income (poverty rate >= 20% OR median family income <= 80% of state median/area median) and low access (urban tracts where >=500 people or >=33% of residents live more than 1 mile from a supermarket, or rural tracts where the same amount of people live more than 10 miles from a supermarket). 

This file detects USDA food-access indicators and assigns a score based on how many indicators are present. A score of 3 means that there are 3 food desert indicators present. 

Future versions of this project will take into account vehicle access and senior-specific measures.

# 1.1 - Imports

This section is responsible for the necessary imports.

Annotations/type hints are imported for clarity. Pandas is used for data management.

In [None]:
from __future__ import annotations
import pandas as pd

# 1.2 - Compute Food Desert Scores

This function takes a USDA Food Access Research Atlas dataset, normalizes tract IDs, detects food-desert indicator columns, and produces a score per census tract. The function takes a Pandas dataframe `usda_df` (the USDA dataset) and returns a Pandas dataframe containing GEOID, food desert indicator, and desert severity columns.

Line-by-line breakdown:
- Create a safe copy.
- Create a placeholder for storing the detected tract ID column name.
- Iterate through several possible tract ID column names. When a valid column is detected in the input dataset, it is stored in the placeholder and the loop exits.
- If the loop exited without finding a tract ID column, then throw a clear KeyError.
- Otherwise, convert the tract ID column to a string and left pad with 11 zeros (GEOID format is 11 digits long).
- The `candidate_flags` initialization loop searches for LILA (low income low access) and LOWACCESS, both of which are standard USDA naming conventions for food desert indicator columns.
- If the USDA extract did not include food desert flages, a safe default is returned to avoid crashing the application.
- `tmp` stores only a copy of the indicator columns.
- Loop through each flag and convert to a numeric value.
- For each tract, count how many food desert conditions apply.
- Convert severity into a boolean value (any positive condition would make the indicator column True)
- Return resulting dataframe containing GEOID, indicator column, and severity score.

In [None]:
def compute_food_desert_scores(usda_df: pd.DataFrame) -> pd.DataFrame:
    df = usda_df.copy()

    geoid_col = None
    for c in ["CensusTract", "GEOID", "geoid", "TRACTID", "CensusTractId"]:
        if c in df.columns:
            geoid_col = c
            break
    if geoid_col is None:
        raise KeyError("Could not find a GEOID/CensusTract column in USDA food access CSV.")

    df["GEOID"] = df[geoid_col].astype(str).str.zfill(11)

    candidate_flags = [c for c in df.columns if "LILA" in c.upper() or "LOWACCESS" in c.upper()]
    if not candidate_flags:
        df["desert_severity"] = 0
        df["is_food_desert"] = False
        return df[["GEOID", "is_food_desert", "desert_severity"]]

    tmp = df[candidate_flags].copy()
    for c in candidate_flags:
        tmp[c] = pd.to_numeric(tmp[c], errors="coerce").fillna(0)

    df["desert_severity"] = (tmp > 0).sum(axis=1).astype(int)
    df["is_food_desert"] = df["desert_severity"] > 0
    return df[["GEOID", "is_food_desert", "desert_severity"]]


# 2 - food_swamp.py

This file is responsible for food swamp logic.

Food swamps do not have an official USDA classification, but it is usually considered to be a location where unhealthy food options greatly outnumber healthy food options. There is readily available food access, but it is not nutritionally sufficient for a population.

For this application, the food swamp index for a particular census tract is `unhealthy outlets / (healthy outlets + 1)`. The +1 ensures that the denominator is never 0.

# 2.1 - Imports

This section is responsible for the necessary imports.

Annotations/type hints are imported for clarity. Pandas is used for data management.

In [None]:
from __future__ import annotations
import pandas as pd

# 2.2 - Compute Food Swamp Index

This function computes a food swamp index for each census tract, based on a balance of unhealthy versus healthy food outlets. The function takes 2 Pandas dataframes `healthy_points` and `unhealthy_points` (healthy/unhealthy food locations). The function returns a Pandas dataframe, which is a tract-level numeric index. 

Line-by-line breakdown:
- Drop all healthy rows and unhealthy rows that do not have GEOIDs, group both by census tract, count rows per group and rename the results.
- Combine counts into a single table `df`.
- Cast counts to integers.
- Calculate swamp index (unhealthy / healthy + 1).
- Return modified dataframe.

In [None]:
def compute_food_swamp_index(healthy_points: pd.DataFrame, unhealthy_points: pd.DataFrame) -> pd.DataFrame:
    h = healthy_points.dropna(subset=["GEOID"]).groupby("GEOID").size().rename("healthy_count")
    u = unhealthy_points.dropna(subset=["GEOID"]).groupby("GEOID").size().rename("unhealthy_count")
    df = pd.concat([h, u], axis=1).fillna(0).reset_index()
    df["healthy_count"] = df["healthy_count"].astype(int)
    df["unhealthy_count"] = df["unhealthy_count"].astype(int)
    df["swamp_index"] = df["unhealthy_count"] / (df["healthy_count"] + 1.0)
    return df

# 3 - nutrition.py

This file is responsible for nutrition layer logic. Currently, the nutrition layer scoring system does not exist. However, a stub function is implemented so that nutrition score columns exist in the application. This ensures that nothing downstream needs to change in order to implement the scoring logic in the future.

# 3.1 - Imports

This section is responsible for necessary imports.

Annotations/type hints are imported for clarity. Pandas is used for data management.

In [None]:
from __future__ import annotations
import pandas as pd

# 3.2 - Compute Nutrition Scores Stub

This function is a stub implementation of a nutrition-scoring system. The nutrition-scoring system has not yet been devised, but this function will be updated to include the logic in the future. This function ensures that the stores dataframe will contain a nutrition score column, although all of the rows will have a score of 0 for the time being.

The function accepts a Pandas dataframe `healthy_points`, which will be modified and returned with an additional column `nutrition_score`. Another Pandas dataframe `tract` will also be returned.

Line-by-line breakdown:
- Create a safe copy of the input dataframe for store-level nutrition scores.
- Create new column of nutrition scores and set all entries to 0.
- Create nutrition scores at a tract level based on stores within each tract's GEOID.
- Return `stores` and `tracts`.

In [None]:
def compute_nutrition_scores_stub(healthy_points: pd.DataFrame):
    stores = healthy_points.copy()
    stores["nutrition_score"] = 0.0
    tract = (
        stores.dropna(subset=["GEOID"])
        .groupby("GEOID")["nutrition_score"]
        .mean()
        .rename("nutrition_score")
        .reset_index()
    )
    return stores[["name", "lat", "lon", "GEOID", "nutrition_score"]], tract
