# Case Study 1 - Evaluate impact from environmental events/pressures
## Description 
I want to aggregate observations of _Caladenia_ orchids in the ACT so I can analyse the relationship between records and the protection status and vegetation cover of the locations of each species.
## Case Breakdown 
- **Actors:** Plant researcher
- **Goals:** Compare the abundance of wild orchid flowers based on habitat type and protection status
- **Scope:** Regional, 30-year aggregated
## Generalised case
I want to combine a suite of spatial variables at different scales across multiple sites so I can analyse the factors correlated with a variable of interest.
## Comparable cases
- As a grains researcher, I want to aggregate spatial data for frost and other extreme weather events associated with chickpeas and wheat grown at Merredin and other sites in Western Australia, so I can analyse the effects of such events on different varieties at different stages and advise growers on the best choices. 
- I want to aggregate iMapPests data for the same pest across multiple sites and locations so I can analyse the relationship between population levels and environmental context at the time and over the previous month, including weather (temperature, rainfall, humidity - all xyt), lunar phase (t) and greenness (xyt - see https://portal.tern.org.au/metadata/TERN/8542d90e-6e20-4ad8-b30d-0a171b61d3f5).
## Stakeholders 
- **Name:** Martin Westgate (Atlas of Living Australia)
- **Contact:** martin.westgate@csiro.au


## Data Sources
The implementation uses the following data sources (all in the source_data subfolder):
- **vegetation_cover_northern.tif** - GA Landsat Vegetation Cover GeoTIFF at 25m2 for the northern two thirds of the ACT and adjacent NSW in 2020: https://explorer.dea.ga.gov.au/products/ga_ls_landcover_class_cyear_2/datasets/67bb9d38-00c7-46ba-a5e9-b892d9f9ad42 (values defined here: https://knowledge.dea.ga.gov.au/data/product/dea-land-cover-landsat/?tab=details)
- **vegetation_cover_southern.tif** - GA Landsat Vegetation Cover GeoTIFF at 25m2 for the southern third of the ACT and adjacent NSW in 2020: https://explorer.dea.ga.gov.au/products/ga_ls_landcover_class_cyear_2/datasets/464fd7e2-0554-4095-80e1-42e00f087831
- **boundary_act.geojson** - 2023 boundary for the Australian Capital Territory from the ACT Government in GeoJSON format: https://actmapi-actgov.opendata.arcgis.com/datasets/ACTGOV::actgov-border/explore
- **capad_act.geojson** - Protected Area data for the Australian Capital Territory in 2022 from the CAPAD dataset in GeoJSON format: https://fed.dcceew.gov.au/datasets/ec356a872d8048459fe78fc80213dc70_0/explore?filters=eyJTVEFURSI6WyJBQ1QiXX0%3D&location=-35.437128%2C149.203518%2C11.00
- **caladenia_act.csv** - Distribution records for orchids in the genus _Caladenia_ between 1990 and present from the ALA in CSV format: https://doi.org/10.26197/ala.1e501311-7077-403b-a743-59e096068fa0

## Imports

In [None]:
import xarray as xr
import pandas as pd
import numpy as np
import os
import shutil
import pystac
import pystac_client
import subprocess
import json

from stac_generator import StacGeneratorFactory
from stac_generator.core.base.generator import StacSerialiser
from stac_generator.core.base.schema import StacCollectionConfig, ColumnInfo
from stac_generator.core.raster.schema import RasterConfig, BandInfo
from stac_generator.core.vector.schema import VectorConfig
from stac_generator.core.point.schema import PointConfig

from mccn.client import MCCN
import matplotlib.pyplot as plt

## Data Catalog
This notebook uses the following data sources (all in the source_data subfolder):
- **vegetation_cover.tif** - GA Landsat Vegetation Cover GeoTIFF at 25m2 for the northern two thirds of the ACT and adjacent NSW in 2020 (convenient just to get one tile): https://explorer.dea.ga.gov.au/products/ga_ls_landcover_class_cyear_2/datasets/67bb9d38-00c7-46ba-a5e9-b892d9f9ad42 (values defined here: https://knowledge.dea.ga.gov.au/data/product/dea-land-cover-landsat/?tab=details)
- **boundary_act.geojson** - 2023 boundary for the Australian Capital Territory from the ACT Government in GeoJSON format: https://actmapi-actgov.opendata.arcgis.com/datasets/ACTGOV::actgov-border/explore
- **capad_act.geojson** - Protected Area data for the Australian Capital Territory in 2022 from the CAPAD dataset in GeoJSON format: https://fed.dcceew.gov.au/datasets/ec356a872d8048459fe78fc80213dc70_0/explore?filters=eyJTVEFURSI6WyJBQ1QiXX0%3D&location=-35.437128%2C149.203518%2C11.00
- **caladenia_act.csv** - Distribution records for orchids in the genus _Caladenia_ between 1990 and present from the ALA in CSV format: https://doi.org/10.26197/ala.1e501311-7077-403b-a743-59e096068fa0

In [None]:
# Paths

current_folder = os.getcwd()
cover_n_source = os.path.join(current_folder, "source_data/vegetation_cover_northern.tif")
cover_s_source = os.path.join(current_folder, "source_data/vegetation_cover_southern.tif")
caladenia_source = os.path.join(current_folder, "source_data/caladenia_act.csv")
boundary_source = os.path.join(current_folder, "source_data/boundary_act.geojson")
capad_source = os.path.join(current_folder, "source_data/capad_act.geojson")
scratch_folder = os.path.join(current_folder, "scratch")
cover_n_configuration_filename = os.path.join(scratch_folder, "cover_n_config.json")
cover_s_configuration_filename = os.path.join(scratch_folder, "cover_s_config.json")
boundary_configuration_filename = os.path.join(scratch_folder, "boundary_config.json")
capad_configuration_filename = os.path.join(scratch_folder, "capad_config.json")

if not os.path.exists(scratch_folder):
    os.makedirs(scratch_folder)

collection_config = StacCollectionConfig(
    id="CaladeniaStudy",
    title="Datasets for Caladenia case study",
    description="STAC records for accessing datasets to explore as part of the MCCN case study 1 relating to distribution of orchids in the genus Caladenia in the northern ACT",
    license="CC-BY-4.0",
)

configurations = []

boundary_configuration = VectorConfig(
    id="ACT_Boundary",
    location=boundary_source,
    collection_date="2024-12-31",
    collection_time="00:00:00"
)
with open(boundary_configuration_filename, "w") as f:
    f.write(boundary_configuration.model_dump_json(exclude_none=True))
configurations.append(boundary_configuration_filename)

capad_configuration = VectorConfig(
    id="ACT_CAPAD",
    location=capad_source,
    collection_date="2024-12-31",
    collection_time="00:00:00",
    column_info=[
        ColumnInfo(name="PA_ID"),
    ]
)
with open(capad_configuration_filename, "w") as f:
    f.write(capad_configuration.model_dump_json(exclude_none=True))
configurations.append(capad_configuration_filename)

cover_n_configuration = RasterConfig(
    id="Vegetation_Cover_Northern",
    location=cover_n_source,
    collection_date="2024-12-31",
    collection_time="00:00:00",
    band_info=[
        BandInfo(name="band_1", description="Vegetation cover level")
    ]
)
with open(cover_n_configuration_filename, "w") as f:
    f.write(cover_n_configuration.model_dump_json(exclude_none=True))
configurations.append(cover_n_configuration_filename)

cover_s_configuration = RasterConfig(
    id="Vegetation_Cover_Southern",
    location=cover_s_source,
    collection_date="2024-12-31",
    collection_time="00:00:00",
    band_info=[
        BandInfo(name="band_2", description="Vegetation cover level")
    ]
)
with open(cover_s_configuration_filename, "w") as f:
    f.write(cover_s_configuration.model_dump_json(exclude_none=True))
configurations.append(cover_s_configuration_filename)

# Read caladenia_source, discard all columns but the scientific name and coordinates, and drop all records without complete coordinates

caladenia = pd.read_csv(caladenia_source, encoding="UTF8")[["scientificName", "decimalLatitude", "decimalLongitude"]]
caladenia = caladenia[~((caladenia["decimalLatitude"].isna()) | (caladenia["decimalLatitude"].isna()))]

# Generate separate CSV files for each species with 10 or more observations (filenames held in species_files)
species = caladenia.rename(columns= {'decimalLatitude':'count'}).groupby("scientificName")["count"].count().reset_index()
species = species[species["count"] >= 10]["scientificName"].tolist()

species_files = [f"{os.path.join(scratch_folder, (s.replace(' ', '_') + ".csv"))}" for s in species]

for s, sf in zip(species, species_files):
    s_underscore = s.replace(" ", "_")
    caladenia_subset = caladenia.loc[caladenia["scientificName"] == s].copy()
    caladenia_subset[s_underscore] = 1.0
    caladenia_subset.to_csv(sf, encoding="utf8")

    caladenia_configuration = PointConfig(
        id=s_underscore,
        location=sf,
        collection_date="2024-12-31",
        collection_time="00:00:00",
        X="decimalLongitude",
        Y="decimalLatitude",
        column_info=[
            ColumnInfo(name=s_underscore, description=f"{s} reported as present"),
        ]
    )
    caladenia_configuration_filename = os.path.join(scratch_folder, f"{s_underscore}.json")
    with open(caladenia_configuration_filename, "w") as f:
        f.write(caladenia_configuration.model_dump_json(exclude_none=True))
    configurations.append(caladenia_configuration_filename)

generator = StacGeneratorFactory.get_stac_generator(
    source_configs=configurations,
    collection_config=collection_config
)

serialiser = StacSerialiser(generator, "scratch/generated") # Replace with path to generated
serialiser()

## DataCube Generation
Load data for all STAC items for the vegetation cover, ACT boundary, CAPAD shapes and orchid species records into a new data cube, using the boundary, CRS and shape from the vegetation cover layer.

In [None]:
endpoint = "scratch/generated/collection.json"
collection = "CaladeniaStudy"
client = MCCN(endpoint, collection, shape=(200,400), point_nodata=0)

ds = client.load()

In [None]:
ds

In [None]:
ds = ds.isel(time=0)
ds

In [None]:
ds.ACT_Boundary.plot(x="x")

In [None]:
ds.PA_ID.plot(x="x")

In [None]:
ds["vegetation_n"] = ds.band_1.where(ds.band_1 < 255, 0)
ds["vegetation_s"] = ds.band_2.where(ds.band_2 < 255, 0)
ds["vegetation"] = ds.vegetation_n.where(ds.vegetation_n > 0, ds.vegetation_s)
ds.vegetation.plot(x="x")

In [None]:

ds["protected"] = ds.PA_ID.where(ds.PA_ID == 0, 1)
ds.protected.plot(x="x")


In [None]:
species_keys = [k for k in list(ds.keys()) if k.startswith("Caladenia")]

ds[species_keys] = ds[species_keys].where(ds.ACT_Boundary > 0, 0).astype(int)
ds[species_keys].to_array().plot(x="x", col="variable", col_wrap=3)

In [None]:
ds[species_keys].to_array().sum("variable").plot(x="x")

## Data Analysis/Visualisation
1. Visualise all layers in a grid
2. Crop all layers to boundary in boundary_source layer
3. Generate and display pivot table showing orchid species as rows, vegetation cover levels for columns, and measures of affinity between the species and cover levels (observed/expected) as values
4. Generate and display pivot table showing orchid species as rows, inclusion and exclusion from CAPAD areas as two columns, and measures of affinity between the species and protection status (observed/expected)  as values

The vegetation cover levels are as follows:
- 0: Not applicable (such as in bare areas)
- 10: Closed (>65 %)
- 12: Open (40 to 65 %)
- 13: Open (15 to 40 %)
- 15: Sparse (4 to 15 %)
- 16: Scattered (1 to 4 %)

For each species, report percentage of records found in protected areas (included in CAPAD) for 1) vegetation cover level with most records, 2) all vegetation cover levels

In [None]:
# Dictionary of vegetation cover levels

levels = {
    0: "Not applicable (such as in bare areas)",
    10: "Closed (>65 %)",
    12: "Open (40 to 65 %)",
    13: "Open (15 to 40 %)",
    15: "Sparse (4 to 15 %)",
    16: "Scattered (1 to 4 %)",
}

ds["all_ones"] = 1

study_pixels = ds.ACT_Boundary.sum().item()

level_keys = [l for l in sorted(levels.keys())]
level_keys[-1] = level_keys.pop(0)
vegetation_affinity = pd.DataFrame(index=(["Total pixels"] + species_keys), columns=(["Total pixels"] + [levels[l] for l in level_keys]))
vegetation_affinity.loc["Total pixels", ["Total pixels"]] = study_pixels

protection_affinity = pd.DataFrame(index=(["Total pixels"] + species_keys), columns=(["Total pixels", "protected", "unprotected"]))
protection_affinity.loc["Total pixels", ["Total pixels"]] = study_pixels

level_pixels = {}
for l in level_keys:
    level_pixels[l] = ds.ACT_Boundary.where(ds.ACT_Boundary == 0, ds.all_ones.where(ds.vegetation == l, 0)).sum().item()
    vegetation_affinity.loc["Total pixels", [levels[l]]] = level_pixels[l]

protection_pixels = {}
protection_pixels["protected"] = ds.ACT_Boundary.where(ds.ACT_Boundary == 0, ds.protected).sum().item()
protection_affinity.loc["Total pixels", ["protected"]] = protection_pixels["protected"]
protection_pixels["unprotected"] = study_pixels - protection_pixels["protected"]
protection_affinity.loc["Total pixels", ["unprotected"]] = protection_pixels["unprotected"]

species_pixels = {}
for s in species_keys:
    species_pixels[s] = ds.ACT_Boundary.where(ds.ACT_Boundary == 0, ds.all_ones.where(ds[s] == 1, 0)).sum().item()
    vegetation_affinity.loc[s, ["Total pixels"]] = species_pixels[s]
    protection_affinity.loc[s, ["Total pixels"]] = species_pixels[s]

    if species_pixels[s] > 0:
        for l in level_keys:
            observed = ds.ACT_Boundary.where(ds.ACT_Boundary == 0, ds[s].where(ds[s] == 0, ds.all_ones.where(ds.vegetation == l, 0))).sum().item()
            expected = (species_pixels[s] * level_pixels[l]) / study_pixels
            vegetation_affinity.loc[s, [levels[l]]] = observed / expected

        observed_protected = ds.ACT_Boundary.where(ds.ACT_Boundary == 0, ds[s].where(ds[s] == 0, ds.all_ones.where(ds.protected == 1, 0))).sum().item()
        expected_protected = (species_pixels[s] * protection_pixels["protected"]) / study_pixels
        protection_affinity.loc[s, ["protected"]] = observed_protected / expected_protected
        observed_unprotected = species_pixels[s] - observed_protected
        expected_unprotected = (species_pixels[s] * protection_pixels["unprotected"]) / study_pixels
        protection_affinity.loc[s, ["unprotected"]] = observed_unprotected / expected_unprotected

vegetation_affinity = vegetation_affinity.drop(vegetation_affinity[vegetation_affinity["Total pixels"] == 0].index)
vegetation_affinity


In [None]:

protection_affinity = protection_affinity.drop(protection_affinity[protection_affinity["Total pixels"] == 0].index)
protection_affinity


## Cleanup

In [None]:
# Clean up scratch folder

shutil.rmtree(scratch_folder)