# Case Study 1
## Description 
As a grains researcher, I want to aggregate spatial data for frost and other extreme weather events associated with chickpeas and wheat grown at Merredin and other sites in Western Australia, so I can analyse the effects of such events on different varieties at different stages and advise growers on the best choices. 
## Case Breakdown 
- **Actors:** Grains researcher
- **Goals:** compare the effects of extremes weather to different grains at different stages
- **Scope:** Regional, plot-based
- **Language:** Python,R
- **Extra:** TBD
## Generalised case
I want to combine a suite of spatial variables at different scales across multiple sites so I can analyse the factors correlated with a variable of interest.
## Comparable cases
- I want to aggregate iMapPests data for the same pest across multiple sites and locations so I can analyse the relationship between population levels and environmental context at the time and over the previous month, including weather (temperature, rainfall, humidity - all xyt), lunar phase (t) and greenness (xyt - see https://portal.tern.org.au/metadata/TERN/8542d90e-6e20-4ad8-b30d-0a171b61d3f5).
- I want to aggregate observations of _Caladenia_ orchids in the ACT so I can analyse the relationship between records and the protection status of the locations of each species.
## Stakeholders 
- **Name:** TBD
- **Contact:** TBD@adelaide.edu.au


## Data Sources - <span style="color:red">DRAFT</span>
- **Reference:** https://appf-central.atlassian.net/l/cp/7yrLwqX1
### Data Source 1: SILO Climate DATABASE 
- **Description:** SILO Climate Database for environmental data.
- **Format:** TBD
- **Access Method:** https://www.longpaddock.qld.gov.au/silo/
### Data Source 2: Extreme Weather Data (BOM)
- **Description:** Extreme weather data from the Bureau of Meteorology (BOM).
- **Format:** TBD
- **Access Method:** http://www.bom.gov.au/akamai/https-redirect.html
### Data Source 1: On-Ground Site
- **Description:** Data from where the chickpeas come from. - AT PRESENT, THIS IS UNAVAILABLE.
- **Format:** TBD
- **Access Method:** TBD

## Imports

In [22]:
import pandas as pd
import os
import shutil

## Data Catalog - <span style="color:red">DRAFT</span> 
This interim implementation uses the following data sources (all in the source_data subfolder):
- **vegetation_cover.tif** - GA Landsat Vegetation Cover GeoTIFF at 25m2 for the northern two thirds of the ACT and adjacent NSW in 2020 (convenient just to get one tile): https://explorer.dea.ga.gov.au/products/ga_ls_landcover_class_cyear_2/datasets/67bb9d38-00c7-46ba-a5e9-b892d9f9ad42 (values defined here: https://knowledge.dea.ga.gov.au/data/product/dea-land-cover-landsat/?tab=details)
- **boundary_act.geojson** - 2023 boundary for the Australian Capital Territory from the ACT Government in GeoJSON format: https://actmapi-actgov.opendata.arcgis.com/datasets/ACTGOV::actgov-border/explore
- **capad_act.geojson** - Protected Area data for the Australian Capital Territory in 2022 from the CAPAD dataset in GeoJSON format: https://fed.dcceew.gov.au/datasets/ec356a872d8048459fe78fc80213dc70_0/explore?filters=eyJTVEFURSI6WyJBQ1QiXX0%3D&location=-35.437128%2C149.203518%2C11.00
- **caladenia_act.csv** - Distribution records for orchids in the genus _Caladenia_ between 1990 and present from the ALA in CSV format: https://doi.org/10.26197/ala.1e501311-7077-403b-a743-59e096068fa0

In [23]:
# Paths

cover_source = "source_data/vegetation_cover.tif"
caladenia_source = "source_data/caladenia_act.csv"
boundary_source = "source_data/boundary_act.geojson"
capad_source = "source_data/capad_act.geojson"
scratch_folder = "scratch/"

if not os.path.exists(scratch_folder):
    os.makedirs(scratch_folder)

# Generate STAC Item for cover_source - the boundary, CRS and shape for this asset will be reused for the cube as a whole

## TODO

# Read caladenia_source, discard all columns but the scientific name and coordinates, and drop all records without complete coordinates

caladenia = pd.read_csv(caladenia_source, encoding="UTF8")[["scientificName", "decimalLatitude", "decimalLongitude"]]
caladenia = caladenia[~((caladenia["decimalLatitude"].isna()) | (caladenia["decimalLatitude"].isna()))]

# Generate separate CSV files for each species with 10 or more observations (filenames held in species_files)
species = caladenia.rename(columns= {'decimalLatitude':'count'}).groupby("scientificName")["count"].count().reset_index()
species = species[species["count"] >= 10]["scientificName"].tolist()

species_files = [f"{os.path.join(scratch_folder, (s.replace(' ', '_') + ".csv"))}" for s in species]

for s, sf in zip(species, species_files):
    caladenia[caladenia["scientificName"] == s].to_csv(sf, encoding="utf8")

# Generate STAC Items for each species file

## TODO

# Generate STAC Items for boundary_source

## TODO

# Generate STAC Items for capad_source

## TODO




## DataCube Generation - <span style="color:red">DRAFT</span> 
Load data for all STAC items for the vegetation cover, ACT boundary, CAPAD shapes and orchid species records into a new data cube, using the boundary, CRS and shape from the vegetation cover layer.

In [24]:
## TODO

## Data Analysis/Visualisation - <span style="color:red">DRAFT</span> 
1. Visualise all layers in a grid
2. Crop all layer to boundary in boundary_source layer
3. Generate and display pivot table showing orchid species as rows and columns for each vegetation cover level with percentage of records in each category as values
4. Generate and display pivot table showing orchid species as rows and columns for inclusion and exclusion from CAPAD areas with percentage of records in each category as values

The vegetation cover levels are as follows:
- 0: Not applicable (such as in bare areas)
- 10: Closed (>65 %)
- 12: Open (40 to 65 %)
- 13: Open (15 to 40 %)
- 15: Sparse (4 to 15 %)
- 16: Scattered (1 to 4 %)

In [25]:
# Dictionary of vegetation cover levels

levels = {
    0: "Not applicable (such as in bare areas)",
    10: "Closed (>65 %)",
    12: "Open (40 to 65 %)",
    13: "Open (15 to 40 %)",
    15: "Sparse (4 to 15 %)",
    16: "Scattered (1 to 4 %)",
}

## TODO

## Cleanup

In [26]:
# Clean up scratch folder

shutil.rmtree(scratch_folder)

## NonFunctional Requirements - <span style="color:red">DRAFT</span> 
### Welcome PAGE
#### Objectives
- Showing users what they can do and get and how they can proceed.
- Showing the most updeted information. 
#### Requirements 
- Create a simple UI/Template for user input their requirements
  - showing the normal process of querying data and provide sample data
  - showing the sample output users are expected to have
  - showing the resources users can use
- Creating a 'HELP' Page for users to looking for what kind of attributes they can use
  - Including all the attributes they can query from the data sources
  - Including all the functions users can use and expecting outputs of those query
  - Including informations for asking questions and reaching for help.
- Creating a 'Annoucement' Page for showing the most updated information of the project
  - Including versions/Time/Detail

### User Query 
#### Objectives
- take input from users and process with related data
- Analyze the data and give feedback of what is expected
  - Error Control
  - Time estimation
  - Output Estimatation
#### Requirements 
- Using basic Query language to query DATA
    > Select Phenotype*
    > from On-GroundSite
    > Where CropType = 'Chickpeas'

    > Select Environmental
    > from SILO
    > where weather ='Extreme'- ERROR Control
- Estimation for time & Output
    > ----> Estimated Files intotal: 500 file 5GB
    > ----> Estimated Time:5 mins 50%
    > ----> Noted it may depends on your download speed balabla.
### Error Control
#### Objectives
- State clear what is happening while using try to query
- Give advice & suggestions

### Extras
#### Non-Functional Requirements
- Ensure querying time is within a set time (less than 1 second).
- Create a waiting process for querying if time is not stable.
- Present outputs in a consistent and organized manner.
- Handle unexpected errors gracefully, showing error codes and continuing with other feasible data.