Skip to content

Commit

Permalink
Add docs (#107)
Browse files Browse the repository at this point in the history
* starter docs

* add more docs!

* mkdocs commands

* add mkdocs and theme reqs

* addtl resources

* bring in code from docs page

* tweaks

* add docs workflows

* add mike for versions

* remove extra line

* underscore

* add more resources

* epa guidance

* move dev reqs into pyproject.toml

* remove gh workflows to use render

* specify docs reqs in own file

* install with pyproject

* Update README.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* Update docs/docs/index.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* Update docs/docs/installation.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* Update docs/docs/index.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* Update docs/docs/index.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* Update docs/docs/index.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* Update docs/docs/index.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* Update docs/docs/index.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* change links

* do not use relative links

* use different date

* Update docs/docs/index.md

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>

* tweak opening line

* test links

* more links

* add link to docs

* fix link

* newline

* copy changelog into docs

* add table div css

* index.md format dataframes as tables

* quickstart.md format dataframes as tables

* add image to homepage

* updates for image on homepage

* make image responsive to window size; add paragraph breaks

* add more context to intro

* remove image that is replace with smaller one

* fix tables to use bootstrap; move severity levels higher up

* remove outdated file

* ignore docs/stie

* small tweaks

* optimize image

---------

Co-authored-by: Katie Wetstone <46792169+klwetstone@users.noreply.github.com>
Co-authored-by: Katie Wetstone <klwetstone@gmail.com>
  • Loading branch information
3 people committed Oct 6, 2023
1 parent d6cfabf commit 03eac66
Show file tree
Hide file tree
Showing 20 changed files with 557 additions and 37 deletions.
5 changes: 2 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,11 @@ jobs:
cache: "pip"
cache-dependency-path: |
pyproject.toml
requirements_dev.txt
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements_dev.txt
pip install -e .[dev]
- name: Lint package
run: |
Expand Down Expand Up @@ -61,7 +60,7 @@ jobs:
conda update pip
conda install -c conda-forge lightgbm
conda install -c conda-forge xarray dask netCDF4 bottleneck
pip install -r requirements_dev.txt
pip install -e .[dev]
- name: Run tests
run: |
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ coverage.xml
# Django stuff:
*.log

# Sphinx documentation
docs/_build/
# mkdocs documentation
docs/site/

# PyBuilder
target/
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# CyFi changelog

### v0.1.0 - 2023-xx-xx

CyFi has its origins in the [Tick Tick Bloom](https://www.drivendata.org/competitions/143/tick-tick-bloom/) machine learning competition, hosted by DrivenData and created on behalf of [NASA](https://www.nasa.gov/). The goal in that challenge was to detect and classify the severity of cyanobacteria blooms in small, inland water bodies using publicly available satellite, climate, and elevation data. Labels were based on "in situ" samples that were collected manually by [many organizations](https://www.drivendata.org/competitions/143/tick-tick-bloom/page/651/#about-the-project-team) across the U.S. The model in CyFi is based on the [winning solutions](https://github.com/drivendataorg/tick-tick-bloom) from that challenge, and has been optimized for generalizability and efficiency.
6 changes: 0 additions & 6 deletions HISTORY.md

This file was deleted.

9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.PHONY: docs
#################################################################################
# GLOBALS #
#################################################################################
Expand Down Expand Up @@ -46,6 +47,14 @@ assets:
rm -r tests/assets/experiment
python cyfi/experiment.py tests/assets/experiment_config.yaml

docs: ## build the static version of the docs
sed 's|https://cyfi.drivendata.org/stable/|../|g' CHANGELOG.md \
> docs/docs/changelog.md
cd docs && mkdocs build

docs-serve: ## serve documentation to livereload while you work
cd docs && mkdocs serve

#################################################################################
# Self Documenting Commands #
#################################################################################
Expand Down
84 changes: 66 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,77 @@
CyFi
CyFi: Cyanobacteria Finder
==============================

Cyan Finder
CyFi is a command line tool that uses satellite imagery and machine learning to estimate cyanobacteria levels in small, inland water bodies. The goal of CyFi is to help water quality managers better allocate resources for in situ sampling, and make more informed decisions around public health warnings for critical resources like lakes and reservoirs.

> Estimate cyanobacteria density based on satellite imagery.
Read more at [cyfi.drivendata.org](cyfi.drivendata.org)

## Quickstart

### Experiment module
### Install

There is an unsupported `experiment` module for training new models.
Install CyFi with pip:

```
$ python cyfi/experiment.py --help
Usage: experiment.py [OPTIONS] CONFIG_PATH
pip install cyfi
```

For detailed instructions for those installing python for the first time, see the [Installation](installation.md) page.

### Generate batch predictions

Generate batch predictions at the command line with `cyfi predict`.

Run an experiment
First, specify your sample points in a csv with the following columns:

Arguments:
CONFIG_PATH Path to an experiment configuration [required]
* latitude
* longitude
* date

For example,

```
# sample_points.csv
latitude,longitude,date
41.424144,-73.206937,2023-06-22
36.045,-79.0919415,2023-07-01
35.884524,-78.953997,2023-08-04
```

Then run:
```
cyfi predict sample_points.csv
```

This will output a `preds.csv` that contains a column for cyanobacteria density and a column for the associated severity level based on WHO thresholds.
```
# preds.csv
sample_id,date,latitude,longitude,density_cells_per_ml,severity
7ff4b4a56965d80f6aa501cc25aa1883,2023-06-22,41.424144,-73.206937,34173.0,moderate
882b9804a3e28d8805f98432a1a9d9af,2023-07-01,36.045,-79.0919415,7701.0,low
10468e709dcb6133d19a230419efbb24,2023-08-04,35.884524,-78.953997,4053.0,low
```

To see all of the available options, run `cyfi predict --help`.

### Generate prediction for a single point

Or, generate a cyanobacteria estimate for a single point on a single date using `cyfi predict-point`.

Just specify the latitude, longitude, and date as arguments at the command line.

```
cyfi predict-point --lat 41.2 --lon -73.2 --date 2023-09-14
```

This will print out the estimated cyanobacteria density and associated severity level based on WHO thresholds.

```
2023-10-04 16:25:40.581 | SUCCESS | cyfi.cli:predict_point:154 - Estimate generated:
date 2023-09-14
latitude 41.2
longitude -73.2
density_cells_per_ml 32,820
severity moderate
```

Options:
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
```
To see all of the available options, run `cyfi predict-point --help`.
50 changes: 50 additions & 0 deletions docs/docs/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# About the project

Inland water bodies provide a variety of critical services for both human and aquatic life, including drinking water, recreational and economic opportunities, and marine habitats. A significant challenge water quality managers face is the formation of harmful algal blooms, which can harm human health, threaten other mammals like pets, and damage aquatic ecosystems.

Cyanobacteria are microscopic algae that can multiply very quickly in warm, nutrient-rich environments, often creating visible blue or green blooms. These blooms can block sunlight from reaching the rest of the aquatic ecosystem beneath the surface, and take away oxygen and nutrients from other organisms. Cyanobacteria can produce toxins that are poisonous to humans, pets, and livestock. The effect of climate change on marine environments likely makes harmful algal blooms form more often.

Manual water sampling, or “in situ” sampling, is generally used to monitor cyanobacteria in inland water bodies. In situ sampling is accurate, but time intensive and difficult to perform continuously. Public health managers also rely on the public to notice and report blooms.

**The goal of CyFi is to help water quality managers better allocate resources for in situ sampling, and make more informed decisions around public health warnings for critical resources like drinking water reservoirs.** Ultimately, more accurate and more timely detection of algal blooms helps keep both the human and marine life that rely on these water bodies safe and healthy.

CyFi was born out of the [Tick Tick Bloom](https://www.drivendata.org/competitions/143/tick-tick-bloom/) machine learning competition, hosted by DrivenData. The goal in that challenge was to detect and classify the severity of cyanobacteria blooms in small, inland water bodies using publicly available satellite, climate, and elevation data. Labels were based on "in situ" samples that were collected manually by [many organizations](https://www.drivendata.org/competitions/143/tick-tick-bloom/page/651/#about-the-project-team) across the U.S. The model in CyFi is based on the [winning solutions](https://github.com/drivendataorg/tick-tick-bloom) from that challenge, and has been optimized for generalizability and efficiency.

For more details on the model, see the [About the Model](../#about-the-model) section.

## Additional resources

**Tick Tick Bloom machine learning competition**

- [Tick Tick Bloom competition](https://www.drivendata.org/competitions/143/tick-tick-bloom/)
- [Meet the winners blog post](https://drivendata.co/blog/tick-tick-bloom-challenge-winners)
- [Code from winning solutions](https://github.com/drivendataorg/tick-tick-bloom)

**About harmful algal blooms (HABs)**

- [CDC resources on HABs](https://www.cdc.gov/habs/general.html)
- [EPA resources on HABs](https://www.epa.gov/cyanohabs)

**Related tools**

There are other groups working on cyanobacteria estimates from satellite imagery. Here are a few that use Sentinel-3 (300m resolution) imagery:

- [NOAA's Harmful Algal Bloom Monitoring System](https://coastalscience.noaa.gov/science-areas/habs/hab-monitoring-system/)
- [Cyanobacteria Assessment Network (CyAN)](https://oceancolor.gsfc.nasa.gov/about/projects/cyan/)
- [Dashboard](https://qed.epa.gov/cyanweb/)
- [Paper](https://www.sciencedirect.com/science/article/pii/S1364815218302482?via%3Dihub)

**EPA guidance on HABs**

- [Recommendations for Cyanobacteria and Cyanotoxin Monitoring in Recreational Waters](https://www.epa.gov/sites/default/files/2019-09/documents/recommend-cyano-rec-water-2019-update.pdf)
- [Recommended Human Health Recreational Ambient Water Quality Criteria or Swimming Advisories for Microcystins and Cylindrospermopsin](https://www.epa.gov/sites/default/files/2019-05/documents/hh-rec-criteria-habs-document-2019.pdf)

**Related research on using satellite imagery to monitor HABs**

- [Quantifying national and regional cyanobacterial occurrence in US lakes using satellite remote sensing](https://www.sciencedirect.com/science/article/pii/S1470160X19309719?ref=pdf_download&fr=RR-2&rr=8109976f78329642)
- [Evaluation of a satellite-based cyanobacteria bloom detection algorithm using field-measured microcystin data](https://www.sciencedirect.com/science/article/pii/S0048969721005301?ref=pdf_download&fr=RR-2&rr=7ee00136c8e396d1#f0015)
- [Satellite monitoring of cyanobacterial harmful algal bloom frequency in recreational waters and drinking water sources](https://www.sciencedirect.com/science/article/pii/S1470160X17302194?ref=pdf_download&fr=RR-2&rr=805b0d4bedb0642f)
- [Satellite remote sensing to assess cyanobacterial bloom frequency across the United States at multiple spatial scales](https://www.sciencedirect.com/science/article/pii/S1470160X21004878)
- [Challenges for mapping cyanotoxin patterns from remote sensing of cyanobacteria](https://pubmed.ncbi.nlm.nih.gov/28073474/)
- [Satellites for long-term monitoring of inland U.S. lakes: The MERIS time series and application for chlorophyll-a](https://www.sciencedirect.com/science/article/pii/S0034425721004053)
- [Mapping algal bloom dynamics in small reservoirs using Sentinel-2 imagery in Google Earth Engine](https://www.sciencedirect.com/science/article/pii/S1470160X2200512X)
5 changes: 5 additions & 0 deletions docs/docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# CyFi changelog

### v0.1.0 - 2023-xx-xx

CyFi has its origins in the [Tick Tick Bloom](https://www.drivendata.org/competitions/143/tick-tick-bloom/) machine learning competition, hosted by DrivenData and created on behalf of [NASA](https://www.nasa.gov/). The goal in that challenge was to detect and classify the severity of cyanobacteria blooms in small, inland water bodies using publicly available satellite, climate, and elevation data. Labels were based on "in situ" samples that were collected manually by [many organizations](https://www.drivendata.org/competitions/143/tick-tick-bloom/page/651/#about-the-project-team) across the U.S. The model in CyFi is based on the [winning solutions](https://github.com/drivendataorg/tick-tick-bloom) from that challenge, and has been optimized for generalizability and efficiency.
Binary file added docs/docs/images/lake_st_clair.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/images/linux.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/images/mac.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/images/windows.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 03eac66

Please sign in to comment.