# M5.1 - Creating a Project Plan

*Part of:* [**Open Climate Science for Crops & Crop Conditions**](https://github.com/OpenClimateScience/M5-Open-Science-for-Crops)

## Introduction

In this module, we'll learn how to create research software that can be easily installed and executed on someone else's computer, enabling someone to reproduce our scientific results. We've seen two other ways of doing this, in Modules 3 and 4. We've discussed how, in order to reproduce results from a computational project:

- Our **software environment** must be the same,
- We need to have the **same versions of packages** installed,
- And we need to **run the same code** on the **same data.**

Getting access to the code and data that someone used to generate a result depends on whether that person is adhering to Open Science principles. They should have shared their code and data, in repositories with a unique and persistent identifier (e.g., a DOI), along with an open license.

### Workflow management

How do we make sure our **software environment** is set up the same way? We use an environment manager to create a virtual environment: a unique Python installation for our project. We can then use a **package manager** to simplify the installation of Python packages and to verify the version of each package.

**In this lesson, we'll introduce a third consideration for reproducible scientific software projects: Workflow management.** Because there can be a lot of data to process and potentially many steps in our reproducible scientific analysis, we want to make it as easy as possible for someone to reproduce our results. Ideally, they might run an executable ("double-click" a file) that does everything from downloading the data to making the final plot, for example. Workflow management tools are like this. **If we compare the environment, package, and workflow management tools we've seen in each Module so far, we might make a table:**

| Module   | Environment manager | Package manager | Workflow management|
|:---------|:--------------------|:----------------|:-------------------|
| Module 3 | `virtualenv`        | `pip`           | None               |
| Module 4 | `mamba`             | `mamba`         | None               |
| Module 5 | Pixi                | Pixi            | Snakemake          |

Workflow management can represent a complex workflow as a series of steps. The steps can be run independently or all at once using one or more simple commands. The individual steps should include accessing or creating data, analyzing the data, and producing decision-ready outputs like tables or figures. It's how we ["turn raw data into scientific knoweldge."](https://doi.org/10.1038/d41586-019-02619-z)

### An example analysis

**In this lesson, we'll compute the Water Requirement Satisfaction Index (WRSI),** an estimate of how much of the plant's water needs have been satisfied. Low values would indicate the plant is water stressed. The WRSI is not a cutting-edge index for studying crop conditions but it is a moderately complex, iterative algorithm. It will serve as a good example for the kinds of scientific workflows that we hope to produce in our careers. By implementing the WRSI, we'll explore best practices for reproducible software, including:

- Connect input data files to output (processed) data files and explicitly indicate their relationship.
- Cite scientific datasets and justify fixed values, like coefficients, in a way that is transparent and understandable to outside users.
- Enable outside users to run our entire scientific workflow with a single command.

## Planning our software project

There are two key skills in software developent that novice computer scientists are introduce to early on: **Abstraction and Decomposition.** These concepts are also extremely useful for scientists that write research code.

- **Abstraction** means representing only what is essential about a workflow or a system.
- **Decomposition** means breaking down a complex workflow (or system) into a series of ordered, connected steps (or parts).

[Valerie Shute and co-authors describe how these two skills are key to "computational thinking"](https://doi.org/10.1016/j.edurev.2017.09.003) and, therefore, key to computational science.

We'll practice both of these with the WRSI, [which is described in detail in this README prepared by one of its original authors.](https://iridl.ldeo.columbia.edu/documentation/usgs/adds/wrsi/.WRSI_readme.pdf) **How can we break down the WRSI into a series of steps that we can implement in Python?**

#### &#x1F3C1; Challenge

If you're interested in crop conditions and computing water balances from remote sensing data, this is a great opportunity to challenge yourself. Read about the WRSI (the README, linked above, is probably the only resource you need) and write down how you would go about implementing the WRSI in Python, based on everything we've learned in the previous modules. You don't need to reproduce the equations involved; rather, think about what data are needed and what the outcome of each step is going to be.

**This is what we came up with, the steps involved in calculating the WRSI:**

1. **Calculate potential evapotranspiration (PET),** or the amount of energy available to vaporize water. The fraction of this energy that is consumed to vaporize water is the actual evapotranspiration (AET). We'll obtain consistent, weekly estimates of both PET and AET from NASA's VIIRS VNP16 product, which uses the same algorithm as MODIS MOD16 [(recall that we used these data in Module 3)](https://github.com/OpenClimateScience/M3-Open-Science-for-Water-Resources).
2. **Define the available soil water.** We can calculate soil water (mm) if we know the **volumetric water content (VWC)** and the **rooting depth.** We'll approximate VWC based on a simplified, annual water balance. We'll use representative rooting depths from the Food and Agriculture Organization (FAO).
3. **Calculate the critical soil water level.** This is the amount of water that is *more than enough* to satisfy the crop. It is also the minimum amount of water required to ensure that all of the available energy to vaporize water (i.e., PET) is used. We'll compute this based on the soil water-holding capacity and other coefficients defined by the FAO.
4. **Calculate the plant available water (PAW);** this is the amount of water the plant can actually use and it is determined by the water balance. We'll have to compute the available soil water at each time step, then add the precipitation (if any), to compute the new PAW at each time step. We'll use precipitation data from NASA.
5. **Compute the WRSI for the given time step.**