In [1]:
%autosave 5

Autosaving every 5 seconds


# Template RGB Plot

## Authors

* Author1 = {"name": "Chris Schnaufer", "affiliation": "Cyverse/University of Arizona Data Scientist", "email": "schnaufer@arizona.edu", "orcid": "0000-0002-6150-4558"}
* Author2 = {"name": "Jacob van der Leeuw", "affiliation": "Cyverse Intern", "email": "jvanderleeuw@email.arizona.edu", "orcid": "0000-0003-0892-9837"}


## Purpose

This is an rgb image-based template that can be used to test plot-level algorithms in Python.

## Technical contributions

* Development of a template for creating  and testing rgb image algorithms
* Creation of a testing.py file to ensure that the variables are set up correctly and to run rgb image-processing functions
* Usage of the [gdal](https://gdal.org) library to open rgb images and process data


## Methodology

It is assumed that:

* an image folder is located in the same directory as this jupyter notebook with sample plot images to process. Sample plot images can be [downloaded from CyVerse](https://de.cyverse.org/dl/d/4108BB75-AAA3-48E1-BBD4-E10B06CADF54/sample_plot_images.zip). 

* If you will be building and running a Docker image, you have [Docker](https://www.docker.com) installed on your computer

* You are familiar with Github template repositories, or know how to use [git](https://git-scm.com)


After cloning the [template-rgb-plot](https://github.com/AgPipeline/template-rgb-plot) repository, this notebook expects that a user will first develop and test their algorithm in python and then generate and run a Docker image based off of the algorithm


## Results

This template enables a user to develop their own rgb image-processing algorithms in a structured way, validate their setup with a testing.py script, and then test their developed algorithm both locally and via docker. 


## Funding

* Award1 = {"agency": "USDA National Institute of Food and Agriculture, Hatch General Administration of Federal-Grant Fund Research 30152", "award_code":"30152"}


## Keywords

keywords=["plot-level", "rgb", "docker"]


## Citation


## Acknowledgements

This project is funded by [CyVerse](https://cyverse.org) and in turn the National Science Foundation Grant Nos. DBI-0735191, DBI-1265383, and DBI-1743442

template-rgb-plot is licensed under a [BSD 3-Clause License](https://opensource.org/licenses/BSD-3-Clause)

# Setup

## Library imports

In [3]:
# Handling directories and file paths
import os
from pathlib import Path

# Downloading and opening images from a url
import shutil
import urllib.request as urllib
import zipfile

# Designing the calculate() function
import numpy as np

# Writing the algorithm_rgb.py file
import re
import textwrap

# Opens the downloaded images for processing
from PIL import Image

from importlib import reload

# Reading .json and .csv output files
import json
import pandas as pd

# Creating output folder to save results from a run
from datetime import datetime

## Top-level Docstring. Change this to an appropriate  docstring for your algorithm

#### Imports into the python file for running the algorithm. Please add any additional import statements that will be needed for your algorithm in the block below:

## Parameter definitions

#### Define the version number of your algorithm. Consider using [Semantic Versioning](https://semver.org/)

#### Provide information on the creator and contributors of this algorithm

#### Name and describe your algorithm

#### Provide citation information for algorithm publication. This includes the citation author, the citation title, and the citation year

#### Include the name(s) of the variable(s) used in the algorithm, separated by commas. Note that variable names cannot have comma's in them: use a different separator instead. Also, all white space is kept intact; don't add any extra whitespace since it may cause name comparisons to fail

#### Include the units and labels of the variables, matching the order of VARIABLE_NAMES, also separated by commas. VARIABLE_LABELS is an optional field and can be left empty.

#### Optional override for the generation of a BETYdb compatible csv file. Set to False to suppress the creation of a compatible file

#### Optional override for the generation of a TERRA REF Geostreams compatible csv file. Set the variable to False to suppress the creation of a compatible file

## Data import

### Currently the code is set up to reference a predefined set of [sample plot images](https://de.cyverse.org/dl/d/4108BB75-AAA3-48E1-BBD4-E10B06CADF54/sample_plot_images.zip) available from CyVerse. Download other image files in order to run the algorithm on them

# Data Processing

#### In this section you are able to define a calculate() function in order to generate values based off of the rgb images. Below there is code to take this and generate a python file that will include the parameters along with this calculate function.

### Define your calculate() function. This should be able to manipulate RGB image data

#### This is provided as an example; you can try alternative functions by replacing the algorithm below.

## Generate algorithm_rgb.py

#### This will be the file that contains the earlier parameters from the "Parameter_definitions" section as well as the calculate() function from the cell above

In [4]:
def write_algorithm_rgb_file():
    cells = json.load(open(Path.cwd() / "JV_01_template-rgb-plot.ipynb"))["cells"]
    with open("algorithm_rgb.py", "w") as outfile:
        for key in cells:
            toWrite = ""
            if key["cell_type"] == "markdown":
                for entry in key["source"]:
                    if entry[0:4] == "####":
                        entry = re.sub('####', '', entry).lstrip()
                        toWrite = toWrite + entry
                        toWrite = format_string(toWrite)
                        outfile.write("\n\n" + str(toWrite) + "\n")
            elif key["cell_type"] == "raw":
                for entry in key["source"]:
                    toWrite = toWrite + entry
                outfile.write(str(toWrite))
        outfile.write("\n")
                
def format_string(toWrite):
    returnStr = ""
    lines = textwrap.wrap(toWrite, width=115, break_long_words=False)
    for line in range(len(lines)):
        if line != len(lines)-1:
            returnStr = returnStr + "# " + lines[line] + "\n"
        else: 
            returnStr = returnStr + "# " + lines[line]
    return returnStr

write_algorithm_rgb_file()

## Test the calculate() function on the sample plot images located in the sample_plot_images folder

In [5]:
import algorithm_rgb
reload(algorithm_rgb)

for filename in os.listdir("sample_plot_images"):
    img = Image.open(Path.cwd() / "sample_plot_images" / filename)
    img_arr = np.array(img)
    print(algorithm_rgb.calculate(img_arr))


700
700
700
700
700
700


# (OPTIONAL) The following steps are for generating and running a docker image based off of your algorithm. This will not work via a myBinder link. You should run the following in an environment with docker installed

## Next Generate your Dockerfile by running the generate.py script

In [6]:
cmd0 = "python generate.py"
os.system(cmd0)

256

## If there are leftover files from previous runs, delete them

In [7]:
filelist = ["result.json", "rgb_plot.csv", "rgb_plot_betydb.csv", "rgb_plot_geo.csv"]
for file in filelist:
    if os.path.isfile(file):
        os.remove(file)

## Now build the dockerfile (Currently this will have a default project name and project version)

In [8]:
cmd = "docker build -t " + algorithm_rgb.ALGORITHM_NAME + ":" + algorithm_rgb.VERSION + " ."
os.system(cmd)

256

## Next run the dockerfile for testing

In [9]:
cmd = 'docker run --rm --mount "src=`pwd`,target=/mnt,type=bind" ' + algorithm_rgb.ALGORITHM_NAME + ":" + algorithm_rgb.VERSION + ' --working_space "/mnt"'
for filename in os.listdir("sample_plot_images"):
    cmd += ' "/mnt/sample_plot_images/' + filename + '"'
os.system(cmd)

0

## Make sure that the correct files are generated and contain appropriate results

In [10]:
filelist = ["result.json", "rgb_plot.csv", "rgb_plot_betydb.csv", "rgb_plot_geo.csv"]
saveDir = "outputs_" + str(datetime.now()).replace(" ", "").replace(":", ".")
Path.mkdir(Path.cwd() / saveDir)
saveDir = Path.cwd() / saveDir

for filename in filelist:
    assert os.path.isfile(filename)
    if (file == "result.json"):
        result = json.load(open(file))[algorithm_rgb.ALGORITHM_NAME]
        assert result['version'] == algorithm_rgb.VERSION
        assert result['traits'] == algorithm_rgb.VARIABLE_NAMES
        assert result['units'] == algorithm_rgb.VARIABLE_UNITS
        assert result['labels'] == algorithm_rgb.VARIABLE_LABELS
        assert result['files_processed'] == str(len(os.listdir("sample_plot_images")))
        assert result['lines_written'] == str(len(os.listdir("sample_plot_images")))
        if (algorithm_rgb.WRITE_GEOSTREAMS_CSV == True):
            assert result['wrote_geostreams'] == "Yes"
        else:
            assert result['wrote_geostreams'] == "No"
        if (algorithm_rgb.WRITE_BETYDB_CSV == True):
            assert result['wrote_betydb'] == "Yes"
        else: 
            assert result['wrote_betydb'] == "No"
    os.system("mv " + filename + " " + str(saveDir / Path(filename).name))

## View the output files. They will be displayed in the following order: 
## 1.) result.json
## 2.) rgb_plot.csv
## 3.) rgb_plot_betydb.csv
## 4.) rgb_plot_geo.csv

In [11]:
print(json.load(open((saveDir / "result.json"))))

{'code': 0, 'file': [{'path': '/mnt/rgb_plot.csv', 'key': 'csv'}, {'path': '/mnt/rgb_plot_geo.csv', 'key': 'csv'}, {'path': '/mnt/rgb_plot_betydb.csv', 'key': 'csv'}], 'algorithm': {'version': '1.0', 'traits': 'size of image channels', 'units': 'pixels', 'labels': '', 'files_processed': '6', 'lines_written': '6', 'wrote_geostreams': 'Yes', 'wrote_betydb': 'Yes'}}


In [12]:
rgb_plot = pd.read_csv((saveDir / "rgb_plot.csv"))
print(rgb_plot)

   species                site   timestamp        lat         lon  \
0  Unknown  sample_plot_images  2021-05-15  33.075194 -111.974953   
1  Unknown  sample_plot_images  2021-05-15  33.075949 -111.974888   
2  Unknown  sample_plot_images  2021-05-15  33.074727 -111.975043   
3  Unknown  sample_plot_images  2021-05-15  33.074547 -111.975027   
4  Unknown  sample_plot_images  2021-05-15  33.075697 -111.974937   
5  Unknown  sample_plot_images  2021-05-15  33.074691 -111.974888   

       citation_author  citation_year      citation_title  \
0  add citation author           2020  add citation title   
1  add citation author           2020  add citation title   
2  add citation author           2020  add citation title   
3  add citation author           2020  add citation title   
4  add citation author           2020  add citation title   
5  add citation author           2020  add citation title   

   size of image channels (pixels)  
0                          35000.0  
1             

In [13]:
rgb_plot_betydb = pd.read_csv((saveDir / "rgb_plot_betydb.csv"))
print(rgb_plot_betydb)

        local_datetime  access_level  species                site  \
0  2021-05-15T00:06:28             2  Unknown  sample_plot_images   
1  2021-05-15T00:06:28             2  Unknown  sample_plot_images   
2  2021-05-15T00:06:28             2  Unknown  sample_plot_images   
3  2021-05-15T00:06:28             2  Unknown  sample_plot_images   
4  2021-05-15T00:06:28             2  Unknown  sample_plot_images   
5  2021-05-15T00:06:28             2  Unknown  sample_plot_images   

       citation_author  citation_year      citation_title   method  \
0  add citation author           2020  add citation title  Unknown   
1  add citation author           2020  add citation title  Unknown   
2  add citation author           2020  add citation title  Unknown   
3  add citation author           2020  add citation title  Unknown   
4  add citation author           2020  add citation title  Unknown   
5  add citation author           2020  add citation title  Unknown   

   size of image channels

In [14]:
rgb_plot_geo = pd.read_csv((saveDir / "rgb_plot_geo.csv"))
print(rgb_plot_geo)

                 site                   trait        lat         lon  \
0  sample_plot_images  size of image channels  33.075194 -111.974953   
1  sample_plot_images  size of image channels  33.075949 -111.974888   
2  sample_plot_images  size of image channels  33.074727 -111.975043   
3  sample_plot_images  size of image channels  33.074547 -111.975027   
4  sample_plot_images  size of image channels  33.075697 -111.974937   
5  sample_plot_images  size of image channels  33.074691 -111.974888   

               dp_time                                   source    value  \
0  2021-05-15T00:06:28   /mnt/sample_plot_images/rgb_17_7_W.tif  35000.0   
1  2021-05-15T00:06:28  /mnt/sample_plot_images/rgb_40_11_W.tif  35000.0   
2  2021-05-15T00:06:28    /mnt/sample_plot_images/rgb_6_1_E.tif  35000.0   
3  2021-05-15T00:06:28    /mnt/sample_plot_images/rgb_1_2_E.tif  35000.0   
4  2021-05-15T00:06:28   /mnt/sample_plot_images/rgb_33_8_W.tif  35000.0   
5  2021-05-15T00:06:28   /mnt/sample_pl

# References

#### Schnaufer et al

#### Examples of RGB image processing algorithms that use this template

#### https://github.com/AgPipeline/transformer-rgb-indices/blob/main/algorithm_rgb.py

# Dependencies

* GDAL==3.2.2
* numpy==1.20.2
* pandas==1.2.4
* pathlib==1.0.1
* pillow==8.2.0
* urllib3==1.26.4