# Real Time Defect Analysis: Tensorflow Seedling
This notebook will show how to register and run the tenserflow model used in https://github.com/ivem-argonne/real-time-defect-analysis/tree/main with Garden.
## Enviorment setup
**This notebook is intended to be run from the provided anaconda enviorment.** Run ```conda env create -f ./environment.yml``` to create the enviorment, ```conda activate rtdefects``` to activate it, and then relaunch your Jupyter notebook from inside the enviorment.

In [1]:
# Check Python version is 3.10.*
import sys
assert sys.version_info[0] == 3 and sys.version_info[1] == 10

In [2]:
import garden_ai
from garden_ai import step, GardenClient

import json
from typing import Optional, Tuple
import numpy as np
import pandas as pd
from datetime import datetime
from pathlib import Path
from hashlib import md5
from skimage import color, measure, morphology
from io import BytesIO
from time import perf_counter
from hyperspy import io as hsio
from scipy.stats import siegelslopes
from scipy.interpolate import interp1d
import imageio
import tensorflow as tf

2023-08-11 13:15:06.426295: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Step 1: Register the model
The first step is to register the model files with Garden. This model has already been registered with Garden, so you can skip this step.
If you did want to re-register the model, run the command ```garden-ai model register short_name path_to_model_file flavor```

In [3]:
client = GardenClient()

# Model uri for the pre-registered tensorflow model
REGISTERED_MODEL_NAME = "maxtuecke@gmail.com/rtdefect-tf-model-seedling"

TEST_INPUT_PATH = "./data/input_image.tiff"
TEST_OUTPUT_PATH = "./data/tensorflow_output_mask.tiff"
TEST_OUTPUT_DEFECT_PATH = "./data/tensorflow_output_defect_results.json"

# Pre-made DOI's to register example seedling resources with
PIPELINE_DOI = "10.23677/kd6n-fk59"
GARDEN_DOI = "10.23677/c66j-tb82"

# Pipeline requirments
PIP_REQUIREMENTS = ["torchvision==0.14.1", "torch==1.13.1", "segmentation_models.pytorch==0.2.*", "pandas==2.0.3", "scikit-image==0.21.0", "chardet==5.2.0", "hyperspy==1.7.5", "werkzeug==2.2.3"]
CONDA_REQUIREMENTS = ["tensorflow>2"]

## Step 2: Create the pipeline
Now that we have our model registered with Garden, we can create a pipeline to use it. A pipeline consists of any number of Python functions called steps that will be chained together during execution. Below is an example of three step pipeline, with steps given by the functions ```preprocessing```, ```run_inference``` and ```postprocessing```. Each function must import whatever libraries it requires.

In [4]:
# Decorate functions with `@step` so that we can use it to build up a pipeline
@step
def preprocessing(
    input_data: np.ndarray,
) -> np.ndarray:
    import numpy as np
    from io import BytesIO
    import imageio
    from skimage import color, measure, morphology
    from typing import Optional, Tuple
    
    def encode_as_tiff(data: np.ndarray, compress_type: int = 5) -> bytes:
        # Convert mask to a uint8-compatible image
        data = np.squeeze(data)
        assert data.ndim == 2, "Image must be grayscale"
        assert np.logical_and(data >= 0, data <= 1).all(), "Image values must be between 0 and 1"
        data = np.array(data * 255, dtype=np.uint8)

        # Convert mask to a TIFF-encoded image
        output_img = BytesIO()
        writer = imageio.get_writer(output_img, format='tiff', mode='i')
        writer.append_data(data, meta={'compression': compress_type})
        return output_img.getvalue()
    
    
    #Encode image data as tiff
    encoded_image_data = encode_as_tiff(input_data, compress_type=5)

    # Load the TIFF file into a numpy array
    image_gray = imageio.imread(BytesIO(encoded_image_data))

    # Preprocess the image data
    image = color.gray2rgb(image_gray)  # Convert to RGB
    image = np.array(image, dtype=np.float32) / 255  # Convert to float32
    image =  np.expand_dims(image, axis=0)

    # Check the shape
    assert image.ndim == 4, "Expects a stack of images"
    assert image.shape[-1] == 3, "Expects 3 output channels"
    assert image.dtype == np.float32, "Expects np.float32"
    assert 0 <= np.min(image) and np.max(image) <= 1, "Image values should be in [0, 1]"
    
    return image

@step
def run_inference(
    input_data: np.ndarray,
    model=garden_ai.Model(REGISTERED_MODEL_NAME),  # loads the registered model by name, with a `.predict()` method
) -> np.ndarray:
    return model.predict(input_data)
    
@step
def postprocessing(input_data: np.ndarray) -> np.ndarray:
    import numpy as np
    from io import BytesIO
    import imageio
    from skimage import color, measure, morphology
    from typing import Optional, Tuple
    
    def encode_as_tiff(data: np.ndarray, compress_type: int = 5) -> bytes:
        # Convert mask to a uint8-compatible image
        data = np.squeeze(data)
        assert data.ndim == 2, "Image must be grayscale"
        assert np.logical_and(data >= 0, data <= 1).all(), "Image values must be between 0 and 1"
        data = np.array(data * 255, dtype=np.uint8)

        # Convert mask to a TIFF-encoded image
        output_img = BytesIO()
        writer = imageio.get_writer(output_img, format='tiff', mode='i')
        writer.append_data(data, meta={'compression': compress_type})
        return output_img.getvalue()

    def analyze_defects(mask: np.ndarray, min_size: int = 50) -> Tuple[dict, np.ndarray]:
        mask = morphology.remove_small_objects(mask, min_size=min_size)
        mask = morphology.remove_small_holes(mask, min_size)
        mask = morphology.binary_erosion(mask, morphology.square(1))
        output = {'void_frac': mask.sum() / (mask.shape[0] * mask.shape[1])}

        # Assign labels to the labeled regions
        labels = measure.label(mask)
        output['void_count'] = int(labels.max())

        # Compute region properties
        props = measure.regionprops(labels, mask)
        radii = [p['equivalent_diameter'] / 2 for p in props]
        output['radii'] = radii
        output['radii_average'] = np.average(radii)
        output['positions'] = [p['centroid'] for p in props]
        return output, labels
    
    
    # Make it into a bool array
    segment = np.squeeze(input_data)
    mask = segment > 0.9

    # Generate the analysis results
    defect_results, _ = analyze_defects(mask)  # Discard the labeled output

    # Convert mask to a TIFF-encoded image
    mask_data = encode_as_tiff(mask)
    
    output = {"mask" : mask_data, "defect_results" : defect_results}
    
    return output

With our steps defined, we can now make the pipeline object. The parameter ```steps``` is a tuple giving the step functions in the desired order of execution, from left to right.  The output of the previous step will be send to the next. Note that at this point the pipeline is still a local object and has not been registered with Garden.

**NOTE:** We are manually setting doi to PIPELINE_DOI which is pre-generated so the pipeline will always be registered to the same place. If you want to register a new pipeline with a new doi, just remove the argument and a new one will be generated. We will also do this when creating a new Garden later on.

In [5]:
rtdefect_pipeline = client.create_pipeline(
    title="RT Defect Analysis TF Demo Pipeline",
    python_version=f"{sys.version_info[0]}.{sys.version_info[1]}.{sys.version_info[2]}",
    pip_dependencies=PIP_REQUIREMENTS,
    conda_dependencies=CONDA_REQUIREMENTS,
    steps=(preprocessing, run_inference, postprocessing),  # steps run in order left to right, passing output to subsequent steps
    authors=[
        "Ward, Logan",
    ],
    contributors=["Tuecke, Max"],
    version="0.0.1",
    year=2023,
    tags=[],
    short_name="rtdefect_tf", # will use this name to execute the pipeline later
    doi=PIPELINE_DOI,
)

## Step 3: Register the pipeline
Now that we have our pipeline defined, it is time to register it with Garden. Normally, registering a new pipeline creates a new container using the pipeline dependencies and then uploads the pipeline to Garden. However, a container for this pipeline already exists, so we will just manually set the container_id and skip the time-consuming process of building a new container.

In [6]:
#container_id = client.build_container(rtdefect_pipeline) # if you want to build a fresh container instead

container_id = "cb99321a-7e27-4d13-b2ac-1855ce28e90d"
rtdefect_pipeline.container_uuid = container_id

client.register_pipeline(rtdefect_pipeline, container_id)
print(f"Registered pipeline '{rtdefect_pipeline.doi}'!")

Registered pipeline '10.23677/kd6n-fk59'!


## Sanity check: Pipeline execution
At this point, you should now have a new registered pipline with Garden. To confirm that the pipeline exists and is working, let's quickly fetch it from Garden using the new DOI and run it.

In [7]:
# rtdefects input image loader - takes a path to an input image and loads it into np.ndarray for the pipeline.
def load_rtdefects_input(path: Path) -> np.ndarray:
    # Step 1: attempt to read it with imageio
    load_functions = [
        imageio.imread,
        lambda x: hsio.load(x).data
    ]
    data = None
    for function in load_functions:
        try:
            data: np.ndarray = function(path)
        except Exception as e:
            continue
    if data is None:
        raise ValueError(f'Failed to load image from {path}')

    # Standardize the format
    data = np.array(data, dtype=np.float32)
    data = np.squeeze(data)
    if data.ndim == 3:
        data = color.rgb2gray(data)
    data = (data - data.min()) / (data.max() - data.min())
    return data

demo_input = load_rtdefects_input(TEST_INPUT_PATH)

  data: np.ndarray = function(path)
ERROR:hyperspy.io:If this file format is supported, please report this error to the HyperSpy developers.


In [8]:
# results we want to reproduce with our pipeline
with open(TEST_OUTPUT_PATH, "rb") as img:
	expected_mask = img.read()
expected_defects = json.load(open(TEST_OUTPUT_DEFECT_PATH))

# fetch the new pipeline from Garden with the DOI
# Note: this pipeline is not discoverable yet as it has not been added to a Garden
rtdefect_remote = client.get_registered_pipeline(PIPELINE_DOI)

results = rtdefect_remote(
    demo_input,
    endpoint="86a47061-f3d9-44f0-90dc-56ddc642c000",  # execute on Globus Compute endpoint of choice
)

assert results["mask"] == expected_mask
assert json.loads(json.dumps(results["defect_results"])) == expected_defects #use json here to change tuples to lists, makes result same format as expected

print("Done! Pipeline executed with correct results.")

Done! Pipeline executed with correct results.


## Step 4: Create and publish a new Garden
The final step is to add the newly registered pipeline to a Garden and publish it. First, create a new Garden and add the pipline's DOI to its pipeline_ids list.
**Note:** A Garden can have multiple pipelines and models associated with it. See the ```rtdefect_torch_garden``` notebook for an example of this.

In [9]:
rtdefect_garden_tf = client.create_garden(
    title="RT Defect Analysis TF Demo Garden",
    authors=["Max Tuecke"],
    description="Recreates the RT Defect Analysis tensorflow model from https://github.com/ivem-argonne/real-time-defect-analysis/tree/main",
    doi=GARDEN_DOI,
)
# include the pipeline by just its DOI:
rtdefect_garden_tf.pipeline_ids += [PIPELINE_DOI]

Now all thats left is to publish the new Garden.

In [10]:
# Publish our new garden, making it (and its pipeline) discoverable by other garden users
client.publish_garden_metadata(rtdefect_garden_tf)

## Sanity check: Garden search and execution
Let's make sure that our new Garden is now published by searching for it with the CLI.

In [11]:
!garden-ai garden search --title="RT Defect Analysis TF Demo Garden"

[1m{[0m
  [1;34m"gmeta"[0m: [1m[[0m
    [1m{[0m
      [1;34m"@datatype"[0m: [32m"GMetaResult"[0m,
      [1;34m"entries"[0m: [1m[[0m
        [1m{[0m
          [1;34m"content"[0m: [1m{[0m
            [1;34m"pipeline_aliases"[0m: [1m{[0m[1m}[0m,
            [1;34m"year"[0m: [32m"2023"[0m,
            [1;34m"description"[0m: [32m"Recreates the RT Defect Analysis tensorflow model from https://github.com/ivem-argonne/real-time-defect-analysis/tree/main"[0m,
            [1;34m"language"[0m: [32m"en"[0m,
            [1;34m"title"[0m: [32m"RT Defect Analysis TF Demo Garden"[0m,
            [1;34m"version"[0m: [32m"0.0.1"[0m,
            [1;34m"tags"[0m: [1m[[0m[1m][0m,
            [1;34m"pipelines"[0m: [1m[[0m
              [1m{[0m
                [1;34m"models"[0m: [1m[[0m
                  [1m{[0m
                    [1;34m"flavor"[0m: [32m"tensorflow"[0m,
                    [1;34m"user_email"[0m: [32

Finally, let's make sure we can run the pipline in the new Garden.

In [12]:
# Get the newly published Garden with its DOI
rtdefects_garden_published = client.get_published_garden(GARDEN_DOI)

# Run its pipeline by calling the pipelines' short_name
results = rtdefects_garden_published.rtdefect_tf(demo_input, endpoint="86a47061-f3d9-44f0-90dc-56ddc642c000")
print(results["defect_results"])

{'void_frac': 0.0023212432861328125, 'void_count': 7, 'radii': [10.940041919714261, 4.442433223290478, 10.704744696916627, 7.998767850296815, 5.352372348458314, 14.439285835884782, 14.820047957642227], 'radii_average': 9.813956261743357, 'positions': [(120.55319148936171, 259.25531914893617), (312.98387096774195, 259.11290322580646), (589.8416666666667, 932.0722222222222), (661.0995024875622, 1017.3781094527363), (856.2444444444444, 865.1), (953.7862595419847, 682.2290076335878), (1002.4869565217391, 555.6579710144928)]}
