## Your Model 🌱Garden🌱 Execution Environment

Use this notebook to write a function that executes your model(s). Tag that function with the `@garden_pipeline` decorator.

Garden will take this notebook and build a container with it. When Garden executes your `@garden_pipeline`, it will be like like you have just run all the cells of this notebook once. So you can install libraries with `!pip install` and your function can use those libraries. You can also define helper functions and constants to use in your `@garden_pipeline`.

In [None]:
# This cell imports everything you need from Garden, please DO NOT DELETE THIS CELL!
from garden_ai.model_connectors import HFConnector
from garden_ai import PipelineMetadata, garden_pipeline

In [None]:
# Import frameworks and packages as needed. 
# We've included 'sklearn' as the default ML framework.
# Use '!pip install' or '%pip install' to install other libraries.

import pandas as pd
import sklearn
import joblib # a Python library for running computationally intensive tasks in parallel

### Model connectors

Please make sure you have published your model on another service (e.g. Hugging Face) before using Garden! You will reference your model using our ***model connectors*** in this notebook. We have you host your model elsewhere to streamline your Garden experience; Garden is meant to put your work out there for the scientific community but we are not a model hosting service.

***Model connectors*** let Garden import metadata about your model. They also have a `stage` method that you can use to download your model weights. 
<!-- Explain a bit more about the stage methid and why it's needed. -->

In [None]:
# Below is an example of how you reference a Hugging Face model. 
# It requires a Hugging Face repository ID, which is the name of your model repository following this format: ("Owner/Model Name").
# You can copy this directly from your model repository page.

my_hugging_face_repo = HFConnector("garden-ai/sample_sklearn_model")

# Feel free to reference as many models as you'd like in this cell.

# TODO: Elaborate more on the optional parameters users can use for HFConnector: revision and local_dir.

### Pipeline metadata

Use the cell below to enter metadata for your pipeline function. Some information you can include are: function title, description, authors, and more.

Why? --- This helps your model be more discoverable, contributes to open-science, and makes your work more replicable!

Edit the PipelineMetadata object below to describe your pipeline function. Only one is needed for each pipeline function

***PLEASE NOTE:*** The metadata you put in here is permanent and cannot be edited once the pipeline is published!

In [None]:
my_pipeline_meta = PipelineMetadata(
    ######    REQUIRED    ######
    title="My Inference Function",
    description="Write a longer description here so that people know what your pipeline function does.",
    authors=["you", "your collaborator"],
    tags=["materials science", "your actual field"]

    ######    OPTIONAL    ######
    # TODO: Put in more entries
)

### Helper Functions

Define any helper functions you need and use them in the function you want to let people run remotely (next cell).

In [None]:
# The following function is an example of a helper function. Replace it with your own functions.
def preprocess(input_df: pd.DataFrame) -> pd.DataFrame:
    input_df.fillna(0, inplace=True)
    return input_df

# TODO: Confirm if Models are automatically a step

# You can mark any helper functions as a Garden Step. 
# If a function is marked as a Step, it will be shown on Garden's UI and represents code you want to highlight.
# Steps are optional but are highly recommended to break down your code into more readable chunks.

# To mark a function as a Garden Step, add the @garden_step() decorator above your function declaration like so:
@garden_step(function_name="The name of the step function that will be used in Garden's UI", description="An optional string describing the function.")
def example_step(input_df: pd.DataFrame) -> pd.DataFrame:
    input_df.fillna(0, inplace=True)
    return input_df

# What order are my steps in?
# -> For now, there is no way to specify the order of your steps. 
# -> The order is always the order they are defined in your notebook, with the pipeline function always being the final step. 
# -> If there is more than one pipeline in a notebook, all the steps apply to both pipelines.

### Write your pipeline function that will run remotely

The `@garden_pipeline` decorator makes this function available to run in your garden when you publish the notebook.
Make sure you keep the lines that are for downloading your model weights from hosting services and calling your model in this function.

In the decorator be sure to include:
- your pipeline metadata,
- connectors for any models you're using,
- the DOI of the garden you want this pipeline to be found in. (Check `garden-ai garden list` for the DOIs of your gardens.)

In [None]:
# Edit the function below as needed. This function is what we'll run when your pipeline is called, so pay extra attention to what your put in here!
@garden_pipeline(metadata=my_pipeline_meta,  model_connectors=[my_hugging_face_repo], garden_doi="10.23677/my-garden-doi")
def run_my_model(input_df: pd.DataFrame) -> pd.DataFrame:
    cleaned_df = preprocess(input_df) # Edit as needed with code and helper functions you declared in previous cells.
    
    ######   You may edit but not remove these lines! Garden needs this for your pipeline to run.   ######
    download_path = my_hugging_face_repo.stage() # This downloads your model from HF and returns the path. Edit this for your model connector.
    model = joblib.load(f"{download_path}/model.pkl") # This defines your model from the path we just got from the line above
    return model.predict(cleaned_df) # Edit as needed for what your pipeline should run/output.

### Test your pipeline function

Finally, make sure your `@garden_pipeline` works!
When Garden makes a container from your notebook, it runs all the cells in order and saves the notebook. Then users invoke your `@garden_pipeline` in the context of the notebook.

If you can hit "Kernel" -> "Restart and run all cells" and your test below works, your `@garden_pipeline` will work in your garden!

⚠️ Once you've verified your pipeline works, make sure to clear all output in this notebook, **delete** or **comment out** this testing section, and run everything again. This is to prevent your model weights from being included redundantly in this notebook. Your model weights will be downloaded via our model connectors when the pipeline is run remotely.

In [None]:
# Replace with input that is relevant for your garden_pipeline:
example_input = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': [None, 2, 3, 4],
    'C': [1, 2, 3, 4]
})

# Run the @garden_pipeline function that you declared in the previous section:
run_my_model(example_input)

# TODO: What to do when the input is too complex? (e.g. images)