Migrate to new Task and controller architecture

Issue: #136 Before the inital release we are consolidting the controller and crd architectures. This commit also polishes the CLI.
TeaInSpace · Aug 4, 2023 · 9ac6554 · 9ac6554
1 parent 0b7a4cd
commit 9ac6554
Show file tree

Hide file tree

Showing 113 changed files with 8,378 additions and 5,969 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,4 +1,4 @@
 [workspace]
-members = ["controller", "service", "cli",  "web",  "lib"]
+members = ["controller", "service", "cli",  "web",  "lib"  ]
 resolver = "2"
 
diff --git a/Dockerfile.controller b/Dockerfile.controller
@@ -1,6 +1,6 @@
 # Using the `rust-musl-builder` as base image, instead of the official Rust toolchain. 
 # See https://github.com/clux/muslrust for why this is desirable.
-FROM  clux/muslrust:1.68.0-stable AS chef
+FROM  clux/muslrust:1.70.0-stable AS chef
 RUN cargo install cargo-chef
 WORKDIR /app
 

diff --git a/Dockerfile.server b/Dockerfile.server
@@ -1,6 +1,6 @@
 # Using the `rust-musl-builder` as base image, instead of 
 # the official Rust toolchain
-FROM  clux/muslrust:1.68.0-stable AS chef
+FROM  clux/muslrust:1.70.0-stable AS chef
 RUN cargo install cargo-chef
 WORKDIR /app
 

diff --git a/README.md b/README.md
@@ -94,7 +94,7 @@ TODO: show an example of this.
 
 ### Pipelines
 
-You might have multiple Tasks meant to be executed together, for example downloading data, preparing data, model training, model upload. Each of these Tasks will have different requirements. This can be expressed using a pipeline. Each Task in a pipeline is executed in a separate container potentially on different machines if their compute requirements are different. To ensure that your code will work without modification, all of the state is transferred between steps transparently so it appears as if all of the steps are executed on the same machine. For example data is downloaed in step 1, prepared in step 2 and trained on in step 3 AME will make sure to transfer these files automatically between steps so no adjustments are reqired to the project's code.
+You might have multiple Tasks meant to be executed together, for example downloading data, preparing data, model training, model upload. Each of these Tasks will have different requirements. This can be expressed using a pipeline. Each Task in a pipeline is executed in a separate container potentially on different machines if their compute requirements are different. To ensure that your code will work without modification, all of the state is transferred between steps transparently so it appears as if all of the steps are executed on the same machine. For example data is downloaed in step 1, prepared in step 2 and trained on in step 3 AME will make sure to transfer these files automatically between steps so no adjustments are required to the project's code.
 
 ```yaml
 #ame.yaml

diff --git a/docs/datasets.md → ame/docs/datasets.md b/docs/datasets.md → ame/docs/datasets.md
diff --git a/ame/docs/index.md b/ame/docs/index.md
@@ -0,0 +1,17 @@
+# Welcome to MkDocs
+
+For full documentation visit [mkdocs.org](https://www.mkdocs.org).
+
+## Commands
+
+* `mkdocs new [dir-name]` - Create a new project.
+* `mkdocs serve` - Start the live-reloading docs server.
+* `mkdocs build` - Build the documentation site.
+* `mkdocs -h` - Print help message and exit.
+
+## Project layout
+
+    mkdocs.yml    # The configuration file.
+    docs/
+        index.md  # The documentation homepage.
+        ...       # Other markdown pages, images and other files.
diff --git a/ame/docs/model_validation.md b/ame/docs/model_validation.md
@@ -0,0 +1,172 @@
+# Guides
+
+## From zero to live model
+
+**This guide is focused on using AME** if you are looking for a deployment guide go [here](todo).
+
+
+
+This guide will walk through going from zero to having a model served through an the [V2 inference protocol](https://docs.seldon.io/projects/seldon-core/en/latest/reference/apis/v2-protocol.html).
+it will be split into multiple sub steps which can be consumed in isolation if you are just looking for a smaller guide on that specific step.
+
+Almost any python project should be usable but if you want to follow along with the exact same project as the guide clone [this]() repo.
+
+### Setup the CLI
+
+Before we can initialise an AME project we need to install the ame [CLI](todo) and connect with your AME instance.
+
+TODO describe installation
+
+### Initialising AME in your project
+
+The first step will be creating an `ame.yaml` file in the project directory.
+
+This is easiet to do with the ame [CLI]() by running `ame project init`. The [CLI]() will ask for a project and then produce a file
+that looks like this:
+
+```yaml
+projectName: sklearn_logistic_regression
+```
+
+### The first training
+
+Not very exciting but it is a start. Next we want to set up our model to be run by AME. The most important thing here is the Task that will train the model so
+lets start with that.
+
+Here we need to consider a few things, what command is used to train a model, how are dependencies managed in our project, what python version do we need and
+how many resources does our model training require.
+
+If you are using the [repo]() for this guide, you will want a task configured as below. 
+
+```yaml
+
+projectid: sklearn_logistic_regression
+tasks:
+  - name: training
+    !poetry
+    executor:
+      pythonVersion: 3.11
+      command: python train.py
+    resources:
+      memory: 10G 
+      cpu: 4 
+      storage: 30G 
+      nvidia.com/gpu: 1 
+```
+
+## Your first Task
+
+[`Tasks`](TODO) are an important building block for AME. This guide will walk you through the basic of constructing and running [`Task`](todo). 
+
+We assume that the AME [CLI](todo) is setup and connected to an AME instance. If not see this [guide](todo). 
+
+Before we can run a task we must have a project setup. To init a project follow the commands as shown below, replacing myproject with the 
+path to your project.
+
+```sh
+cd myproject
+ame init
+```
+
+Now you should have an AME file ame.yaml inside your project:
+```yaml
+name: myproject
+```
+
+Not very exciting yet. Next we want to add a Task to this file so we can run it.
+Update your file to match the changes shown below.
+
+```yaml
+name: myproject
+tasks:
+  - name: training
+    !poetry
+    executor:
+      pythonVersion: 3.11
+      command: python train.py
+    resources:
+      memory: 2G 
+      cpu: 2 
+      storage: 10G 
+```
+
+Here we add a list of tasks for our project, containing a single `Task` called training. Lets look at the anatomy of training.
+
+First we set the name `name: training`, pretty standard YAML. Next we set the [executor](todo). This syntax might seem a bit confusing
+if you have not used this YAML feature before. `!poetry` adds a tag to the executor indicating the executor type. In this case we are 
+using the poetry executor. It requires two fields to be set. the Python version and the command to run. This tells AME how to execute the [`Task`](todo).
+
+Finally we set the required resources. 2G ram, 2 cpu threads and 10G of storage.
+
+To run the task we can use the CLI:
+```sh
+ame task run
+```
+
+
+
+## Validating models before deployment
+
+To ensure that a new model versions perform well before exposing them AME supports model validation. This is done by providing AME with a `Task` which 
+will succeed if the model passes validation and fail if not.
+
+Example from [ame-demo](https://github.com/TeaInSpace/ame-demo):
+
+```yaml
+
+projectid: sklearn_logistic_regression
+models:
+  - name: logreg
+    type: mlflow
+    validationTask: # the validation task is set here.
+      taskRef: mlflow_validation 
+    training: 
+      task:
+        taskRef: training
+    deployment:
+      auto_train: true
+      deploy: true
+      enable_tls: false
+tasks:
+  - name: training
+    projectid: sklearn_logistic_regression
+    templateRef: shared-templates.logistic_reg_template
+    taskType: Mlflow
+  - name: mlflow_validation
+    projectid: sklearn_logistic_regression
+    runcommand: python validate.py
+```
+
+This approach allows for a lot of flexibility of how models are validated, at the cost of writing the validation your self. In the future AME will provide builtin options for common validation configurations as well, see the [roadmap](todo).
+
+### Using MLflow metrics
+
+Here we will walk through how to validate a model based on recorded metrics in MLflow, using the [ame-demo](https://github.com/TeaInSpace/ame-demo) repository as an example. The model is a simple logistic regresser, the training code looks like this:
+
+```python
+import numpy as np
+from sklearn.linear_model import LogisticRegression
+import mlflow
+import mlflow.sklearn
+import os
+
+if __name__ == "__main__":
+    X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
+    y = np.array([0, 0, 1, 1, 1, 0])
+    lr = LogisticRegression()
+    lr.fit(X, y)
+    score = lr.score(X, y)
+    print("Score: %s" % score)
+    mlflow.log_metric("score", score)
+    mlflow.sklearn.log_model(lr, "model", registered_model_name="logreg")
+    print("Model saved in run %s" % mlflow.active_run().info.run_uuid)
+```
+
+Notice how the score is logged as a metric. We can use that in our validation.
+
+AME exposes the necessary environment variables to running tasks so we can access the Mlflow instance during validation just by using the Mlflow library.
+
+```python
+TODO
+
+```
diff --git a/ame/docs/models.html b/ame/docs/models.html
@@ -0,0 +1,30 @@
+<h1>Models</h1>
+<p>Models are one of AME's higher level constructs, see what that means <a href="">here</a>. if you are configuring how a model should be trained, deployed, monitored or validated this is the right place.
+Models exist in an AME file along side Datasets Tasks and Templates.</p>
+<h3>Model training</h3>
+<p>Model training is configured described use a <a href="./task.html">Task</a>.</p>
+<p>AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.</p>
+<pre lang="yaml" style="background-color:#2b303b;"><code><span style="color:#65737e;"># main project ame.yml
+</span><span style="color:#bf616a;">project</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">xgboost_project
+</span><span style="color:#bf616a;">models</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">  - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">product_recommendor
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">training</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">task</span><span style="color:#c0c5ce;">: 
+</span><span style="color:#c0c5ce;">        </span><span style="color:#bf616a;">taskRef</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model 
+</span><span style="color:#bf616a;">tasks</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">  - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">fromTemplate</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">shared_templates.xgboost_resources
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">executor</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">      </span><span style="color:#b48ead;">!poetry
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">pythonVersion</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">3.11
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">command</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">python train.py
+</span><span style="color:#c0c5ce;">    </span><span style="color:#bf616a;">resources</span><span style="color:#c0c5ce;">:
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">memory</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">10G 
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">cpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">4 
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">storage</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">30G 
+</span><span style="color:#c0c5ce;">      </span><span style="color:#bf616a;">nvidia.com/gpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">1 
+</span></code></pre>
+<h3>Model deployment</h3>
+<h4>Model validation</h4>
+<h4>Model monitoring</h4>
+<h3>Batch inference</h3>
diff --git a/ame/docs/models.md b/ame/docs/models.md
@@ -0,0 +1,42 @@
+# Models
+
+Models are one of AME's higher level constructs, see what that means [here](). if you are configuring how a model should be trained, deployed, monitored or validated this is the right place.
+Models exist in an AME file along side Datasets Tasks and Templates.
+
+### Model training
+
+Model training is configured described use a [Task](tasks.md).
+
+AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.
+
+
+```yaml
+# main project ame.yml
+project: xgboost_project
+models:
+  - name: product_recommendor
+    training:
+      task: 
+        taskRef: train_my_model 
+tasks:
+  - name: train_my_model
+    fromTemplate: shared_templates.xgboost_resources
+    executor:
+      !poetry
+      pythonVersion: 3.11
+      command: python train.py
+    resources:
+      memory: 10G 
+      cpu: 4 
+      storage: 30G 
+      nvidia.com/gpu: 1 
+```
+
+
+### Model deployment 
+
+#### Model validation
+
+#### Model monitoring
+
+### Batch inference
diff --git a/docs/project_sources.md → ame/docs/project_sources.md b/docs/project_sources.md → ame/docs/project_sources.md
@@ -2,11 +2,11 @@
 
 A project source informs AME of a location to check and sync an AME project from. Currently the only supported location is a Git repository.
 
-### Git project sources
+## Git project sources
 
 Git project sources allow for a Gitops like approach to managing models, data and the surrounding operations using the AME file defined in the repository.
 
-#### How to use Git project sources
+### How to use Git project sources
 
 You can create a Git project source either through the CLI or the AME frontend.