Skip to content

Commit

Permalink
Migrate to new Task and controller architecture
Browse files Browse the repository at this point in the history
Issue: #136

Before the inital release we are consolidting the controller and crd
architectures.

This commit also polishes the CLI.
  • Loading branch information
jmintb authored and Jessie Chatham Spencer committed Aug 4, 2023
1 parent 0b7a4cd commit 9ac6554
Show file tree
Hide file tree
Showing 113 changed files with 8,378 additions and 5,969 deletions.
1,095 changes: 602 additions & 493 deletions Cargo.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[workspace]
members = ["controller", "service", "cli", "web", "lib"]
members = ["controller", "service", "cli", "web", "lib" ]
resolver = "2"

2 changes: 1 addition & 1 deletion Dockerfile.controller
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Using the `rust-musl-builder` as base image, instead of the official Rust toolchain.
# See https://github.com/clux/muslrust for why this is desirable.
FROM clux/muslrust:1.68.0-stable AS chef
FROM clux/muslrust:1.70.0-stable AS chef
RUN cargo install cargo-chef
WORKDIR /app

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.server
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Using the `rust-musl-builder` as base image, instead of
# the official Rust toolchain
FROM clux/muslrust:1.68.0-stable AS chef
FROM clux/muslrust:1.70.0-stable AS chef
RUN cargo install cargo-chef
WORKDIR /app

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ TODO: show an example of this.

### Pipelines

You might have multiple Tasks meant to be executed together, for example downloading data, preparing data, model training, model upload. Each of these Tasks will have different requirements. This can be expressed using a pipeline. Each Task in a pipeline is executed in a separate container potentially on different machines if their compute requirements are different. To ensure that your code will work without modification, all of the state is transferred between steps transparently so it appears as if all of the steps are executed on the same machine. For example data is downloaed in step 1, prepared in step 2 and trained on in step 3 AME will make sure to transfer these files automatically between steps so no adjustments are reqired to the project's code.
You might have multiple Tasks meant to be executed together, for example downloading data, preparing data, model training, model upload. Each of these Tasks will have different requirements. This can be expressed using a pipeline. Each Task in a pipeline is executed in a separate container potentially on different machines if their compute requirements are different. To ensure that your code will work without modification, all of the state is transferred between steps transparently so it appears as if all of the steps are executed on the same machine. For example data is downloaed in step 1, prepared in step 2 and trained on in step 3 AME will make sure to transfer these files automatically between steps so no adjustments are required to the project's code.

```yaml
#ame.yaml
Expand Down
File renamed without changes.
17 changes: 17 additions & 0 deletions ame/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Welcome to MkDocs

For full documentation visit [mkdocs.org](https://www.mkdocs.org).

## Commands

* `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit.

## Project layout

mkdocs.yml # The configuration file.
docs/
index.md # The documentation homepage.
... # Other markdown pages, images and other files.
172 changes: 172 additions & 0 deletions ame/docs/model_validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Guides

## From zero to live model

**This guide is focused on using AME** if you are looking for a deployment guide go [here](todo).



This guide will walk through going from zero to having a model served through an the [V2 inference protocol](https://docs.seldon.io/projects/seldon-core/en/latest/reference/apis/v2-protocol.html).
it will be split into multiple sub steps which can be consumed in isolation if you are just looking for a smaller guide on that specific step.

Almost any python project should be usable but if you want to follow along with the exact same project as the guide clone [this]() repo.

### Setup the CLI

Before we can initialise an AME project we need to install the ame [CLI](todo) and connect with your AME instance.

TODO describe installation

### Initialising AME in your project

The first step will be creating an `ame.yaml` file in the project directory.

This is easiet to do with the ame [CLI]() by running `ame project init`. The [CLI]() will ask for a project and then produce a file
that looks like this:

```yaml
projectName: sklearn_logistic_regression
```

### The first training

Not very exciting but it is a start. Next we want to set up our model to be run by AME. The most important thing here is the Task that will train the model so
lets start with that.

Here we need to consider a few things, what command is used to train a model, how are dependencies managed in our project, what python version do we need and
how many resources does our model training require.

If you are using the [repo]() for this guide, you will want a task configured as below.

```yaml

projectid: sklearn_logistic_regression
tasks:
- name: training
!poetry
executor:
pythonVersion: 3.11
command: python train.py
resources:
memory: 10G
cpu: 4
storage: 30G
nvidia.com/gpu: 1
```

## Your first Task

[`Tasks`](TODO) are an important building block for AME. This guide will walk you through the basic of constructing and running [`Task`](todo).

We assume that the AME [CLI](todo) is setup and connected to an AME instance. If not see this [guide](todo).

Before we can run a task we must have a project setup. To init a project follow the commands as shown below, replacing myproject with the
path to your project.

```sh
cd myproject
ame init
```

Now you should have an AME file ame.yaml inside your project:
```yaml
name: myproject
```

Not very exciting yet. Next we want to add a Task to this file so we can run it.
Update your file to match the changes shown below.

```yaml
name: myproject
tasks:
- name: training
!poetry
executor:
pythonVersion: 3.11
command: python train.py
resources:
memory: 2G
cpu: 2
storage: 10G
```

Here we add a list of tasks for our project, containing a single `Task` called training. Lets look at the anatomy of training.

First we set the name `name: training`, pretty standard YAML. Next we set the [executor](todo). This syntax might seem a bit confusing
if you have not used this YAML feature before. `!poetry` adds a tag to the executor indicating the executor type. In this case we are
using the poetry executor. It requires two fields to be set. the Python version and the command to run. This tells AME how to execute the [`Task`](todo).

Finally we set the required resources. 2G ram, 2 cpu threads and 10G of storage.

To run the task we can use the CLI:
```sh
ame task run
```



## Validating models before deployment

To ensure that a new model versions perform well before exposing them AME supports model validation. This is done by providing AME with a `Task` which
will succeed if the model passes validation and fail if not.

Example from [ame-demo](https://github.com/TeaInSpace/ame-demo):

```yaml

projectid: sklearn_logistic_regression
models:
- name: logreg
type: mlflow
validationTask: # the validation task is set here.
taskRef: mlflow_validation
training:
task:
taskRef: training
deployment:
auto_train: true
deploy: true
enable_tls: false
tasks:
- name: training
projectid: sklearn_logistic_regression
templateRef: shared-templates.logistic_reg_template
taskType: Mlflow
- name: mlflow_validation
projectid: sklearn_logistic_regression
runcommand: python validate.py
```

This approach allows for a lot of flexibility of how models are validated, at the cost of writing the validation your self. In the future AME will provide builtin options for common validation configurations as well, see the [roadmap](todo).

### Using MLflow metrics

Here we will walk through how to validate a model based on recorded metrics in MLflow, using the [ame-demo](https://github.com/TeaInSpace/ame-demo) repository as an example. The model is a simple logistic regresser, the training code looks like this:

```python
import numpy as np
from sklearn.linear_model import LogisticRegression
import mlflow
import mlflow.sklearn
import os

if __name__ == "__main__":
X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
y = np.array([0, 0, 1, 1, 1, 0])
lr = LogisticRegression()
lr.fit(X, y)
score = lr.score(X, y)
print("Score: %s" % score)
mlflow.log_metric("score", score)
mlflow.sklearn.log_model(lr, "model", registered_model_name="logreg")
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)
```

Notice how the score is logged as a metric. We can use that in our validation.

AME exposes the necessary environment variables to running tasks so we can access the Mlflow instance during validation just by using the Mlflow library.

```python
TODO

```
30 changes: 30 additions & 0 deletions ame/docs/models.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<h1>Models</h1>
<p>Models are one of AME's higher level constructs, see what that means <a href="">here</a>. if you are configuring how a model should be trained, deployed, monitored or validated this is the right place.
Models exist in an AME file along side Datasets Tasks and Templates.</p>
<h3>Model training</h3>
<p>Model training is configured described use a <a href="./task.html">Task</a>.</p>
<p>AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.</p>
<pre lang="yaml" style="background-color:#2b303b;"><code><span style="color:#65737e;"># main project ame.yml
</span><span style="color:#bf616a;">project</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">xgboost_project
</span><span style="color:#bf616a;">models</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">product_recommendor
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">training</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">task</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">taskRef</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model
</span><span style="color:#bf616a;">tasks</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> - </span><span style="color:#bf616a;">name</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">train_my_model
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">fromTemplate</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">shared_templates.xgboost_resources
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">executor</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#b48ead;">!poetry
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">pythonVersion</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">3.11
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">command</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">python train.py
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">resources</span><span style="color:#c0c5ce;">:
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">memory</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">10G
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">cpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">4
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">storage</span><span style="color:#c0c5ce;">: </span><span style="color:#a3be8c;">30G
</span><span style="color:#c0c5ce;"> </span><span style="color:#bf616a;">nvidia.com/gpu</span><span style="color:#c0c5ce;">: </span><span style="color:#d08770;">1
</span></code></pre>
<h3>Model deployment</h3>
<h4>Model validation</h4>
<h4>Model monitoring</h4>
<h3>Batch inference</h3>
42 changes: 42 additions & 0 deletions ame/docs/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Models

Models are one of AME's higher level constructs, see what that means [here](). if you are configuring how a model should be trained, deployed, monitored or validated this is the right place.
Models exist in an AME file along side Datasets Tasks and Templates.

### Model training

Model training is configured described use a [Task](tasks.md).

AME can be deployed with a an MLflow instance which will be exposed to the Training Task allowing for simply storage and retrievel of models and metrics.


```yaml
# main project ame.yml
project: xgboost_project
models:
- name: product_recommendor
training:
task:
taskRef: train_my_model
tasks:
- name: train_my_model
fromTemplate: shared_templates.xgboost_resources
executor:
!poetry
pythonVersion: 3.11
command: python train.py
resources:
memory: 10G
cpu: 4
storage: 30G
nvidia.com/gpu: 1
```


### Model deployment

#### Model validation

#### Model monitoring

### Batch inference
4 changes: 2 additions & 2 deletions docs/project_sources.md → ame/docs/project_sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

A project source informs AME of a location to check and sync an AME project from. Currently the only supported location is a Git repository.

### Git project sources
## Git project sources

Git project sources allow for a Gitops like approach to managing models, data and the surrounding operations using the AME file defined in the repository.

#### How to use Git project sources
### How to use Git project sources

You can create a Git project source either through the CLI or the AME frontend.

Expand Down
Loading

0 comments on commit 9ac6554

Please sign in to comment.