-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
148 additions
and
3 deletions.
There are no files selected for viewing
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# Introduction | ||
|
||
This page will introduce you to the core concepts in AME. | ||
|
||
For end to end guies of specific setups go [here](). | ||
|
||
|
||
## Core concepts | ||
|
||
AME is designed with a notion of simple building blocks which add exponetially more value when combining them together. | ||
|
||
TODO: reformulate this | ||
|
||
### Tasks | ||
|
||
**Note**: that yaml configuraion files are used here as a way to easily show different configurations. the CLI and fronted for | ||
AME will help du generate, edit and valdidate these files so you don't have write mountains of error prone YAML by hand :). | ||
|
||
A ['Task'](tasks) is the fundamental unit of work in AME. It can be as simple as running a single command `python train.py`: | ||
|
||
```yaml | ||
project: logreg | ||
tasks: | ||
- name: train | ||
executor: | ||
!pip | ||
pythonVersion: 3.11 | ||
command: python train | ||
``` | ||
|
||
|
||
or more complex such as orchestrating | ||
a ['pipline'](pipeline) or ['DAG'](dags) with many sub tasks: | ||
|
||
```yaml | ||
project: logreg | ||
tasks: | ||
- name: train | ||
pipeline: | ||
- name: prepare-data | ||
executor: | ||
!poetry | ||
command: python prepare_data.py | ||
- name: train | ||
executor: | ||
!poetry | ||
command: python train.py | ||
resources: | ||
cpu: 4 | ||
nvidia.com/gpu: 2 | ||
memory: 20Gi | ||
storage: 100Gi | ||
- name: save | ||
executor: | ||
!poetry | ||
command: python save_model.py | ||
|
||
``` | ||
|
||
Task's also have a notion of dependencies where a `Task` can depend on a [`Dataset`]. Indicating dependencies | ||
too AME allows for more efficient scheduling of Task and cacheing to avoid repeating the same work. Task's are designed to be able to execute most python projects out of the box. | ||
If you have start building custom docker images in day to day usage, that is considered a failure on our part and please submit an [issue](repo). | ||
|
||
We can't cover every possible case therefore if any of the defaults are unssuitable there are escape hatches to allow for things such as custom container images, custom setup commands and | ||
patching the underlying K8S resources. This should be a last resort, if you find yourself doing this feel free to submit an issue and we will likely expand AME to cover your usecase properly :) | ||
|
||
### Projects | ||
|
||
Projects a specific directory and often repostiory. It provides the context for which Tasks are executed within. The AME file `ame.yaml` servers as a declaractive way of defining any configuration | ||
related to a specific project. This includes Tasks, TaskTemplates, DataSets, Models and project wide defaults. | ||
|
||
### Project Source | ||
|
||
A project source Tells AME where to look for projects. Currently only Git repositories are supported as projet sources. | ||
|
||
A typical project source object looks like this: | ||
|
||
```yaml | ||
gitProjectSource: | ||
repository: github.com/my/ml/repo | ||
username: jane | ||
secret: reference_to_secret | ||
``` | ||
|
||
AME will then watch every branch on this repo for project files and pull them into the AME cluster. | ||
|
||
|
||
### DataSets | ||
|
||
A [`DataSet`](datasets) i essentially a `Task` with extra semantics. Where artifacts generated by the underlying `Task`. Currently this doesn't add match but in the future | ||
this will allow AME to perform much smarted scheduling. The main advantage right now is that `Task` can depend of a `DataSet` and once a `DataSet` is cached the work will not be | ||
repaeated. There many `Tasks` can depend on the same DataSet an the dataset will only be generated once. | ||
|
||
Example dataset | ||
|
||
```yaml | ||
# ame.yaml | ||
... | ||
dataSets: | ||
- name: mnist | ||
path: ./data # Specifies where the tasks stores data. | ||
# task: | ||
taskRef: fetch_mnist # References a task which produces data. | ||
``` | ||
|
||
### Models | ||
|
||
A [`Model`](models) defines how to train, validate and deploy a model. All you have to do is tell AME how to train and validate your model in the form of | ||
`Tasks` and then then lifecycle of a model can be autoted. AME currently does not have it's own model registry but instead supports deploying with an | ||
[mlflow](todo) instance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters