Skip to content

Commit

Permalink
WIP - docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jmintb committed Aug 6, 2023
1 parent 9bb3797 commit dd986f1
Show file tree
Hide file tree
Showing 10 changed files with 148 additions and 3 deletions.
File renamed without changes.
File renamed without changes.
110 changes: 110 additions & 0 deletions ame/docs/user_guide/0_introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Introduction

This page will introduce you to the core concepts in AME.

For end to end guies of specific setups go [here]().


## Core concepts

AME is designed with a notion of simple building blocks which add exponetially more value when combining them together.

TODO: reformulate this

### Tasks

**Note**: that yaml configuraion files are used here as a way to easily show different configurations. the CLI and fronted for
AME will help du generate, edit and valdidate these files so you don't have write mountains of error prone YAML by hand :).

A ['Task'](tasks) is the fundamental unit of work in AME. It can be as simple as running a single command `python train.py`:

```yaml
project: logreg
tasks:
- name: train
executor:
!pip
pythonVersion: 3.11
command: python train
```


or more complex such as orchestrating
a ['pipline'](pipeline) or ['DAG'](dags) with many sub tasks:

```yaml
project: logreg
tasks:
- name: train
pipeline:
- name: prepare-data
executor:
!poetry
command: python prepare_data.py
- name: train
executor:
!poetry
command: python train.py
resources:
cpu: 4
nvidia.com/gpu: 2
memory: 20Gi
storage: 100Gi
- name: save
executor:
!poetry
command: python save_model.py

```

Task's also have a notion of dependencies where a `Task` can depend on a [`Dataset`]. Indicating dependencies
too AME allows for more efficient scheduling of Task and cacheing to avoid repeating the same work. Task's are designed to be able to execute most python projects out of the box.
If you have start building custom docker images in day to day usage, that is considered a failure on our part and please submit an [issue](repo).

We can't cover every possible case therefore if any of the defaults are unssuitable there are escape hatches to allow for things such as custom container images, custom setup commands and
patching the underlying K8S resources. This should be a last resort, if you find yourself doing this feel free to submit an issue and we will likely expand AME to cover your usecase properly :)

### Projects

Projects a specific directory and often repostiory. It provides the context for which Tasks are executed within. The AME file `ame.yaml` servers as a declaractive way of defining any configuration
related to a specific project. This includes Tasks, TaskTemplates, DataSets, Models and project wide defaults.

### Project Source

A project source Tells AME where to look for projects. Currently only Git repositories are supported as projet sources.

A typical project source object looks like this:

```yaml
gitProjectSource:
repository: github.com/my/ml/repo
username: jane
secret: reference_to_secret
```

AME will then watch every branch on this repo for project files and pull them into the AME cluster.


### DataSets

A [`DataSet`](datasets) i essentially a `Task` with extra semantics. Where artifacts generated by the underlying `Task`. Currently this doesn't add match but in the future
this will allow AME to perform much smarted scheduling. The main advantage right now is that `Task` can depend of a `DataSet` and once a `DataSet` is cached the work will not be
repaeated. There many `Tasks` can depend on the same DataSet an the dataset will only be generated once.

Example dataset

```yaml
# ame.yaml
...
dataSets:
- name: mnist
path: ./data # Specifies where the tasks stores data.
# task:
taskRef: fetch_mnist # References a task which produces data.
```

### Models

A [`Model`](models) defines how to train, validate and deploy a model. All you have to do is tell AME how to train and validate your model in the form of
`Tasks` and then then lifecycle of a model can be autoted. AME currently does not have it's own model registry but instead supports deploying with an
[mlflow](todo) instance.
37 changes: 34 additions & 3 deletions ame/docs/datasets.md → ame/docs/user_guide/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

AME has a builtin notion of data sets in allowing user's to think in terms of data sets and not just raw tasks.

It is important to note that DataSets in AME should be treated as ephemeral and not long term storage. If AME is running
out of space any cached data can be deleted.

Here is an example of what a simple data set configuration looks like:


```yaml
# ame.yaml
...
Expand Down Expand Up @@ -31,18 +35,45 @@ Lets start with that here:
dataSets:
- name: mnist
path: ./data # Specifies where the tasks stores data.
# task:
task:
taskRef: fetch_mnist # References a task which produces data.
```

So far so good, we have a path `data` and reference a `Task` that produces our data.

#### Dataset size

If a dataset is large it is a good idea to specifiy the storage requirements. This will allow AME to warn you if the object storage is running out.

If you do not specify the size AME will attempt to save the dataset, detect the failure and then produce an alert.

```yaml
# ame.yaml
...
dataSets:
- name: mnist
path: ./data # Specifies where the tasks stores data.
size: 50Gi
task:
taskRef: fetch_mnist # References a task which produces data.
```


### Interacting with data sets

To see the status of live data sets, use the AME's cli. Current it is only possible to see data sets that are in use, meaning referenced by some running task.

```bash
ame ds list
ame dataset list
ame ds list # or shortend
```


You can also view datasets from AME's dashboard:

TODO: dataset image

### Consuming data from object storage

AME does not yet have builtin support for extracing data from object storage, although it will in the near future, see the tracking issue [here]().
It is still quite simplte to accomplish this in pure python, so we shall demonstrate that here.

Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 4 additions & 0 deletions ame/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
site_name: AME
repo_url: https://github.com/teainspace/ame
theme:
name: material
features:
- navigation.tabs
- navigation.tabs.sticky
- navigation.top
- navigation.path
palette:
# Palette toggle for light mode
- scheme: default
Expand Down

0 comments on commit dd986f1

Please sign in to comment.