WIP - docs

TeaInSpace · Aug 6, 2023 · dd986f1 · dd986f1
1 parent 9bb3797
commit dd986f1
Show file tree

Hide file tree

Showing 10 changed files with 148 additions and 3 deletions.
diff --git a/ame/docs/developer.md → ame/docs/developer_manual/developer.md b/ame/docs/developer.md → ame/docs/developer_manual/developer.md
diff --git a/ame/docs/deploy-ame.md → ame/docs/operator_manual/deploy-ame.md b/ame/docs/deploy-ame.md → ame/docs/operator_manual/deploy-ame.md
diff --git a/ame/docs/user_guide/0_introduction.md b/ame/docs/user_guide/0_introduction.md
@@ -0,0 +1,110 @@
+#  Introduction
+
+This page will introduce you to the core concepts in AME.
+
+For end to end guies of specific setups go [here]().
+
+
+## Core concepts
+
+AME is designed with a notion of simple building blocks which add exponetially more value when combining them together.
+
+TODO: reformulate this
+
+### Tasks
+
+**Note**: that yaml configuraion files are used here as a way to easily show different configurations. the CLI and fronted for
+AME will help du generate, edit and valdidate these files so you don't have write mountains of error prone YAML by hand :).
+
+A ['Task'](tasks) is the fundamental unit of work in AME. It can be as simple as running a single command `python train.py`:
+
+```yaml
+project: logreg
+tasks:
+  - name: train
+    executor:
+      !pip
+      pythonVersion: 3.11
+      command: python train
+```
+
+
+ or more complex such as orchestrating
+a ['pipline'](pipeline) or ['DAG'](dags) with many sub tasks:
+
+```yaml
+project: logreg
+tasks:
+  - name: train
+    pipeline:
+      - name: prepare-data
+        executor:
+          !poetry
+          command: python prepare_data.py
+      - name: train
+        executor:
+          !poetry
+          command: python train.py
+        resources:
+          cpu: 4
+          nvidia.com/gpu: 2
+          memory: 20Gi
+          storage: 100Gi
+      - name: save
+        executor:
+          !poetry
+          command: python save_model.py
+
+```
+
+Task's also have a notion of dependencies where a `Task` can depend on a [`Dataset`]. Indicating dependencies 
+too AME allows for more efficient scheduling of Task and cacheing to avoid repeating the same work. Task's are designed to be able to execute most python projects out of the box.
+If you have start building custom docker images in day to day usage, that is considered a failure on our part and please submit an [issue](repo).
+
+We can't cover every possible case therefore if any of the defaults are unssuitable there are escape hatches to allow for things such as custom container images, custom setup commands and
+patching the underlying K8S resources. This should be a last resort, if you find yourself doing this feel free to submit an issue and we will likely expand AME to cover your usecase properly :)
+
+### Projects
+
+Projects a specific directory and often repostiory. It provides the context for which Tasks are executed within. The AME file `ame.yaml` servers as a declaractive way of defining any configuration
+related to a specific project. This includes Tasks, TaskTemplates, DataSets, Models and project wide defaults.
+
+### Project Source
+
+A project source Tells AME where to look for projects. Currently only Git repositories are supported as projet sources.
+
+A typical project source object looks like this:
+
+```yaml
+gitProjectSource:
+  repository: github.com/my/ml/repo
+  username: jane
+  secret: reference_to_secret
+```
+
+AME will then watch every branch on this repo for project files and pull them into the AME cluster.
+
+
+### DataSets
+
+A [`DataSet`](datasets) i essentially a `Task` with extra semantics. Where artifacts generated by the underlying `Task`. Currently this doesn't add match but in the future
+this will allow AME to perform much smarted scheduling. The main advantage right now is that `Task` can depend of a `DataSet` and once a `DataSet` is cached the work will not be 
+repaeated. There many `Tasks` can depend on the same DataSet an the dataset will only be generated once.  
+
+Example dataset
+
+```yaml
+# ame.yaml
+...
+dataSets:
+  - name: mnist
+    path: ./data # Specifies where the tasks stores data.
+#    task:
+      taskRef: fetch_mnist # References a task which produces data.     
+```
+
+### Models
+
+A [`Model`](models) defines how to train, validate and deploy a model. All you have to do is tell AME how to train and validate your model in the form of
+`Tasks` and then then lifecycle of a model can be autoted. AME currently does not have it's own model registry but instead supports deploying with an 
+[mlflow](todo) instance.
diff --git a/ame/docs/datasets.md → ame/docs/user_guide/datasets.md b/ame/docs/datasets.md → ame/docs/user_guide/datasets.md
@@ -2,8 +2,12 @@
 
 AME has a builtin notion of data sets in allowing user's to think in terms of data sets and not just raw tasks. 
 
+It is important to note that DataSets in AME should be treated as ephemeral and not long term storage. If AME is running
+out of space any cached data can be deleted.
+
 Here is an example of what a simple data set configuration looks like:
 
+
 ```yaml
 # ame.yaml
 ...
@@ -31,18 +35,45 @@ Lets start with that here:
 dataSets:
   - name: mnist
     path: ./data # Specifies where the tasks stores data.
-#    task:
+    task:
       taskRef: fetch_mnist # References a task which produces data.     
 ```
 
 So far so good, we have a path `data` and reference a `Task` that produces our data.
 
+#### Dataset size
+
+If a dataset is large it is a good idea to specifiy the storage requirements. This will allow AME to warn you if the object storage is running out.
+
+If you do not specify the size AME will attempt to save the dataset, detect the failure and then produce an alert.
+
+```yaml
+# ame.yaml
+...
+dataSets:
+  - name: mnist
+    path: ./data # Specifies where the tasks stores data.
+    size: 50Gi
+    task:
+      taskRef: fetch_mnist # References a task which produces data.     
+```
+
 
 ### Interacting with data sets
 
 To see the status of live data sets, use the AME's cli. Current it is only possible to see data sets that are in use, meaning referenced by some running task.
 
 ```bash
-ame ds list
+ame dataset list
+ame ds list # or shortend
 ```  
-
+
+You can also view datasets from AME's dashboard:
+
+TODO: dataset image
+
+### Consuming data from object storage
+
+AME does not yet have builtin support for extracing data from object storage, although it will in the near future, see the tracking issue [here](). 
+It is still quite simplte to accomplish this in pure python, so we shall demonstrate that here.
+
diff --git a/ame/docs/user_guide/field_reference.md b/ame/docs/user_guide/field_reference.md
diff --git a/ame/docs/model_validation.md → ame/docs/user_guide/model_validation.md b/ame/docs/model_validation.md → ame/docs/user_guide/model_validation.md
diff --git a/ame/docs/models.md → ame/docs/user_guide/models.md b/ame/docs/models.md → ame/docs/user_guide/models.md
diff --git a/ame/docs/project_sources.md → ame/docs/user_guide/project_sources.md b/ame/docs/project_sources.md → ame/docs/user_guide/project_sources.md
diff --git a/ame/docs/tasks.md → ame/docs/user_guide/tasks.md b/ame/docs/tasks.md → ame/docs/user_guide/tasks.md
diff --git a/ame/mkdocs.yml b/ame/mkdocs.yml
@@ -1,8 +1,12 @@
 site_name: AME 
+repo_url: https://github.com/teainspace/ame
 theme: 
   name: material
   features:
     - navigation.tabs
+    - navigation.tabs.sticky
+    - navigation.top
+    - navigation.path
   palette:
      # Palette toggle for light mode
     - scheme: default