Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
Issue: #
  • Loading branch information
jmintb authored and Jessie Chatham Spencer committed Aug 13, 2023
1 parent 3835c1d commit b04a3e0
Show file tree
Hide file tree
Showing 12 changed files with 344 additions and 32 deletions.
2 changes: 2 additions & 0 deletions ame/docs/operator_manual/deploy-ame.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Deploying AME

The story for deploying AME is still under construction, we are focusing on getting the user side of AME documented first, stay tuned!

This page contains everything you need to know to deploy and administrate AME.

If you are looking to try out AME quickly see the [quick start](todo).
Expand Down
92 changes: 92 additions & 0 deletions helm/ame/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# ame

![Version: 0.1.0-alpha1](https://img.shields.io/badge/Version-0.1.0--alpha1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.1.0-alpha1](https://img.shields.io/badge/AppVersion-0.1.0--alpha1-informational?style=flat-square)

A helm chart AME the artificial MLOps engineer.

## Requirements

| Repository | Name | Version |
|------------|------|---------|
| https://argoproj.github.io/argo-helm | argo-workflows | 0.32.1 |
| oci://registry-1.docker.io/bitnamicharts | minio | 12.6.12 |

## Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| ameVersion | string | `""` | Version used to select the images for AME's server, controller and default images Tasks and model deployments. |
| argo-workflows.enabled | bool | `true` | |
| controller.autoscaling.enabled | bool | `false` | |
| controller.image.repository | string | `"ghcr.io/teainspace/ame-controller/main"` | |
| controller.image.tag | string | `"d162"` | |
| controller.labels | object | `{}` | |
| controller.logging.env_filter | string | `"info,controller=debug"` | |
| controller.name | string | `"controller"` | |
| controller.podSecurityContext | object | `{}` | |
| controller.replicaCount | int | `1` | |
| controller.service.port | int | `80` | |
| controller.service.type | string | `"ClusterIP"` | |
| controller.serviceAccount.create | bool | `true` | |
| controller.serviceAccount.name | string | `"ame-controller"` | |
| crds.install | bool | `true` | Flag for installing custom resource definitions with this see, see a discussion on the tradeoffs [here](TODO). |
| minio.enabled | bool | `true` | |
| mlflow.endpoint | string | `"http://mlflow.ame-system.svc.cluster.local:5000"` | |
| models.deployments.affinity | object | `{}` | |
| models.deployments.autoscaling.enabled | bool | `false` | |
| models.deployments.autoscaling.maxReplicas | int | `100` | |
| models.deployments.autoscaling.minReplicas | int | `1` | |
| models.deployments.autoscaling.targetCPUUtilizationPercentage | int | `80` | |
| models.deployments.ingress.defaultIngress | string | `""` | |
| models.deployments.ingress.host | string | `""` | |
| models.deployments.nodeSelector | object | `{}` | |
| models.deployments.resources | object | `{}` | |
| models.deployments.resources | string | `nil` | |
| models.deployments.securty.podSecurityContext | object | `{}` | |
| models.deployments.securty.securityContext | object | `{}` | |
| models.deployments.serviceAccount | object | `{"annotations":{},"create":true,"name":"ame-model"}` | The service account used by Tasks with minimal permissions. |
| models.deployments.tolerations | list | `[]` | |
| namespace | object | `{"create":true,"name":"ame-system"}` | The namepace AME will operate within, this includes any depencies like Argo workflows, minio and keycloak. |
| objectStorage.s3.accessIdKey | string | `"root-user"` | |
| objectStorage.s3.accessSecretKey | string | `"root-password"` | |
| objectStorage.s3.bucket | string | `"ameprojectstorage"` | |
| objectStorage.s3.endpoint | string | `"http://ame-minio:9000"` | |
| objectStorage.s3.secretName | string | `"ame-minio"` | |
| server.autoscaling.enabled | bool | `false` | |
| server.image.repository | string | `"ghcr.io/teainspace/ame-server/main"` | |
| server.image.tag | string | `"d162"` | |
| server.ingress.annotations."cert-manager.io/cluster-issuer" | string | `"selfsigned-cluster-issuer"` | |
| server.ingress.annotations."nginx.ingress.kubernetes.io/backend-protocol" | string | `"GRPC"` | |
| server.ingress.annotations."nginx.ingress.kubernetes.io/cors-allow-headers" | string | `"DNT, Keep-Alive, User-Agent, X-Requested-With, If-Modified-Since, Cache-Control, Content-Type, Range, Authorization, x-grpc-web"` | |
| server.ingress.annotations."nginx.ingress.kubernetes.io/cors-allow-origin" | string | `"*"` | |
| server.ingress.annotations."nginx.ingress.kubernetes.io/enable-cors" | string | `"false"` | |
| server.ingress.className | string | `"nginx"` | |
| server.ingress.enabled | bool | `true` | |
| server.ingress.hosts[0].host | string | `"ame.local"` | |
| server.ingress.hosts[0].http.paths[0].backend.service.name | string | `"ame-server-service"` | |
| server.ingress.hosts[0].http.paths[0].backend.service.port.number | int | `3342` | |
| server.ingress.hosts[0].http.paths[0].path | string | `"/"` | |
| server.ingress.hosts[0].http.paths[0].pathType | string | `"ImplementationSpecific"` | |
| server.ingress.tls[0].hosts[0] | string | `"ame.local"` | |
| server.ingress.tls[0].secretName | string | `"ame-tls-cert"` | |
| server.labels | object | `{}` | |
| server.name | string | `"server"` | |
| server.podSecurityContext | object | `{}` | |
| server.replicaCount | int | `1` | |
| server.resources | object | `{}` | |
| server.service.port | int | `3342` | |
| server.service.type | string | `"ClusterIP"` | |
| server.serviceAccount.create | bool | `true` | |
| server.serviceAccount.name | string | `"ame-server"` | |
| task.affinity | object | `{}` | |
| task.image.repository.repository | string | `"ghcr.io/teainspace/ame-controller/main"` | |
| task.image.repository.tag | string | `"d162"` | |
| task.nodeSelector | object | `{}` | |
| task.resources | string | `nil` | |
| task.securty.podSecurityContext | object | `{}` | |
| task.securty.securityContext | object | `{}` | |
| task.serviceAccount | object | `{"annotations":{},"create":true,"name":"ame-task"}` | The service account used by Tasks with minimal permissions. |
| task.tolerations | list | `[]` | |

----------------------------------------------
Autogenerated from chart metadata using [helm-docs v1.11.0](https://github.com/norwoodj/helm-docs/releases/v1.11.0)
5 changes: 4 additions & 1 deletion helm/ame/templates/config_map.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,10 @@ data:
AME_MODEL_INGRESS_HOST: {{ .Values.models.deployments.ingress.host }}
AME_MODEL_DEPLOYMENT_DEFAULT_INGRESS: {{ .Values.models.deployments.defaultIngress }}
AME_EXECUTOR_IMAGE: {{ .Values.task.image.repository }}:{{ .Values.task.image.tag }}
AME_TASK_IMAGE_PULL_POLICY: {{ .Values.task.image.pullPolicy }}
AME_MODEL_DEPLOYMENT_IMAGE_PULL_POLICY: {{ .Values.models.deployments.image.pullPolicy }}
AME_MODEL_DEPLOYMENT_IMAGE: {{ .Values.models.deployments.image.repository }}:{{ .Values.models.deployments.image.tag }}
AME_OBJECT_STORAGE_SECRET: {{ .Values.objectStorage.s3.secretName }}
AME_OBJECT_STORAGE_SECRET_KEY: {{ .Values.objectStorage.s3.secretName }}
AME_OBJECT_STORAGE_SECRET_ID_KEY: {{ .Values.objectStorage.s3.secretName }}
AME_OBJECT_STORAGE_ID_KEY: {{ .Values.objectStorage.s3.secretName }}

26 changes: 18 additions & 8 deletions helm/ame/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,11 @@
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# -- Version used to select the images for AME's server, controller and default images Tasks and model deployments.
# -- Version used to select the images for AME's server, controller and default images Tasks and model deployments.
ameVersion: ""

# -- The namepace AME will operate within, this includes any depencies like Argo workflows, minio and keycloak.
namespace:
create: true
name: "ame-system"

crds:
# -- Flag for installing custom resource definitions with this see, see a discussion on the tradeoffs [here](TODO).
install: true

server:
Expand Down Expand Up @@ -52,8 +48,10 @@ server:
podSecurityContext: {}
image:
repository: "ghcr.io/teainspace/ame-server/main"
tag: "d162"
tag: "latest"
pullPolicy: Always
resources: {}

controller:
name: controller
replicaCount: 1
Expand All @@ -70,23 +68,30 @@ controller:
image:
repository: "ghcr.io/teainspace/ame-controller/main"
tag: "d162"
pullPolicy: Always

logging:
env_filter: info,controller=debug

mlflow:
endpoint: "http://mlflow.ame-system.svc.cluster.local:5000"

argo-workflows:
# -- Enabling this will deploy argo workflows a long side AME automatically.
enabled: true

# minio options
minio:
# -- Enabling this will deploy deploy a minio instance a long side AME.
enabled: true

objectStorage:
s3:
# -- Name of the Kubernetes secret containing access information for the object storage, this is expected to be in the same namespace as AME.
secretName: "ame-minio"
# -- Key for the access ID in the Kubernetes secret.
accessIdKey: "root-user"
# -- Key for the access password in the Kubernetes secret.
accessSecretKey: "root-password"
endpoint: "http://ame-minio:9000"
bucket : ameprojectstorage
Expand All @@ -96,6 +101,7 @@ task:
repository:
repository: "ghcr.io/teainspace/ame-controller/main"
tag: "d162"
pullPolicy: Always
# TODO add security options here
securty:

Expand Down Expand Up @@ -132,7 +138,7 @@ task:
models:
deployments:
# TODO add security options here
securty:
security:

podSecurityContext: {}
# fsGroup: 2000
Expand All @@ -156,6 +162,10 @@ models:
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: "ame-model"
image:
repository: "ghcr.io/teainspace/ame-executor/main"
tag: "d162"
pullPolicy: Always

nodeSelector: {}

Expand Down
84 changes: 84 additions & 0 deletions lib/src/config.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
use std::env::var;

use crate::{error::AmeError, Result};
use k8s_openapi::api::{core::v1::ContainerImage, networking::v1::Ingress};
use serde::{Deserialize, Serialize};
use url::{Host, Url};

use crate::k8s_safe_types::ImagePullPolicy;

#[derive(Debug, Clone, Deserialize, Serialize)]
struct AmeCfg {
server_port: u16,
object_storage_endpoint: url::Url,
object_storage_container: String,
object_storage_secret_name: String,
object_storage_secret_key: String,
object_storage_id_key: String,
mlflow_endpoint: Option<url::Url>,
model_deployment_default_host: Option<Host>,
model_deployment_ingress: Option<Ingress>,
model_deployment_default_image_pull_policy: ImagePullPolicy,
task_executor_default_image: String,
task_executor_default_image_pull_policy: ImagePullPolicy,
}

impl AmeCfg {
pub fn from_env() -> Result<Self> {
let server_port: i32 = ame_env_var("SERVER_PORT")
.map(|v| {
v.parse()
.map_err(|e| AmeError::Parsing(format!("failed to parse server port {} ", v)))
})
.unwrap_or(Ok(3342))?;

let object_storage_endpoint: Url = ame_env_var("OBJECT_STORAGE_ENDPOINT")
.unwrap_or("http://ame-minio:9000".to_string())
.parse()
.map_err(|e| {
AmeError::Parsing(format!(
"failed to pass object storage endpoint with error {e}"
))
})?;

// TODO: sanitize this
let object_storage_endpoint: String =
ame_env_var("OBJECT_STORAGE_CONTAINER").unwrap_or("ameprojectstorage".to_string());

let object_storage_secret_name: String =
ame_env_var("OBJECT_STORAGE_SECRET_NAME").unwrap_or("ame-minio".to_string());

let object_storage_secret_key: String =
ame_env_var("OBJECT_STORAGE_SECRET_KEY").unwrap_or("root-password".to_string());

let object_storage_id_key: String =
ame_env_var("OBJECT_STORAGE_ID_KEY").unwrap_or("root-user".to_string());

let mlflow_endpoint: Option<Url> = ame_env_var("MLFLOW_ENDPOINT")
.map(|v| {
v.parse().map_or_else(
|e| {
Err(AmeError::Parsing(format!(
"failed to pass mlflow endpoint: {v} due to error {e}"
)))
},
|v| Ok(Some(v)),
)
})
.unwrap_or(Ok(None))?;

let model_deployment_default_host: Option<Host<String>> =
ame_env_var("MODEL_DEPLOYMENT_DEFAULT_HOST")
.map(|v| {
Host::parse(&v)
.map_or_else(|e| Err(AmeError::Parsing("".to_string())), |v| Ok(Some(v)))
})
.unwrap_or(Ok(None))?;

todo!()
}
}

fn ame_env_var(key: &str) -> Option<String> {
var(format!("AME_{key}")).ok()
}
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,17 @@ spec:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: MINIO_ROOT_USER
name: ame-minio-secret
key: root-user
name: ame-minio
optional: false
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: MINIO_ROOT_PASSWORD
name: ame-minio-secret
key: root-password
name: ame-minio
optional: false
- name: MLFLOW_TRACKING_URI
value: "http://mlflow.default.svc.cluster.local:5000"
value: "http://mlflow.ame-system.svc.cluster.local:5000"
- name: MINIO_URL
value: "http://ame-minio.ame-system.svc.cluster.local:9000"
- name: PIPENV_YES
Expand All @@ -57,6 +57,7 @@ spec:
key: secret
name: secretkey
image: myimage
imagePullPolicy: Never
name: ""
resources:
limits:
Expand Down Expand Up @@ -85,17 +86,17 @@ spec:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: MINIO_ROOT_USER
name: ame-minio-secret
key: root-user
name: ame-minio
optional: false
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: MINIO_ROOT_PASSWORD
name: ame-minio-secret
key: root-password
name: ame-minio
optional: false
- name: MLFLOW_TRACKING_URI
value: "http://mlflow.default.svc.cluster.local:5000"
value: "http://mlflow.ame-system.svc.cluster.local:5000"
- name: MINIO_URL
value: "http://ame-minio.ame-system.svc.cluster.local:9000"
- name: PIPENV_YES
Expand All @@ -108,6 +109,7 @@ spec:
key: secret
name: secretkey
image: myimage
imagePullPolicy: Never
name: ""
resources:
limits:
Expand Down Expand Up @@ -136,17 +138,17 @@ spec:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: MINIO_ROOT_USER
name: ame-minio-secret
key: root-user
name: ame-minio
optional: false
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: MINIO_ROOT_PASSWORD
name: ame-minio-secret
key: root-password
name: ame-minio
optional: false
- name: MLFLOW_TRACKING_URI
value: "http://mlflow.default.svc.cluster.local:5000"
value: "http://mlflow.ame-system.svc.cluster.local:5000"
- name: MINIO_URL
value: "http://ame-minio.ame-system.svc.cluster.local:9000"
- name: PIPENV_YES
Expand All @@ -159,6 +161,7 @@ spec:
key: secret
name: secretkey
image: myimage
imagePullPolicy: Never
name: ""
resources:
limits:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ spec:
containers:
- args:
- "-c"
- "export PATH=$HOME/.pyenv/bin:$PATH; mlflow models serve -m model_source --host 0.0.0.0"
- "export PATH=$HOME/.pyenv/bin:$PATH; mlflow models serve -m model_source -p 5000 --host 0.0.0.0"
command:
- /bin/bash
env:
- name: MLFLOW_TRACKING_URI
value: "http://mlflow.default.svc.cluster.local:5000"
value: "http://mlflow.ame-system.svc.cluster.local:5000"
image: test_img
name: main
ports:
Expand Down
Loading

0 comments on commit b04a3e0

Please sign in to comment.