From 348f4e375894e04db08fd4582b4aa477814e3c89 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Sat, 6 Jul 2019 23:05:16 -0700 Subject: [PATCH 1/2] Update docs --- docs/apis/apis.md | 16 +------ docs/apis/compute.md | 4 +- docs/apis/deployment.md | 17 -------- docs/apis/deployments.md | 17 ++++++++ docs/{pipelines => apis}/python-packages.md | 0 docs/pipelines/apis.md | 47 --------------------- docs/pipelines/compute.md | 32 -------------- docs/pipelines/deployment.md | 17 -------- docs/pipelines/packaging-models.md | 26 ------------ docs/pipelines/statuses.md | 17 -------- docs/summary.md | 10 ++--- 11 files changed, 23 insertions(+), 180 deletions(-) delete mode 100644 docs/apis/deployment.md create mode 100644 docs/apis/deployments.md rename docs/{pipelines => apis}/python-packages.md (100%) delete mode 100644 docs/pipelines/apis.md delete mode 100644 docs/pipelines/compute.md delete mode 100644 docs/pipelines/deployment.md delete mode 100644 docs/pipelines/packaging-models.md diff --git a/docs/apis/apis.md b/docs/apis/apis.md index 1415343142..2abed1a657 100644 --- a/docs/apis/apis.md +++ b/docs/apis/apis.md @@ -1,6 +1,6 @@ # APIs -Serve models at scale and use them to build smarter applications. +Serve models at scale. ## Config @@ -40,17 +40,3 @@ See [packaging models](packaging-models.md) for how to create the zipped model. Request handlers are used to decouple the interface of an API endpoint from its model. A `pre_inference` request handler can be used to modify request payloads before they are sent to the model. A `post_inference` request handler can be used to modify model predictions in the server before they are sent to the client. See [request handlers](request-handlers.md) for a detailed guide. - -## Integration - -APIs can be integrated into other applications or services via their JSON endpoints. The endpoint for any API follows the following format: {apis_endpoint}/{deployment_name}/{api_name}. - -The fields in the request payload for a particular API should match the raw columns that were used to train the model that it is serving. Cortex automatically applies the same transformers that were used at training time when responding to prediction requests. - -## Horizontal Scalability - -APIs can be configured using `replicas` in the `compute` field. Replicas can be used to change the amount of computing resources allocated to service prediction requests for a particular API. APIs that have low request volumes should have a small number of replicas while APIs that handle large request volumes should have more replicas. - -## Rolling Updates - -When the model that an API is serving gets updated, Cortex will update the API with the new model without any downtime. diff --git a/docs/apis/compute.md b/docs/apis/compute.md index e77edf20af..464b45b1f1 100644 --- a/docs/apis/compute.md +++ b/docs/apis/compute.md @@ -1,11 +1,11 @@ # Compute -Compute resource requests in Cortex follow the syntax and meaning of [compute resources in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/). +Compute resource requests in Cortex follow the syntax and meaning of [compute resources in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container). For example: ```yaml -- kind: model +- kind: api ... compute: cpu: "2" diff --git a/docs/apis/deployment.md b/docs/apis/deployment.md deleted file mode 100644 index 878c3e7cc0..0000000000 --- a/docs/apis/deployment.md +++ /dev/null @@ -1,17 +0,0 @@ -# Deployment - -The deployment resource is used to group a set of APIs that can be deployed as a single unit. It must be defined in every Cortex directory in a top-level `cortex.yaml` file. - -## Config - -```yaml -- kind: deployment - name: # deployment name (required) -``` - -## Example - -```yaml -- kind: deployment - name: my_deployment -``` diff --git a/docs/apis/deployments.md b/docs/apis/deployments.md new file mode 100644 index 0000000000..ecf5d65c6f --- /dev/null +++ b/docs/apis/deployments.md @@ -0,0 +1,17 @@ +# Deployments + +Deployments are used to group a set of resources that can be deployed as a single unit. It must be defined in every Cortex directory in a top-level `cortex.yaml` file. + +## Config + +```yaml +- kind: deployment + name: # deployment name (required) +``` + +## Example + +```yaml +- kind: deployment + name: my_deployment +``` diff --git a/docs/pipelines/python-packages.md b/docs/apis/python-packages.md similarity index 100% rename from docs/pipelines/python-packages.md rename to docs/apis/python-packages.md diff --git a/docs/pipelines/apis.md b/docs/pipelines/apis.md deleted file mode 100644 index c221b63c37..0000000000 --- a/docs/pipelines/apis.md +++ /dev/null @@ -1,47 +0,0 @@ -# APIs - -Serve models at scale and use them to build smarter applications. - -## Config - -```yaml -- kind: api - name: # API name (required) - model: # reference to a model (e.g. @dnn) or path to a zipped model dir (e.g. s3://my-bucket/model.zip) - compute: - min_replicas: # minimum number of replicas (default: 1) - max_replicas: # maximum number of replicas (default: 100) - init_replicas: # initial number of replicas (default: ) - target_cpu_utilization: # CPU utilization threshold (as a percentage) to trigger scaling (default: 80) - cpu: # CPU request per replica (default: 200m) - gpu: # gpu request per replica (default: 0) - mem: # memory request per replica (default: Null) -``` - -See [packaging models](packaging-models.md) for how to create a zipped model (only if using an externally-built models). - -## Example - -```yaml -- kind: api - name: my-api - model: @dnn - compute: - min_replicas: 5 - max_replicas: 20 - cpu: "1" -``` - -## Integration - -APIs can be integrated into other applications or services via their JSON endpoints. The endpoint for any API follows the following format: {apis_endpoint}/{deployment_name}/{api_name}. - -The fields in the request payload for a particular API should match the raw columns that were used to train the model that it is serving. Cortex automatically applies the same transformers that were used at training time when responding to prediction requests. - -## Horizontal Scalability - -APIs can be configured using `replicas` in the `compute` field. Replicas can be used to change the amount of computing resources allocated to service prediction requests for a particular API. APIs that have low request volumes should have a small number of replicas while APIs that handle large request volumes should have more replicas. - -## Rolling Updates - -When the model that an API is serving gets updated, Cortex will update the API with the new model without any downtime. diff --git a/docs/pipelines/compute.md b/docs/pipelines/compute.md deleted file mode 100644 index c331360257..0000000000 --- a/docs/pipelines/compute.md +++ /dev/null @@ -1,32 +0,0 @@ -# Compute - -Compute resource requests in Cortex follow the syntax and meaning of [compute resources in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/). - -For example: - -```yaml -- kind: model - ... - compute: - cpu: "2" - mem: "1Gi" - gpu: 1 -``` - -CPU and memory requests in Cortex correspond to compute resource requests in Kubernetes. In the example above, the training job will only be scheduled once 2 CPUs and 1Gi of memory are available, and the job will be guaranteed to have access to those resources throughout it's execution. In some cases, a Cortex compute resource request can be (or may default to) `Null`. - -## CPU - -One unit of CPU corresponds to one virtual CPU on AWS. Fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (`0.2` and `200m` are equivalent). - -## Memory - -One unit of memory is one byte. Memory can be expressed as an integer or by using one of these suffixes: `K`, `M`, `G`, `T` (or their power-of two counterparts: `Ki`, `Mi`, `Gi`, `Ti`). For example, the following values represent roughly the same memory: `128974848`, `129e6`, `129M`, `123Mi`. - -## GPU - -One unit of GPU corresponds to one virtual GPU on AWS. Fractional requests are not allowed. Here's some information on [adding GPU enabled nodes on EKS](https://docs.aws.amazon.com/en_ca/eks/latest/userguide/gpu-ami.html). - -## GPU Support - -We recommend using GPU compute requests on API resources only if you have enough nodes in your cluster to support the number of GPU requests in model training plus APIs (ideally with an autoscaler). Otherwise, due to the nature of zero downtime rolling updates, your model training will not have sufficient GPU resources as there will always be GPUs consumed by APIs from the previous deployment. diff --git a/docs/pipelines/deployment.md b/docs/pipelines/deployment.md deleted file mode 100644 index 2c2f75dbd0..0000000000 --- a/docs/pipelines/deployment.md +++ /dev/null @@ -1,17 +0,0 @@ -# Deployment - -The deployment resource is used to group a set of resources that can be deployed as a single unit. It must be defined in every Cortex directory in a top-level `cortex.yaml` file. - -## Config - -```yaml -- kind: deployment - name: # deployment name (required) -``` - -## Example - -```yaml -- kind: deployment - name: my_deployment -``` diff --git a/docs/pipelines/packaging-models.md b/docs/pipelines/packaging-models.md deleted file mode 100644 index 4837829420..0000000000 --- a/docs/pipelines/packaging-models.md +++ /dev/null @@ -1,26 +0,0 @@ -# Packaging Models - -## TensorFlow - -Zip the exported estimator output in your checkpoint directory, e.g. - -```text -$ ls export/estimator -saved_model.pb variables/ - -$ zip -r model.zip export/estimator -``` - -Upload the zipped file to Amazon S3, e.g. - -```text -$ aws s3 cp model.zip s3://my-bucket/model.zip -``` - -Specify `model` in an API, e.g. - -```yaml -- kind: api - name: my-api - model: s3://my-bucket/model.zip -``` diff --git a/docs/pipelines/statuses.md b/docs/pipelines/statuses.md index e1a0f00752..47e5eb99c5 100644 --- a/docs/pipelines/statuses.md +++ b/docs/pipelines/statuses.md @@ -14,20 +14,3 @@ | upstream error | Resource was not created due to an error in one of its dependencies | | upstream termination | Resource was not created because one of its dependencies was terminated | | compute unavailable | Resource's workload could not start due to insufficient memory, CPU, or GPU in the cluster | - -## API statuses - -| Status | Meaning | -|----------------------|---| -| ready | API is deployed and ready to serve prediction requests | -| pending | API is waiting for another resource to be ready, or is initializing | -| updating | API is performing a rolling update | -| update pending | API will be updated when the new model is ready; a previous version of this API is ready | -| stopping | API is stopping | -| stopped | API is stopped | -| error | API was not created due to an error; run `cortex logs -v ` to view the logs | -| skipped | API was not created due to an error in another resource | -| update skipped | API was not updated due to an error in another resource; a previous version of this API is ready | -| upstream error | API was not created due to an error in one of its dependencies; a previous version of this API may be ready | -| upstream termination | API was not created because one of its dependencies was terminated; a previous version of this API may be ready | -| compute unavailable | API could not start due to insufficient memory, CPU, or GPU in the cluster; some replicas may be ready | diff --git a/docs/summary.md b/docs/summary.md index 9d110a5a2c..f29fa703e2 100644 --- a/docs/summary.md +++ b/docs/summary.md @@ -7,10 +7,12 @@ ## Model Deployments -* [Deployments](apis/deployment.md) +* [Deployments](apis/deployments.md) * [APIs](apis/apis.md) * [Packaging Models](apis/packaging-models.md) +* [Request Handlers](apis/request-handlers.md) * [Compute](apis/compute.md) +* [Python Packages](apis/python-packages.md) * [CLI Commands](cluster/cli.md) * [Resource Statuses](apis/statuses.md) @@ -26,7 +28,6 @@ * [Overview](pipelines/overview.md) * [Tutorial](pipelines/tutorial.md) -* [Deployments](pipelines/deployment.md) * [Environments](pipelines/environments.md) * [Raw Columns](pipelines/raw-columns.md) * [Aggregators](pipelines/aggregators.md) @@ -38,12 +39,7 @@ * [Estimators](pipelines/estimators.md) * [Custom Estimators](pipelines/estimators-custom.md) * [Models](pipelines/pipelines.md) -* [APIs](pipelines/apis.md) * [Constants](pipelines/constants.md) * [Data Types](pipelines/data-types.md) * [Templates](pipelines/templates.md) -* [Packaging Models](pipelines/packaging-models.md) -* [Python Packages](pipelines/python-packages.md) -* [Compute](pipelines/compute.md) -* [CLI Commands](cluster/cli.md) * [Resource Statuses](pipelines/statuses.md) From ded9ce2724f12e6fa872d95fc53cf176a7a97a95 Mon Sep 17 00:00:00 2001 From: David Eliahu Date: Sun, 7 Jul 2019 11:27:29 -0700 Subject: [PATCH 2/2] Update apis.md --- docs/apis/apis.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/apis/apis.md b/docs/apis/apis.md index 2abed1a657..aa73cc6f91 100644 --- a/docs/apis/apis.md +++ b/docs/apis/apis.md @@ -7,7 +7,7 @@ Serve models at scale. ```yaml - kind: api name: # API name (required) - model: # path to a zipped model dir (e.g. s3://my-bucket/model.zip) + model: # path to an exported model (e.g. s3://my-bucket/model.zip) model_format: # model format, must be "tensorflow" or "onnx" request_handler: # path to the request handler implementation file, relative to the cortex root compute: