Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 2 additions & 16 deletions docs/apis/apis.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# APIs

Serve models at scale and use them to build smarter applications.
Serve models at scale.

## Config

```yaml
- kind: api
name: <string> # API name (required)
model: <string> # path to a zipped model dir (e.g. s3://my-bucket/model.zip)
model: <string> # path to an exported model (e.g. s3://my-bucket/model.zip)
model_format: <string> # model format, must be "tensorflow" or "onnx"
request_handler: <string> # path to the request handler implementation file, relative to the cortex root
compute:
Expand Down Expand Up @@ -40,17 +40,3 @@ See [packaging models](packaging-models.md) for how to create the zipped model.
Request handlers are used to decouple the interface of an API endpoint from its model. A `pre_inference` request handler can be used to modify request payloads before they are sent to the model. A `post_inference` request handler can be used to modify model predictions in the server before they are sent to the client.

See [request handlers](request-handlers.md) for a detailed guide.

## Integration

APIs can be integrated into other applications or services via their JSON endpoints. The endpoint for any API follows the following format: {apis_endpoint}/{deployment_name}/{api_name}.

The fields in the request payload for a particular API should match the raw columns that were used to train the model that it is serving. Cortex automatically applies the same transformers that were used at training time when responding to prediction requests.

## Horizontal Scalability

APIs can be configured using `replicas` in the `compute` field. Replicas can be used to change the amount of computing resources allocated to service prediction requests for a particular API. APIs that have low request volumes should have a small number of replicas while APIs that handle large request volumes should have more replicas.

## Rolling Updates

When the model that an API is serving gets updated, Cortex will update the API with the new model without any downtime.
4 changes: 2 additions & 2 deletions docs/apis/compute.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Compute

Compute resource requests in Cortex follow the syntax and meaning of [compute resources in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/).
Compute resource requests in Cortex follow the syntax and meaning of [compute resources in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container).

For example:

```yaml
- kind: model
- kind: api
...
compute:
cpu: "2"
Expand Down
17 changes: 0 additions & 17 deletions docs/apis/deployment.md

This file was deleted.

17 changes: 17 additions & 0 deletions docs/apis/deployments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Deployments

Deployments are used to group a set of resources that can be deployed as a single unit. It must be defined in every Cortex directory in a top-level `cortex.yaml` file.

## Config

```yaml
- kind: deployment
name: <string> # deployment name (required)
```

## Example

```yaml
- kind: deployment
name: my_deployment
```
File renamed without changes.
47 changes: 0 additions & 47 deletions docs/pipelines/apis.md

This file was deleted.

32 changes: 0 additions & 32 deletions docs/pipelines/compute.md

This file was deleted.

17 changes: 0 additions & 17 deletions docs/pipelines/deployment.md

This file was deleted.

26 changes: 0 additions & 26 deletions docs/pipelines/packaging-models.md

This file was deleted.

17 changes: 0 additions & 17 deletions docs/pipelines/statuses.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,3 @@
| upstream error | Resource was not created due to an error in one of its dependencies |
| upstream termination | Resource was not created because one of its dependencies was terminated |
| compute unavailable | Resource's workload could not start due to insufficient memory, CPU, or GPU in the cluster |

## API statuses

| Status | Meaning |
|----------------------|---|
| ready | API is deployed and ready to serve prediction requests |
| pending | API is waiting for another resource to be ready, or is initializing |
| updating | API is performing a rolling update |
| update pending | API will be updated when the new model is ready; a previous version of this API is ready |
| stopping | API is stopping |
| stopped | API is stopped |
| error | API was not created due to an error; run `cortex logs -v <name>` to view the logs |
| skipped | API was not created due to an error in another resource |
| update skipped | API was not updated due to an error in another resource; a previous version of this API is ready |
| upstream error | API was not created due to an error in one of its dependencies; a previous version of this API may be ready |
| upstream termination | API was not created because one of its dependencies was terminated; a previous version of this API may be ready |
| compute unavailable | API could not start due to insufficient memory, CPU, or GPU in the cluster; some replicas may be ready |
10 changes: 3 additions & 7 deletions docs/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@

## Model Deployments

* [Deployments](apis/deployment.md)
* [Deployments](apis/deployments.md)
* [APIs](apis/apis.md)
* [Packaging Models](apis/packaging-models.md)
* [Request Handlers](apis/request-handlers.md)
* [Compute](apis/compute.md)
* [Python Packages](apis/python-packages.md)
* [CLI Commands](cluster/cli.md)
* [Resource Statuses](apis/statuses.md)

Expand All @@ -26,7 +28,6 @@

* [Overview](pipelines/overview.md)
* [Tutorial](pipelines/tutorial.md)
* [Deployments](pipelines/deployment.md)
* [Environments](pipelines/environments.md)
* [Raw Columns](pipelines/raw-columns.md)
* [Aggregators](pipelines/aggregators.md)
Expand All @@ -38,12 +39,7 @@
* [Estimators](pipelines/estimators.md)
* [Custom Estimators](pipelines/estimators-custom.md)
* [Models](pipelines/pipelines.md)
* [APIs](pipelines/apis.md)
* [Constants](pipelines/constants.md)
* [Data Types](pipelines/data-types.md)
* [Templates](pipelines/templates.md)
* [Packaging Models](pipelines/packaging-models.md)
* [Python Packages](pipelines/python-packages.md)
* [Compute](pipelines/compute.md)
* [CLI Commands](cluster/cli.md)
* [Resource Statuses](pipelines/statuses.md)