From 06fabaea6d4d303d3606765578727dd563c5d54f Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 13:41:16 -0700 Subject: [PATCH 01/20] Update README.md --- README.md | 110 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 58 insertions(+), 52 deletions(-) diff --git a/README.md b/README.md index 58e71f3ff7..2d3978f322 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ Cortex is an open source platform for deploying machine learning models as produ -[install](https://cortex.dev/install) • [tutorial](https://cortex.dev/iris-classifier) • [docs](https://cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.15/examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex)

+[install](https://cortex.dev/install) • [tutorial](https://cortex.dev/iris-classifier) • [docs](https://cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.15/examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [chat with us](https://gitter.im/cortexlabs/cortex)

![Demo](https://d1zqebknpdh033.cloudfront.net/demo/gif/v0.13_2.gif) @@ -25,41 +25,6 @@ Cortex is an open source platform for deploying machine learning models as produ
-## Spinning up a cluster - -Cortex is designed to be self-hosted on any AWS account. You can spin up a cluster with a single command: - - -```bash -# install the CLI on your machine -$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.15/get-cli.sh)" - -# provision infrastructure on AWS and spin up a cluster -$ cortex cluster up - -aws region: us-west-2 -aws instance type: g4dn.xlarge -spot instances: yes -min instances: 0 -max instances: 5 - -aws resource cost per hour -1 eks cluster $0.10 -0 - 5 g4dn.xlarge instances for your apis $0.1578 - $0.526 each (varies based on spot price) -0 - 5 50gb ebs volumes for your apis $0.007 each -1 t3.medium instance for the operator $0.0416 -1 20gb ebs volume for the operator $0.003 -2 network load balancers $0.0225 each - -your cluster will cost $0.19 - $2.85 per hour based on cluster size and spot instance pricing/availability - -○ spinning up your cluster ... - -your cluster is ready! -``` - -
- ## Deploying a model ### Implement your predictor @@ -84,14 +49,12 @@ class PythonPredictor: predictor: type: python path: predictor.py - tracker: - model_type: classification compute: gpu: 1 mem: 4G ``` -### Deploy to AWS +### Deploy your model ```bash $ cortex deploy @@ -99,20 +62,57 @@ $ cortex deploy creating sentiment-classifier ``` -### Serve real-time predictions +### Serve predictions ```bash -$ curl http://***.amazonaws.com/sentiment-classifier \ +$ curl http://localhost:8888/sentiment-classifier \ -X POST -H "Content-Type: application/json" \ - -d '{"text": "the movie was amazing!"}' + -d '{"text": "serving models locally is cool!"}' positive ``` +
+ +## Running Cortex in production + +### Spin up a cluster + +Cortex is designed to be self-hosted on any AWS account (GCP is coming soon). You can spin up a cluster with a single command: + + +```bash +# install the CLI on your machine +$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.15/get-cli.sh)" + +# provision infrastructure on AWS and spin up a cluster +$ cortex cluster up + +aws region: us-west-2 +aws instance type: g4dn.xlarge +spot instances: yes +min instances: 0 +max instances: 5 + +your cluster will cost $0.19 - $2.85 per hour based on cluster size and spot instance pricing/availability + +○ spinning up your cluster ... + +your cluster is ready! +``` + +### Deploy to your cluster with the same code and configuration + +```bash +$ cortex deploy + +creating sentiment-classifier +``` + ### Monitor your deployment ```bash -$ cortex get sentiment-classifier --watch +$ cortex get sentiment-classifier status up-to-date requested last update avg request 2XX live 1 1 8s 24ms 12 @@ -122,15 +122,17 @@ positive 8 negative 4 ``` -
+### Serve predictions at scale -## What is Cortex similar to? - -Cortex is an open source alternative to serving models with SageMaker or building your own model deployment platform on top of AWS services like Elastic Kubernetes Service (EKS), Elastic Container Service (ECS), Lambda, Fargate, and Elastic Compute Cloud (EC2) and open source projects like Docker, Kubernetes, and TensorFlow Serving. +```bash +$ curl http://***.amazonaws.com/sentiment-classifier \ + -X POST -H "Content-Type: application/json" \ + -d '{"text": "serving models at scale is cooler!"}' -
+positive +``` -## How does Cortex work? +## How it works The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. @@ -138,11 +140,15 @@ Cortex manages its own Kubernetes cluster so that end-to-end functionality like
+## What is Cortex similar to? + +Cortex is an open source alternative to serving models with SageMaker or building your own model deployment platform on top of AWS services like Elastic Kubernetes Service (EKS), Lambda, or Fargate and open source projects like Docker, Kubernetes, TensorFlow Serving, and TorchServe. + +
+ ## Examples of Cortex deployments - -* [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.15/examples/tensorflow/sentiment-analyzer): deploy a BERT model for sentiment analysis. + * [Image classification](https://github.com/cortexlabs/cortex/tree/0.15/examples/tensorflow/image-classifier): deploy an Inception model to classify images. * [Search completion](https://github.com/cortexlabs/cortex/tree/0.15/examples/pytorch/search-completer): deploy Facebook's RoBERTa model to complete search terms. * [Text generation](https://github.com/cortexlabs/cortex/tree/0.15/examples/pytorch/text-generator): deploy Hugging Face's DistilGPT2 model to generate text. -* [Iris classification](https://github.com/cortexlabs/cortex/tree/0.15/examples/sklearn/iris-classifier): deploy a scikit-learn model to classify iris flowers. From 92a6e5dcb3a2ba63e38595b397ed3e1afb4e7ce8 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 13:45:08 -0700 Subject: [PATCH 02/20] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 2d3978f322..e4e7a6698c 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ Cortex is an open source platform for deploying machine learning models as produ -[install](https://cortex.dev/install) • [tutorial](https://cortex.dev/iris-classifier) • [docs](https://cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.15/examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [chat with us](https://gitter.im/cortexlabs/cortex)

+[install](https://cortex.dev/install) • [docs](https://cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.15/examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [chat with us](https://gitter.im/cortexlabs/cortex)

![Demo](https://d1zqebknpdh033.cloudfront.net/demo/gif/v0.13_2.gif) @@ -78,7 +78,7 @@ positive ### Spin up a cluster -Cortex is designed to be self-hosted on any AWS account (GCP is coming soon). You can spin up a cluster with a single command: +Cortex is designed to be self-hosted on any AWS account (GCP is coming soon). ```bash @@ -132,7 +132,7 @@ $ curl http://***.amazonaws.com/sentiment-classifier \ positive ``` -## How it works +### How it works The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. From b849e815378a07e91f15290e631f1c6459678ef5 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 13:47:44 -0700 Subject: [PATCH 03/20] Update README.md --- README.md | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index e4e7a6698c..1528fcb6a5 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,5 @@ # Deploy machine learning models in production -Cortex is an open source platform for deploying machine learning models as production web services. -
@@ -13,6 +11,10 @@ Cortex is an open source platform for deploying machine learning models as produ
+Cortex is an open source platform for deploying machine learning models as production web services. + +
+ ## Key features * **Multi framework:** deploy TensorFlow, PyTorch, scikit-learn, and other models. @@ -109,6 +111,16 @@ $ cortex deploy creating sentiment-classifier ``` +### Serve predictions at scale + +```bash +$ curl http://***.amazonaws.com/sentiment-classifier \ + -X POST -H "Content-Type: application/json" \ + -d '{"text": "serving models at scale is cooler!"}' + +positive +``` + ### Monitor your deployment ```bash @@ -122,16 +134,6 @@ positive 8 negative 4 ``` -### Serve predictions at scale - -```bash -$ curl http://***.amazonaws.com/sentiment-classifier \ - -X POST -H "Content-Type: application/json" \ - -d '{"text": "serving models at scale is cooler!"}' - -positive -``` - ### How it works The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. From 9dfc16f33a124757314c5bb6e137e77aee0e1441 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 13:48:23 -0700 Subject: [PATCH 04/20] Update README.md --- README.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/README.md b/README.md index 1528fcb6a5..6106298e63 100644 --- a/README.md +++ b/README.md @@ -11,10 +11,6 @@
-Cortex is an open source platform for deploying machine learning models as production web services. - -
- ## Key features * **Multi framework:** deploy TensorFlow, PyTorch, scikit-learn, and other models. From 25ddca8c3d69241b3f005cb8aa5807e62fa8eaf8 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 13:50:18 -0700 Subject: [PATCH 05/20] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6106298e63..ae5a81f1ae 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ creating sentiment-classifier ### Serve predictions ```bash -$ curl http://localhost:8888/sentiment-classifier \ +$ curl http://localhost:8888 \ -X POST -H "Content-Type: application/json" \ -d '{"text": "serving models locally is cool!"}' @@ -112,7 +112,7 @@ creating sentiment-classifier ```bash $ curl http://***.amazonaws.com/sentiment-classifier \ -X POST -H "Content-Type: application/json" \ - -d '{"text": "serving models at scale is cooler!"}' + -d '{"text": "serving models at scale is really cool!"}' positive ``` From 87dd14ee0a3c17bdc75dfae235355d1a200eb375 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 13:58:56 -0700 Subject: [PATCH 06/20] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ae5a81f1ae..a62a64a868 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ creating sentiment-classifier ### Serve predictions ```bash -$ curl http://localhost:8888 \ +$ curl http://localhost:12345 \ -X POST -H "Content-Type: application/json" \ -d '{"text": "serving models locally is cool!"}' @@ -76,7 +76,7 @@ positive ### Spin up a cluster -Cortex is designed to be self-hosted on any AWS account (GCP is coming soon). +Cortex clusters are designed to be self-hosted on any AWS account (GCP support is coming soon): ```bash From 5520c7b3b3cec1fc5f8c44a5c810fe9df52b3be8 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 14:06:24 -0700 Subject: [PATCH 07/20] Update README.md --- README.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index a62a64a868..0307a13331 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,13 @@ ## Deploying a model +### Install the CLI + + +```bash +$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.15/get-cli.sh)" +``` + ### Implement your predictor ```python @@ -76,13 +83,9 @@ positive ### Spin up a cluster -Cortex clusters are designed to be self-hosted on any AWS account (GCP support is coming soon): +Cortex clusters are designed to be self-hosted on any AWS account (and GCP support is coming soon): - ```bash -# install the CLI on your machine -$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.15/get-cli.sh)" - # provision infrastructure on AWS and spin up a cluster $ cortex cluster up @@ -102,7 +105,7 @@ your cluster is ready! ### Deploy to your cluster with the same code and configuration ```bash -$ cortex deploy +$ cortex deploy --env aws creating sentiment-classifier ``` From 472cbc096f659fa6e004b56b73458cc994d36ded Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Thu, 30 Apr 2020 20:43:18 -0700 Subject: [PATCH 08/20] Update tagline --- README.md | 2 +- cli/cmd/root.go | 2 +- docs/summary.md | 2 +- examples/pytorch/language-identifier/sample.json | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 0307a13331..5e1c9f0780 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Deploy machine learning models in production +# Machine learning model serving platform
diff --git a/cli/cmd/root.go b/cli/cmd/root.go index 7adfc89d9a..c39251bd7d 100644 --- a/cli/cmd/root.go +++ b/cli/cmd/root.go @@ -121,7 +121,7 @@ func initTelemetry() { var _rootCmd = &cobra.Command{ Use: "cortex", Aliases: []string{"cx"}, - Short: "deploy machine learning models in production", + Short: "machine learning model serving platform", } func Execute() { diff --git a/docs/summary.md b/docs/summary.md index 7dcdd5851b..200d577b1e 100644 --- a/docs/summary.md +++ b/docs/summary.md @@ -1,6 +1,6 @@ # Table of contents -* [Deploy machine learning models in production](../README.md) +* [Machine learning model serving platform](../README.md) * [Install](cluster-management/install.md) * [Tutorial](../examples/sklearn/iris-classifier/README.md) * [GitHub](https://github.com/cortexlabs/cortex) diff --git a/examples/pytorch/language-identifier/sample.json b/examples/pytorch/language-identifier/sample.json index 2329a0a714..0e1392bb07 100644 --- a/examples/pytorch/language-identifier/sample.json +++ b/examples/pytorch/language-identifier/sample.json @@ -1,3 +1,3 @@ { - "text": "deploy machine learning models in production" + "text": "machine learning model serving platform" } From ed31ab404ff6599a948f17338d8f5f492a7fe1a0 Mon Sep 17 00:00:00 2001 From: Caleb Kaiser <42076840+caleb-kaiser@users.noreply.github.com> Date: Fri, 1 May 2020 16:03:17 -0400 Subject: [PATCH 09/20] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5e1c9f0780..285095d77b 100644 --- a/README.md +++ b/README.md @@ -81,6 +81,8 @@ positive ## Running Cortex in production +Cortex can also automatically provision and manage a Kubernetes cluster for inference workloads, typically for production use cases. Cortex manages its own Kubernetes cluster so that end-to-end functionality like request-based autoscaling, GPU support, and spot instance management can work out of the box without any additional DevOps work. + ### Spin up a cluster Cortex clusters are designed to be self-hosted on any AWS account (and GCP support is coming soon): @@ -137,8 +139,6 @@ negative 4 The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. -Cortex manages its own Kubernetes cluster so that end-to-end functionality like request-based autoscaling, GPU support, and spot instance management can work out of the box without any additional DevOps work. -
## What is Cortex similar to? From 9d3ca67c599bb15e86001e8838e88d4e530e98ba Mon Sep 17 00:00:00 2001 From: David Eliahu Date: Fri, 1 May 2020 14:30:28 -0700 Subject: [PATCH 10/20] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 285095d77b..4ecd4d764b 100644 --- a/README.md +++ b/README.md @@ -137,7 +137,7 @@ negative 4 ### How it works -The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. +The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using a Network Load Balancer (NLB) and FastAPI / TensorFlow Serving / ONNX Runtime (depending on the model type). The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch.
From 9f8431960eacf032b670061bb0da9283dbee5c73 Mon Sep 17 00:00:00 2001 From: Vishal Bollu Date: Fri, 1 May 2020 19:35:35 -0400 Subject: [PATCH 11/20] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4ecd4d764b..030e2fc1ec 100644 --- a/README.md +++ b/README.md @@ -79,7 +79,7 @@ positive
-## Running Cortex in production +## Deploying models at scale Cortex can also automatically provision and manage a Kubernetes cluster for inference workloads, typically for production use cases. Cortex manages its own Kubernetes cluster so that end-to-end functionality like request-based autoscaling, GPU support, and spot instance management can work out of the box without any additional DevOps work. From c84c60cc06a8e935722f9ab02c713eed5f9e557e Mon Sep 17 00:00:00 2001 From: David Eliahu Date: Fri, 1 May 2020 17:43:53 -0700 Subject: [PATCH 12/20] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 030e2fc1ec..3ed43b24e8 100644 --- a/README.md +++ b/README.md @@ -85,7 +85,7 @@ Cortex can also automatically provision and manage a Kubernetes cluster for infe ### Spin up a cluster -Cortex clusters are designed to be self-hosted on any AWS account (and GCP support is coming soon): +Cortex clusters are designed to be self-hosted on any AWS account (GCP support is coming soon): ```bash # provision infrastructure on AWS and spin up a cluster From 8ddb59764ba4e6d63ef5c6d508717aea78d38394 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Fri, 1 May 2020 21:06:22 -0700 Subject: [PATCH 13/20] Update README.md --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 3ed43b24e8..f7d31eb7c2 100644 --- a/README.md +++ b/README.md @@ -81,8 +81,6 @@ positive ## Deploying models at scale -Cortex can also automatically provision and manage a Kubernetes cluster for inference workloads, typically for production use cases. Cortex manages its own Kubernetes cluster so that end-to-end functionality like request-based autoscaling, GPU support, and spot instance management can work out of the box without any additional DevOps work. - ### Spin up a cluster Cortex clusters are designed to be self-hosted on any AWS account (GCP support is coming soon): From 0a1af5298b8f5712b8a6c6aea232b3a85b557e19 Mon Sep 17 00:00:00 2001 From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com> Date: Fri, 1 May 2020 21:13:17 -0700 Subject: [PATCH 14/20] Update README.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index f7d31eb7c2..563dee27b2 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Machine learning model serving platform +# Cortex: machine learning model serving infrastructure
@@ -86,7 +86,6 @@ positive Cortex clusters are designed to be self-hosted on any AWS account (GCP support is coming soon): ```bash -# provision infrastructure on AWS and spin up a cluster $ cortex cluster up aws region: us-west-2 From 459c535c6833c41f191878a9130d7b61c5c7e94e Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Fri, 1 May 2020 21:14:28 -0700 Subject: [PATCH 15/20] Update tagline --- cli/cmd/root.go | 2 +- docs/summary.md | 2 +- examples/pytorch/language-identifier/sample.json | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/cli/cmd/root.go b/cli/cmd/root.go index c39251bd7d..750255c217 100644 --- a/cli/cmd/root.go +++ b/cli/cmd/root.go @@ -121,7 +121,7 @@ func initTelemetry() { var _rootCmd = &cobra.Command{ Use: "cortex", Aliases: []string{"cx"}, - Short: "machine learning model serving platform", + Short: "machine learning model serving infrastructure", } func Execute() { diff --git a/docs/summary.md b/docs/summary.md index 200d577b1e..da02fa89f7 100644 --- a/docs/summary.md +++ b/docs/summary.md @@ -1,6 +1,6 @@ # Table of contents -* [Machine learning model serving platform](../README.md) +* [Machine learning model serving infrastructure](../README.md) * [Install](cluster-management/install.md) * [Tutorial](../examples/sklearn/iris-classifier/README.md) * [GitHub](https://github.com/cortexlabs/cortex) diff --git a/examples/pytorch/language-identifier/sample.json b/examples/pytorch/language-identifier/sample.json index 0e1392bb07..76fce072eb 100644 --- a/examples/pytorch/language-identifier/sample.json +++ b/examples/pytorch/language-identifier/sample.json @@ -1,3 +1,3 @@ { - "text": "machine learning model serving platform" + "text": "machine learning model serving infrastructure" } From 8b13797cd87ee472c59c00b0bf05bbc458c7378d Mon Sep 17 00:00:00 2001 From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com> Date: Fri, 1 May 2020 21:15:11 -0700 Subject: [PATCH 16/20] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 563dee27b2..d4e339a14a 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Cortex: machine learning model serving infrastructure +# Machine learning model serving infrastructure
From 586fd188c711c8675d4d01d7b5aa5437142b5712 Mon Sep 17 00:00:00 2001 From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com> Date: Fri, 1 May 2020 21:23:54 -0700 Subject: [PATCH 17/20] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d4e339a14a..8e74dc804b 100644 --- a/README.md +++ b/README.md @@ -144,7 +144,7 @@ Cortex is an open source alternative to serving models with SageMaker or buildin
-## Examples of Cortex deployments +## Examples * [Image classification](https://github.com/cortexlabs/cortex/tree/0.15/examples/tensorflow/image-classifier): deploy an Inception model to classify images. From a1345fc14dcd6b295be5cd5860655a7455a8ce1b Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Sun, 3 May 2020 21:08:51 -0700 Subject: [PATCH 18/20] Update install.md --- docs/cluster-management/install.md | 46 ++++++++++++------------------ 1 file changed, 19 insertions(+), 27 deletions(-) diff --git a/docs/cluster-management/install.md b/docs/cluster-management/install.md index 195372c28d..c54639d471 100644 --- a/docs/cluster-management/install.md +++ b/docs/cluster-management/install.md @@ -2,41 +2,37 @@ _WARNING: you are on the master branch, please refer to the docs on the branch that matches your `cortex version`_ -## Prerequisites +## Running on your machine or a single instance -1. [Docker](https://docs.docker.com/install) -2. [AWS credentials](aws-credentials.md) +[Docker](https://docs.docker.com/install) is required to run Cortex locally. In addition, your machine (or your Docker Desktop for Mac users) should have at least 8GB of memory if you plan to deploy large deep learning models. -## Spin up a cluster - -See [cluster configuration](config.md) to learn how you can customize your cluster with `cluster.yaml` and see [EC2 instances](ec2-instances.md) for an overview of several EC2 instance types. To use GPU nodes, you may need to subscribe to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM) and [file an AWS support ticket](https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=ec2-instances) to increase the limit for your desired instance type. +### Install the CLI ```bash -# install the CLI on your machine $ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/master/get-cli.sh)" +``` -# provision infrastructure on AWS and spin up a cluster -$ cortex cluster up - -aws resource cost per hour -1 eks cluster $0.10 -0 - 5 g4dn.xlarge instances for your apis $0.1578 - $0.526 each (varies based on spot price) -0 - 5 50gb ebs volumes for your apis $0.007 each -1 t3.medium instance for the operator $0.0416 -1 20gb ebs volume for the operator $0.003 -2 network load balancers $0.0225 each +## Running at scale on AWS -your cluster will cost $0.19 - $2.85 per hour based on cluster size and spot instance pricing/availability +[Docker](https://docs.docker.com/install) and valid [AWS credentials](aws-credentials.md) are required to run a Cortex cluster on AWS. -○ spinning up your cluster ... +### Spin up a cluster -your cluster is ready! -``` +See [cluster configuration](config.md) to learn how you can customize your cluster with `cluster.yaml` and see [EC2 instances](ec2-instances.md) for an overview of several EC2 instance types. -## Deploy a model +To use GPU nodes, you may need to subscribe to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM) and [file an AWS support ticket](https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=ec2-instances) to increase the limit for your desired instance type. +```bash +# install the CLI on your machine +$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/master/get-cli.sh)" + +# provision infrastructure on AWS and spin up a cluster +$ cortex cluster up +``` + +## Deploy an example ```bash # clone the Cortex repository @@ -45,7 +41,7 @@ git clone -b master https://github.com/cortexlabs/cortex.git # navigate to the TensorFlow iris classification example cd cortex/examples/tensorflow/iris-classifier -# deploy the model to the cluster +# deploy the model cortex deploy # view the status of the api @@ -61,11 +57,7 @@ cortex get iris-classifier curl -X POST -H "Content-Type: application/json" \ -d '{ "sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3 }' \ -``` -## Cleanup - -```bash # delete the api cortex delete iris-classifier ``` From df76a5533a7a9b805bccf5c1f9a3986f449665a9 Mon Sep 17 00:00:00 2001 From: Omer Spillinger Date: Sun, 3 May 2020 21:15:54 -0700 Subject: [PATCH 19/20] Update README.md --- examples/sklearn/iris-classifier/README.md | 66 ++++++++++++++++------ 1 file changed, 48 insertions(+), 18 deletions(-) diff --git a/examples/sklearn/iris-classifier/README.md b/examples/sklearn/iris-classifier/README.md index 19df9e16b1..7056aac36a 100644 --- a/examples/sklearn/iris-classifier/README.md +++ b/examples/sklearn/iris-classifier/README.md @@ -1,4 +1,4 @@ -# Deploy a model as a web service +# Deploy models as a web APIs _WARNING: you are on the master branch, please refer to the examples on the branch that matches your `cortex version`_ @@ -119,9 +119,9 @@ Create a `cortex.yaml` file and add the configuration below and replace `cortex-
-## Deploy to AWS +## Deploy your model locally -`cortex deploy` takes the configuration from `cortex.yaml` and creates it on your cluster: +`cortex deploy` takes your model along with the configuration from `cortex.yaml` and creates a web API: ```bash $ cortex deploy @@ -129,7 +129,7 @@ $ cortex deploy creating iris-classifier ``` -Track the status of your api using `cortex get`: +Monitor the status of your API using `cortex get`: ```bash $ cortex get iris-classifier --watch @@ -137,7 +137,7 @@ $ cortex get iris-classifier --watch status up-to-date requested last update avg request 2XX live 1 1 1m - - -endpoint: http://***.amazonaws.com/iris-classifier +endpoint: http://localhost:8888 ``` The output above indicates that one replica of your API was requested and is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity. @@ -148,11 +148,37 @@ You can also stream logs from your API: $ cortex logs iris-classifier ``` +You can use `curl` to test your API: + +```bash +$ curl http://localhost:8888 \ + -X POST -H "Content-Type: application/json" \ + -d '{"sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3}' + +"setosa" +``` +
-## Serve real-time predictions +## Deploy your model to AWS -You can use `curl` to test your prediction service: +Cortex can automatically provision infrastructure on your AWS account and deploy your models as production-ready web services: + +```bash +$ cortex cluster up +``` + +You can deploy the model using the same code and configuration to your cluster: + +```bash +$ cortex deploy --env aws + +creating iris-classifier +``` + +
+ +## Serve predictions in production ```bash $ curl http://***.amazonaws.com/iris-classifier \ @@ -185,7 +211,7 @@ Add a `tracker` to your `cortex.yaml` and specify that this is a classification Run `cortex deploy` again to perform a rolling update to your API with the new configuration: ```bash -$ cortex deploy +$ cortex deploy --env aws updating iris-classifier ``` @@ -193,7 +219,7 @@ updating iris-classifier After making more predictions, your `cortex get` command will show information about your API's past predictions: ```bash -$ cortex get iris-classifier --watch +$ cortex get --env aws iris-classifier --watch status up-to-date requested last update avg request 2XX live 1 1 1m 1.1 ms 14 @@ -230,7 +256,7 @@ This model is fairly small but larger models may require more compute resources. You could also configure GPU compute here if your cluster supports it. Adding compute resources may help reduce your inference latency. Run `cortex deploy` again to update your API with this configuration: ```bash -$ cortex deploy +$ cortex deploy --env aws updating iris-classifier ``` @@ -238,7 +264,7 @@ updating iris-classifier Run `cortex get` again: ```bash -$ cortex get iris-classifier --watch +$ cortex get --env aws iris-classifier --watch status up-to-date requested last update avg request 2XX live 1 1 1m 1.1 ms 14 @@ -288,7 +314,7 @@ If you trained another model and want to A/B test it with your previous model, s Run `cortex deploy` to create the new API: ```bash -$ cortex deploy +$ cortex deploy --env aws iris-classifier is up to date creating another-iris-classifier @@ -297,7 +323,7 @@ creating another-iris-classifier `cortex deploy` is declarative so the `iris-classifier` API is unchanged while `another-iris-classifier` is created: ```bash -$ cortex get --watch +$ cortex get --env aws --watch api status up-to-date requested last update iris-classifier live 1 1 5m @@ -388,7 +414,7 @@ Next, add the `api` to `cortex.yaml`: Run `cortex deploy` to create your batch API: ```bash -$ cortex deploy +$ cortex deploy --env aws updating iris-classifier updating another-iris-classifier @@ -400,7 +426,7 @@ Since a new file was added to the directory, and all files in the directory cont `cortex get` should show all 3 APIs now: ```bash -$ cortex get --watch +$ cortex get --env aws --watch api status up-to-date requested last update iris-classifier live 1 1 1m @@ -446,15 +472,19 @@ $ curl http://***.amazonaws.com/batch-iris-classifier \ Run `cortex delete` to delete each API: ```bash -$ cortex delete iris-classifier +$ cortex delete --env local iris-classifier + +deleting iris-classifier + +$ cortex delete --env aws iris-classifier deleting iris-classifier -$ cortex delete another-iris-classifier +$ cortex delete --env aws another-iris-classifier deleting another-iris-classifier -$ cortex delete batch-iris-classifier +$ cortex delete --env aws batch-iris-classifier deleting batch-iris-classifier ``` From f1993efb8f7e29b6ec3d30aa4cdbce225f55dad8 Mon Sep 17 00:00:00 2001 From: David Eliahu Date: Sun, 3 May 2020 21:27:52 -0700 Subject: [PATCH 20/20] Update README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8e74dc804b..4e298e3eb3 100644 --- a/README.md +++ b/README.md @@ -70,7 +70,7 @@ creating sentiment-classifier ### Serve predictions ```bash -$ curl http://localhost:12345 \ +$ curl http://localhost:8888 \ -X POST -H "Content-Type: application/json" \ -d '{"text": "serving models locally is cool!"}' @@ -136,6 +136,8 @@ negative 4 The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using a Network Load Balancer (NLB) and FastAPI / TensorFlow Serving / ONNX Runtime (depending on the model type). The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. +Cortex manages its own Kubernetes cluster so that end-to-end functionality like request-based autoscaling, GPU support, and spot instance management can work out of the box without any additional DevOps work. +
## What is Cortex similar to?