diff --git a/README.md b/README.md index 58e71f3ff7..4e298e3eb3 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,10 @@ -# Deploy machine learning models in production - -Cortex is an open source platform for deploying machine learning models as production web services. +# Machine learning model serving infrastructure
-[install](https://cortex.dev/install) • [tutorial](https://cortex.dev/iris-classifier) • [docs](https://cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.15/examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex)

+[install](https://cortex.dev/install) • [docs](https://cortex.dev) • [examples](https://github.com/cortexlabs/cortex/tree/0.15/examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [chat with us](https://gitter.im/cortexlabs/cortex)

![Demo](https://d1zqebknpdh033.cloudfront.net/demo/gif/v0.13_2.gif) @@ -25,43 +23,15 @@ Cortex is an open source platform for deploying machine learning models as produ
-## Spinning up a cluster +## Deploying a model -Cortex is designed to be self-hosted on any AWS account. You can spin up a cluster with a single command: +### Install the CLI ```bash -# install the CLI on your machine $ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.15/get-cli.sh)" - -# provision infrastructure on AWS and spin up a cluster -$ cortex cluster up - -aws region: us-west-2 -aws instance type: g4dn.xlarge -spot instances: yes -min instances: 0 -max instances: 5 - -aws resource cost per hour -1 eks cluster $0.10 -0 - 5 g4dn.xlarge instances for your apis $0.1578 - $0.526 each (varies based on spot price) -0 - 5 50gb ebs volumes for your apis $0.007 each -1 t3.medium instance for the operator $0.0416 -1 20gb ebs volume for the operator $0.003 -2 network load balancers $0.0225 each - -your cluster will cost $0.19 - $2.85 per hour based on cluster size and spot instance pricing/availability - -○ spinning up your cluster ... - -your cluster is ready! ``` -
- -## Deploying a model - ### Implement your predictor ```python @@ -84,14 +54,12 @@ class PythonPredictor: predictor: type: python path: predictor.py - tracker: - model_type: classification compute: gpu: 1 mem: 4G ``` -### Deploy to AWS +### Deploy your model ```bash $ cortex deploy @@ -99,12 +67,54 @@ $ cortex deploy creating sentiment-classifier ``` -### Serve real-time predictions +### Serve predictions + +```bash +$ curl http://localhost:8888 \ + -X POST -H "Content-Type: application/json" \ + -d '{"text": "serving models locally is cool!"}' + +positive +``` + +
+ +## Deploying models at scale + +### Spin up a cluster + +Cortex clusters are designed to be self-hosted on any AWS account (GCP support is coming soon): + +```bash +$ cortex cluster up + +aws region: us-west-2 +aws instance type: g4dn.xlarge +spot instances: yes +min instances: 0 +max instances: 5 + +your cluster will cost $0.19 - $2.85 per hour based on cluster size and spot instance pricing/availability + +○ spinning up your cluster ... + +your cluster is ready! +``` + +### Deploy to your cluster with the same code and configuration + +```bash +$ cortex deploy --env aws + +creating sentiment-classifier +``` + +### Serve predictions at scale ```bash $ curl http://***.amazonaws.com/sentiment-classifier \ -X POST -H "Content-Type: application/json" \ - -d '{"text": "the movie was amazing!"}' + -d '{"text": "serving models at scale is really cool!"}' positive ``` @@ -112,7 +122,7 @@ positive ### Monitor your deployment ```bash -$ cortex get sentiment-classifier --watch +$ cortex get sentiment-classifier status up-to-date requested last update avg request 2XX live 1 1 8s 24ms 12 @@ -122,27 +132,23 @@ positive 8 negative 4 ``` -
+### How it works -## What is Cortex similar to? +The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using a Network Load Balancer (NLB) and FastAPI / TensorFlow Serving / ONNX Runtime (depending on the model type). The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. -Cortex is an open source alternative to serving models with SageMaker or building your own model deployment platform on top of AWS services like Elastic Kubernetes Service (EKS), Elastic Container Service (ECS), Lambda, Fargate, and Elastic Compute Cloud (EC2) and open source projects like Docker, Kubernetes, and TensorFlow Serving. +Cortex manages its own Kubernetes cluster so that end-to-end functionality like request-based autoscaling, GPU support, and spot instance management can work out of the box without any additional DevOps work.
-## How does Cortex work? - -The CLI sends configuration and code to the cluster every time you run `cortex deploy`. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch. +## What is Cortex similar to? -Cortex manages its own Kubernetes cluster so that end-to-end functionality like request-based autoscaling, GPU support, and spot instance management can work out of the box without any additional DevOps work. +Cortex is an open source alternative to serving models with SageMaker or building your own model deployment platform on top of AWS services like Elastic Kubernetes Service (EKS), Lambda, or Fargate and open source projects like Docker, Kubernetes, TensorFlow Serving, and TorchServe.
-## Examples of Cortex deployments +## Examples - -* [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.15/examples/tensorflow/sentiment-analyzer): deploy a BERT model for sentiment analysis. + * [Image classification](https://github.com/cortexlabs/cortex/tree/0.15/examples/tensorflow/image-classifier): deploy an Inception model to classify images. * [Search completion](https://github.com/cortexlabs/cortex/tree/0.15/examples/pytorch/search-completer): deploy Facebook's RoBERTa model to complete search terms. * [Text generation](https://github.com/cortexlabs/cortex/tree/0.15/examples/pytorch/text-generator): deploy Hugging Face's DistilGPT2 model to generate text. -* [Iris classification](https://github.com/cortexlabs/cortex/tree/0.15/examples/sklearn/iris-classifier): deploy a scikit-learn model to classify iris flowers. diff --git a/cli/cmd/root.go b/cli/cmd/root.go index 7adfc89d9a..750255c217 100644 --- a/cli/cmd/root.go +++ b/cli/cmd/root.go @@ -121,7 +121,7 @@ func initTelemetry() { var _rootCmd = &cobra.Command{ Use: "cortex", Aliases: []string{"cx"}, - Short: "deploy machine learning models in production", + Short: "machine learning model serving infrastructure", } func Execute() { diff --git a/docs/cluster-management/install.md b/docs/cluster-management/install.md index 195372c28d..c54639d471 100644 --- a/docs/cluster-management/install.md +++ b/docs/cluster-management/install.md @@ -2,41 +2,37 @@ _WARNING: you are on the master branch, please refer to the docs on the branch that matches your `cortex version`_ -## Prerequisites +## Running on your machine or a single instance -1. [Docker](https://docs.docker.com/install) -2. [AWS credentials](aws-credentials.md) +[Docker](https://docs.docker.com/install) is required to run Cortex locally. In addition, your machine (or your Docker Desktop for Mac users) should have at least 8GB of memory if you plan to deploy large deep learning models. -## Spin up a cluster - -See [cluster configuration](config.md) to learn how you can customize your cluster with `cluster.yaml` and see [EC2 instances](ec2-instances.md) for an overview of several EC2 instance types. To use GPU nodes, you may need to subscribe to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM) and [file an AWS support ticket](https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=ec2-instances) to increase the limit for your desired instance type. +### Install the CLI ```bash -# install the CLI on your machine $ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/master/get-cli.sh)" +``` -# provision infrastructure on AWS and spin up a cluster -$ cortex cluster up - -aws resource cost per hour -1 eks cluster $0.10 -0 - 5 g4dn.xlarge instances for your apis $0.1578 - $0.526 each (varies based on spot price) -0 - 5 50gb ebs volumes for your apis $0.007 each -1 t3.medium instance for the operator $0.0416 -1 20gb ebs volume for the operator $0.003 -2 network load balancers $0.0225 each +## Running at scale on AWS -your cluster will cost $0.19 - $2.85 per hour based on cluster size and spot instance pricing/availability +[Docker](https://docs.docker.com/install) and valid [AWS credentials](aws-credentials.md) are required to run a Cortex cluster on AWS. -○ spinning up your cluster ... +### Spin up a cluster -your cluster is ready! -``` +See [cluster configuration](config.md) to learn how you can customize your cluster with `cluster.yaml` and see [EC2 instances](ec2-instances.md) for an overview of several EC2 instance types. -## Deploy a model +To use GPU nodes, you may need to subscribe to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM) and [file an AWS support ticket](https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=ec2-instances) to increase the limit for your desired instance type. +```bash +# install the CLI on your machine +$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/master/get-cli.sh)" + +# provision infrastructure on AWS and spin up a cluster +$ cortex cluster up +``` + +## Deploy an example ```bash # clone the Cortex repository @@ -45,7 +41,7 @@ git clone -b master https://github.com/cortexlabs/cortex.git # navigate to the TensorFlow iris classification example cd cortex/examples/tensorflow/iris-classifier -# deploy the model to the cluster +# deploy the model cortex deploy # view the status of the api @@ -61,11 +57,7 @@ cortex get iris-classifier curl -X POST -H "Content-Type: application/json" \ -d '{ "sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3 }' \ -``` -## Cleanup - -```bash # delete the api cortex delete iris-classifier ``` diff --git a/docs/summary.md b/docs/summary.md index 2a75d47ea8..8b565074ca 100644 --- a/docs/summary.md +++ b/docs/summary.md @@ -1,6 +1,6 @@ # Table of contents -* [Deploy machine learning models in production](../README.md) +* [Machine learning model serving infrastructure](../README.md) * [Install](cluster-management/install.md) * [Tutorial](../examples/sklearn/iris-classifier/README.md) * [GitHub](https://github.com/cortexlabs/cortex) diff --git a/examples/pytorch/language-identifier/sample.json b/examples/pytorch/language-identifier/sample.json index 2329a0a714..76fce072eb 100644 --- a/examples/pytorch/language-identifier/sample.json +++ b/examples/pytorch/language-identifier/sample.json @@ -1,3 +1,3 @@ { - "text": "deploy machine learning models in production" + "text": "machine learning model serving infrastructure" } diff --git a/examples/sklearn/iris-classifier/README.md b/examples/sklearn/iris-classifier/README.md index 19df9e16b1..7056aac36a 100644 --- a/examples/sklearn/iris-classifier/README.md +++ b/examples/sklearn/iris-classifier/README.md @@ -1,4 +1,4 @@ -# Deploy a model as a web service +# Deploy models as a web APIs _WARNING: you are on the master branch, please refer to the examples on the branch that matches your `cortex version`_ @@ -119,9 +119,9 @@ Create a `cortex.yaml` file and add the configuration below and replace `cortex-
-## Deploy to AWS +## Deploy your model locally -`cortex deploy` takes the configuration from `cortex.yaml` and creates it on your cluster: +`cortex deploy` takes your model along with the configuration from `cortex.yaml` and creates a web API: ```bash $ cortex deploy @@ -129,7 +129,7 @@ $ cortex deploy creating iris-classifier ``` -Track the status of your api using `cortex get`: +Monitor the status of your API using `cortex get`: ```bash $ cortex get iris-classifier --watch @@ -137,7 +137,7 @@ $ cortex get iris-classifier --watch status up-to-date requested last update avg request 2XX live 1 1 1m - - -endpoint: http://***.amazonaws.com/iris-classifier +endpoint: http://localhost:8888 ``` The output above indicates that one replica of your API was requested and is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity. @@ -148,11 +148,37 @@ You can also stream logs from your API: $ cortex logs iris-classifier ``` +You can use `curl` to test your API: + +```bash +$ curl http://localhost:8888 \ + -X POST -H "Content-Type: application/json" \ + -d '{"sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3}' + +"setosa" +``` +
-## Serve real-time predictions +## Deploy your model to AWS -You can use `curl` to test your prediction service: +Cortex can automatically provision infrastructure on your AWS account and deploy your models as production-ready web services: + +```bash +$ cortex cluster up +``` + +You can deploy the model using the same code and configuration to your cluster: + +```bash +$ cortex deploy --env aws + +creating iris-classifier +``` + +
+ +## Serve predictions in production ```bash $ curl http://***.amazonaws.com/iris-classifier \ @@ -185,7 +211,7 @@ Add a `tracker` to your `cortex.yaml` and specify that this is a classification Run `cortex deploy` again to perform a rolling update to your API with the new configuration: ```bash -$ cortex deploy +$ cortex deploy --env aws updating iris-classifier ``` @@ -193,7 +219,7 @@ updating iris-classifier After making more predictions, your `cortex get` command will show information about your API's past predictions: ```bash -$ cortex get iris-classifier --watch +$ cortex get --env aws iris-classifier --watch status up-to-date requested last update avg request 2XX live 1 1 1m 1.1 ms 14 @@ -230,7 +256,7 @@ This model is fairly small but larger models may require more compute resources. You could also configure GPU compute here if your cluster supports it. Adding compute resources may help reduce your inference latency. Run `cortex deploy` again to update your API with this configuration: ```bash -$ cortex deploy +$ cortex deploy --env aws updating iris-classifier ``` @@ -238,7 +264,7 @@ updating iris-classifier Run `cortex get` again: ```bash -$ cortex get iris-classifier --watch +$ cortex get --env aws iris-classifier --watch status up-to-date requested last update avg request 2XX live 1 1 1m 1.1 ms 14 @@ -288,7 +314,7 @@ If you trained another model and want to A/B test it with your previous model, s Run `cortex deploy` to create the new API: ```bash -$ cortex deploy +$ cortex deploy --env aws iris-classifier is up to date creating another-iris-classifier @@ -297,7 +323,7 @@ creating another-iris-classifier `cortex deploy` is declarative so the `iris-classifier` API is unchanged while `another-iris-classifier` is created: ```bash -$ cortex get --watch +$ cortex get --env aws --watch api status up-to-date requested last update iris-classifier live 1 1 5m @@ -388,7 +414,7 @@ Next, add the `api` to `cortex.yaml`: Run `cortex deploy` to create your batch API: ```bash -$ cortex deploy +$ cortex deploy --env aws updating iris-classifier updating another-iris-classifier @@ -400,7 +426,7 @@ Since a new file was added to the directory, and all files in the directory cont `cortex get` should show all 3 APIs now: ```bash -$ cortex get --watch +$ cortex get --env aws --watch api status up-to-date requested last update iris-classifier live 1 1 1m @@ -446,15 +472,19 @@ $ curl http://***.amazonaws.com/batch-iris-classifier \ Run `cortex delete` to delete each API: ```bash -$ cortex delete iris-classifier +$ cortex delete --env local iris-classifier + +deleting iris-classifier + +$ cortex delete --env aws iris-classifier deleting iris-classifier -$ cortex delete another-iris-classifier +$ cortex delete --env aws another-iris-classifier deleting another-iris-classifier -$ cortex delete batch-iris-classifier +$ cortex delete --env aws batch-iris-classifier deleting batch-iris-classifier ```