From 9e4a9c4ae64424eb7c7aa876b33c89403d3f0bf0 Mon Sep 17 00:00:00 2001
From: Caleb Kaiser <42076840+caleb-kaiser@users.noreply.github.com>
Date: Tue, 15 Oct 2019 09:08:31 -0400
Subject: [PATCH 01/15] Change to feature GPT-2 example

---
 README.md | 141 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 67 deletions(-)
diff --git a/README.md b/README.md
index b2f9ce187d..50a0497a25 100644
--- a/README.md
+++ b/README.md
@@ -1,122 +1,129 @@
 # Deploy machine learning models in production
 
-Cortex is an open source machine learning deployment platform that makes it simple to deploy your machine learning models as web APIs on AWS. It combines TensorFlow Serving, ONNX Runtime, and Flask into a single tool that takes models from S3 and deploys them as web APIs. It also uses Docker and Kubernetes behind the scenes to autoscale, run rolling updates, and support CPU and GPU inference. The project is maintained by a venture-backed team of infrastructure engineers with backgrounds from Google, Illumio, and Berkeley.
-
-<br>
-
+Cortex is an open source platform that takes machine learning models—trained with nearly **any framework**—and turns them into **autoscaling**, **production APIs**, in one command. <br><br>
+<!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) -->
+![Demo](https://cortex-public.s3-us-west-2.amazonaws.com/demo/gif/v0.8.gif)<br>
 <!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
 [install](https://www.cortex.dev/install) • [docs](https://www.cortex.dev) • [examples](examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex)
-
 <br>
+## Cortex quickstart
 
-<!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) -->
-![Demo](https://cortex-public.s3-us-west-2.amazonaws.com/demo/gif/v0.8.gif)
-
-<br>
-
-## Key features
-
-- **Minimal declarative configuration:** Deployments can be defined in a single `cortex.yaml` file.
-
-- **Autoscaling:** Cortex automatically scales APIs to handle production workloads.
-
-- **Multi framework:** Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more.
+Below, we'll walkthrough how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS.
+<br><br>
+What you'll need:
 
-- **Rolling updates:** Cortex updates deployed APIs without any downtime.
+- An AWS account (you can set one up for free [here](aws.com))
+- Cortex installed on your machine ([see instructions here](https://www.cortex.dev/v/0.8/install))
 
-- **Log streaming:** Cortex streams logs from your deployed models to your CLI.
+### Step 1. Configure your deployment
 
-- **Prediction monitoring:** Cortex can monitor network metrics and track predictions.
-
-- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.
-
-<br>
-
-## How it works
-
-### Define your deployment using declarative configuration
+The first step to deploying your model is to upload your model to an accessible S3 bucket. For this example, we have uploaded our model to our S3 bucket, located at s3://cortex-examples/text-generator/gpt-2/124M <br><br>
+Next, create a .yaml file titled `cortex.yaml`. This file will instruct Cortex as to how your deployment should be constructed. Within `cortex.yaml`, define a `deployment` and an `api` resource as shown below:
 
 ```yaml
-# cortex.yaml
+- kind: deployment
+  name: text
 
 - kind: api
-  name: my-api
-  model: s3://my-bucket/my-model.onnx
+  name: generator
+  model: s3://cortex-examples/text-generator/gpt-2/124M
   request_handler: handler.py
-  compute:
-    gpu: 1
 ```
 
-### Customize request handling
+A `deployment` specifies a set of resources that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The above example configuration will download the model from the `cortex-examples` S3 bucket.
+
+<!-- CORTEX_VERSION_MINOR -->
+You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
+
+### Step 2. Add request handling
+
+The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also translate the model’s prediction to make it readable to users. This can be implemented in a request handler file (which we defined in `cortex.yaml` as `handler.py`) using the pre_inference and post_inference functions. Create a file called `handler.py`, and fill it with this:
 
 ```python
-# handler.py
+from encoder import get_encoder
 
-# Load data for preprocessing or postprocessing. For example:
-labels = download_labels_from_s3()
+encoder = get_encoder()
 
 
 def pre_inference(sample, metadata):
-  # Python code
+    context = encoder.encode(sample["text"])
+    return {"context": [context]}
 
 
 def post_inference(prediction, metadata):
-  # Python code
+    response = prediction["sample"]
+    return {encoder.decode(response)}
 ```
 
-### Deploy to AWS using the CLI
+### Step 3. Deploy to AWS
+
+Deploying to AWS is as simple as running `cortex deploy` from your command line (from within the same folder as `cortex.yaml` and `handler.py`). `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster:
 
 ```bash
 $ cortex deploy
 
-Deploying ...
-http://***.amazonaws.com/my-api  # Your API is ready!
+deployment started
 ```
 
-### Serve real-time predictions via autoscaling JSON APIs running on AWS
+Behind the scenes, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.
+
+You can track the status of a deployment using `cortex get`:
 
 ```bash
-$ curl http://***.amazonaws.com/my-api -d '{"a": 1, "b": 2, "c": 3}'
+$ cortex get generator --watch
 
-{ prediction: "def" }
+status   up-to-date   available   requested   last update   avg latency
+live     1            1           1           8s            -
 ```
 
-<br>
+The output above indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
 
-## Installation
+### Step 4. Serve real-time predictions
 
-<!-- CORTEX_VERSION_README_MINOR -->
+The `cortex get` command returns the endpoint of your new API:
 
 ```bash
-# Download the install script
-$ curl -O https://raw.githubusercontent.com/cortexlabs/cortex/0.8/cortex.sh && chmod +x cortex.sh
+$ cortex get generator
 
-# Install the Cortex CLI on your machine: the CLI sends configuration and code to the Cortex cluster
-$ ./cortex.sh install cli
-
-# Set your AWS credentials
-$ export AWS_ACCESS_KEY_ID=***
-$ export AWS_SECRET_ACCESS_KEY=***
+url: http://***.amazonaws.com/iris/classifier
+```
+Once you have your endpoint, you ping it with sample data from the command line to test it's functionality:
 
-# Configure AWS instance settings
-$ export CORTEX_NODE_TYPE="m5.large"
-$ export CORTEX_NODES_MIN="2"
-$ export CORTEX_NODES_MAX="5"
+```bash
+$ curl http://***.amazonaws.com/text/generator \
+    -X POST -H "Content-Type: application/json" \
+    -d '{"text": "machine learning"}'
 
-# Install the Cortex cluster in your AWS account: the cluster is responsible for hosting your APIs
-$ ./cortex.sh install
+Machine learning, with more than one thousand researchers around the world today, are looking to create computer-driven machine learning algorithms that can also be applied to human and social problems, such as education, health care, employment, medicine, politics, or the environment...
 ```
 
-<!-- CORTEX_VERSION_MINOR (e.g. www.cortex.dev/v/0.8/...) -->
-See [installation instructions](https://www.cortex.dev/cluster/install) for more details.
+Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
 
 <br>
 
-## Examples
+## More Examples
 
 <!-- CORTEX_VERSION_README_MINOR x3 -->
-- [Text generation](https://github.com/cortexlabs/cortex/tree/0.8/examples/text-generator) with GPT-2
-
 - [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.8/examples/sentiment-analysis) with BERT
 
 - [Image classification](https://github.com/cortexlabs/cortex/tree/0.8/examples/image-classifier) with Inception v3
+
+- [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier)
+
+## What else can Cortex do?
+
+Here's a more detailed breakdown of Cortex's core features:
+
+- **Minimal declarative configuration:** Deployments can be defined in a single `cortex.yaml` file.
+
+- **Autoscaling:** Cortex automatically scales APIs to handle production workloads.
+
+- **Multi framework:** Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more.
+
+- **Rolling updates:** Cortex updates deployed APIs without any downtime.
+
+- **Log streaming:** Cortex streams logs from your deployed models to your CLI.
+
+- **Prediction monitoring:** Cortex can monitor network metrics and track predictions.
+
+- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.

From 4cf82520a26f4baab0dc0c0bc7a65a588765cdc5 Mon Sep 17 00:00:00 2001
From: Caleb Kaiser <42076840+caleb-kaiser@users.noreply.github.com>
Date: Tue, 15 Oct 2019 15:07:28 -0400
Subject: [PATCH 02/15] Update README.md

---
 README.md | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/README.md b/README.md
index 50a0497a25..7baa7755bc 100644
--- a/README.md
+++ b/README.md
@@ -1,19 +1,21 @@
 # Deploy machine learning models in production
 
 Cortex is an open source platform that takes machine learning models—trained with nearly **any framework**—and turns them into **autoscaling**, **production APIs**, in one command. <br><br>
-<!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) -->
-![Demo](https://cortex-public.s3-us-west-2.amazonaws.com/demo/gif/v0.8.gif)<br>
 <!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
 [install](https://www.cortex.dev/install) • [docs](https://www.cortex.dev) • [examples](examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex)
-<br>
+<br><br>
+<!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) -->
+![Demo](https://cortex-public.s3-us-west-2.amazonaws.com/demo/gif/v0.8.gif)<br>
+<br><br>
 ## Cortex quickstart
 
 Below, we'll walkthrough how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS.
 <br><br>
 What you'll need:
 
-- An AWS account (you can set one up for free [here](aws.com))
+- An AWS account ([you can set one up for free here](aws.com))
 - Cortex installed on your machine ([see instructions here](https://www.cortex.dev/v/0.8/install))
+<br><br>
 
 ### Step 1. Configure your deployment
 
@@ -35,6 +37,8 @@ A `deployment` specifies a set of resources that are deployed as a single unit.
 <!-- CORTEX_VERSION_MINOR -->
 You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
 
+<br><br>
+
 ### Step 2. Add request handling
 
 The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also translate the model’s prediction to make it readable to users. This can be implemented in a request handler file (which we defined in `cortex.yaml` as `handler.py`) using the pre_inference and post_inference functions. Create a file called `handler.py`, and fill it with this:
@@ -55,6 +59,8 @@ def post_inference(prediction, metadata):
     return {encoder.decode(response)}
 ```
 
+<br><br>
+
 ### Step 3. Deploy to AWS
 
 Deploying to AWS is as simple as running `cortex deploy` from your command line (from within the same folder as `cortex.yaml` and `handler.py`). `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster:
@@ -73,11 +79,13 @@ You can track the status of a deployment using `cortex get`:
 $ cortex get generator --watch
 
 status   up-to-date   available   requested   last update   avg latency
-live     1            1           1           8s            -
+live     1            1           1           8s            124ms
 ```
 
 The output above indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
 
+<br><br>
+
 ### Step 4. Serve real-time predictions
 
 The `cortex get` command returns the endpoint of your new API:
@@ -85,7 +93,7 @@ The `cortex get` command returns the endpoint of your new API:
 ```bash
 $ cortex get generator
 
-url: http://***.amazonaws.com/iris/classifier
+url: http://***.amazonaws.com/text/generator
 ```
 Once you have your endpoint, you ping it with sample data from the command line to test it's functionality:
 
@@ -99,7 +107,7 @@ Machine learning, with more than one thousand researchers around the world today
 
 Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
 
-<br>
+<br><br>
 
 ## More Examples
 
@@ -108,11 +116,10 @@ Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
 
 - [Image classification](https://github.com/cortexlabs/cortex/tree/0.8/examples/image-classifier) with Inception v3
 
-- [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier)
-
-## What else can Cortex do?
+- [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier) with famous iris data set
+<br><br>
 
-Here's a more detailed breakdown of Cortex's core features:
+## Key features:
 
 - **Minimal declarative configuration:** Deployments can be defined in a single `cortex.yaml` file.
 

From 9228d44ae39c1160f2de31993cca4221f5c449bf Mon Sep 17 00:00:00 2001
From: Caleb Kaiser <42076840+caleb-kaiser@users.noreply.github.com>
Date: Tue, 15 Oct 2019 15:08:06 -0400
Subject: [PATCH 03/15] Update README.md

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 7baa7755bc..4f8d6f5291 100644
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@ A `deployment` specifies a set of resources that are deployed as a single unit.
 <!-- CORTEX_VERSION_MINOR -->
 You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
 
-<br><br>
+<br>
 
 ### Step 2. Add request handling
 
@@ -59,7 +59,7 @@ def post_inference(prediction, metadata):
     return {encoder.decode(response)}
 ```
 
-<br><br>
+<br>
 
 ### Step 3. Deploy to AWS
 
@@ -84,7 +84,7 @@ live     1            1           1           8s            124ms
 
 The output above indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
 
-<br><br>
+<br>
 
 ### Step 4. Serve real-time predictions
 
@@ -107,7 +107,7 @@ Machine learning, with more than one thousand researchers around the world today
 
 Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
 
-<br><br>
+<br>
 
 ## More Examples
 

From e7219885d3b361ebd62bddaa5ca9b5b4a0e899cb Mon Sep 17 00:00:00 2001
From: Caleb Kaiser <42076840+caleb-kaiser@users.noreply.github.com>
Date: Tue, 15 Oct 2019 15:08:40 -0400
Subject: [PATCH 04/15] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 4f8d6f5291..bd5ae448e3 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ What you'll need:
 
 - An AWS account ([you can set one up for free here](aws.com))
 - Cortex installed on your machine ([see instructions here](https://www.cortex.dev/v/0.8/install))
-<br><br>
+<br>
 
 ### Step 1. Configure your deployment
 

From ffb0d2f72b4898c091c6083d0fbb14a0dbd471a2 Mon Sep 17 00:00:00 2001
From: Caleb Kaiser <42076840+caleb-kaiser@users.noreply.github.com>
Date: Tue, 15 Oct 2019 15:09:05 -0400
Subject: [PATCH 05/15] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index bd5ae448e3..2fa55edb61 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Deploy machine learning models in production
 
-Cortex is an open source platform that takes machine learning models—trained with nearly **any framework**—and turns them into **autoscaling**, **production APIs**, in one command. <br><br>
+Cortex is an open source platform that takes machine learning models—trained with nearly **any framework**—and turns them into **autoscaling**, **production APIs**, in one command. <br>
 <!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
 [install](https://www.cortex.dev/install) • [docs](https://www.cortex.dev) • [examples](examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex)
 <br><br>

From 550987d6f375b33f22e77f448d0ca4fb8878b836 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 12:34:58 -0700
Subject: [PATCH 06/15] Update README.md

---
 README.md | 50 +++++++++++++++++++++++---------------------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/README.md b/README.md
index 2fa55edb61..97102b485b 100644
--- a/README.md
+++ b/README.md
@@ -1,26 +1,27 @@
 # Deploy machine learning models in production
 
-Cortex is an open source platform that takes machine learning models—trained with nearly **any framework**—and turns them into **autoscaling**, **production APIs**, in one command. <br>
+Cortex is an open source platform that takes machine learning models—trained with nearly **any framework**—and turns them into **production web APIs** in one command. <br>
+
 <!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
 [install](https://www.cortex.dev/install) • [docs](https://www.cortex.dev) • [examples](examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex)
-<br><br>
+
+<br>
+
 <!-- Set header Cache-Control=no-cache on the S3 object metadata (see https://help.github.com/en/articles/about-anonymized-image-urls) -->
 ![Demo](https://cortex-public.s3-us-west-2.amazonaws.com/demo/gif/v0.8.gif)<br>
-<br><br>
+
+<br>
+
 ## Cortex quickstart
 
-Below, we'll walkthrough how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS.
-<br><br>
-What you'll need:
+<!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
+Below, we'll walkthrough how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to [install Cortex](https://www.cortex.dev/install) on your [AWS account](aws.com) before getting started.
 
-- An AWS account ([you can set one up for free here](aws.com))
-- Cortex installed on your machine ([see instructions here](https://www.cortex.dev/v/0.8/install))
 <br>
 
 ### Step 1. Configure your deployment
 
-The first step to deploying your model is to upload your model to an accessible S3 bucket. For this example, we have uploaded our model to our S3 bucket, located at s3://cortex-examples/text-generator/gpt-2/124M <br><br>
-Next, create a .yaml file titled `cortex.yaml`. This file will instruct Cortex as to how your deployment should be constructed. Within `cortex.yaml`, define a `deployment` and an `api` resource as shown below:
+Create a `cortex.yaml` file and define a `deployment` and an `api` resource as shown below:
 
 ```yaml
 - kind: deployment
@@ -41,11 +42,10 @@ You can run the code that generated the exported GPT-2 model [here](https://cola
 
 ### Step 2. Add request handling
 
-The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also translate the model’s prediction to make it readable to users. This can be implemented in a request handler file (which we defined in `cortex.yaml` as `handler.py`) using the pre_inference and post_inference functions. Create a file called `handler.py`, and fill it with this:
+The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also translate the model’s prediction to make it more readable. This can be implemented in a request handler file (which we defined in `cortex.yaml` as `handler.py`) using the pre_inference and post_inference functions. Create a file called `handler.py`, and add this code:
 
 ```python
 from encoder import get_encoder
-
 encoder = get_encoder()
 
 
@@ -63,7 +63,7 @@ def post_inference(prediction, metadata):
 
 ### Step 3. Deploy to AWS
 
-Deploying to AWS is as simple as running `cortex deploy` from your command line (from within the same folder as `cortex.yaml` and `handler.py`). `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster:
+Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster:
 
 ```bash
 $ cortex deploy
@@ -79,7 +79,9 @@ You can track the status of a deployment using `cortex get`:
 $ cortex get generator --watch
 
 status   up-to-date   available   requested   last update   avg latency
-live     1            1           1           8s            124ms
+live     1            1           1           8s            123ms
+
+url: http://***.amazonaws.com/text/generator
 ```
 
 The output above indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
@@ -88,14 +90,7 @@ The output above indicates that one replica of the API was requested and one rep
 
 ### Step 4. Serve real-time predictions
 
-The `cortex get` command returns the endpoint of your new API:
-
-```bash
-$ cortex get generator
-
-url: http://***.amazonaws.com/text/generator
-```
-Once you have your endpoint, you ping it with sample data from the command line to test it's functionality:
+Once you have your endpoint, you can make requests:
 
 ```bash
 $ curl http://***.amazonaws.com/text/generator \
@@ -112,14 +107,15 @@ Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
 ## More Examples
 
 <!-- CORTEX_VERSION_README_MINOR x3 -->
-- [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/0.8/examples/sentiment-analysis) with BERT
+- [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier)
 
-- [Image classification](https://github.com/cortexlabs/cortex/tree/0.8/examples/image-classifier) with Inception v3
+- [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/master/examples/sentiment-analysis) with BERT
 
-- [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier) with famous iris data set
-<br><br>
+- [Image classification](https://github.com/cortexlabs/cortex/tree/master/examples/image-classifier) with Inception v3
+
+<br>
 
-## Key features:
+## Key features
 
 - **Minimal declarative configuration:** Deployments can be defined in a single `cortex.yaml` file.
 

From a0a1d2f926eaac3896e4d492711d23444310a2d5 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 13:05:43 -0700
Subject: [PATCH 07/15] Update README.md

---
 README.md | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 97102b485b..427a7bf21a 100644
--- a/README.md
+++ b/README.md
@@ -21,9 +21,11 @@ Below, we'll walkthrough how to use Cortex to deploy OpenAI's GPT-2 model as a s
 
 ### Step 1. Configure your deployment
 
-Create a `cortex.yaml` file and define a `deployment` and an `api` resource as shown below:
+Define a `deployment` and an `api` resource as shown below:
 
 ```yaml
+# cortex.yaml
+
 - kind: deployment
   name: text
 
@@ -33,18 +35,19 @@ Create a `cortex.yaml` file and define a `deployment` and an `api` resource as s
   request_handler: handler.py
 ```
 
-A `deployment` specifies a set of resources that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The above example configuration will download the model from the `cortex-examples` S3 bucket.
-
 <!-- CORTEX_VERSION_MINOR -->
-You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
+
+A `deployment` specifies a set of resources that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The above example configuration will download the model from the `cortex-examples` S3 bucket. You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
 
 <br>
 
 ### Step 2. Add request handling
 
-The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also translate the model’s prediction to make it more readable. This can be implemented in a request handler file (which we defined in `cortex.yaml` as `handler.py`) using the pre_inference and post_inference functions. Create a file called `handler.py`, and add this code:
+The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output. This can be implemented in a request handler file using the `pre_inference` and `post_inference` functions:
 
 ```python
+# handler.py
+
 from encoder import get_encoder
 encoder = get_encoder()
 

From 029efaa8e2c0aea76247c8d2a981b502757c1cbf Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 13:07:27 -0700
Subject: [PATCH 08/15] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 427a7bf21a..bee30c1637 100644
--- a/README.md
+++ b/README.md
@@ -12,10 +12,10 @@ Cortex is an open source platform that takes machine learning models—trained w
 
 <br>
 
-## Cortex quickstart
+## Quickstart
 
 <!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
-Below, we'll walkthrough how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to [install Cortex](https://www.cortex.dev/install) on your [AWS account](aws.com) before getting started.
+Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to [install Cortex](https://www.cortex.dev/install) on your [AWS account](aws.com) before getting started.
 
 <br>
 

From 9253c5d4f3fc5f31ead45f8960084c67c6acf877 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 13:08:45 -0700
Subject: [PATCH 09/15] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index bee30c1637..66f42d11b4 100644
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a
 
 <br>
 
-### Step 1. Configure your deployment
+### Step 1. configure your deployment
 
 Define a `deployment` and an `api` resource as shown below:
 
@@ -41,7 +41,7 @@ A `deployment` specifies a set of resources that are deployed as a single unit.
 
 <br>
 
-### Step 2. Add request handling
+### Step 2. add request handling
 
 The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output. This can be implemented in a request handler file using the `pre_inference` and `post_inference` functions:
 
@@ -64,7 +64,7 @@ def post_inference(prediction, metadata):
 
 <br>
 
-### Step 3. Deploy to AWS
+### Step 3. deploy to AWS
 
 Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster:
 
@@ -91,7 +91,7 @@ The output above indicates that one replica of the API was requested and one rep
 
 <br>
 
-### Step 4. Serve real-time predictions
+### Step 4. serve real-time predictions
 
 Once you have your endpoint, you can make requests:
 
@@ -107,7 +107,7 @@ Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
 
 <br>
 
-## More Examples
+## More examples
 
 <!-- CORTEX_VERSION_README_MINOR x3 -->
 - [Iris classification](https://github.com/cortexlabs/cortex/tree/master/examples/iris-classifier)

From 80b8828ed23835e02b5a04b21883382fed957041 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 13:09:46 -0700
Subject: [PATCH 10/15] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 66f42d11b4..99f165e06a 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@ Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a
 
 ### Step 1. configure your deployment
 
-Define a `deployment` and an `api` resource as shown below:
+Define a `deployment` and an `api` resource:
 
 ```yaml
 # cortex.yaml

From 312ebf9f4bf78b995a70fdc27b77f9578f77e551 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 13:10:54 -0700
Subject: [PATCH 11/15] Update README.md

---
 README.md | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 99f165e06a..ce9c75657a 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,9 @@ Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a
 
 ### Step 1. configure your deployment
 
-Define a `deployment` and an `api` resource:
+<!-- CORTEX_VERSION_MINOR -->
+
+Define a `deployment` and an `api` resource. A `deployment` specifies a set of resources that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The above example configuration will download the model from the `cortex-examples` S3 bucket. You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
 
 ```yaml
 # cortex.yaml
@@ -35,10 +37,6 @@ Define a `deployment` and an `api` resource:
   request_handler: handler.py
 ```
 
-<!-- CORTEX_VERSION_MINOR -->
-
-A `deployment` specifies a set of resources that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The above example configuration will download the model from the `cortex-examples` S3 bucket. You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
-
 <br>
 
 ### Step 2. add request handling

From d9d4b11db6ffbe03cb33bf129ff53db112f71072 Mon Sep 17 00:00:00 2001
From: David Eliahu <deliahu@users.noreply.github.com>
Date: Tue, 15 Oct 2019 14:05:36 -0700
Subject: [PATCH 12/15] Update README.md

---
 README.md | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/README.md b/README.md
index ce9c75657a..ad0a06eccb 100644
--- a/README.md
+++ b/README.md
@@ -14,16 +14,15 @@ Cortex is an open source platform that takes machine learning models—trained w
 
 ## Quickstart
 
-<!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
-Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to [install Cortex](https://www.cortex.dev/install) on your [AWS account](aws.com) before getting started.
+<!-- CORTEX_VERSION_MINOR (e.g. www.cortex.dev/v/0.8/...) -->
+Below, we'll walk through how to use Cortex to deploy OpenAI's GPT-2 model as a service on AWS. You'll need to [install Cortex](https://www.cortex.dev/install) on your AWS account before getting started.
 
 <br>
 
-### Step 1. configure your deployment
+### Step 1: Configure your deployment
 
 <!-- CORTEX_VERSION_MINOR -->
-
-Define a `deployment` and an `api` resource. A `deployment` specifies a set of resources that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The above example configuration will download the model from the `cortex-examples` S3 bucket. You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
+Define a `deployment` and an `api` resource. A `deployment` specifies a set of APIs that are deployed as a single unit. An `api` makes a model available as a web service that can serve real-time predictions. The configuration below will download the model from the `cortex-examples` S3 bucket. You can run the code that generated the exported GPT-2 model [here](https://colab.research.google.com/github/cortexlabs/cortex/blob/master/examples/text-generator/gpt-2.ipynb).
 
 ```yaml
 # cortex.yaml
@@ -39,7 +38,7 @@ Define a `deployment` and an `api` resource. A `deployment` specifies a set of r
 
 <br>
 
-### Step 2. add request handling
+### Step 2: Add request handling
 
 The model requires encoded data for inference, but the API should accept strings of natural language as input. It should also decode the inference output. This can be implemented in a request handler file using the `pre_inference` and `post_inference` functions:
 
@@ -62,7 +61,7 @@ def post_inference(prediction, metadata):
 
 <br>
 
-### Step 3. deploy to AWS
+### Step 3: Deploy to AWS
 
 Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster:
 
@@ -89,7 +88,7 @@ The output above indicates that one replica of the API was requested and one rep
 
 <br>
 
-### Step 4. serve real-time predictions
+### Step 4: Serve real-time predictions
 
 Once you have your endpoint, you can make requests:
 
@@ -112,22 +111,22 @@ Any questions? [chat with us](https://gitter.im/cortexlabs/cortex).
 
 - [Sentiment analysis](https://github.com/cortexlabs/cortex/tree/master/examples/sentiment-analysis) with BERT
 
-- [Image classification](https://github.com/cortexlabs/cortex/tree/master/examples/image-classifier) with Inception v3
+- [Image classification](https://github.com/cortexlabs/cortex/tree/master/examples/image-classifier) with Inception v3 and AlexNet
 
 <br>
 
 ## Key features
 
-- **Minimal declarative configuration:** Deployments can be defined in a single `cortex.yaml` file.
-
 - **Autoscaling:** Cortex automatically scales APIs to handle production workloads.
 
 - **Multi framework:** Cortex supports TensorFlow, Keras, PyTorch, Scikit-learn, XGBoost, and more.
 
+- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.
+
 - **Rolling updates:** Cortex updates deployed APIs without any downtime.
 
-- **Log streaming:** Cortex streams logs from your deployed models to your CLI.
+- **Log streaming:** Cortex streams logs from deployed models to your CLI.
 
-- **Prediction monitoring:** Cortex can monitor network metrics and track predictions.
+- **Prediction monitoring:** Cortex monitors network metrics and tracks predictions.
 
-- **CPU / GPU support:** Cortex can run inference on CPU or GPU infrastructure.
+- **Minimal declarative configuration:** Deployments are defined in a single `cortex.yaml` file.

From 1827af1b23b541e7322097bdadbcdffe98d6d168 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 14:06:47 -0700
Subject: [PATCH 13/15] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ad0a06eccb..ad36447846 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Deploy machine learning models in production
 
-Cortex is an open source platform that takes machine learning models—trained with nearly **any framework**—and turns them into **production web APIs** in one command. <br>
+Cortex is an open source platform that takes machine learning models—trained with nearly any framework—and turns them into production web APIs in one command. <br>
 
 <!-- CORTEX_VERSION_MINOR x2 (e.g. www.cortex.dev/v/0.8/...) -->
 [install](https://www.cortex.dev/install) • [docs](https://www.cortex.dev) • [examples](examples) • [we're hiring](https://angel.co/cortex-labs-inc/jobs) • [email us](mailto:hello@cortex.dev) • [chat with us](https://gitter.im/cortexlabs/cortex)

From e0f3b6b3677b545b7186bfd4b4c3cf5da4926ab2 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 14:08:54 -0700
Subject: [PATCH 14/15] Update README.md

---
 README.md | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index ad36447846..82f4c563cc 100644
--- a/README.md
+++ b/README.md
@@ -63,7 +63,7 @@ def post_inference(prediction, metadata):
 
 ### Step 3: Deploy to AWS
 
-Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster:
+Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster. Behind the scenes, Cortex will containerize the model, make it servable using TensorFlow Serving, expose the endpoint with a load balancer, and orchestrate the workload on Kubernetes.
 
 ```bash
 $ cortex deploy
@@ -71,9 +71,7 @@ $ cortex deploy
 deployment started
 ```
 
-Behind the scenes, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.
-
-You can track the status of a deployment using `cortex get`:
+You can track the status of a deployment using `cortex get`. The output below indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
 
 ```bash
 $ cortex get generator --watch
@@ -84,8 +82,6 @@ live     1            1           1           8s            123ms
 url: http://***.amazonaws.com/text/generator
 ```
 
-The output above indicates that one replica of the API was requested and one replica is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
-
 <br>
 
 ### Step 4: Serve real-time predictions

From 5c1ee94a0650ac77775ea806ab15c1d1822de530 Mon Sep 17 00:00:00 2001
From: Omer Spillinger <42219498+ospillinger@users.noreply.github.com>
Date: Tue, 15 Oct 2019 14:10:08 -0700
Subject: [PATCH 15/15] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 82f4c563cc..9687bb8b38 100644
--- a/README.md
+++ b/README.md
@@ -63,7 +63,7 @@ def post_inference(prediction, metadata):
 
 ### Step 3: Deploy to AWS
 
-Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster. Behind the scenes, Cortex will containerize the model, make it servable using TensorFlow Serving, expose the endpoint with a load balancer, and orchestrate the workload on Kubernetes.
+Deploying to AWS is as simple as running `cortex deploy` from your CLI. `cortex deploy` takes the declarative configuration from `cortex.yaml` and creates it on the cluster. Behind the scenes, Cortex containerizes the model, makes it servable using TensorFlow Serving, exposes the endpoint with a load balancer, and orchestrates the workload on Kubernetes.
 
 ```bash
 $ cortex deploy