Skip to content

Update docs #1949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 40 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,70 +6,64 @@

<br>

# Model serving at scale
# Deploy, manage, and scale machine learning models in production

Cortex is a platform for deploying, managing, and scaling machine learning in production.
Cortex is a cloud native model serving platform for machine learning engineering teams.

<br>

## Key features
## Use cases

* Run realtime inference, batch inference, and training workloads.
* Deploy TensorFlow, PyTorch, ONNX, and other models to production.
* Scale to handle production workloads with server-side batching and request-based autoscaling.
* Configure rolling updates and live model reloading to update APIs without downtime.
* Serve models efficiently with multi-model caching and spot / preemptible instances.
* Stream performance metrics and structured logs to any monitoring tool.
* Perform A/B tests with configurable traffic splitting.
* **Realtime machine learning** - build NLP, computer vision, and other APIs and integrate them into any application.
* **Large-scale inference** - scale realtime or batch inference workloads across hundreds or thousands of instances.
* **Consistent MLOps workflows** - create streamlined and reproducible MLOps workflows for any machine learning team.

<br>

## How it works
## Deploy

### Implement a Predictor
* Deploy TensorFlow, PyTorch, ONNX, and other models using a simple CLI or Python client.
* Run realtime inference, batch inference, asynchronous inference, and training jobs.
* Define preprocessing and postprocessing steps in Python and chain workloads seamlessly.

```python
# predictor.py
```text
$ cortex deploy apis.yaml

from transformers import pipeline
• creating text-generator (realtime API)
• creating image-classifier (batch API)
• creating video-analyzer (async API)

class PythonPredictor:
def __init__(self, config):
self.model = pipeline(task="text-generation")

def predict(self, payload):
return self.model(payload["text"])[0]
```

### Configure a realtime API

```yaml
# text_generator.yaml

- name: text-generator
kind: RealtimeAPI
predictor:
type: python
path: predictor.py
compute:
gpu: 1
mem: 8Gi
autoscaling:
min_replicas: 1
max_replicas: 10
all APIs are ready!
```

### Deploy
## Manage

```bash
$ cortex deploy text_generator.yaml
* Create A/B tests and shadow pipelines with configurable traffic splitting.
* Automatically stream logs from every workload to your favorite log management tool.
* Monitor your workloads with pre-built Grafana dashboards and add your own custom dashboards.

# creating http://example.com/text-generator
```text
$ cortex get

API TYPE GPUs
text-generator realtime 32
image-classifier batch 64
video-analyzer async 16
```

### Serve prediction requests
## Scale

* Configure workload and cluster autoscaling to efficiently handle large-scale production workloads.
* Create clusters with different types of instances for different types of workloads.
* Spend less on cloud infrastructure by letting Cortex manage spot or preemptible instances.

```text
$ cortex cluster info

```bash
$ curl http://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'
provider: aws
region: us-east-1
instance_types: [c5.xlarge, g4dn.xlarge]
spot_instances: true
min_instances: 10
max_instances: 100
```
2 changes: 1 addition & 1 deletion cli/cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ func initTelemetry() {
var _rootCmd = &cobra.Command{
Use: "cortex",
Aliases: []string{"cx"},
Short: "model serving at scale",
Short: "deploy machine learning models to production",
}

func Execute() {
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
**Please view our documentation at [docs.cortex.dev](https://docs.cortex.dev/)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of removing this?

**Please view our documentation at [docs.cortex.dev](https://docs.cortex.dev)**
11 changes: 0 additions & 11 deletions docs/clients/telemetry.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
* [CLI commands](clients/cli.md)
* [Python API](clients/python.md)
* [Environments](clients/environments.md)
* [Telemetry](clients/telemetry.md)
* [Uninstall](clients/uninstall.md)

## Workloads
Expand Down
2 changes: 1 addition & 1 deletion pkg/cortex/client/README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Model serving at scale - [docs.cortex.dev](https://www.docs.cortex.dev)
Deploy machine learning models to production - [docs.cortex.dev](https://www.docs.cortex.dev)
2 changes: 1 addition & 1 deletion pkg/cortex/client/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def run(self):
setup(
name="cortex",
version="master", # CORTEX_VERSION
description="Model serving at scale",
description="Deploy machine learning models to production",
author="cortex.dev",
author_email="dev@cortex.dev",
license="Apache License 2.0",
Expand Down