From 83cb8c95c00a838569ffd27b2aa27b215a364ab8 Mon Sep 17 00:00:00 2001
From: Miguel Varela Ramos <miguel@cortexlabs.com>
Date: Tue, 2 Mar 2021 15:26:48 +0100
Subject: [PATCH 1/7] Add observability category and restructure logging docs

---
 docs/clusters/aws/logging.md            | 41 +---------
 docs/clusters/gcp/logging.md            | 28 +------
 docs/summary.md                         |  3 +
 docs/workloads/observability/logging.md | 99 +++++++++++++++++++++++++
 4 files changed, 106 insertions(+), 65 deletions(-)
 create mode 100644 docs/workloads/observability/logging.md
diff --git a/docs/clusters/aws/logging.md b/docs/clusters/aws/logging.md
index 20b714f480..e44fc778c8 100644
--- a/docs/clusters/aws/logging.md
+++ b/docs/clusters/aws/logging.md
@@ -1,41 +1,4 @@
 # Logs
 
-By default, logs will be pushed to [CloudWatch](https://console.aws.amazon.com/cloudwatch/home) using fluent-bit. A log group with the same name as your cluster will be created to store your logs. API logs are tagged with labels to help with log aggregation and filtering. Below are some sample CloudWatch Log Insight queries:
-
-RealtimeAPI:
-
-```text
-fields @timestamp, log
-| filter labels.apiName="<INSERT API NAME>"
-| filter labels.apiKind="RealtimeAPI"
-| sort @timestamp asc
-| limit 1000
-```
-
-BatchAPI:
-
-```text
-fields @timestamp, log
-| filter labels.apiName="<INSERT API NAME>"
-| filter labels.jobID="<INSERT JOB ID>"
-| filter labels.apiKind="BatchAPI"
-| sort @timestamp asc
-| limit 1000
-```
-
-TaskAPI:
-
-```text
-fields @timestamp, log
-| filter labels.apiName="<INSERT API NAME>"
-| filter labels.jobID="<INSERT JOB ID>"
-| filter labels.apiKind="TaskAPI"
-| sort @timestamp asc
-| limit 1000
-```
-
-Please make sure to select the log group for your cluster and adjust the time range accordingly before running the queries.
-
-## Structured logging
-
-You can use Cortex's logger in your Python code to log in JSON, which will enrich your logs with Cortex's metadata, and enable you to add custom metadata to the logs. See the structured logging docs for [Realtime](../../workloads/realtime/predictors.md#structured-logging), [Batch](../../workloads/batch/predictors.md#structured-logging), and [Task](../../workloads/task/definitions.md#structured-logging) APIs.
+Check the observability documentation about [logs on AWS section](../../workloads/observability/logging.md#logs-on-aws)
+for more information.
diff --git a/docs/clusters/gcp/logging.md b/docs/clusters/gcp/logging.md
index 75d0f970eb..75ea4c2e35 100644
--- a/docs/clusters/gcp/logging.md
+++ b/docs/clusters/gcp/logging.md
@@ -1,28 +1,4 @@
 # Logs
 
-By default, logs will be pushed to [StackDriver](https://console.cloud.google.com/logs/query) using fluent-bit. API logs are tagged with labels to help with log aggregation and filtering. Below are some sample Stackdriver queries:
-
-RealtimeAPI:
-
-```text
-resource.type="k8s_container"
-resource.labels.cluster_name="<INSERT CLUSTER NAME>"
-labels.apiKind="RealtimeAPI"
-labels.apiName="<INSERT API NAME>"
-```
-
-TaskAPI:
-
-```text
-resource.type="k8s_container"
-resource.labels.cluster_name="<INSERT CLUSTER NAME>"
-labels.apiKind="TaskAPI"
-labels.apiName="<INSERT API NAME>"
-labels.jobID="<INSERT JOB ID>"
-```
-
-Please make sure to navigate to the project containing your cluster and adjust the time range accordingly before running queries.
-
-## Structured logging
-
-You can use Cortex's logger in your Python code to log in JSON, which will enrich your logs with Cortex's metadata, and enable you to add custom metadata to the logs. See the structured logging docs for [Realtime](../../workloads/realtime/predictors.md#structured-logging) and [Task](../../workloads/task/definitions.md#structured-logging) APIs.
+Check the observability documentation about [logs on GCP section](../../workloads/observability/logging.md#logs-on-gcp)
+for more information.
diff --git a/docs/summary.md b/docs/summary.md
index b390384a86..3e882f8397 100644
--- a/docs/summary.md
+++ b/docs/summary.md
@@ -48,6 +48,9 @@
   * [Python packages](workloads/dependencies/python-packages.md)
   * [System packages](workloads/dependencies/system-packages.md)
   * [Custom images](workloads/dependencies/images.md)
+* Observability
+  * [Logging](workloads/observability/logging.md)
+  * [Metrics](workloads/observability/metrics.md)
 
 ## Clusters
 
diff --git a/docs/workloads/observability/logging.md b/docs/workloads/observability/logging.md
new file mode 100644
index 0000000000..7fbef4cfe2
--- /dev/null
+++ b/docs/workloads/observability/logging.md
@@ -0,0 +1,99 @@
+# Logging
+
+Cortex provides a logging solution, out-of-the-box, without the need to configure anything. By default, logs are
+collected with FluentBit, on every API kind, and are exported to each cloud provider logging solution. 
+It is also possible to view the logs of a single API replica, while developing, through the `cortex logs` command.
+
+## Cortex logs command
+
+The cortex CLI tool provides a command to in
+
+## Logs on AWS
+
+For AWS clusters, logs will be pushed to [CloudWatch](https://console.aws.amazon.com/cloudwatch/home) using fluent-bit.
+A log group with the same name as your cluster will be created to store your logs. API logs are tagged with labels to
+help with log aggregation and filtering.
+
+Below are some sample CloudWatch Log Insight queries:
+
+**RealtimeAPI:**
+
+```text
+fields @timestamp, log
+| filter labels.apiName="<INSERT API NAME>"
+| filter labels.apiKind="RealtimeAPI"
+| sort @timestamp asc
+| limit 1000
+```
+
+**BatchAPI:**
+
+```text
+fields @timestamp, log
+| filter labels.apiName="<INSERT API NAME>"
+| filter labels.jobID="<INSERT JOB ID>"
+| filter labels.apiKind="BatchAPI"
+| sort @timestamp asc
+| limit 1000
+```
+
+**TaskAPI:**
+
+```text
+fields @timestamp, log
+| filter labels.apiName="<INSERT API NAME>"
+| filter labels.jobID="<INSERT JOB ID>"
+| filter labels.apiKind="TaskAPI"
+| sort @timestamp asc
+| limit 1000
+```
+
+## Logs on GCP
+
+Logs will be pushed to [StackDriver](https://console.cloud.google.com/logs/query) using fluent-bit. API logs are tagged
+with labels to help with log aggregation and filtering.
+
+Below are some sample Stackdriver queries:
+
+**RealtimeAPI:**
+
+```text
+resource.type="k8s_container"
+resource.labels.cluster_name="<INSERT CLUSTER NAME>"
+labels.apiKind="RealtimeAPI"
+labels.apiName="<INSERT API NAME>"
+```
+
+**BatchAPI:**
+
+```text
+resource.type="k8s_container"
+resource.labels.cluster_name="<INSERT CLUSTER NAME>"
+labels.apiKind="BatchAPI"
+labels.apiName="<INSERT API NAME>"
+labels.jobID="<INSERT JOB ID>"
+```
+
+**TaskAPI:**
+
+```text
+resource.type="k8s_container"
+resource.labels.cluster_name="<INSERT CLUSTER NAME>"
+labels.apiKind="TaskAPI"
+labels.apiName="<INSERT API NAME>"
+labels.jobID="<INSERT JOB ID>"
+```
+
+Please make sure to navigate to the project containing your cluster and adjust the time range accordingly before running
+queries.
+
+## Structured logging
+
+You can use Cortex's logger in your Python code to log in JSON, which will enrich your logs with Cortex's metadata, and
+enable you to add custom metadata to the logs.
+
+See the structured logging docs for each API kind:
+
+- [RealtimeAPI](../../workloads/realtime/predictors.md#structured-logging)
+- [BatchAPI](../../workloads/batch/predictors.md#structured-logging)
+- [TaskAPI](../../workloads/task/definitions.md#structured-logging)

From 7f08dda6e66f20520bbb1e14af35b62f54728739 Mon Sep 17 00:00:00 2001
From: Miguel Varela Ramos <miguel@cortexlabs.com>
Date: Tue, 2 Mar 2021 15:37:28 +0100
Subject: [PATCH 2/7] Update logging docs

---
 docs/workloads/observability/logging.md | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/docs/workloads/observability/logging.md b/docs/workloads/observability/logging.md
index 7fbef4cfe2..9c72f24086 100644
--- a/docs/workloads/observability/logging.md
+++ b/docs/workloads/observability/logging.md
@@ -1,12 +1,25 @@
 # Logging
 
 Cortex provides a logging solution, out-of-the-box, without the need to configure anything. By default, logs are
-collected with FluentBit, on every API kind, and are exported to each cloud provider logging solution. 
-It is also possible to view the logs of a single API replica, while developing, through the `cortex logs` command.
+collected with FluentBit, on every API kind, and are exported to each cloud provider logging solution. It is also
+possible to view the logs of a single API replica, while developing, through the `cortex logs` command.
 
 ## Cortex logs command
 
-The cortex CLI tool provides a command to in
+The cortex CLI tool provides a command to quickly check the logs for a single API replica while debugging.
+
+To check the logs of an API run one of the following commands:
+
+```shell
+# RealtimeAPI
+cortex logs <api_name>
+
+# BatchAPI or TaskAPI
+cortex logs <api_name> <job_id>  # the jobs needs to be in a running state
+```
+
+**Important:** this method won't show the logs for all the API replicas and therefore is not a complete logging
+solution.
 
 ## Logs on AWS
 

From 1b92b494721801c2afc62ed6caad8c232e5eff82 Mon Sep 17 00:00:00 2001
From: Miguel Varela Ramos <miguel@cortexlabs.com>
Date: Tue, 2 Mar 2021 17:53:52 +0100
Subject: [PATCH 3/7] Add custom user metrics docs

---
 docs/workloads/batch/metrics.md         |  29 ++++++
 docs/workloads/observability/logging.md |   2 +-
 docs/workloads/observability/metrics.md | 130 ++++++++++++++++++++++++
 docs/workloads/realtime/metrics.md      |  72 ++++---------
 docs/workloads/task/metrics.md          |  28 +++++
 5 files changed, 208 insertions(+), 53 deletions(-)
 create mode 100644 docs/workloads/batch/metrics.md
 create mode 100644 docs/workloads/observability/metrics.md
 create mode 100644 docs/workloads/task/metrics.md

diff --git a/docs/workloads/batch/metrics.md b/docs/workloads/batch/metrics.md
new file mode 100644
index 0000000000..ade418b6a9
--- /dev/null
+++ b/docs/workloads/batch/metrics.md
@@ -0,0 +1,29 @@
+# Metrics
+
+## Custom user metrics
+
+It is possible to export custom user metrics by adding the `metrics_client`
+argument to the predictor constructor. Below there is an example on how to use the metrics client with
+the `PythonPredictor` type. The implementation would be similar to other predictor types.
+
+```python
+class PythonPredictor:
+    def __init__(self, config, metrics_client):
+        self.metrics = metrics_client
+
+    def predict(self, payload):
+        # --- my predict code here ---
+        result = ...
+
+        # increment a counter with name "my_metric" and tags model:v1
+        self.metrics.increment(metric="my_counter", value=1, tags={"model": "v1"})
+
+        # set the value for a gauge with name "my_gauge" and tags model:v1
+        self.metrics.gauge(metric="my_gauge", value=42, tags={"model": "v1"})
+
+        # set the value for an histogram with name "my_histogram" and tags model:v1
+        self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"})
+```
+
+Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on
+custom metrics.
diff --git a/docs/workloads/observability/logging.md b/docs/workloads/observability/logging.md
index 9c72f24086..b57436bbfa 100644
--- a/docs/workloads/observability/logging.md
+++ b/docs/workloads/observability/logging.md
@@ -15,7 +15,7 @@ To check the logs of an API run one of the following commands:
 cortex logs <api_name>
 
 # BatchAPI or TaskAPI
-cortex logs <api_name> <job_id>  # the jobs needs to be in a running state
+cortex logs <api_name> <job_id>  # the job needs to be in a running state
 ```
 
 **Important:** this method won't show the logs for all the API replicas and therefore is not a complete logging
diff --git a/docs/workloads/observability/metrics.md b/docs/workloads/observability/metrics.md
new file mode 100644
index 0000000000..b0107acc6c
--- /dev/null
+++ b/docs/workloads/observability/metrics.md
@@ -0,0 +1,130 @@
+# Metrics
+
+A cortex cluster includes a deployment of Prometheus for metrics collections and a deployment of Grafana for
+visualization. You can monitor your APIs with the Grafana dashboards that ship with Cortex, or even add custom metrics
+and dashboards.
+
+## Accessing the dashboard
+
+The dashboard URL is displayed once you run a `cortex get <api_name>` command.
+
+Alternatively, you can access it on `http://<operator_url>/dashboard`. Run the following command to get the operator
+URL:
+
+```shell
+cortex env list
+```
+
+If your operator load balancer is configured to be internal, there are a few options for accessing the dashboard:
+
+1. Access the dashboard from a machine that has VPC Peering configured to your cluster's VPC, or which is inside of your
+   cluster's VPC
+1. Run `kubectl port-forward -n default grafana-0 3000:3000` to forward Grafana's port to your local machine, and access
+   the dashboard on [http://localhost:3000/](http://localhost:3000/) (see instructions for setting up `kubectl`
+   on [AWS](../../clusters/aws/kubectl.md) or [GCP](../../clusters/gcp/kubectl.md))
+1. Set up VPN access to your cluster's
+   VPC ([AWS docs](https://docs.aws.amazon.com/vpc/latest/userguide/vpn-connections.html))
+
+### Default credentials
+
+The dashboard is protected with username / password authentication, which by default are:
+
+- Username: admin
+- Password: admin
+
+You will be prompted to change the admin user password in the first time you log in.
+
+Grafana allows managing the access of several users and managing teams. For more information on this topic check
+the [grafana documentation](https://grafana.com/docs/grafana/latest/manage-users/).
+
+### Selecting an API
+
+You can select one or more APIs to visualize in the top left corner of the dashboard.
+
+![](https://user-images.githubusercontent.com/7456627/107375721-57545180-6ae9-11eb-9474-ba58ad7eb0c5.png)
+
+### Selecting a time range
+
+Grafana allows you to select a time range on which the metrics will be visualized. You can do so in the top right corner
+of the dashboard.
+
+![](https://user-images.githubusercontent.com/7456627/107376148-d9dd1100-6ae9-11eb-8c2b-c678b41ade01.png)
+
+**Note: Cortex only retains a maximum of 2 weeks worth of data at any moment in time**
+
+### Available dashboards
+
+There are more than one dashboard available by default. You can view the available dashboards by accessing the Grafana
+menu: `Dashboards -> Manage -> Cortex folder`.
+
+The dashboards that Cortex ships with are the following:
+
+- RealtimeAPI
+- BatchAPI
+- Cluster resources
+- Node resources
+
+## Exposed metrics
+
+Cortex exposes more metrics with Prometheus, that can be potentially useful. To check the available metrics, access
+the `Explore` menu in grafana and press the `Metrics` button.
+
+![](https://user-images.githubusercontent.com/7456627/107377492-515f7000-6aeb-11eb-9b46-909120335060.png)
+
+You can use any of these metrics to set up your own dashboards.
+
+## Custom user metrics
+
+It is possible to export your own custom metrics by using the `MetricsClient` class in your predictor code. This allows
+you to create a custom metrics from your deployed API that can be later be used on your own custom dashboards.
+
+Code examples on how to use custom metrics for each API kind can be found here:
+
+- [RealtimeAPI](../realtime/metrics.md#custom-user-metrics)
+- [BatchAPI](../batch/metrics.md#custom-user-metrics)
+- [TaskAPI](../task/metrics.md#custom-user-metrics)
+
+### Metric types
+
+Currently, we only support 3 different metric types that will be converted to its respective Prometheus type:
+
+- [Counter](https://prometheus.io/docs/concepts/metric_types/#counter) - a cumulative metric that represents a single
+  monotonically increasing counter whose value can only increase or be reset to zero on restart.
+- [Gauge](https://prometheus.io/docs/concepts/metric_types/#gauge) - a single numerical value that can arbitrarily go up
+  and down.
+- [Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) - samples observations (usually things like
+  request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed
+  values.
+
+### Metrics client class reference
+
+```python
+class MetricsClient:
+
+    def gauge(self, metric: str, value: float, tags: Dict[str, str] = None):
+        """
+        Record the value of a gauge.
+
+        Example:
+        >>> metrics.gauge('active_connections', 1001, tags={"protocol": "http"})
+        """
+        pass
+
+    def increment(self, metric: str, value: float = 1, tags: Dict[str, str] = None):
+        """
+        Increment the value of a counter.
+
+        Example:
+        >>> metrics.increment('model_calls', 1, tags={"model_version": "v1"})
+        """
+        pass
+
+    def histogram(self, metric: str, value: float, tags: Dict[str, str] = None):
+        """
+        Set the value in a histogram metric
+
+        Example:
+        >>> metrics.histogram('inference_time_milliseconds', 120, tags={"model_version": "v1"})
+        """
+        pass
+```
diff --git a/docs/workloads/realtime/metrics.md b/docs/workloads/realtime/metrics.md
index 6637967c1d..9c9f291df1 100644
--- a/docs/workloads/realtime/metrics.md
+++ b/docs/workloads/realtime/metrics.md
@@ -33,62 +33,30 @@ The `cortex get API_NAME` command also provides a link to a Grafana dashboard:
 | p50 Latency       | 50th percentile latency, computed over a minute, for an API                        | Value might not be accurate because the histogram buckets are not dynamically set.                 |
 | Average Latency   | Average latency, computed over a minute, for an API                                |                                                                                                    |
 
----
-
-## Accessing the dashboard
-
-The dashboard URL is displayed once you run a `cortex get <api_name>` command.
-
-Alternatively, you can access it on `http://<operator_url>/dashboard`. Run the following command to get the operator
-URL:
-
-```shell
-cortex env list
-```
-
-If your operator load balancer is configured to be internal, there are a few options for accessing the dashboard:
-
-1. Access the dashboard from a machine that has VPC Peering configured to your cluster's VPC, or which is inside of your cluster's VPC
-1. Run `kubectl port-forward -n default grafana-0 3000:3000` to forward Grafana's port to your local machine, and access the dashboard on [http://localhost:3000/](http://localhost:3000/) (see instructions for setting up `kubectl` on [AWS](../../clusters/aws/kubectl.md) or [GCP](../../clusters/gcp/kubectl.md))
-1. Set up VPN access to your cluster's VPC ([docs](https://docs.aws.amazon.com/vpc/latest/userguide/vpn-connections.html))
-
-### Default credentials
-
-The dashboard is protected with username / password authentication, which by default are:
+## Custom user metrics
 
-- Username: admin
-- Password: admin
+It is possible to export custom user metrics by adding the `metrics_client`
+argument to the predictor constructor. Below there is an example on how to use the metrics client with
+the `PythonPredictor` type. The implementation would be similar to other predictor types.
 
-You will be prompted to change the admin user password in the first time you log in.
+```python
+class PythonPredictor:
+    def __init__(self, config, metrics_client):
+        self.metrics = metrics_client
 
-Grafana allows managing the access of several users and managing teams. For more information on this topic check
-the [grafana documentation](https://grafana.com/docs/grafana/latest/manage-users/).
+    def predict(self, payload):
+        # --- my predict code here ---
+        result = ...
 
-### Selecting an API
+        # increment a counter with name "my_metric" and tags model:v1
+        self.metrics.increment(metric="my_counter", value=1, tags={"model": "v1"})
 
-You can select one or more APIs to visualize in the top left corner of the dashboard.
+        # set the value for a gauge with name "my_gauge" and tags model:v1
+        self.metrics.gauge(metric="my_gauge", value=42, tags={"model": "v1"})
 
-![](https://user-images.githubusercontent.com/7456627/107375721-57545180-6ae9-11eb-9474-ba58ad7eb0c5.png)
-
-### Selecting a time range
-
-Grafana allows you to select a time range on which the metrics will be visualized. You can do so in the top right corner
-of the dashboard.
-
-![](https://user-images.githubusercontent.com/7456627/107376148-d9dd1100-6ae9-11eb-8c2b-c678b41ade01.png)
-
-**Note: Cortex only retains a maximum of 2 weeks worth of data at any moment in time**
-
-### Available dashboards
-
-There are more than one dashboard available by default. You can view the available dashboards by accessing the Grafana
-menu: `Dashboards -> Manage -> Cortex folder`.
-
-## Exposed metrics
-
-Cortex exposes more metrics with Prometheus, that can be potentially useful. To check the available metrics, access
-the `Explore` menu in grafana and press the `Metrics` button.
-
-![](https://user-images.githubusercontent.com/7456627/107377492-515f7000-6aeb-11eb-9b46-909120335060.png)
+        # set the value for an histogram with name "my_histogram" and tags model:v1
+        self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"})
+```
 
-You can use any of these metrics to set up your own dashboards.
+Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on
+custom metrics.
diff --git a/docs/workloads/task/metrics.md b/docs/workloads/task/metrics.md
new file mode 100644
index 0000000000..677ae6562a
--- /dev/null
+++ b/docs/workloads/task/metrics.md
@@ -0,0 +1,28 @@
+## Custom user metrics
+
+It is possible to export custom user metrics by adding the `metrics_client`
+argument to the task definition constructor. Below there is an example on how to use the metrics client. 
+
+Currently, it is only possible to instantiate the metrics client from a class task definition.
+
+```python
+class Task:
+    def __init__(self, metrics_client):
+        self.metrics = metrics_client
+
+    def __call__(self, config):
+        # --- my task code here ---
+        ...
+        
+        # increment a counter with name "my_metric" and tags model:v1
+        self.metrics.increment(metric="my_counter", value=1, tags={"model": "v1"})
+
+        # set the value for a gauge with name "my_gauge" and tags model:v1
+        self.metrics.gauge(metric="my_gauge", value=42, tags={"model": "v1"})
+
+        # set the value for an histogram with name "my_histogram" and tags model:v1
+        self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"})
+```
+
+Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on
+custom metrics.

From 65fefe7bc2a5b52b3774deeeb90d8f30d9ed23bf Mon Sep 17 00:00:00 2001
From: Miguel Varela Ramos <miguel@cortexlabs.com>
Date: Tue, 2 Mar 2021 17:58:02 +0100
Subject: [PATCH 4/7] Add note about the metrics client

---
 docs/workloads/batch/metrics.md    | 3 +++
 docs/workloads/realtime/metrics.md | 3 +++
 docs/workloads/task/metrics.md     | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/docs/workloads/batch/metrics.md b/docs/workloads/batch/metrics.md
index ade418b6a9..5086eb2b92 100644
--- a/docs/workloads/batch/metrics.md
+++ b/docs/workloads/batch/metrics.md
@@ -27,3 +27,6 @@ class PythonPredictor:
 
 Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on
 custom metrics.
+
+**Note**: The metrics client uses the UDP protocol to push metrics, to be fault tolerant, so if it fails during a
+metrics push there is no exception thrown.
diff --git a/docs/workloads/realtime/metrics.md b/docs/workloads/realtime/metrics.md
index 9c9f291df1..24f380427c 100644
--- a/docs/workloads/realtime/metrics.md
+++ b/docs/workloads/realtime/metrics.md
@@ -60,3 +60,6 @@ class PythonPredictor:
 
 Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on
 custom metrics.
+
+**Note**: The metrics client uses the UDP protocol to push metrics, to be fault tolerant, so if it fails during a
+metrics push there is no exception thrown.
diff --git a/docs/workloads/task/metrics.md b/docs/workloads/task/metrics.md
index 677ae6562a..642dbc9d95 100644
--- a/docs/workloads/task/metrics.md
+++ b/docs/workloads/task/metrics.md
@@ -26,3 +26,6 @@ class Task:
 
 Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on
 custom metrics.
+
+**Note**: The metrics client uses the UDP protocol to push metrics, to be fault tolerant, so if it fails during a
+metrics push there is no exception thrown.

From dae3435e957b4b4f8cc98c6a801b67533f36cec8 Mon Sep 17 00:00:00 2001
From: Miguel Varela Ramos <miguel@cortexlabs.com>
Date: Tue, 2 Mar 2021 18:01:30 +0100
Subject: [PATCH 5/7] Fix typo

---
 docs/workloads/batch/metrics.md    | 2 +-
 docs/workloads/realtime/metrics.md | 2 +-
 docs/workloads/task/metrics.md     | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/workloads/batch/metrics.md b/docs/workloads/batch/metrics.md
index 5086eb2b92..95d3817d42 100644
--- a/docs/workloads/batch/metrics.md
+++ b/docs/workloads/batch/metrics.md
@@ -3,7 +3,7 @@
 ## Custom user metrics
 
 It is possible to export custom user metrics by adding the `metrics_client`
-argument to the predictor constructor. Below there is an example on how to use the metrics client with
+argument to the predictor constructor. Below there is an example of how to use the metrics client with
 the `PythonPredictor` type. The implementation would be similar to other predictor types.
 
 ```python
diff --git a/docs/workloads/realtime/metrics.md b/docs/workloads/realtime/metrics.md
index 24f380427c..9ef19306c6 100644
--- a/docs/workloads/realtime/metrics.md
+++ b/docs/workloads/realtime/metrics.md
@@ -36,7 +36,7 @@ The `cortex get API_NAME` command also provides a link to a Grafana dashboard:
 ## Custom user metrics
 
 It is possible to export custom user metrics by adding the `metrics_client`
-argument to the predictor constructor. Below there is an example on how to use the metrics client with
+argument to the predictor constructor. Below there is an example of how to use the metrics client with
 the `PythonPredictor` type. The implementation would be similar to other predictor types.
 
 ```python
diff --git a/docs/workloads/task/metrics.md b/docs/workloads/task/metrics.md
index 642dbc9d95..77c012e567 100644
--- a/docs/workloads/task/metrics.md
+++ b/docs/workloads/task/metrics.md
@@ -1,7 +1,7 @@
 ## Custom user metrics
 
 It is possible to export custom user metrics by adding the `metrics_client`
-argument to the task definition constructor. Below there is an example on how to use the metrics client. 
+argument to the task definition constructor. Below there is an example of how to use the metrics client.
 
 Currently, it is only possible to instantiate the metrics client from a class task definition.
 
@@ -13,7 +13,7 @@ class Task:
     def __call__(self, config):
         # --- my task code here ---
         ...
-        
+
         # increment a counter with name "my_metric" and tags model:v1
         self.metrics.increment(metric="my_counter", value=1, tags={"model": "v1"})
 

From e111b469048be3704e9016bfe6cce2812f518fdf Mon Sep 17 00:00:00 2001
From: Miguel Varela Ramos <miguel@cortexlabs.com>
Date: Tue, 2 Mar 2021 19:14:28 +0100
Subject: [PATCH 6/7] Add docs on how to push metrics

---
 docs/workloads/observability/metrics.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/docs/workloads/observability/metrics.md b/docs/workloads/observability/metrics.md
index b0107acc6c..2837c80123 100644
--- a/docs/workloads/observability/metrics.md
+++ b/docs/workloads/observability/metrics.md
@@ -95,6 +95,26 @@ Currently, we only support 3 different metric types that will be converted to it
 - [Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) - samples observations (usually things like
   request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed
   values.
+  
+### Pushing metrics
+
+ - Counter
+    
+    ```python
+    metrics.increment('my_counter', value=1, tags={"tag": "tag_name"})
+    ```
+
+ - Gauge
+   
+    ```python
+    metrics.gauge('active_connections', value=1001, tags={"tag": "tag_name"})
+    ```
+
+ - Histogram
+
+    ```python
+    metrics.histogram('inference_time_milliseconds', 120, tags={"tag": "tag_name"})
+    ```
 
 ### Metrics client class reference
 

From 58f1702c517172cd7d8614daad9b1318eda0d7c1 Mon Sep 17 00:00:00 2001
From: vishal <vishalbollu@users.noreply.github.com>
Date: Tue, 2 Mar 2021 15:59:34 -0500
Subject: [PATCH 7/7] Remove pointer docs and fix linting

---
 docs/clusters/aws/logging.md            | 4 ----
 docs/clusters/gcp/logging.md            | 4 ----
 docs/summary.md                         | 2 --
 docs/workloads/observability/metrics.md | 6 +++---
 4 files changed, 3 insertions(+), 13 deletions(-)
 delete mode 100644 docs/clusters/aws/logging.md
 delete mode 100644 docs/clusters/gcp/logging.md

diff --git a/docs/clusters/aws/logging.md b/docs/clusters/aws/logging.md
deleted file mode 100644
index e44fc778c8..0000000000
--- a/docs/clusters/aws/logging.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Logs
-
-Check the observability documentation about [logs on AWS section](../../workloads/observability/logging.md#logs-on-aws)
-for more information.
diff --git a/docs/clusters/gcp/logging.md b/docs/clusters/gcp/logging.md
deleted file mode 100644
index 75ea4c2e35..0000000000
--- a/docs/clusters/gcp/logging.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Logs
-
-Check the observability documentation about [logs on GCP section](../../workloads/observability/logging.md#logs-on-gcp)
-for more information.
diff --git a/docs/summary.md b/docs/summary.md
index 3e882f8397..e432fe62e9 100644
--- a/docs/summary.md
+++ b/docs/summary.md
@@ -59,7 +59,6 @@
   * [Update](clusters/aws/update.md)
   * [Auth](clusters/aws/auth.md)
   * [Security](clusters/aws/security.md)
-  * [Logging](clusters/aws/logging.md)
   * [Spot instances](clusters/aws/spot.md)
   * [Networking](clusters/aws/networking/index.md)
     * [Custom domain](clusters/aws/networking/custom-domain.md)
@@ -69,7 +68,6 @@
   * [Uninstall](clusters/aws/uninstall.md)
 * GCP
   * [Install](clusters/gcp/install.md)
-  * [Logging](clusters/gcp/logging.md)
   * [Credentials](clusters/gcp/credentials.md)
   * [Setting up kubectl](clusters/gcp/kubectl.md)
   * [Uninstall](clusters/gcp/uninstall.md)
diff --git a/docs/workloads/observability/metrics.md b/docs/workloads/observability/metrics.md
index 2837c80123..6f3a13a91a 100644
--- a/docs/workloads/observability/metrics.md
+++ b/docs/workloads/observability/metrics.md
@@ -95,17 +95,17 @@ Currently, we only support 3 different metric types that will be converted to it
 - [Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) - samples observations (usually things like
   request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed
   values.
-  
+
 ### Pushing metrics
 
  - Counter
-    
+
     ```python
     metrics.increment('my_counter', value=1, tags={"tag": "tag_name"})
     ```
 
  - Gauge
-   
+
     ```python
     metrics.gauge('active_connections', value=1001, tags={"tag": "tag_name"})
     ```