mpds-orange

Overall system design

Autoscaling system

There are 2 Kafka topics used by the autoscaling system:

Metric for message rate metrics
Prediction for output results of the prediction models

Metrics

Example of a Prometheus API request and response:

Here's the request. It is not a single HTTP request, but a separate request for each metric. Although it is technically possible to get all the metrics at once (as per the documentation), we would have to do aggregations on our own. Having multiple queries is not a problem, however. We can simply specify a time field in the HTTP request to retrieve a consistent set of metrics.

request() {
	address="http://prometheus:30090"
	endpoint="/api/v1/query"
	url="${address}${endpoint}"
	time="2021-02-08T10:10:51.781Z"

	curl -Ss -X POST -F query="$1" -F time="$time" "$url" | jq
}

request "avg(avg by (operator_id) (flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{quantile=\"0.95\"}))"
request "avg(kafka_server_brokertopicmetrics_total_messagesinpersec_count)"
request kafka_controller_kafkacontroller_controllerstate_value

The response:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1612779051.781,
          "181.43333333333334"
        ]
      }
    ]
  }
}
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1612779051.781,
          "112629192.33333334"
        ]
      }
    ]
  }
}
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "kafka_controller_kafkacontroller_controllerstate_value",
          "app_kubernetes_io_component": "kafka",
          "app_kubernetes_io_instance": "mpds",
          "app_kubernetes_io_managed_by": "Helm",
          "app_kubernetes_io_name": "kafka",
          "controller_revision_hash": "kafka-7dc6cd8b54",
          "helm_sh_chart": "kafka-11.8.2",
          "instance": "10.1.0.10:5556",
          "job": "kubernetes-pods",
          "kubernetes_namespace": "default",
          "kubernetes_pod_name": "kafka-1",
          "statefulset_kubernetes_io_pod_name": "kafka-1"
        },
        "value": [
          1612779051.781,
          "0"
        ]
      },
      {
        "metric": {
          "__name__": "kafka_controller_kafkacontroller_controllerstate_value",
          "app_kubernetes_io_component": "kafka",
          "app_kubernetes_io_instance": "mpds",
          "app_kubernetes_io_managed_by": "Helm",
          "app_kubernetes_io_name": "kafka",
          "controller_revision_hash": "kafka-7dc6cd8b54",
          "helm_sh_chart": "kafka-11.8.2",
          "instance": "10.1.1.6:5556",
          "job": "kubernetes-pods",
          "kubernetes_namespace": "default",
          "kubernetes_pod_name": "kafka-0",
          "statefulset_kubernetes_io_pod_name": "kafka-0"
        },
        "value": [
          1612779051.781,
          "0"
        ]
      },
      {
        "metric": {
          "__name__": "kafka_controller_kafkacontroller_controllerstate_value",
          "app_kubernetes_io_component": "kafka",
          "app_kubernetes_io_instance": "mpds",
          "app_kubernetes_io_managed_by": "Helm",
          "app_kubernetes_io_name": "kafka",
          "controller_revision_hash": "kafka-7dc6cd8b54",
          "helm_sh_chart": "kafka-11.8.2",
          "instance": "10.1.2.6:5556",
          "job": "kubernetes-pods",
          "kubernetes_namespace": "default",
          "kubernetes_pod_name": "kafka-2",
          "statefulset_kubernetes_io_pod_name": "kafka-2"
        },
        "value": [
          1612779051.781,
          "0"
        ]
      }
    ]
  }
}

The following metrics are potentially interesting, but are not available available at this moment in Prometheus:

Network in/out (to do)
CPU Utilization (to do, probably available via Kubernetes API's)
Memory Usage (to do, probably avialable via Kubernetes API's)

Prediction models

There are two prediction models available at the moment: a long-term one, and a short-term one. Both models are predicting the load based on the Kafka message rates.

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
covid-simulator		covid-simulator
flink-autoscaler		flink-autoscaler
flink-engine-simple		flink-engine-simple
flink-engine		flink-engine
infrastructure		infrastructure
iot-simulator		iot-simulator
short-term-predictor		short-term-predictor
testing/postman_collectons		testing/postman_collectons
time-series-prediction		time-series-prediction
.gitignore		.gitignore
Prometheus.postman_collection.json		Prometheus.postman_collection.json
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mpds-orange

Overall system design

Autoscaling system

Metrics

Prediction models

About

Releases

Packages

Contributors 4

Languages

0mp/mpds-orange

Folders and files

Latest commit

History

Repository files navigation

mpds-orange

Overall system design

Autoscaling system

Metrics

Prediction models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages