Skip to content

Opni AIOps Gateway

Amartya Chakraborty edited this page Jan 25, 2023 · 4 revisions

Opni AIOps Gateway

Description

Implements gateway endpoints for the Opni admin dashboard UI.

Programming Languages

  • Go

Diagram

AIOps Gateway (1)

Responsibilities

  • Train a new Deep Learning model or update watchlist parameters if workloads are simply removed from watchlist and no new workloads were added.
  • Update model training progress to give information such as estimated remaining time of training, percentage completed and total time elapsed during training.
  • Update breakdown of logs by workload within each cluster and namespace.
  • Get breakdown of logs by workload within each cluster and namespace.
  • Get status of whether a trained model currently exists within cluster.
  • Get the workloads watchlist of the last training job submitted by the user.
  • Get information on GPU(s) within cluster

Input and output interfaces

Input

Component Type Description
Opni Admin Dashboard User Interface The Opni admin dashboard will interact with the gateway plugin.
modelTrainingStatus Nats Jetstream key-val storage This keep tracks of the progress of the training of a Deep Learning model.
aggregation Nats Jetstream key-val storage This keeps track of the count of log messages by cluster, namespace and deployment.

Output

Component Type Description
train_model Nats request/reply subject When the admin dashboard sends in the workload parmaters for which the user would like a trained model for, the gateway plugin will submit a training job to the train_model Nats subject.
model_status Nats request/reply subject When the admin dashboard is getting the status of the model, it will send a request to the model_status Nats subject.
workload_parameters Nats request/reply subject When the admin dashboard is getting the last set of workloads for which a Deep Learning model was trained for, it will send a request to the workload_parameters Nats subject.
aggregation Nats Jetstream key-val storage Update the aggregation kv store within Nats jetstream every 30 seconds by getting the updated count of log messages by cluster, namespace and deployment.

Restrictions/limitations

  • Relies heavily on Nats request/reply structure.

Performance issues

Test plan

  • Unit tests
  • Integration tests
  • e2e tests
  • Manual testing
Clone this wiki locally