Add Datadog system agent #10380

markus-hinsche · 2021-11-24T14:59:19Z

Motivation: For the benchmarking project, we want to collect CPU utilization and memory usage. Tracking these metrics manually introduces a lot of custom code (e.g. mprof, nvidia-smi, top) cluttering the test. To avoid this, we can have a datadog agent running in the background which reports the metrics every so many seconds.

Proposed changes:

add Datadog agent to send system (e.g. CPU+memory) metrics to Datadog
introduce a bash script that can be called from Github Actions yaml (avoid code duplication)
add NVML integration (if ACCELERATOR_TYPE=GPU)

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

github-actions · 2021-12-07T08:45:17Z

Commit: 9253866, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m5s`, train: `2m7s`, total: `3m12s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

github-actions · 2021-12-07T08:56:58Z

Commit: 5a4c8a2, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m33s`, train: `3m17s`, total: `4m49s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

github-actions · 2021-12-08T17:20:00Z

Commit: ae38aad, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m25s`, train: `2m56s`, total: `4m21s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

github-actions · 2021-12-08T17:47:15Z

Hey @markus-hinsche! 👋 To run model regression tests, comment with the /modeltest command and a configuration.

Tips 💡: The model regression test will be run on push events. You can re-run the tests by re-add status:model-regression-tests label or use a Re-run jobs button in Github Actions workflow.

Tips 💡: Every time when you want to change a configuration you should edit the comment with the previous configuration.

You can copy this in your comment and customize:

/modeltest

```yml
##########
## Available datasets
##########
# - "Carbon Bot" (NLU)
# - "Hermit" (NLU)
# - "Private 1" (NLU)
# - "Private 2" (NLU)
# - "Private 3" (NLU)
# - "Sara" (NLU, Core)
# - "financial-demo" (NLU, Core)
# - "helpdesk-assistant" (NLU, Core)
# - "insurance-demo" (NLU, Core)
# - "retail-demo" (NLU, Core)

##########
## Available NLU configurations
##########
# - "BERT + DIET(bow) + ResponseSelector(bow)"
# - "BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Spacy + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + BERT + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)"

##########
## Available Core configurations
##########
# - "Rules"
# - "Rules + AugMemo"
# - "Rules + AugMemo + TED"
# - "Rules + Memo"
# - "Rules + Memo + TED"
# - "Rules + TED"

## Example configuration
#################### syntax #################
## include:
##   - dataset: ["<dataset_name>"]
##     config: ["<configuration_name>"]
#
## Example:
## include:
##  - dataset: ["Carbon Bot"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Shortcut:
## You can use the "all" shortcut to include all available configurations or datasets
#
## Example: Use the "Sparse + EmbeddingIntent + ResponseSelector(bow)" configuration
## for all available datasets
## include:
##  - dataset: ["all"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Example: Use all available configurations for the "Carbon Bot" and "Sara" datasets
## and for the "Hermit" dataset use the "Sparse + DIET + ResponseSelector(T2T)" and
## "BERT + DIET + ResponseSelector(T2T)" configurations:
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##  - dataset: ["Hermit"]
##    config: ["Sparse + DIET(seq) + ResponseSelector(t2t)", "BERT + DIET(seq) + ResponseSelector(t2t)"]
#
## Example: Define a branch name to check-out for a dataset repository. Default branch is 'main'
## dataset_branch: "test-branch"
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##
## Shortcuts:
## You can use the "all" shortcut to include all available configurations or datasets.
## You can use the "all-nlu" shortcut to include all available NLU configurations or datasets.
## You can use the "all-core" shortcut to include all available core configurations or datasets.

include:
 - dataset: ["Carbon Bot"]
   config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]

```

github-actions · 2021-12-08T17:47:19Z

/modeltest

include:
 - dataset: ["financial-demo"]
   config: ["Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"]

github-actions · 2021-12-08T17:47:21Z

The model regression tests have started. It might take a while, please be patient.
As soon as results are ready you'll see a new comment with the results.

Used configuration can be found in the comment.

github-actions · 2021-12-08T17:55:37Z

Commit: 208d68b, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m23s`, train: `3m35s`, total: `4m57s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

markus-hinsche · 2021-12-08T18:09:00Z

This is ready to be merged from my side, but I can't merge yet because @tczekajlo requested changes

re-review done!

Add DD system agent

94477ba

markus-hinsche added status:model-regression-tests and removed status:model-regression-tests labels Nov 24, 2021

github-actions bot deleted a comment from markus-hinsche Nov 24, 2021

github-actions bot removed the status:model-regression-tests label Nov 24, 2021

Debug: sleep more and look into system_core

fdcc2d5