[Benchmark] Improve NLP Backbone Benchmark #1473

sxjscience · 2021-01-09T23:00:58Z

Description

In GluonNLP, we introduced the benchmarking script in https://github.com/dmlc/gluon-nlp/tree/master/scripts/benchmarks.

The goal is to track the training + inference latency of common NLP backbones so that we can choose the appropriate ones for our task. This will help users train + deploy models with AWS.

Currently, we covered:

Huggingface/Transformer-based backbone with FP32 + FP16 training / inference. For FP16 training, we are not profiling against the AMP-based solution so this gives an edge of pytorch, in which we need to fix
MXNet 2.0-nightly version (only for community use) + GluonNLP 1.0 with FP32 + FP16 (amp) training / inference.
TVM FP32 inference. Due to some recent upgrade of the code base, this is currently broken.

I will share the following action items that I feel are worthwhile doing:

Short-term Bug-fix + Improvement

Fix the FP16 training benchmark in Huggingface/Transformer to use AMP in PyTorch
Fix the TVM benchmark. This is also tracked in [TVM] TVM Integration Issue after changing to Boolean Mask. #1425
Add FP16 inference to TVM benchmark.
Turn on einsum acceleration in MXNet-based benchmark. This is added in Einsum cutensor GPU apache/mxnet#18921

Automation + Visualization

Support launching benchmark job with AWS Batch. Currently tracked in Fix Benchmark #1471.
Automate benchmarking process via Github actions.
Support visualization of benchmark results

Longer-term Backbone Benchmarking Effort

Add JAX/flax-based solution, which is internally using XLA.
Support AutoScheduler in TVM benchmark
Enable ONNX + TensorRT. This is considered the fastest solution for conducting NLP inference.

Other longer-term efforts

Support benchmarks for Data-loaders.
Support common end-to-end training benchmarks like the SQuAD 2.0 finetuning. We may focus on single-instance-based benchmarks.

@dmlc/gluon-nlp-committers

sxjscience added enhancement New feature or request help wanted Extra attention is needed performance Performance issues labels Jan 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Improve NLP Backbone Benchmark #1473

[Benchmark] Improve NLP Backbone Benchmark #1473

sxjscience commented Jan 9, 2021 •

edited

[Benchmark] Improve NLP Backbone Benchmark #1473

[Benchmark] Improve NLP Backbone Benchmark #1473

Comments

sxjscience commented Jan 9, 2021 • edited

Description

Short-term Bug-fix + Improvement

Automation + Visualization

Longer-term Backbone Benchmarking Effort

Other longer-term efforts

sxjscience commented Jan 9, 2021 •

edited