NVIDIA NeMo-Eval 0.1.0
- Evaluation for Automodel with vllm OAI deployment and nvidia-lm-eval as the eval harness
- Support for Logprob benchmarks with Ray
- Use evaluation APIs from nvidia-eval-commons
Known Issues
- Very low flexible-extract score with GSM8k for evaluation of NeMo 2.0 models due to lack of stop word support in MegatronLLMDeployableNemo2. However, this does not impact the strict-match score.