Skip to content

John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization

Compare
Choose a tag to compare
@ArshaanNazir ArshaanNazir released this 25 May 11:47
· 3493 commits to main since this release
2683c1e

John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization


πŸ“’ Overview

NLP Test 1.3.0 πŸš€ comes with brand new features, including: new capabilities for testing Large Language Models on Summarization task with support for robustness, bias, representation, fairness and accuracy tests on the XSum dataset. Also added fairness tests for the Question Answering datasets and many other enhancements and bug fixes!

A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests πŸŽ‰

Make sure to give the project a star right here ⭐


πŸ”₯ New Features & Enhancements

  • Adding support for summarization with the XSum dataset #433
  • Adding support for fairness tests for testing LLMs on Question Answering #430
  • Adding support for accuracy/fairness tests for testing LLMs on summarization #446
  • Adding new robustness test called add_ocr_typo #428

πŸ› Bug Fixes

  • Review issues with QAEval in OpenAI Natural Questions #444

❓ How to Use

Get started now! πŸ‘‡

pip install nlptest

Create your test harness in 3 lines of code πŸ§ͺ

# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''

# Import and create a Harness object
from nlptest import Harness
h = Harness(task='summarization', model='text-davinci-002', hub='openai', data='XSum-test', config='config.yml')

# Generate test cases, run them and view a report
h.generate().run().report()

πŸ“– Documentation


❀️ Community support

  • Slack For live discussion with the NLP Test community, join the #nlptest channel
  • GitHub For bug reports, feature requests, and contributions
  • Discussions To engage with other community members, share ideas, and show off how you use NLP Test!

We would love to have you join the mission πŸ‘‰ open an issue, a PR, or give us some feedback on features you'd like to see! πŸ™Œ


♻️ Changelog

What's Changed

New Contributors

Full Changelog: v1.2.0...v1.3.0