Release John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization · JohnSnowLabs/langtest

John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization

📢 Overview

NLP Test 1.3.0 🚀 comes with brand new features, including: new capabilities for testing Large Language Models on Summarization task with support for robustness, bias, representation, fairness and accuracy tests on the XSum dataset. Also added fairness tests for the Question Answering datasets and many other enhancements and bug fixes!

A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests 🎉

Make sure to give the project a star right here ⭐

🔥 New Features & Enhancements

Adding support for summarization with the XSum dataset #433
Adding support for fairness tests for testing LLMs on Question Answering #430
Adding support for accuracy/fairness tests for testing LLMs on summarization #446
Adding new robustness test called add_ocr_typo #428

🐛 Bug Fixes

Review issues with QAEval in OpenAI Natural Questions #444

❓ How to Use

Get started now! 👇

pip install nlptest

Create your test harness in 3 lines of code 🧪

# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''

# Import and create a Harness object
from nlptest import Harness
h = Harness(task='summarization', model='text-davinci-002', hub='openai', data='XSum-test', config='config.yml')

# Generate test cases, run them and view a report
h.generate().run().report()

📖 Documentation

❤️ Community support

Slack For live discussion with the NLP Test community, join the #nlptest channel
GitHub For bug reports, feature requests, and contributions
Discussions To engage with other community members, share ideas, and show off how you use NLP Test!

We would love to have you join the mission 👉 open an issue, a PR, or give us some feedback on features you'd like to see! 🙌

♻️ Changelog

What's Changed

Docs/website llm accuracy tests by @alytarik in #412
Docs/website number to word robustnes test by @RakshitKhajuria in #416
Release/1.2.0 by @ArshaanNazir in #425
Docs/add disclaimer for QAEval by @RakshitKhajuria in #429
feature/added ocr typo test by @Prikshit7766 in #428
tutorials/Cleaned notebooks by @Prikshit7766 in #431
feature/add-support-for-summarization by @ArshaanNazir in #433
feature/fairness for qa task by @alytarik in #430
Chore: add logos to landing page by @luca-martial in #435
feature/add_ocr_typo_for_QA_and_Summarization by @Prikshit7766 in #436
Fix/review issues with qa eval in open ai natural questions using custom prompt by @RakshitKhajuria in #444
Feature/update bias in summarization by @ArshaanNazir in #445
Feature/accuracy fairness for summarization by @alytarik in #446
hot-fix: harness_config in Harness Class by @chakravarthik27 in #447
Update/docs for summarization by @Prikshit7766 in #448
fix format for qa task by @alytarik in #450
hot-fix/XSum-test by @Prikshit7766 in #449
update summarization prompt by @ArshaanNazir in #451
Fix/tutorial nbs by @ArshaanNazir in #453
DRAFT: Fix/max f1 score by @alytarik in #452
Fix/tutorial nbs by @ArshaanNazir in #454
fix eval score by @alytarik in #455
update QA is_pass by @ArshaanNazir in #456
Release/1.3.0 by @ArshaanNazir in #457

New Contributors

@Prikshit7766 made their first contribution in #428

Full Changelog: v1.2.0...v1.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly