John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization
John Snow Labs NLP Test 1.3.0: Enhancing Support for Evaluating Large Language Models in Summarization
π’ Overview
NLP Test 1.3.0 π comes with brand new features, including: new capabilities for testing Large Language Models on Summarization task with support for robustness, bias, representation, fairness and accuracy tests on the XSum dataset. Also added fairness tests for the Question Answering datasets and many other enhancements and bug fixes!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Adding support for summarization with the XSum dataset #433
- Adding support for fairness tests for testing LLMs on Question Answering #430
- Adding support for accuracy/fairness tests for testing LLMs on summarization #446
- Adding new robustness test called add_ocr_typo #428
π Bug Fixes
- Review issues with QAEval in OpenAI Natural Questions #444
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='summarization', model='text-davinci-002', hub='openai', data='XSum-test', config='config.yml')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptest
channel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- Docs/website llm accuracy tests by @alytarik in #412
- Docs/website number to word robustnes test by @RakshitKhajuria in #416
- Release/1.2.0 by @ArshaanNazir in #425
- Docs/add disclaimer for QAEval by @RakshitKhajuria in #429
- feature/added ocr typo test by @Prikshit7766 in #428
- tutorials/Cleaned notebooks by @Prikshit7766 in #431
- feature/add-support-for-summarization by @ArshaanNazir in #433
- feature/fairness for qa task by @alytarik in #430
- Chore: add logos to landing page by @luca-martial in #435
- feature/add_ocr_typo_for_QA_and_Summarization by @Prikshit7766 in #436
- Fix/review issues with qa eval in open ai natural questions using custom prompt by @RakshitKhajuria in #444
- Feature/update bias in summarization by @ArshaanNazir in #445
- Feature/accuracy fairness for summarization by @alytarik in #446
- hot-fix: harness_config in Harness Class by @chakravarthik27 in #447
- Update/docs for summarization by @Prikshit7766 in #448
- fix format for qa task by @alytarik in #450
- hot-fix/XSum-test by @Prikshit7766 in #449
- update summarization prompt by @ArshaanNazir in #451
- Fix/tutorial nbs by @ArshaanNazir in #453
- DRAFT: Fix/max f1 score by @alytarik in #452
- Fix/tutorial nbs by @ArshaanNazir in #454
- fix eval score by @alytarik in #455
- update QA is_pass by @ArshaanNazir in #456
- Release/1.3.0 by @ArshaanNazir in #457
New Contributors
- @Prikshit7766 made their first contribution in #428
Full Changelog: v1.2.0...v1.3.0