John Snow Labs NLP Test 1.1.0: Announcing Support for Testing LLMs
π’ Overview
NLP Test 1.1.0 π comes with brand new features, including: new capabilities for testing Large Language Models on Question Answering tasks, with support for testing OpenAI-based LLMs and support for robustness tests on the BoolQ and Natural Questions datasets!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Support for testing OpenAI LLMs on Question Answering #361
- Support for BoolQ and Natural Questions datasets #361
- Improved layout for configuring tests #361
- Improved warning and error messaging #361
π Bug Fixes
- Fixed overlapping and mis-formatted country names in dictionaries #347
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='question-answering', model='gpt-3.5-turbo', hub='openai', data='BoolQ-test', config='config.yml')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptestchannel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- fix country names by @alytarik in #347
- Fix/country names by @alytarik in #348
- Adding support for openAI model testing for question-answering on several benchmark datasets by @chakravarthik27 in #361
- update boolQ prompt by @ArshaanNazir in #366
- Chore: Website updates for LLM release by @luca-martial in #369
- Update notebooks by @alytarik in #368
- Release/1.1.0 by @luca-martial in #367
Full Changelog: v1.0.2...v1.1.0