Release John Snow Labs NLP Test 1.4.0: Enhancing Support for Toxicity test and new QA benchmark datasets (NarrativeQA, TruthfulQA, QuAC, HellaSwag, MMLU and OpenbookQA) · PacificAI/langtest

John Snow Labs NLP Test 1.4.0: Enhancing Support for Toxicity test and new QA benchmark datasets (NarrativeQA, TruthfulQA, QuAC, HellaSwag, MMLU and OpenbookQA)

📢 Overview

NLP Test 1.4.0 🚀 comes with brand new features, including: new capabilities for testing Large Language Models for toxicity and support for new QA benchmark datasets (NarrativeQA, TruthfulQA, QuAC, HellaSwag, MMLU and OpenbookQA) for robustness, representation, fairness and accuracy tests. It also includes addition of some new robustness tests and many other enhancements and bug fixes!

A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests 🎉

Make sure to give the project a star right here ⭐

🔥 New Features & Enhancements

Adding support for NarrativeQA dataset #487
Adding support for toxicity task #488
Adding support for TruthfulQA dataset #477
Adding support for new dyslexia swap test for robustness testing #474
Adding support for new slangificator test for robustness testing #463
Adding support for new abbreviation test for robustness testing #471
Adding support for OpenBookQA dataset #479
Adding support for MMLU dataset #481
Adding support for hellaswag dataset #486
Adding new tutorial notebooks #497

❓ How to Use

Get started now! 👇

pip install nlptest

Create your test harness in 3 lines of code 🧪

# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''

# Import and create a Harness object
from nlptest import Harness
h = Harness(task='toxicity', model='text-davinci-002', hub='openai', data='toxicity-test-tiny')

# Generate test cases, run them and view a report
h.generate().run().report()

📖 Documentation

❤️ Community support

Slack For live discussion with the NLP Test community, join the #nlptest channel
GitHub For bug reports, feature requests, and contributions
Discussions To engage with other community members, share ideas, and show off how you use NLP Test!

We would love to have you join the mission 👉 open an issue, a PR, or give us some feedback on features you'd like to see! 🙌

♻️ Changelog

What's Changed

updated/doc by @Prikshit7766 in #459
docs/Update documentation of models by @RakshitKhajuria in #465
refactor user prompt by @alytarik in #472
Feature/dyslexia swap feature by @ArkajyotiChakraborty in #417
Feature/add support for abbreviation test by @RakshitKhajuria in #471
Hotfix/get rid of some dependencies by @chakravarthik27 in #473
Draft: refactor/perturbations and samples to support QA. by @chakravarthik27 in #460
feature/Add speech to text typo by @Prikshit7766 in #475
hotfix/get rid of inflect dependency and refactoring robustness by @RakshitKhajuria in #478
Added TruthfulQA Dataset by @RakshitKhajuria in #477
feature/Add support for slangificator test by @Prikshit7766 in #463
Dataset/OpenBookQA datasets by @Prikshit7766 in #479
Datasets/MMLU Datasets by @Prikshit7766 in #481
Docs/update model hub-summarization nb-readme by @RakshitKhajuria in #480
Hotfix/fixed some tests and refactored number_to_word.py by @RakshitKhajuria in #483
Dataset/quac dataset by @Prikshit7766 in #484
Feature/dyslexia swap test by @alytarik in #474
Feature/hellaswag dataset by @alytarik in #486
Feature/narrativeqa dataset by @alytarik in #487
Feature/create toxicity test 438 by @chakravarthik27 in #488
hot-fix/fix-slangify-test by @RakshitKhajuria in #489
DRAFT : Docs/update nb and docs by @RakshitKhajuria in #490
Update datasets by @RakshitKhajuria in #493
Fix/toxicity by @chakravarthik27 in #492
Feature/add tutorial nbs by @ArshaanNazir in #497
default toxicity config by @chakravarthik27 in #498
docs/add dataset notebooks by @alytarik in #499
Release/1.4.0 by @ArshaanNazir in #500

New Contributors

@ArkajyotiChakraborty made their first contribution in #417

Full Changelog: v1.3.0...v1.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

John Snow Labs NLP Test 1.4.0: Enhancing Support for Toxicity test and new QA benchmark datasets (NarrativeQA, TruthfulQA, QuAC, HellaSwag, MMLU and OpenbookQA)

Choose a tag to compare

Sorry, something went wrong.