Release/2.2.0 by chakravarthik27 · Pull Request #1029 · PacificAI/langtest

chakravarthik27 · 2024-05-15T10:33:55Z

📢 Highlights

John Snow Labs is excited to announce the release of LangTest 2.2.0! This update introduces powerful new features and enhancements to elevate your language model testing experience and deliver even greater insights.

🏆 Model Ranking & Leaderboard: LangTest introduces a comprehensive model ranking system. Use harness.get_leaderboard() to rank models based on various test metrics and retain previous rankings for historical comparison.
🔍 Few-Shot Model Evaluation: Optimize and evaluate your models using few-shot prompt techniques. This feature enables you to assess model performance with minimal data, providing valuable insights into model capabilities with limited examples.
📊 Evaluating NER in LLMs: This release extends support for Named Entity Recognition (NER) tasks specifically for Large Language Models (LLMs). Evaluate and benchmark LLMs on their NER performance with ease.
🚀 Enhanced Data Augmentation: The new DataAugmenter module allows for streamlined and harness-free data augmentation, making it simpler to enhance your datasets and improve model robustness.
🎯 Multi-Dataset Prompts: LangTest now offers optimized prompt handling for multiple datasets, allowing users to add custom prompts for each dataset, enabling seamless integration and efficient testing.

…asualLM Models

…t issues

…formance and error handling

…rness(langtest.py)

…rompt-handling-for-different-datasets User prompt handling for multi-dataset testing

… handling

…e harness reports over the timestamp.

…age score

…derboard functionality

…pdate leaderboard functionality

…dataset

Bug fix/performance tests

…-augmentation-allow-access-without-harness-testing

…a data

…-importing-of-edited-testcases-into-harness Refactor: Improved the `import_edited_testcases()` functionality in Harness.

…pt-techniques Implementation of prompt techniques

…ization and readability.

…-benchmark-report Fix: Summary class to update summary dataframe and handle file path

…allow-access-without-harness-testing Refactor: Improve Code Organization and Readability

…website_updates

…-benchmark-report Improved: `rank_by` argument add to `harness.get_leaderboard()`

…website_updates

website updates

updated: langtest version in pip

chakravarthik27 added 30 commits April 11, 2024 13:52

Add NER model handler and update PretrainedModelForNER class

61b57de

Add default_llm_chat_prompt to helpers.py and Added NER support for C…

b3d16d7

…asualLM Models

Fix role_extract regex pattern in PretrainedModelForNER class and lin…

30c20ef

…t issues

Refactor predict method in PretrainedModelForNER class for better per…

a46fa39

…formance and error handling

Enhancements for user prompt handling for multi-dataset testing in Ha…

85a3b80

…rness(langtest.py)

Update user prompt handling in Harness for multi-dataset testing

c9055cf

Merge pull request #1010 from JohnSnowLabs/Enchaments/implement-the-p…

75453d0

…rompt-handling-for-different-datasets User prompt handling for multi-dataset testing

Refactor PretrainedModelForNER class for better performance and error…

a32b3d8

… handling

Refactor LLMChain and PromptTemplate imports in llm_modelhandler.py

1c34bd9

Refactor PretrainedModelForNER class for better performance and error…

fc20e04

… handling

fix linting issues

01c5473

New: created the leaderboard and summary classes in utils to track th…

82bbf4a

…e harness reports over the timestamp.

Add benchmarking functionality to Harness class

ab450d8

Add benchmarking functionality to Harness class

6801583

Refactor sorting logic in Leaderboard class to sort by model and aver…

fc8855a

…age score

Refactor benchmarking logic in Harness class and Leaderboard class

b6b131c

Update data source and target column in TestNERDataset class

d64a6a5

benchmarking logic in Harness and Leaderboard classes, and update lea…

c9e011e

…derboard functionality

Refactor benchmarking logic in Harness and Leaderboard classes, and u…

a0de4f6

…pdate leaderboard functionality

Refactor dataset_name column logic in Harness class to handle single …

214f44b

…dataset

Fix comparison operator in SpeedTestSample class

e0f1bfd

Implement Augmenter class for data augmentation in langtest

b7769bd

Update data source and target column in TestNERDataset

b50dcfd

Refactor Augmenter class to remove unused code and improve performance

08d88fd

updated the nb

042d3d2

Refactor Augmenter class to improve code organization and readability

74589d3

update the nb

702b2b7

Merge pull request #1015 from JohnSnowLabs/bug_fix/performance_tests

60e3f1a

Bug fix/performance tests

Merge remote-tracking branch 'origin/release/2.2.0' into feature/data…

4adf0f4

…-augmentation-allow-access-without-harness-testing

Implemented new, inplace and extend functionality style to genereate …

8b308d9

…a data

chakravarthik27 and others added 26 commits May 11, 2024 13:13

fix lint and format issue

e7e94a2

Merge pull request #1022 from JohnSnowLabs/enhancements/improving-the…

89dd5f3

…-importing-of-edited-testcases-into-harness Refactor: Improved the `import_edited_testcases()` functionality in Harness.

Merge pull request #1018 from JohnSnowLabs/feature/implement-the-prom…

e2d08fc

…pt-techniques Implementation of prompt techniques

Notebook for LLM evaluation in ner task

760d9a0

Data_Augmenter Nb

876e39c

Added the MultiPrompt_MultiDataset NB

4cc11d0

chore: Update Langtest_Cli_Eval_Command.ipynb and Benchmarking Report NB

dd9e9bb

Refactor Summary class to update summary dataframe and handle file path

04d67e4

Refactor Augmenter class to DataAugmenter for improved code organ…

34dfdf9

…ization and readability.

resolved: lint issues.

c7b29f7

Merge pull request #1024 from JohnSnowLabs/Improvements/load-and-save…

f9fcf7e

…-benchmark-report Fix: Summary class to update summary dataframe and handle file path

Merge pull request #1025 from JohnSnowLabs/feature/data-augmentation-…

e641ee9

…allow-access-without-harness-testing Refactor: Improve Code Organization and Readability

Merge remote-tracking branch 'origin/release/2.2.0' into chore/final_…

43be5ab

…website_updates

Updated the Augmeter to DataAugmeter

cf34e05

updated the description in nb

95f12d3

updated: ner task on llm

214ac2f

Refactor leaderboard class to support ranking by different criteria

5481109

Merge pull request #1027 from JohnSnowLabs/Improvements/load-and-save…

ce7a06a

…-benchmark-report Improved: `rank_by` argument add to `harness.get_leaderboard()`

Merge remote-tracking branch 'origin/release/2.2.0' into chore/final_…

ac75314

…website_updates

Updated: Benchmarking NB

39a0116

Refactor pagination component to include additional release notes links

fc00494

Add Fewshot Model Evaluation and Evaluating NER in LLMs tutorials

3688c5c

Merge pull request #1023 from JohnSnowLabs/chore/final_website_updates

9e84ae9

website updates

updated: langtest version in pip

aa76c7f

minor format fix

dd5e083

Merge pull request #1028 from JohnSnowLabs/chore/final_website_updates

c0642ed

updated: langtest version in pip

chakravarthik27 requested a review from ArshaanNazir May 15, 2024 10:33

chakravarthik27 self-assigned this May 15, 2024

ArshaanNazir approved these changes May 15, 2024

View reviewed changes

ArshaanNazir merged commit f875632 into main May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/2.2.0#1029

Release/2.2.0#1029
ArshaanNazir merged 86 commits intomainfrom
release/2.2.0

chakravarthik27 commented May 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chakravarthik27 commented May 15, 2024

📢 Highlights

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants