Description

reliability-checklist is a Python framework (available via CLI) for Comprehensively Evaluating the Reliability of NLP Systems

reliability-checklist accepts any model and dataset as input and facilitates the comprehensive evaluation on a wide range of reliability-related aspects such as accuracy, selective prediction, novelty detection, stability, sensitivity, and calibration.

Why you might want to use it:

✅ No coding needed
Pre-defined templates available to easily integrate your models/datasets via command line only.

✅ Bring Your own Model (BYoM)
Your model template is missing? We have you covered: Check out BYoM to create your own model specific config file.

✅ Bring Your own Data (BYoD)
Your dataset template is missing? Check out BYoD to create your own dataset specific config file.

✅ Reliability metrics
Currently, we support a number of reliability related aspects:

Accuracy/F1/Precision/Recall
Calibration: Reliability Diagram Expected Calibration Error (ECE), Expected Overconfidence Error (EOE)
Selective Prediction: Risk-Coverage Curve (RCC), AUC of risk-coverage curve
Sensitivity
Stability
Out-of-Distribution

Upcoming Reliability Aspects:

Adversarial Attack: Model in the loop adversarial attacks to evaluate model's robustness.
Task-Specific Augmentations: Task-specific augmentations to check the reliability on augmented inputs.
Novelty
Other Measures: We plan to incorporate other measures such as bias, fairness, toxicity, and faithfulness of models. We also plan to measure the reliability of generative models on crucial parameters such as hallucinations.

Workflow

✅ Want to integrate more features?
Our easy-to-develop infrastructure allows developers to contribute models, datasets, augmentations, and evaluation metrics seamlessly to the workflow.

How to install?

pip install git+https://github.com/Maitreyapatel/reliability-checklist

python -m spacy download en_core_web_sm
python -c "import nltk;nltk.download('wordnet')"

How to use?

Evaluate example model/data with default configuration

# eval on CPU
recheck

# eval on GPU
recheck trainer=gpu +trainer.gpus=[1,2,3]

Evaluate model with chosen dataset-specific experiment configuration from reliability_checklist/configs/task/

recheck tasl=<task_name>

Specify the custom model_name as shown in following MNLI example

# if model_name is used for tokenizer as well.
recheck task=mnli custom_model="bert-base-uncased-mnli"

# if model_name is different for tokenizer then
recheck task=mnli custom_model="bert-base-uncased-mnli" custom_model.tokenizer.model_name="ishan/bert-base-uncased-mnli"

Add custom_model config

# create config folder structure similar to reliability_checklist/configs/
mkdir ./configs/
mkdir ./configs/custom_model/

# run following command after creating new config file inside ./configs/custom_model/<your-config>.yaml
recheck task=mnli custom_model=<your-config>

Visualization of results

reliability-checklist supports the wide range of visualization tools. One can decide to go with default wandb online visualizer. It also generates plots that are highly informative which will be stored into logs directory.

🤝 Contributing to `reliability-checklist`

Any kind of positive contribution is welcome! Please help us to grow by contributing to the project.

If you wish to contribute, you can work on any features/issues listed here or create one on your own. After adding your code, please send us a Pull Request.

Please read CONTRIBUTING for details on our CODE OF CONDUCT, and the process for submitting pull requests to us.

A ⭐️ to reliability-checklist is to build the reliability of Language Models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Description

Why you might want to use it:

Upcoming Reliability Aspects:

Workflow

How to install?

How to use?

Add custom_model config

Visualization of results

🤝 Contributing to `reliability-checklist`

A ⭐️ to reliability-checklist is to build the reliability of Language Models.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Description

Why you might want to use it:

Upcoming Reliability Aspects:

Workflow

How to install?

How to use?

Add custom_model config

Visualization of results

🤝 Contributing to reliability-checklist

A ⭐️ to reliability-checklist is to build the reliability of Language Models.

🤝 Contributing to `reliability-checklist`