BoolV

A method to evaluate the response of lightweight LLMs to TRUE-FALSE questions

Source Code for the Data Visualization: https://github.com/csisc/BoolV-Analysis.

To Cite the Work: Turki, H., Dossou, B. F. P., Nebli, A., & Valdelli, I. (2025). Evaluating the Behavior of Small Language Models in Answering Binary Questions. In 3rd International Workshop on Generalizing from Limited Resources in the Open World (GLOW@IJCAI 2025).

Models

Model	Hyperparameters
llama-3.2-1b-instruct-q8_0	1.24 B
llama-3.2-3b-instruct-q8_0	3.21 B
Phi-3.5-mini-instruct.Q8_0	3.82 B
Mistral-7B-Instruct-v0.3.Q8_0	7.25 B
llama-3.2-8b-instruct-q8_0	8.03 B

Dataset

https://github.com/google-research-datasets/boolean-questions
Train dataset: 9427 labeled training examples.
Dev dataset: 3270 labeled dev examples.

Dependencies

llama-cpp-python
pathlib
pandas
math
jsonlines

Funding

This research work has been done thanks to the computer resources of Wikimedia Switzerland.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Mistral-7B		Mistral-7B
Phi-3.5-mini		Phi-3.5-mini
llama-3.2-1B		llama-3.2-1B
llama-3.2-3B		llama-3.2-3B
llama-3.2-8B		llama-3.2-8B
output		output
LICENSE		LICENSE
README.md		README.md
calibration-false-first.py		calibration-false-first.py
calibration.py		calibration.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BoolV

Models

Dataset

Dependencies

Funding

About

Uh oh!

Releases

Packages

Languages

License

csisc/BoolV

Folders and files

Latest commit

History

Repository files navigation

BoolV

Models

Dataset

Dependencies

Funding

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages