Osiris: Lightweight Hallucination Evaluation Model

🔥 News

arxiv is coming out soon.
Models checkpoints are available on huggingface:
- Qwen2.5-Osiris-0.5B-Instruct: judgmentlabs/Qwen2.5-Osiris-0.5B-Instruct
- Qwen2.5-Osiris-1.5B-Instruct: judgmentlabs/Qwen2.5-Osiris-1.5B-Instruct
- Qwen2.5-Osiris-3B-Instruct: judgmentlabs/Qwen2.5-Osiris-3B-Instruct
- Qwen2.5-Osiris-7B-Instruct: judgmentlabs/Qwen2.5-Osiris-7B-Instruct
📊 Dataset is available at here.

Usage

from src.data.perturb_musique import DatasetPerturbator

perturbator = DatasetPerturbator(
    dataset_path="/path/to/your/dataset.jsonl",
    output_dir="/path/to/save/perturbed/dataset"
)

perturbator.perturb()

Evaluation

# Navigate to the evaluation directory
cd src/data/evaluation

# Run the RAGTruth benchmark on models defined in this script
# This evaluates how well models detect hallucinations in RAG contexts
bash ragtruth_predict.sh

# Format the RAGTruth benchmark results into structured JSON files if necessary
# This prepares the data for analysis and visualization
bash format_predictions.sh

# Calculate and display evaluation metrics
# Shows Recall, Precision, and F1 scores for hallucination detection
python show_results.py

Training

We used LLaMA-Factory for efficient fine-tuning. A sample configuration can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Osiris: Lightweight Hallucination Evaluation Model

🔥 News

Usage

Evaluation

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

JudgmentLabs/osiris-detection

Folders and files

Latest commit

History

Repository files navigation

Osiris: Lightweight Hallucination Evaluation Model

🔥 News

Usage

Evaluation

Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages