APPRISE is a lightweight GitHub companion repository for the APPRISE dataset. It contains code, documentation, prompts, persona definitions, validation scripts, mining scripts, and small sample files so researchers can inspect and reuse the project without committing the full dataset payload to GitHub.
The full APPRISE dataset is archived on Zenodo:
https://doi.org/10.5281/zenodo.20091031
Repository:
https://github.com/SoftALL/APPRISE
Hugging Face triplets dataset:
https://huggingface.co/datasets/SoftALL/APPRISE-triplets
APPRISE contains:
- 9,435 GitHub issues
- 13,579 persona-conditioned synthetic reviews
- 73,984 contrastive triplets
- 400 audit samples
- 10 personas
- 4 Android applications: Brave Browser, Signal Android, AnkiDroid, and K-9 Mail
This companion repository includes:
DATASHEET.md: dataset documentation, intended uses, limitations, and maintenance notespersonas.json: persona definitions used for synthetic review generationprompts/generation_prompt_template.txt: prompt template for generating persona-conditioned synthetic reviewsscripts/validate_dataset.py: validation script for checking APPRISE dataset consistencyscripts/mine_hard_negatives.py: script for mining lexical hard-negative candidatesscripts/compute_audit_stats.py: script for summarizing audit scores and distributional fieldsdata_sample/: small sample files for repository inspectionCITATION.cff: citation metadataLICENSE: MIT License
The full dataset files must not be committed to GitHub:
issues.jsonlreviews.jsonltriplets.tsvaudit.csv.ziparchives containing the full dataset
These filenames are listed in .gitignore. Keep the full files local or obtain them from the archived Zenodo dataset.
python -m pip install -r requirements.txtAfter downloading the full APPRISE dataset from Zenodo, run validation by pointing the script to the folder containing the full files:
python scripts/validate_dataset.py --data-dir path/to/apprise/dataTo regenerate small sample files from local full dataset files:
python scripts/validate_dataset.py --data-dir path/to/apprise/data --write-samples data_sample --sample-size 20python scripts/compute_audit_stats.py --audit path/to/apprise/data/audit.csvpython scripts/mine_hard_negatives.py \
--issues path/to/apprise/data/issues.jsonl \
--reviews path/to/apprise/data/reviews.jsonl \
--output hard_negatives.tsvIf you use APPRISE, cite the archived dataset and companion repository.
Zenodo DOI:
https://doi.org/10.5281/zenodo.20091031
See CITATION.cff for machine-readable citation metadata.
APPRISE includes attribution and provenance notes for BugRMSys-derived Brave Browser issue metadata. Please preserve those notes in downstream reuse.
APPRISE is released under the MIT License. See LICENSE.
-
Ogtay Hasanov, King Fahd University of Petroleum and Minerals
Email: g202417720@kfupm.edu.sa -
Saad Ezzini, King Fahd University of Petroleum and Minerals
Email: saad.ezzini@kfupm.edu.sa