This project classifies key aspects of criminal cases within the Israeli legal framework. The project leverages a few-shot learning approach for accurate sentence classification relevant to sentencing decisions.
- Code: Implements few-shot learning approaches for sentence classification.
- Data: Utilizes two datasets—one developed in collaboration with criminal law experts from the Israeli Ministry of Justice, focusing on key aspects of criminal cases, and another generated by ChatGPT and refined for this specific task.
- Results: Presents performance evaluations of the classification methods on weapon-related and drug-related cases, along with sample outputs from ChatGPT's automated sentence tagging.
The project provides two methodologies located in the code directory. Each methodology includes its own README file with detailed execution instructions.
The data directory includes the following subdirectories:
-
tagged_data_manuallyThis folder contains two subdirectories:
drugsandweapons. Each of these directories includes:train.csv/test.csv/eval.csv– Predefined data splits of verdict sentences for training, testing, and evaluation.agreement.csv– Contains statistics on inter-annotator agreement for the manual tagging process.
-
tagged_data_autoThis folder contains two subdirectories:drugsandweapons. Each of these directories includes:results_batch_LABEL_fewshot.csv– A dataset of 10,000 sentences automatically tagged by ChatGPT, categorized by labels (LABEL).
The result files are located in the results directory:
-
drugs_setfit&gpt_experiments.csv
Contains evaluation results oneval.csvlocated at
data/tagged_data_manually/drugs/eval.csv. -
verification_drugs_auto.csv
Contains results for a sample of sentences from
data/tagged_data_auto/drugs/results_batch_LABEL_fewshot.csv. -
verification_weapon_auto.csv
Contains results for a sample of sentences from
data/tagged_data_auto/weapons/results_batch_LABEL_fewshot.csv. -
wep_setfit&gpt_experiments.csv
Contains evaluation results oneval.csvlocated at
data/tagged_data_manually/weapons/eval.csv.
- Language: Hebrew
- License Type: OpenRAIL
- ResouceType: Coropra, Models and Tools
- Model Sub Category: Pre-trained language models
- Coropra Sub Category: Annotated Dataset
- Task: Text Classification