By janniss91
The objective of this project is to investigate whether Transformer models make use of selectional preferences when making predictions. E.g. there is a strong association between the words eat and spaghetti if spaghetti is an object. However, there is a weak association if it is a subject.
The question is whether Transformers make use of the same kind of information for their predictions.
For the setup of this repository simply type:
make
This will
- set up a virtual environment for this repository,
- install all necessary project dependencies.
Make sure that Python 3.7 or higher is installed in your virtual environment. Otherwise, there might be problems installing the transformers library.
After having run the make command you will have installed a virtual environment.
Always work in this environment to make sure to use the correct python interpreter and have access to the relevant dependencies.
To enter the environment in a shell, type:
. env/bin/activate
If you work in an IDE like e.g. Pycharm, make sure that it also makes use of the correct Python interpreter.
To deactivate the virtual environment, type:
deactivate
To reset the repository to its inital state, type:
make dist-clean
This will remove the virtual environment and all dependencies.
With the make command you can re-install them.
To remove temporary files like .pyc or .pyo files, type:
make clean
You can run the pipeline of the whole repository very easily with just one command:
./scripts/pipeline.sh your-input-file.tsv language search_term
your_input_file: a tsv gold file from the SORTS repository
language: german or dutch
search_term: name for sentence type (e.g. vlight or base-acc)
This includes the following steps:
- Filter specific sentence type.
- Predict the next word for the filtered sentences with all models.
- Generate statistics for the predictions that were just made for this sentence type.
- Generate the files with binary information about whether the correct prediction was found.
- Test significance for all models.
The data that was used in this project comes from the SORTS project (https://github.com/DiveFish/SORTS).
More specifically the german_part-ambiguous_gold.tsv file was filtered for sentence entries with light verbs.
The filtering was done by using the script data/data_prep_tools/filter_data.py.
The data can be found in the data folder; it is called filtered_vlight_data.tsv.
For dutch the dutch_part-ambiguous_gold.tsv file was filtered for sentence entries with light verbs.
To regenerate the filtered data, simply run:
python3 data/data_prep_tools/filter_data.py path_to_sorts_data.tsv data/filtered_file.tsv language search_term
The file names must be adapted to your case.
Predictions have been done for the following transformer models:
- xlm-roberta-base
- xlm-roberta-large
- bert-base-german-dbmdz-cased
- bert-base-german-dbmdz-uncased
- bert-base-german-cased
- bert-base-multilingual-uncased
- bert-base-multilingual-cased
Predictions have been done for the following transformer models:
- xlm-roberta-base
- xlm-roberta-large
- wietsedv/bert-base-dutch-cased
- bert-base-multilingual-uncased
- bert-base-multilingual-cased
To make a prediction for a single of these models, run e.g.:
python3 scripts/selective_preferences.py data/filtered_file.tsv -m xlm-roberta-base -o predictions/name_of_output_file.json
If you do not want to store the results in a json file but print the output, just remove the -o flag and output file name from the command.
To make predictions for all models, run:
./scripts/predict_all_models_pipeline.sh data/filtered_file.tsv language search_term
To set up prediction statistics, run e.g.:
python3 scripts/generate_stats.py data/filtered_file.tsv language search_term
To create the binary file for one model, run:
python3 binary_output_per_sentence.py statistics/binary_results/prediction_file.json data/filtered_file.tsv binary_output_file
You can also create the binary files for all models at once.
./write_all_binary_files.sh data/filtered_file.tsv language search_term
You can test superiority of one model output over the other by running a significance test. To do this, run:
./scripts/significance_test_all_files.sh language search_term
The output can be found in the file statistics/significance_test_results.tsv.
It is split up into four columns: model1, model2 and p-value, is_significant.
Usually a value under 0.05 indicates a significant result.
Everything above is considered insignificant.
Note that the script scripts/testSignificance.py has been taken from this repository: https://github.com/rtmdrr/testSignificanceNLP.git
It has been amended (drastically shortened) for the purposes of this project and only makes use of the Wilcoxon significance test.
IMPORTANT: The script was written in Python 2.7. It can be run with python 3.8 but if you encounter any problems, try running it with a python 2.7 interpreter outside of the virtual environment (make sure the library scipy is installed).