To install the python packages, run the following command in Pycharm
pip install -r requirements.txt
There are comments in the first few lines of every function describing the purpose of the function.
The structure is as follows;
- BM25/
- DIRICHLET/
- IF-ROCCHIO-MODEL/
- TF-IDF/
- PROBABALISTIC_MODEL/
- EResult1.dat
- EResult2.dat
- EResult3.dat
- EResult4.dat
- EResult5.dat
1-5 are folders 6-10 are .dat files
Each folder contains files for the 50 training sets, each file containing scores of the respective query in the different documents of the training set
EResult1.dat contains the Information filtering Rocchio model scores. EResult2.dat contains the score for the BM25 baseline model EResult3.dat contains the score for the Dirichlet smoothing model EResult4.dat contains the score for the TF-IDF model EResult5.dat contains the score for the Information filtering Probabalistic model
from scipy import stats
This is the only additional package used along with numpy, it is used to calculate the t-test scores, to reject the null hypothesis