Code for the AP Research paper: "A Comparison of Training Models on the Accuracy of Artificial Essay".
To install the required dependencies:
cd
to the project folder- Run
pip install requirements.txt
The dataset and GloVe embedding are too large to include in the project. If you want to import them, please download them seperatly and put them in the ComaprisonAES/data
directory:
- Training Data
- GloVe Embedding. (Note: the 300d embedding is used in the paper)
This project is split into 4 different Jupyter notebooks:
pipeline.ipynb
is an easy-to-use pipeline which evaluates 5-fold modelsFeature Selection.ipynb
extracts features and generates a*.pkl
represenatation for the dataset to be used in supervised modelsRegression Models.ipynb
is used to train supervised models, drawing from their*.pkl
generated representationNeural Models.ipynb
is used to develop the pipeline for each type of unsuprvised network
These notebooks also use helper python functions, stored in the ComparisonAES/utils
folder:
customLayers.py
contains custom Mean over Time and Attention pooling layers for KerascustomUtils.py
contains various helper functions for preprocessing and getting various layers for unsupervised networkspipeline.py
allows for essays to be individually tokenized to support the web demo
To start the web demo,
- cd to the
ComparisonAES/mysite
folder - Run
python manage.py runserver
- Navigate to
http://localhost:8000/
in your browser
The server uses models stored in a *.h5
format. To use your own models, place them in the ComparisonAES\mysite\evaluator\models\draft
folder using the format [model name]_[prompt ID]_model.h5
or [model name]_[prompt ID]_weights.h5
for models and thier weights respectfully.
Here is the scematic on the architecture of the unsupervised models: