FAQ_rank

This repository is currently being updated...

The code belonging to :

Gonzalez-Garduno, Ana Valeria; Augenstein, Isabelle; Søgaard, Anders. 2018. A strong baseline for question relevancy ranking. Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018. Brussels, Belgium.

note: extra data preprocessing has been added to improve accuracies from what has been reported

Before Running

The code was written in python 3.5 and requires keras (with tensorflow backend). Gensim needs to be installed as well as nltk. The code requires nltk data to be downloaded. If not downloaded already, type the following in the command terminal:

      >> python -c "import nltk; nltk.Download()"

The model uses the pretrained GloVe embeddings found here: https://nlp.stanford.edu/projects/glove/ .Specifically we use the vectors trained on Wikipedia (glove.6B.50d.txt). We place the embeddings in the folder feature_extraction/, however you can place them wherever and specify their location in the script extract_features.py

Extracting queries from XML files

The files used to extract the train, dev and test sets are under QA_data/semEval_data/ ... to extract the queries and relevant information for preprocessing simply go into the directory feature_extraction/ and run :

      >> python run_queryExtractor.py

the extracted queries and other data will be dumped in QA_data/data_dumps/

Extracting feature vectors

Once the queries have been extracted from the XML files run the following script:

      >> python extract_features.py [path_to_glove_embeddings][path_to_data_dumps]
      
i.e.  >> python extract_features.py glove.6B.50d.txt /Users/username/FAQ_rank/QA_data/data_dumps

This will create a list of vectors and labels in the following format:

[vectors_train, vectors_dev, vectors_test, labels_train, labels_dev, labels_test]

This will be dumped in the data_dumps folder.

Training a STL model

Once the vector files are created, one can easily train and test a single task model. Go into the models folder and run:

    >> python stl_mlp.py  [model_name] [path_to_vectors]

    i.e. >> python stl_mlp.py  test_model /Users/username/FAQ_rank/QA_data/data_dumps/qq_data.p

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
QA_data		QA_data
feature_extraction		feature_extraction
models		models
outputs		outputs
trained_models		trained_models
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAQ_rank

Before Running

Extracting queries from XML files

Extracting feature vectors

Training a STL model

About

Releases

Packages

Languages

anavaleriagonzalez/FAQ_rank

Folders and files

Latest commit

History

Repository files navigation

FAQ_rank

Before Running

Extracting queries from XML files

Extracting feature vectors

Training a STL model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages