squad-QA

by Boxiao Pan, Gael Colas and Shervine Amidi, graduate students from Stanford University

This is our final project for the CS224N: "Deep Learning for Natural Language Processing" class in Stanford (2019). Our teacher was Pr. Christopher Manning.

Language: Python (Pytorch)

Goal: Question Answering on the updated Stanford Question Answering Dataset, named SQuAD 2.0

Our final model used a Segment-Based Aggregation method on the BiDAF architecture. We implemented a powerful word embedding to further improve the performance: pretrained GloVe embeddings + character-level word embedding + tag features (POS, NER) and hand engineered features (EM, TF*IDF). Finally, we designed a new loss term: the "index distance-aware loss". The idea being that contrary to the cross-entropy loss, we should penalize more wrong predictions that are far away from the true index than those which are close.

For more details, please refer to our final report at the root: "cs224n_project_final-report.pdf".

Test results: EM: 61.2 ; F1: 65.0

Acknowledgement

The starter code for this project was a custom BiDAF implementation provided by Chris Chute: starter code

Requirements

To create the environment with all the necessary packages, run "conda env create -f environment.yml".

To activate the environment: "conda activate squad".

To use the Stanford CoreNLP tokenizer

Guidelines from: https://github.com/Lynten/stanford-corenlp/blob/master/README.md

Java 1.8+ (Check with command: java -version) (Download Page)

Stanford CoreNLP (Download Page)

Py Version	CoreNLP Version
v3.7.0.1 v3.7.0.2	CoreNLP 3.7.0
v3.8.0.1	CoreNLP 3.8.0
v3.9.1.1	CoreNLP 3.9.1

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
segment_based_aggregation		segment_based_aggregation
.gitignore		.gitignore
README.md		README.md
RESULTS.md		RESULTS.md
__init__.py		__init__.py
args.py		args.py
cs224n_project_final-report.pdf		cs224n_project_final-report.pdf
environment.yml		environment.yml
help_functions.py		help_functions.py
layers.py		layers.py
loss.py		loss.py
models.py		models.py
setup.py		setup.py
test.py		test.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

squad-QA

Acknowledgement

Requirements

To use the Stanford CoreNLP tokenizer

About

Releases

Packages

Languages

ColasGael/QA-squad

Folders and files

Latest commit

History

Repository files navigation

squad-QA

Acknowledgement

Requirements

To use the Stanford CoreNLP tokenizer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages