PyTorch Machine Reading Comprehension Toolkit

Introduction

The PyTorch Machine Reading Comprehension (PyTorch-MRC) toolkit, which was rewritten on the basis of Sogou Machine Reading Comprehension (SMRC), was designed for the fast and efficient development of modern machine comprehension models, including both published models and original prototypes.

Need Teammates!

The whole project is written and maintained by me alone, so I hope that some friends who like NLP and are interested in MRC will work with me to maintain it. Please contact me by email at yingzq0116@163.com.

Toolkit Architecture

Installation

Quick Start

Modules

data
- vocabulary.py: Vocabulary building, word/char index mapping and pretrained word embedding building.
- batch_generator.py: Mapping words and tags to indices and building them by PyTorch Dataset, padding length-variable features dynamically, transforming all of the features into tensors, and batching them by PyTorch DataLoader.
dataset
- squad.py: Dataset reader and evaluator (from official code) for SQuAD 1.1
examples
- Examples for running different models, where the specified data path should provided to run the examples
model
- Base class and subclasses of models, where any model should inherit the base class
- Built-in models such as BiDAF, R-Net and QANet
nn
- attention.py: Attention functions such as BiAttention, Trilinear and MultiHeadAttention
- layers: commonly used layers in PyTorch Machine Reading Comprehension, such as VariationalDropout, Highway and PointerNetwork
- recurrent: Special wrappers for LSTM and GRU
- similarity_function.py: Similarity functions for attention, such as dot_product, trilinear, and symmetric_nolinear
- util: some useful utility functions such as sequence_mask, weighted_sum and masked_softmax
utils
- tokenizer.py: Tokenizers that can be used for both English and Chinese
- feature_extractor: Extracting linguistic features used in some papers, e.g., POS, NER, and Lemma

Custom Model and Dataset

Performance

F1/EM score on SQuAD 1.1 dev set

Model	toolkit implementation	original paper
BiDAF	77.8/68.1	77.3/67.7
R-Net(sogou)	79.0/70.5	79.5/71.1
R-Net(hkust)	78.3/69.8	79.5/71.1
IARNN-Word	-	-
IARNN-hidden	-	-
DrQA	-	78.8/69.5
FusionNet	-	82.5/74.1
QANet	-	82.7/73.6
BERT-Base	-	88.5/80.8

F1/EM score on SQuAD 2.0 dev set

F1 score on CoQA dev set

Contact information

For help or issues using this toolkit, please submit a GitHub issue or by email yingzq0116@163.com.

Additional information

When implementing the MRC model, sometimes I didn't follow the paper reproduction model completely, because some parts of the paper were not clear to me or I didn't think they play a decisive role. So here's a description. Next I'll list the changes I've made.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
examples		examples
pytorch_mrc		pytorch_mrc
unit_tests		unit_tests
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

pytorch_mrc

pytorch_mrc

unit_tests

unit_tests

.gitignore

.gitignore

README.md

README.md

Repository files navigation

PyTorch Machine Reading Comprehension Toolkit

Introduction

Need Teammates!

Toolkit Architecture

Installation

Quick Start

Modules

Custom Model and Dataset

Performance

F1/EM score on SQuAD 1.1 dev set

F1/EM score on SQuAD 2.0 dev set

F1 score on CoQA dev set

Contact information

Additional information

Reference Code

Reference Paper

License

About

Releases

Packages

Languages

YingZiqiang/PyTorch-MRCToolkit

Folders and files

Latest commit

History

Repository files navigation

PyTorch Machine Reading Comprehension Toolkit

Introduction

Need Teammates!

Toolkit Architecture

Installation

Quick Start

Modules

Custom Model and Dataset

Performance

F1/EM score on SQuAD 1.1 dev set

F1/EM score on SQuAD 2.0 dev set

F1 score on CoQA dev set

Contact information

Additional information

Reference Code

Reference Paper

License

About

Resources

Stars

Watchers

Forks

Languages