To Run the code, there are two steps:
- download the
corpus.word2vec
from https://wikipedia2vec.github.io/wikipedia2vec/#pretrained-embeddings, under the main folder. (Since it is too large, I did not upload it onto the github) - split the
geo-multi-question.txt
intotrain.txt
,test.txt
and store them under thedata
folder.
成果:模型在学习知识后,在测试集(没有做过的题)上达到了44分/100分的成绩,在训练集(做过的题)上达到了77分/100分的成绩。希望之后能够将模型进一步优化,争取在测试集上达到60分/100分的成绩。
题库来源: http://igeocn.com/igeocn/tiku/tk-1st/igeocn-qa-1.html (详情见data/geo-multi-question.txt
)
背景知识来源: https://www.liuxue86.com/gaokao/dilizhishidian/ (详情见data/database_org.txt
)
网络的搭建使用到了 BAMnet 的思想,这个思想来自于 《Bidirectional Attentive Memory Networks for Question Answering over Knowledge Base》
在知识的查询过程中同时考虑了背景知识和知识库的知识,在一定程度上解决了 incomplete KB 问题,这个思想参考了《improving QA over incomplete KBs with Knowledge-Aware Reader》
sim_num | rnn_size | sigma |
---|---|---|
5 | 100 | 0.45 |
embedding_dim | max_len | lr |
50 | 100 | 0.4 |
reg_factor | train_period | batch_size |
0.5 | 50 | 256 |