Peking University Semantic Computation and Knowledge Retrieval Course Project
Calculate the word similarity using different methods
-
data
MTURK-771.csv: Data set with the groud truth.
text8: The corpus from Wikipedia for training the Word2vec.(Please add glove.6B.300d.txt in dataset youself.)
-
result
corpus
: The result and model of word2vec.web_search
: The results of jaccard, overalap, pmi and dice.wordnet
: The results of path, wup, lch, res, lin and jcn. -
word_similarity.py
: Codes. -
report.pdf
: The report of Course Project.
Tweet Sentiment Classification (SemEval2017 Task 4 Subtask A)
-
data
glove.6B.300d:Pre-trained word vectors (dimension = 300). (Please add glove.6B.300d.txt in dataset youself.)
twitter-2016train-A/twitter-2016dev-A/twitter-2016test-A: Tweets, divided into training set, valid set and test set.
-
code
: Codes. -
report.pdf
: The report of Course Project.
Document-based Question Answering task (DBQA)
-
data
hanlp-wiki-vec-zh.txt: Pre-trained word vectors (dimension = 300). (Please add hanlp-wiki-vec-zh.txt in dataset youself.)
stop_words.txt: Chinese stop words.
nlpcc-iccpol-2016.dbqa.training-data/nlpcc-iccpol-2016.dbqa.testing-data/test.txt: NLPCC2017DBQA data, divided into training set, valid set and test set.
-
code
DBQA_CNN&Attention1
: CNN with Attention Model 1.DBQA_CNN&Attention2
: CNN with Attention Model 2. -
report.pdf
: The report of Course Project.