YahooEmbeddings

Code for project in Text Mining course (732A81) at Linköping University during the fall of 2022. See folder paper for the final report.

Abstract

Pretrained word embeddings are easily available and can be utilized to build representations of longer pieces of text. This paper investigates three simple strategies for representing paragraphs in a Q/A topic classification problem as well as for suitable common classifiers. The data used is the large scale Yahoo! Answers dataset which contains triples of question title, question content and best answer. The investigated approaches for paragraph representation are Distributed bag of words (DBOW), mean-pooling and projecting the word embeddings for an observation onto the first principal component. The DBOW and mean-pooling representations perform equally well with logistic regression (69% accuracy) and multilayer perceptron (71-72% accuracy). Other investigated models are SVM with linear and radial-basis function. The best model performs 4 percentage-points lower than the state-of-the-art on accuracy. Yet, the simplicity of the approaches show the power of pretrained word embeddings and simple solutions for representing longer pieces of text for topic classification in question and answer settings. SpaCy word embeddings are used throughout the study.

Data

Data is available on Google Drive.

See the official repo for the data and the corresponding paper.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
paper		paper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
models.py		models.py
project.ipynb		project.ipynb
readme.txt		readme.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YahooEmbeddings

Abstract

Data

About

Releases

Packages

Languages

License

TheodorEmanuelsson/YahooEmbeddings

Folders and files

Latest commit

History

Repository files navigation

YahooEmbeddings

Abstract

Data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages