Skip to content
Paper and related materials for Spirling & Rodriguez (2019) word embeddings overview and assessment
Branch: master
Clone or download
Latest commit 20711eb Mar 13, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
Paper Update Embeddings_SpirlingRodriguez.pdf Mar 12, 2019
Project_FAQ Update Mar 13, 2019

Word Embeddings: What works, what doesn’t, and how to tell the difference for applied research

Paper and related materials for Spirling & Rodriguez (2019). The abstract for the paper is as follows

We consider the properties and performance of word embeddings techniques in the context
of political science research. In particular, we explore key parameter choices—including
context window length, embedding vector dimensions and the use of pre-trained vs locally
fit variants—in terms of effects on the efficiency and quality of inferences possible with these
models. Reassuringly, with caveats, we show that results are robust to such choices for political
corpora of various sizes and in various languages. Beyond reporting extensive technical findings,
we provide a novel crowd-sourced “Turing test”-style method for examining the relative
performance of any two models that produce substantive, text-based outputs. Encouragingly,
we show that popular, easily available pre-trained embeddings perform at a level close to---or
surpassing---both human coders and more complicated locally-fit models. For completeness,
we provide best practice advice for cases where local fitting is required.

You can find the paper here and an FAQ for the project here.

Comments are very welcome!

You can’t perform that action at this time.