Political Vector Projector
Given word vectors trained by
word2vec (Mikolov et al. 2013) or
fastText (Bojanowski et al. 2016), this program projects the vectors of U.S. senators onto a "conservative" to "liberal" axis. The scalar components of such projections may be interpreted as a valid metric of political ideology.
Plotting the vector projected ideology against DW-NOMINATE, an ideology metric widely used in political science, reveals a strong correlation:
|Training Corpus||Avg. Pearson’s r||Avg. Spearman’s 𝝆|
|NYT 1981 - 2016||0.7559||0.7602|
|Wash Post 1977 - 2007||0.7902||0.8003|
|WSJ 1997 - 2017||0.7205||0.7184|
In addition to members of Congress, you can also project vectors of public policies. These results are quite amusing but still highly experimental. Again, for a detailed account, please refer to here.
Gensim (optional, only needed for the experimental feature of projecting public policies.)
The DW-NOMINATE ideology data is available at voteview.com. Some example data is already included in this repo.
I apologize that I have tested the code only with Python 3.6
There are two methods for loading vectors into PoliVec Projector:
First Method: Use the
Word2VecProjector class to read vector files generated by word2vec. Call
evaluate_ideology_projection() to evaluate a single congressional session of ideology data. Call
multiyear_evaluation() and pass an iterator, e.g.
multiyear_evaluation(cgrs_sess=range(97,115)), to evaluate multiple years of data. The iPython notebook includes several examples that will help you get started.
Second Method: (seemingly more complicated, but more efficient for comparing multiple years of data) Provide a plain text list of words you want to query, along with the axes onto which you want to project. The axes follow the order of:
[positive x axis, negative x axis, positive y axis, negative y axis]. An example queries.txt looks like this:
conservative liberal good bad johnson nixon carter reagan etc.
If you are interested in members of Congress, the
gen_queries.py script in this repo can take care of this step for you. The name lists of the 95th - 114th Senate (1977 - 2017) are also already included in this repo at
gen_vectors.sh, which takes multiple lists of queries and feed them to fastText's
print-word-vectors function. Be sure to revise the directories specified in the shell script so that it loads your own pre-trained fastText models. The vectors of members of the 95th to 114th Senate are also already included in this repo at
Lastly, create a
FastTextProjector object to load the queried vectors, then call
(In principal, you can use PoliVec Projector with any word embedding models, so long as you make a subclass tweak the file IO methods to load your vectors properly.)