Unsupervised learning of political ideology by word vector projections.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dw-nominate
queried_vectors
queries
.gitignore
PoliVec_Projector.ipynb
README.md
gen_queries.py
gen_vectors.sh

README.md

Political Vector Projector

Given word vectors trained by word2vec (Mikolov et al. 2013) or fastText (Bojanowski et al. 2016), this program projects the vectors of U.S. senators onto a "conservative" to "liberal" axis. The scalar components of such projections may be interpreted as a valid metric of political ideology.

Learn more about this project at here. See this iPython Notebook for complete experiment results.

Highlight

Plotting the vector projected ideology against DW-NOMINATE, an ideology metric widely used in political science, reveals a strong correlation: alt text

Training Corpus Avg. Pearson’s r Avg. Spearman’s 𝝆
NYT 1981 - 2016 0.7559 0.7602
Wash Post 1977 - 2007 0.7902 0.8003
WSJ 1997 - 2017 0.7205 0.7184

In addition to members of Congress, you can also project vectors of public policies. These results are quite amusing but still highly experimental. Again, for a detailed account, please refer to here. alt text

Requirements

fastText or word2vec

Gensim (optional, only needed for the experimental feature of projecting public policies.)

The DW-NOMINATE ideology data is available at voteview.com. Some example data is already included in this repo.

I apologize that I have tested the code only with Python 3.6

How-To

There are two methods for loading vectors into PoliVec Projector:

First Method: Use the Word2VecProjector class to read vector files generated by word2vec. Call evaluate_ideology_projection() to evaluate a single congressional session of ideology data. Call multiyear_evaluation() and pass an iterator, e.g. multiyear_evaluation(cgrs_sess=range(97,115)), to evaluate multiple years of data. The iPython notebook includes several examples that will help you get started.

Second Method: (seemingly more complicated, but more efficient for comparing multiple years of data) Provide a plain text list of words you want to query, along with the axes onto which you want to project. The axes follow the order of: [positive x axis, negative x axis, positive y axis, negative y axis]. An example queries.txt looks like this:

conservative
liberal
good
bad
johnson
nixon
carter
reagan
etc.

If you are interested in members of Congress, the gen_queries.py script in this repo can take care of this step for you. The name lists of the 95th - 114th Senate (1977 - 2017) are also already included in this repo at queries/

Then, run gen_vectors.sh, which takes multiple lists of queries and feed them to fastText's print-word-vectors function. Be sure to revise the directories specified in the shell script so that it loads your own pre-trained fastText models. The vectors of members of the 95th to 114th Senate are also already included in this repo at queried_vectors/

Lastly, create a FastTextProjector object to load the queried vectors, then call evaluate_ideology_projection() or multiyear_evaluation().

(In principal, you can use PoliVec Projector with any word embedding models, so long as you make a subclass tweak the file IO methods to load your vectors properly.)

License

MIT