Measuring the most prominent people on Wikipedia
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Support files for TIME's ranking of the prominent people on Wikipedia.

Data was collected over several days in May using node-wikipedia, a Node.js module maintained by @wilson428.

We considered eight data points for each entry:

  • Number of words
  • Number of links to other Wikipedia pages
  • Number of external links (which are typically references)
  • Number of categories the person is in
  • Total number of revisions to the page
  • Number of unique individuals who have edited the page as a signed-in editors
  • Number of anonymous edits
  • Number of vandalisms, as identified in editing notes

Data for the top 100,000-or-so people is available as a 15MB CSV file.


Using out-of-the-box R functions, we reduced these eight variables to their principal components (using this handy guide). As you can see, a huge amount of the variance is contained in the first PC:


You can rerun the principal component analysis like so:

RScript wikipedia.r

(This may require installing the relevant libraries first).

By trial and error, the ranking that most satisfied our anecdotal sense for "influence" in the real world was PC1 + PC2, which becomes the score for each person.