## Project 05: Wikipedia Page Histories

- **Due**: (essay and project) Tuesday, 4 December 2018; 12:00pm
- **Total Points**: 350
    - selected and described a coherent corpus for analysis, 40 points
    - completed a thoughtful, well-written essay on your topic, 320 points
    - oral presentation is clear and rehearsed, 40 points
    
This projects serves as the "final" project for the semester. It has several
components that we will build up to through the final week of classes. You must
again create a set of interactive webpages, as in Projects 3 and 4, but this
time the set should be smaller and more focused. At the same time, your analysis
will also include information about page histories. 

Here are the steps that you should take, roughly in order, for the project:

1. Run my sample code below exactly as-is. This will create a page for the page
histories of 6 well-known philosophers. Click around and see all of the information
that is shown for the pages. Note that I have manually written the pages that I want
to include as a text file.
2. Select a topic that you want to study for this analysis, which should contain
anywhere from about 30-200 pages. Start by filling in just a few links and doing
another test run. Look at the pages and think about what information is captured
and shown.
3. Finish filling in the links and having Python pull all of the data. You should
now have a reasonable looking corpus to study.
4. Now, write a short essay about the collection, the methods you used, and
interesting patterns. Connect this with some other literature on your topic of 
interest and include citations. Note that this is the largest component of your
grade! I expect somewhere around 1000-1500 words of content. Please write this in
either LaTeX or Word. 
5. In class on December 4th, the website and essay are due. That day we will
put your essay into the index page of your site and show you how to publish your
results.
6. On the last day of class, December 6th, you will briefly present your site to
rest of the class.

**Make sure to hand this file, your list of pages, your essay, and the entire website, 
online on GitHub prior to the due date.**

### Application

The only thing you need to change with the code this time is the parameter
`page_name`. Otherwise all of your changes will come from changing the data
found in the file `page-names.txt`.

In [1]:
import wiki
import wikitext
import wikihistory

In [2]:
with open('page-names.txt', 'rt', encoding='UTF-8') as finput:
    links = [x for x in finput.read().splitlines() if x]

In [3]:
links

['Boston Celtics',
 'Brooklyn Nets',
 'New York Knicks',
 'Philadelphia 76ers',
 'Toronto Raptors',
 'Chicago Bulls',
 'Cleveland Cavaliers',
 'Detroit Pistons',
 'Indiana Pacers',
 'Milwaukee Bucks',
 'Atlanta Hawks',
 'Charlotte Hornets',
 'Miami Heat',
 'Orlando Magic',
 'Washington Wizards',
 'Dallas Mavericks',
 'Houston Rockets',
 'Memphis Grizzlies',
 'New Orleans Pelicans',
 'San Antonio Spurs',
 'Denver Nuggets',
 'Minnesota Timberwolves',
 'Oklahoma City Thunder',
 'Portland Trail Blazers',
 'Utah Jazz',
 'Golden State Warriors',
 'Los Angeles Clippers',
 'Los Angeles Lakers',
 'Phoenix Suns',
 'Sacramento Kings']

In [4]:
wcorp = wikitext.WikiCorpus(links, num_clusters=5)

Pulling data from MediaWiki API: 'Boston Celtics'
Pulling data from MediaWiki API: 'Brooklyn Nets'
Pulling data from MediaWiki API: 'New York Knicks'
Pulling data from MediaWiki API: 'Philadelphia 76ers'
Pulling data from MediaWiki API: 'Toronto Raptors'
Pulling data from MediaWiki API: 'Chicago Bulls'
Pulling data from MediaWiki API: 'Cleveland Cavaliers'
Pulling data from MediaWiki API: 'Detroit Pistons'
Pulling data from MediaWiki API: 'Indiana Pacers'
Pulling data from MediaWiki API: 'Milwaukee Bucks'
Pulling data from MediaWiki API: 'Atlanta Hawks'
Pulling data from MediaWiki API: 'Charlotte Hornets'
Pulling data from MediaWiki API: 'Miami Heat'
Pulling data from MediaWiki API: 'Orlando Magic'
Pulling data from MediaWiki API: 'Washington Wizards'
Pulling data from MediaWiki API: 'Dallas Mavericks'
Pulling data from MediaWiki API: 'Houston Rockets'
Pulling data from MediaWiki API: 'Memphis Grizzlies'
Pulling data from MediaWiki API: 'New Orleans Pelicans'
Pulling data from MediaWik

In [5]:
wikitext.wiki_text_explorer(wcorp, page_name="Current NBA Team")

In [6]:
wikihistory.wiki_text_explore_page(wcorp, page_name="Current NBA Team")

Loaded 1000 revisions, through 2017-05-14T07:54:28Z
Loaded 1500 revisions, through 2016-05-31T21:52:32Z
Loaded 2000 revisions, through 2015-04-26T00:59:04Z
Loaded 2500 revisions, through 2014-01-15T18:09:21Z
Loaded 3000 revisions, through 2013-02-28T05:15:18Z
Loaded 3500 revisions, through 2012-06-22T14:56:34Z
Loaded 4000 revisions, through 2011-08-09T21:10:54Z
Loaded 4500 revisions, through 2011-01-16T07:51:42Z
Loaded 5000 revisions, through 2010-10-18T21:45:18Z
Loaded 5500 revisions, through 2010-06-17T18:12:08Z
Loaded 6000 revisions, through 2010-03-14T02:15:54Z
Loaded 6500 revisions, through 2009-06-22T16:40:02Z
Loaded 7000 revisions, through 2009-02-11T18:17:07Z
Loaded 7500 revisions, through 2008-07-01T02:03:08Z
Loaded 8000 revisions, through 2008-03-21T01:26:50Z
Loaded 8500 revisions, through 2007-11-19T18:58:22Z
Loaded 9000 revisions, through 2007-06-23T08:21:40Z
Loaded 9500 revisions, through 2006-11-02T00:16:04Z
Loaded 10000 revisions, through 2005-01-03T02:22:06Z
Loaded 1007

Loaded 4000 revisions, through 2007-05-18T18:52:13Z
Loaded 4500 revisions, through 2006-11-14T16:02:00Z
Loaded 5000 revisions, through 2006-05-05T15:46:05Z
Loaded 5500 revisions, through 2005-12-31T06:37:25Z
Loaded 6000 revisions, through 2004-06-16T16:10:51Z
Loaded 6057 revisions, through 2002-08-17T20:53:11Z
Grabbed page at 870707781
Grabbed page at 817789516
Grabbed page at 756591764
Grabbed page at 697591460
Grabbed page at 640168358
Grabbed page at 588073525
Grabbed page at 529985280
Grabbed page at 468845806
Grabbed page at 404271459
Grabbed page at 335061415
Grabbed page at 260671401
Grabbed page at 181079947
Grabbed page at 97529249
Grabbed page at 33351204
Grabbed page at 9107508
Grabbed page at 2426501
Grabbed page at 629941
Loaded 1000 revisions, through 2014-04-03T21:57:26Z
Loaded 1500 revisions, through 2012-05-18T15:29:52Z
Loaded 2000 revisions, through 2009-11-05T23:02:25Z
Loaded 2500 revisions, through 2007-05-06T20:20:04Z
Loaded 3000 revisions, through 2005-08-30T21:19

Grabbed page at 9021465
Grabbed page at 2259533
Grabbed page at 599210
Loaded 1000 revisions, through 2012-11-14T02:16:00Z
Loaded 1500 revisions, through 2009-01-31T20:49:05Z
Loaded 2000 revisions, through 2007-02-03T14:49:20Z
Loaded 2392 revisions, through 2002-08-17T21:31:30Z
Grabbed page at 870721211
Grabbed page at 816720337
Grabbed page at 757542466
Grabbed page at 696776687
Grabbed page at 636812421
Grabbed page at 586714604
Grabbed page at 529527845
Grabbed page at 467576913
Grabbed page at 404584460
Grabbed page at 333392245
Grabbed page at 260201362
Grabbed page at 181150144
Grabbed page at 97617284
Grabbed page at 33322764
Grabbed page at 9604344
Grabbed page at 2092764
Grabbed page at 1078752
Loaded 1000 revisions, through 2013-05-20T21:31:12Z
Loaded 1500 revisions, through 2011-06-30T04:48:46Z
Loaded 2000 revisions, through 2009-05-11T19:33:16Z
Loaded 2500 revisions, through 2008-03-15T15:53:05Z
Loaded 3000 revisions, through 2007-02-10T02:11:58Z
Loaded 3500 revisions, thro

Loaded 3500 revisions, through 2008-05-19T07:45:26Z
Loaded 4000 revisions, through 2008-02-23T21:44:43Z
Loaded 4500 revisions, through 2007-10-07T05:59:51Z
Loaded 5000 revisions, through 2007-04-20T01:50:01Z
Loaded 5500 revisions, through 2006-12-13T15:22:50Z
Loaded 6000 revisions, through 2006-04-29T01:08:34Z
Loaded 6500 revisions, through 2005-10-17T01:50:31Z
Loaded 6797 revisions, through 2002-08-17T22:02:38Z
Grabbed page at 870096181
Grabbed page at 817044564
Grabbed page at 755328001
Grabbed page at 696605082
Grabbed page at 640161645
Grabbed page at 587912599
Grabbed page at 530409825
Grabbed page at 467950283
Grabbed page at 404957974
Grabbed page at 334816327
Grabbed page at 260792492
Grabbed page at 181285562
Grabbed page at 97473129
Grabbed page at 33277619
Grabbed page at 9016614
Grabbed page at 2340099
Grabbed page at 564395
Loaded 1000 revisions, through 2013-11-10T05:18:17Z
Loaded 1500 revisions, through 2011-03-03T07:51:33Z
Loaded 2000 revisions, through 2009-05-29T01:20