

Welcome!

You have just opened a collection of notebooks that lets you inspect the evolution of the revision history of a Wikipedia article, up to now (From the English language edition). It also allows you to highlight **article- or word-specific conflicts as well as the productivity of any given editor.** 

Specifically, for the notebooks after this initial one, it interfaces with the API of a specialized service called [WikiWho](www.wikiwho.net), which provides fine-grained change information about the tokens (words) in an article. 

It is written in a way that you can **explore it like a Web app, without interacting with the code behind it**, or - if you choose to - click on "edit app" in the Juypter navigation bar and play around with the code yourself. 

The default introduction example is the article "The Camp of the Saints" (a novel), which we recommend to start with. You can enter/search an article of your choice and explore it as well. 

Let's first get live data of some general statistics from Wikipedias own API and a service called Xtools:

In [1]:
from IPython.display import display
from ipywidgets import widgets, Output
from toggle import hide_toggle2

# design the button
toggle_modules = widgets.Button(description='Modules Imported', button_style='success')
display(toggle_modules)

# cell show/hide to play around with
def hide_modules(b):
    with out1:
        clear_output()
        display(hide_toggle2(for_next=True))
        
        
out1 = Output()
display(out1)

toggle_modules.on_click(hide_modules)

Button(button_style='success', description='Modules Imported', style=ButtonStyle())

Output()

In [2]:
## IMPORT MODULES ##
# for display
from IPython.display import display, Markdown as md, clear_output, Javascript, HTML
from ipywidgets import widgets, Output
import urllib

# for data process
import pandas as pd

# for visualization
from visualization.views_listener import ViewsListener
from ipywidgets import interact
from ipywidgets.widgets import Dropdown

# APIs
from external.wikipedia import WikipediaDV, WikipediaAPI
from external.wikimedia import WikiMediaDV, WikiMediaAPI
from external.xtools import XtoolsAPI, XtoolsDV

# toggle cells
from toggle import hide_toggle, hide_toggle2, hide_cell

# show codes and wrapper as Markdown
from to_markdown import code_to_md, wrapper_to_md

## SOME EXTENTIONS ##
#%load_ext autoreload
%reload_ext autoreload
%autoreload 2
%store -r the_page

if 'the_page' not in locals():
    import pickle
    print("Loading default data...")
    the_page = pickle.load(open("data/the_page.p",'rb'))

---

# A. Basic Info from Wikipedia

***Search for an article on the English Wikipedia***

In [3]:
# Hide all cell prompts.
display(HTML('<style> div.prompt{display: none} </style>'))

# Hide all input cells.
hide_cell(hide_code=True)

In [4]:
wikipedia_dv = WikipediaDV(WikipediaAPI(domain='en.wikipedia.org'))

# the method that listens to the click event
def on_button_clicked(b):
    global the_page
    
    # use the out widget so the output is overwritten when two or more
    # searches are performed
    with out:
        try:
            # query wikipedia
            search_result = wikipedia_dv.search_page(searchTerm.value)
            the_page = wikipedia_dv.get_page(search_result)
            %store the_page
            clear_output()
            display(the_page.to_frame('value'))
            display(md(f'You selected:'))
            display(the_page['title'])
            display(Javascript('Jupyter.notebook.execute_cells([8])'))

        except:
            clear_output()
            display(md(f'The page title *"{searchTerm.value}"* was not found'))
            display(Javascript('Jupyter.notebook.execute_cells([8])'))

# by default display the last search
try:
    searchTerm = widgets.Text(the_page['title'], description='Page title:')
except:
    searchTerm = widgets.Text("The Camp of the Saints", description='Page title:')

display(searchTerm)

# create and display the button    
button = widgets.Button(description="Search")
example = md("e.g. *The Camp of the Saints*")
display(example, button)

# the output widget is used to remove the output after the search field
out = Output()
display(out)

# set the event
button.on_click(on_button_clicked)

# trigger the event with the default value
on_button_clicked(button)

Text(value='PetroChina', description='Page title:')

e.g. *The Camp of the Saints*

Button(description='Search', style=ButtonStyle())

Output()

In [5]:
# design the button
toggle_cell = widgets.Button(description='Show/Hide The Code', button_style='success')

# cell show/hide to play around with
def hide_search(b):
    with out2:
        display(hide_toggle2(for_next_next=True))
        clear_output()
        
out2 = Output()
display(out2)

toggle_cell.on_click(hide_search)
display(toggle_cell)

Output()

Button(button_style='success', description='Show/Hide The Code', style=ButtonStyle())

In [80]:
# hide the tutorial code output
html_searchbutton = """
                        <script>
                            $('div.cell.code_cell.rendered.selected').next().find('div.output').hide()
                        </script>
                    """
display(HTML(html_searchbutton))
display(Javascript('Jupyter.notebook.execute_cells([18])'))

<IPython.core.display.Javascript object>

In [66]:
## TRY YOURSELF! THIS IS WHAT WILL HAPPEN WHEN YOU CLICK 'Search' BUTTON ##

# the page you are interested in
page_title = the_page['title']

# query wikipedia using WikiWho API, more details please see:
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/wikipedia.py
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py
wikipedia_dv = WikipediaDV(WikipediaAPI(domain='en.wikipedia.org')) # create an instance
search_result = wikipedia_dv.search_page(page_title)
url = f"{wikipedia_dv.api.base}action=opensearch&search={urllib.parse.quote_plus(page_title)}&limit=1&namespace=0&format=json"
print('The raw data can be found in:', url)
print("The found page is with the title of", search_result)


# values of the found page
the_page = wikipedia_dv.get_page(search_result)
print('The page id is', the_page['page_id'])

The raw data can be found in: https://en.wikipedia.org/w/api.php?action=opensearch&search=The+Camp+of+the+Saints&limit=1&namespace=0&format=json
The found page is with the title of The Camp of the Saints
The page id is 1636145


If this is correct, load the data and set this as the article to explore.

In [67]:
def run_below(ev):
    display(Javascript('IPython.notebook.execute_cells_below()'))
    
button = widgets.Button(description="Load data", button_style='info', min_width=500)
button.on_click(run_below)
display(button)

<IPython.core.display.Javascript object>

Button(button_style='info', description='Load data', style=ButtonStyle())

In [68]:
# design the button
toggle_load = widgets.Button(description='Show/Hide The Code', button_style='success')

# cell show/hide to play around with
def hide_load(b):
    with out3:
        display(hide_toggle2(for_next=True))
        clear_output()
        
out3 = Output()
display(out3)

toggle_load.on_click(hide_load)
display(toggle_load)

Output()

Button(button_style='success', description='Show/Hide The Code', style=ButtonStyle())

In [69]:
## TRY YOURSELF! THIS IS WHAT WILL HAPPEN WHEN YOU CLICK 'Load data' BUTTON ##

# run all the cells below
display(Javascript('IPython.notebook.execute_cells_below()'))

<IPython.core.display.Javascript object>

---

# B. General Statistics

Provided through the Xtools API (1)

In [70]:
display(md(f"***Page: {the_page['title']}***"))

***Page: The Camp of the Saints***

In [71]:
xtools_api = XtoolsAPI(project = 'en.wikipedia.org')
xtools_dv = XtoolsDV(xtools_api)
page_info = xtools_dv.get_page_info(the_page['title'])
page_info['assessment'] = page_info['assessment']['value']

page_info = page_info.to_frame('value').rename(index={
    'project': 'Project name',
    'page': 'Page name',
    'watchers': 'Watchers (2)',    'pageviews': f"Page Views (per {page_info['pageviews_offset']} days)",
    'revisions': 'Revisions',
    'editors': 'Editors',
    'author': 'Creator of the page',
    'created_at': 'Creation Date',
    'created_rev_id': 'Creation revision id',
    'modified_at': 'Last modified',
    'last_edit_id': 'Last revision id',
    'assessment': 'Content Assessment (3)',
}).drop(index = ['pageviews_offset', 'author_editcount', 'secs_since_last_edit','elapsed_time'])

display(page_info)

Unnamed: 0,value
Project name,en.wikipedia.org
Page name,The Camp of the Saints
Watchers (2),96
Page Views (per 30 days),33003
Revisions,554
Editors,258
minor_edits,109
Creator of the page,Morning star
Creation Date,2005-03-22
Creation revision id,12053908


<sup>**(1)** *A community-built service for article statistics at xtools.wmflabs.org* **(2)** *Users that added this page to their watchlist.* **(3)** *See [Wikipedia Content Assessment](https://en.wikipedia.org/wiki/Wikipedia:Content_assessment)*</sup>


In [72]:
# design the button
toggle_xtools = widgets.Button(description='Show/Hide The Code', button_style='success')

# cell show/hide to play around with
def hide_xtools(b):
    with out4:
        display(hide_toggle2(for_next_next=True))
        clear_output()
        
out4 = Output()
display(out4)

toggle_xtools.on_click(hide_xtools)
display(toggle_xtools)

Output()

Button(button_style='success', description='Show/Hide The Code', style=ButtonStyle())

In [81]:
# hide the tutorial code output
html_searchbutton = """
                        <script>
                            $('div.cell.code_cell.rendered.selected').next().find('div.input').hide()
                            $('div.cell.code_cell.rendered.selected').next().find('div.output').hide()
                        </script>
                    """
display(HTML(html_searchbutton))

In [74]:
## TRY YOURSELF! THIS IS HOW XTOOLS API QUERY THE PAGE. ##

# define a xtools instance, more details see:
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/xtools.py
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py
xtools_api = XtoolsAPI(project = 'en.wikipedia.org')
xtools_dv = XtoolsDV(xtools_api)

# the page you are insterested in
print('The page that was found:', the_page['title'])

# get the page info through the xtools method get_page_info()
page_info = xtools_dv.get_page_info(the_page['title'])

url = f"{xtools_dv.api.base}page/articleinfo/{xtools_dv.api.project}/" + urllib.parse.quote(the_page['title'])
print("Raw data can be found in:", url)

# which info are going to be demonstrated
print('The project url:', page_info['project'])
print('The page of interest:', page_info['page'])
print('How many users added this page to their watchlist:', page_info['watchers'])
print('Pageviews per 30 days:', page_info['pageviews'])
print('Revisions of the page in total:', page_info['revisions'])
print('Number of editors in total:', page_info['editors'])
print('Minor edits of the page in total:', page_info['minor_edits'])
print('The creator of the page:', page_info['author'])
print('Creation date and revision ID:', page_info['created_at'], 'and', page_info['created_rev_id'])
print('Last modify date and revision ID:', page_info['modified_at'], 'and', page_info['last_edit_id'])
print('Content Assessment:', page_info['assessment']['value'])

The page that was found: The Camp of the Saints
Raw data can be found in: https://xtools.wmflabs.org/api/page/articleinfo/en.wikipedia.org/The%20Camp%20of%20the%20Saints
The project url: en.wikipedia.org
The page of interest: The Camp of the Saints
How many users added this page to their watchlist: 96
Pageviews per 30 days: 33003
Revisions of the page in total: 554
Number of editors in total: 258
Minor edits of the page in total: 109
The creator of the page: Morning star
Creation date and revision ID: 2005-03-22 and 12053908
Last modify date and revision ID: 2019-11-19 00:55 and 926878134
Content Assessment: C


---

# C. Page Views

Provided through the Wikimedia API

In [75]:
display(md(f"***Page: {the_page['title']}***"))

***Page: The Camp of the Saints***

In [76]:
# Query request
wikimedia_api = WikiMediaAPI(project='en.wikipedia')
wikimedia_dv = WikiMediaDV(wikimedia_api)
views = wikimedia_dv.get_pageviews(the_page['title'], 'daily')

# Visualization

listener = ViewsListener(views)
interact(listener.listen, 
         begin=Dropdown(options=views.timestamp),
         end=Dropdown(options=views.timestamp.sort_values(ascending=False)),
         granularity=Dropdown(options=['Yearly', 'Monthly', 'Weekly', 'Daily'], value='Monthly'))

# The df_plotted keeps a reference to the plotted data above
display(listener.df_plotted['views'].agg({
        'Total views': sum,
        'Max views period': max,
        'Min views period': min,
        'Average views': min,}).to_frame('Value'))

interactive(children=(Dropdown(description='begin', options=(Timestamp('2015-07-01 00:00:00'), Timestamp('2015…

Unnamed: 0,Value
Total views,595828
Max views period,76388
Min views period,4082
Average views,4082



After we have no seen some general statistics of the article and the views it attracted, we will go on to take a look at what specific kinds of changes by which editors it was subject to over time. 

Click below to go to the next notebook. You can later come back to this notebook and simply enter another article name to start the process over with that new article. 

In [77]:
# from utils.notebooks import get_next_notebook
# from IPython.display import HTML
# display(HTML(f'<a href="{get_next_notebook()}" target="_blank">Go to next workbook</a>'))

In [79]:
# Run turorial cells and hide their outputs by default.
display(Javascript('Jupyter.notebook.execute_cells([7])'))


hide_toggle()

<IPython.core.display.Javascript object>