

Welcome!

You have just opened a collection of notebooks that lets you inspect the evolution of the revision history of a Wikipedia article, up to now (From the English language edition). It also allows you to highlight **article- or word-specific conflicts as well as the productivity of any given editor.** 

Specifically, for the notebooks after this initial one, it interfaces with the API of a specialized service called [WikiWho](www.wikiwho.net), which provides fine-grained change information about the tokens (words) in an article. 

It is written in a way that you can **explore it like a Web app, without interacting with the code behind it**, or - if you choose to - click on "edit app" in the Juypter navigation bar and play around with the code yourself. 

The default introduction example is the article "The Camp of the Saints" (a novel), which we recommend to start with. You can enter/search an article of your choice and explore it as well. 

Let's first get live data of some general statistics from Wikipedias own API and a service called Xtools:

In [1]:
from IPython.display import display
from ipywidgets import widgets, Output
from toggle import hide_toggle2

# design the button
toggle_modules = widgets.Button(description='Modules Imported', button_style='success')
display(toggle_modules)

# cell show/hide to play around with
def hide_modules(b):
    with out1:
        clear_output()
        display(hide_toggle2(for_next=True))
        
        
out1 = Output()
display(out1)

toggle_modules.on_click(hide_modules)

Button(button_style='success', description='Modules Imported', style=ButtonStyle())

Output()

In [2]:
## IMPORT MODULES ##
# for display
from IPython.display import display, Markdown as md, clear_output, Javascript, HTML
from ipywidgets import widgets, Output

# for data process
import pandas as pd

# for visualization
from visualization.views_listener import ViewsListener
from ipywidgets import interact
from ipywidgets.widgets import Dropdown

# APIs
from external.wikipedia import WikipediaDV, WikipediaAPI
from external.wikimedia import WikiMediaDV, WikiMediaAPI
from requests import Session
from urllib.parse import quote_plus

# toggle cells
from toggle import hide_toggle, hide_toggle2, hide_cell

# show codes and wrapper as Markdown
from to_markdown import code_to_md, wrapper_to_md

## SOME EXTENTIONS ##
#%load_ext autoreload
%reload_ext autoreload
%autoreload 2
%store -r the_page

if 'the_page' not in locals():
    import pickle
    print("Loading default data...")
    the_page = pickle.load(open("data/the_page.p",'rb'))

---

# A. Basic Info from Wikipedia

***Search for an article on the English Wikipedia***

In [3]:
wikipedia_dv = WikipediaDV(WikipediaAPI(domain='en.wikipedia.org'))

# the method that listens to the click event
def on_button_clicked(b):
    global the_page
    
    # use the out widget so the output is overwritten when two or more
    # searches are performed
    with out:
        try:
            # query wikipedia
            search_result = wikipedia_dv.search_page(searchTerm.value)
            the_page = wikipedia_dv.get_page(search_result)
            %store the_page
            clear_output()
            display(the_page.to_frame('value'))
            display(md(f'You selected:'))
            display(the_page['title'])
            display(Javascript('Jupyter.notebook.execute_cells([7])'))

        except:
            clear_output()
            display(md(f'The page title *"{searchTerm.value}"* was not found'))
            display(Javascript('Jupyter.notebook.execute_cells([7])'))

# by default display the last search
try:
    searchTerm = widgets.Text(the_page['title'], description='Page title:')
except:
    searchTerm = widgets.Text("The Camp of the Saints", description='Page title:')

display(searchTerm)

# create and display the button    
button = widgets.Button(description="Search")
example = md("e.g. *The Camp of the Saints*")
display(example, button)

# the output widget is used to remove the output after the search field
out = Output()
display(out)

# set the event
button.on_click(on_button_clicked)

# trigger the event with the default value
on_button_clicked(button)

Text(value='Blackpink', description='Page title:')

e.g. *The Camp of the Saints*

Button(description='Search', style=ButtonStyle())

Output()

In [4]:
# design the button
toggle_cell = widgets.Button(description='Show/Hide The Code', button_style='success')

# cell show/hide to play around with
def hide_search(b):
    with out2:
        display(hide_toggle2(for_next_next=True))
        clear_output()
        
out2 = Output()
display(out2)

toggle_cell.on_click(hide_search)
display(toggle_cell)

Output()

Button(button_style='success', description='Show/Hide The Code', style=ButtonStyle())

In [30]:
# hide the tutorial code output
html_searchbutton = """
                        <script>
                            $('div.cell.code_cell.rendered.selected').next().find('div.output').hide()
                        </script>
                    """
display(HTML(html_searchbutton))

In [20]:
## TRY YOURSELF! THIS IS WHAT WILL HAPPEN WHEN YOU CLICK 'Search' BUTTON ##

# the page you are interested in
page_title = the_page['title']

# query wikipedia using WikiWho API
wikipedia_dv = WikipediaDV(WikipediaAPI(domain='en.wikipedia.org')) # create an instance
search_result = wikipedia_dv.search_page(page_title)

print("The found page is with the title of", search_result)
the_page = wikipedia_dv.get_page(search_result)

# values of the found page
print('The page id is', the_page['page_id'])

The found page is with the title of Big Bang
The page id is 4116


If this is correct, load the data and set this as the article to explore.

In [21]:
def run_below(ev):
    display(Javascript('IPython.notebook.execute_cells_below()'))
    
button = widgets.Button(description="Load data", button_style='info', min_width=500)
button.on_click(run_below)
display(button)

<IPython.core.display.Javascript object>

Button(button_style='info', description='Load data', style=ButtonStyle())

---

# B. General Statistics

In [22]:
query_attempts = 2

# functions for query
def request(url):
    for attempt in range(0, query_attempts + 1):
        try:
            response = Session().get(url)
            response.raise_for_status()
            return response.json()
        except Exception as exc:
            if attempt == attempts:
                raise exc
            else:
                print(f'Request ({url}) failed (attempt (attempt {attempt + 1} of query_attemps))')

Provided through the Xtools API (1)

In [23]:
display(md(f"***Page: {the_page['title']}***"))

***Page: Big Bang***

In [24]:
# URL for Xtools API
def xtools_url(project, page_name, protocol = 'https', domain = 'xtools.wmflabs.org'):
    url = f'{protocol}://{domain}/api/page/articleinfo/{project}/{page_name}'
    
    return url

In [25]:
url = xtools_url(project = 'en.wikipedia.org', page_name = the_page['title'])
page_info = request(url)
page_info = pd.Series(page_info)
page_info['assessment'] = page_info['assessment']['value']

page_info = page_info.to_frame('value').rename(index={
    'project': 'Project name',
    'page': 'Page name',
    'watchers': 'Watchers (2)',    'pageviews': f"Page Views (per {page_info['pageviews_offset']} days)",
    'revisions': 'Revisions',
    'editors': 'Editors',
    'author': 'Creator of the page',
    'created_at': 'Creation Date',
    'created_rev_id': 'Creation revision id',
    'modified_at': 'Last modified',
    'last_edit_id': 'Last revision id',
    'assessment': 'Content Assessment (3)',
}).drop(index = ['pageviews_offset', 'author_editcount', 'secs_since_last_edit','elapsed_time'])

display(page_info)

Unnamed: 0,value
Project name,en.wikipedia.org
Page name,Big Bang
Watchers (2),1648
Page Views (per 30 days),156936
Revisions,6920
Editors,2561
minor_edits,2082
Creator of the page,129.128.137.xxx
Creation Date,2001-11-07
Creation revision id,239117


<sup>**(1)** *A community-built service for article statistics at xtools.wmflabs.org* **(2)** *Users that added this page to their watchlist.* **(3)** *See [Wikipedia Content Assessment](https://en.wikipedia.org/wiki/Wikipedia:Content_assessment)*</sup>


---

# C. Page Views

Provided through the Wikimedia API

In [26]:
display(md(f"***Page: {the_page['title']}***"))

***Page: Big Bang***

In [27]:
# Query request
wikimedia_api = WikiMediaAPI(project='en.wikipedia')
wikimedia_dv = WikiMediaDV(wikimedia_api)
views = wikimedia_dv.get_pageviews(the_page['title'], 'daily')

# Visualization

listener = ViewsListener(views)
interact(listener.listen, 
         begin=Dropdown(options=views.timestamp),
         end=Dropdown(options=views.timestamp.sort_values(ascending=False)),
         granularity=Dropdown(options=['Yearly', 'Monthly', 'Weekly', 'Daily'], value='Monthly'))

# The df_plotted keeps a reference to the plotted data above
display(listener.df_plotted['views'].agg({
        'Total views': sum,
        'Max views period': max,
        'Min views period': min,
        'Average views': min,}).to_frame('Value'))

interactive(children=(Dropdown(description='begin', options=(Timestamp('2015-07-01 00:00:00'), Timestamp('2015…

Unnamed: 0,Value
Total views,8507431
Max views period,230712
Min views period,101576
Average views,101576



After we have no seen some general statistics of the article and the views it attracted, we will go on to take a look at what specific kinds of changes by which editors it was subject to over time. 

Click below to go to the next notebook. You can later come back to this notebook and simply enter another article name to start the process over with that new article. 

In [28]:
# from utils.notebooks import get_next_notebook
# from IPython.display import HTML
# display(HTML(f'<a href="{get_next_notebook()}" target="_blank">Go to next workbook</a>'))

In [29]:
# Hide all cell prompts.
display(HTML('<style> div.prompt{display: none} </style>'))

# Hide all input cells.
hide_cell(hide_code=True)

# Run turorial cells and hide their outputs by default.
display(Javascript('Jupyter.notebook.execute_cells([6])'))

hide_toggle()

<IPython.core.display.Javascript object>