

Welcome!

You have just opened a collection of notebooks that lets you inspect the evolution of the revision history of a Wikipedia article, up to now (From the English language edition). It also allows you to highlight **article- or word-specific conflicts as well as the productivity of any given editor.** 

Specifically, for the notebooks after this initial one, it interfaces with the API of a specialized service called [WikiWho](www.wikiwho.net), which provides fine-grained change information about the tokens (words) in an article. 

It is written in a way that you can **explore it like a Web app, without interacting with the code behind it**, or - if you choose to - click on "edit app" in the Juypter navigation bar and play around with the code yourself. 

The default introduction example is the article "The Camp of the Saints" (a novel), which we recommend to start with. You can enter/search an article of your choice and explore it as well. 

Let's first get live data of some general statistics from Wikipedias own API and a service called Xtools:

In [1]:
from IPython.display import display
from ipywidgets import widgets, Output
from toggle import hide_toggle2

# design the button
toggle_modules = widgets.Button(description='Modules Imported', button_style='success')
display(toggle_modules)

# cell show/hide to play around with
def hide_modules(b):
    with out1:
        clear_output()
        display(hide_toggle2(for_next=True))
        
        
out1 = Output()
display(out1)

toggle_modules.on_click(hide_modules)

Button(button_style='success', description='Modules Imported', style=ButtonStyle())

Output()

In [2]:
## IMPORT MODULES ##
# for display
from IPython.display import display, Markdown as md, clear_output, Javascript, HTML
from ipywidgets import widgets, Output
import urllib

# for data process
import pandas as pd

# for visualization
from visualization.views_listener import ViewsListener
from ipywidgets import interact, interactive
from ipywidgets.widgets import Dropdown

# APIs
from external.wikipedia import WikipediaDV, WikipediaAPI
from external.wikimedia import WikiMediaDV, WikiMediaAPI
from external.xtools import XtoolsAPI, XtoolsDV

# toggle cells
from toggle import hide_toggle, hide_toggle2, hide_cell

# show codes and wrapper as Markdown
from to_markdown import code_to_md, wrapper_to_md

## SOME EXTENTIONS ##
#%load_ext autoreload
%reload_ext autoreload
%autoreload 2
%store -r the_page

if 'the_page' not in locals():
    import pickle
    print("Loading default data...")
    the_page = pickle.load(open("data/the_page.p",'rb'))

---

# A. Basic Info from Wikipedia

***Search for an article on the English Wikipedia***

In [3]:
# Hide all cell prompts.
display(HTML('<style> div.prompt{display: none} </style>'))

# Hide all input cells.
hide_cell(hide_code=True)

In [4]:
wikipedia_dv = WikipediaDV(WikipediaAPI(domain='en.wikipedia.org'))

# the method that listens to the click event
def on_button_clicked(b):
    global the_page
    
    # use the out widget so the output is overwritten when two or more
    # searches are performed
    with out:
        try:
            # query wikipedia
            search_result = wikipedia_dv.search_page(searchTerm.value)
            the_page = wikipedia_dv.get_page(search_result)
            %store the_page
            clear_output()
            display(the_page.to_frame('value'))
            display(md(f'You selected:'))
            display(the_page['title'])
            display(Javascript('Jupyter.notebook.execute_cells([8])'))

        except:
            clear_output()
            display(md(f'The page title *"{searchTerm.value}"* was not found'))
            display(Javascript('Jupyter.notebook.execute_cells([8])'))

# by default display the last search
try:
    searchTerm = widgets.Text(the_page['title'], description='Page title:')
except:
    searchTerm = widgets.Text("The Camp of the Saints", description='Page title:')

display(searchTerm)

# create and display the button    
button = widgets.Button(description="Search")
example = md("e.g. *The Camp of the Saints*")
display(example, button)

# the output widget is used to remove the output after the search field
out = Output()
display(out)

# set the event
button.on_click(on_button_clicked)

# trigger the event with the default value
on_button_clicked(button)

Text(value='Big Bang', description='Page title:')

e.g. *The Camp of the Saints*

Button(description='Search', style=ButtonStyle())

Output()

In [5]:
# design the button
toggle_cell = widgets.Button(description='Show/Hide The Code', button_style='success')

# cell show/hide to play around with
def hide_search(b):
    with out2:
        display(hide_toggle2(for_next_next=True))
        clear_output()
        
out2 = Output()
display(out2)

toggle_cell.on_click(hide_search)
display(toggle_cell)

Output()

Button(button_style='success', description='Show/Hide The Code', style=ButtonStyle())

In [27]:
# hide the tutorial code output
html_searchbutton = """
                        <script>
                            $('div.cell.code_cell.rendered.selected').next().find('div.output').hide()
                        </script>
                    """
display(HTML(html_searchbutton))

# hide next tutorial output
display(Javascript('Jupyter.notebook.execute_cells([13])'))

<IPython.core.display.Javascript object>

In [32]:
## TRY YOURSELF! THIS IS WHAT WILL HAPPEN WHEN YOU CLICK 'Search' BUTTON ##

# the page you are interested in
page_title = the_page['title']

# query wikipedia using WikiWho API, more details please see:
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/wikipedia.py
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py
wikipedia_dv = WikipediaDV(WikipediaAPI(domain='en.wikipedia.org')) # create an instance
search_result = wikipedia_dv.search_page(page_title)
url = f"{wikipedia_dv.api.base}action=opensearch&search={urllib.parse.quote_plus(page_title)}&limit=1&namespace=0&format=json"
print("The page that was found:", search_result)
print('The raw data can be found in:', url)

# values of the found page
the_page = wikipedia_dv.get_page(search_result)
print('Page id:', the_page['page_id'])

The page that was found: The Camp of the Saints
The raw data can be found in: https://en.wikipedia.org/w/api.php?action=opensearch&search=The+Camp+of+the+Saints&limit=1&namespace=0&format=json
Page id: 1636145


---

# B. General Statistics

Provided through the Xtools API (1)

In [8]:
def xtools_on_click(b):
    with out_xtools:
        clear_output()
        xtools_api = XtoolsAPI(project = 'en.wikipedia.org')
        xtools_dv = XtoolsDV(xtools_api)
        page_info = xtools_dv.get_page_info(the_page['title'])
        page_info['assessment'] = page_info['assessment']['value']

        page_info = page_info.to_frame('value').rename(index={
            'project': 'Project name',
            'page': 'Page name',
            'watchers': 'Watchers (2)',    'pageviews': f"Page Views (per {page_info['pageviews_offset']} days)",
            'revisions': 'Revisions',
            'editors': 'Editors',
            'author': 'Creator of the page',
            'created_at': 'Creation Date',
            'created_rev_id': 'Creation revision id',
            'modified_at': 'Last modified',
            'last_edit_id': 'Last revision id',
            'assessment': 'Content Assessment (3)',
        }).drop(index = ['pageviews_offset', 'author_editcount', 'secs_since_last_edit','elapsed_time'])
        
        display(md(f"***Page: {the_page['title']}***"))
        display(page_info)
        display(Javascript('Jupyter.notebook.execute_cells([14])'))           
        display(Javascript('Jupyter.notebook.execute_cells([18])'))
        

# create and display the button    
button = widgets.Button(description="Get Page Info")
display(button)

# the output widget is used to remove the output after the search field
out_xtools = Output()
display(out_xtools)

# set the event
button.on_click(xtools_on_click)

# trigger the event with the default value
xtools_on_click(button)

Button(description='Get Page Info', style=ButtonStyle())

Output()

<sup>**(1)** *A community-built service for article statistics at xtools.wmflabs.org* **(2)** *Users that added this page to their watchlist.* **(3)** *See [Wikipedia Content Assessment](https://en.wikipedia.org/wiki/Wikipedia:Content_assessment)*</sup>


In [9]:
# design the button
toggle_xtools = widgets.Button(description='Show/Hide The Code', button_style='success')

# cell show/hide to play around with
def hide_xtools(b):
    with out4:
        display(hide_toggle2(for_next_next=True))
        clear_output()
        
out4 = Output()
display(out4)

toggle_xtools.on_click(hide_xtools)
display(toggle_xtools)

Output()

Button(button_style='success', description='Show/Hide The Code', style=ButtonStyle())

In [29]:
# hide the tutorial code output
html_searchbutton = """
                        <script>
                            $('div.cell.code_cell.rendered.selected').next().find('div.output').hide()
                        </script>
                    """
display(HTML(html_searchbutton))

# hide next tutorial output
display(Javascript('Jupyter.notebook.execute_cells([18])'))

<IPython.core.display.Javascript object>

In [34]:
## TRY YOURSELF! THIS IS HOW XTOOLS API TO GET THE PAGE. ##

# define a xtools instance, more details see:
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/xtools.py
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py
xtools_api = XtoolsAPI(project = 'en.wikipedia.org')
xtools_dv = XtoolsDV(xtools_api)

# the page you are insterested in
print('The page that was found:', the_page['title'])

# get the page info through the xtools method get_page_info()
page_info = xtools_dv.get_page_info(the_page['title'])

# raw data url
url = f"{xtools_dv.api.base}page/articleinfo/{xtools_dv.api.project}/" + urllib.parse.quote(the_page['title'])
print("Raw data can be found in:", url)

# which info are going to be demonstrated
print('The project url:', page_info['project'])
print('The page of interest:', page_info['page'])
print('How many users added this page to their watchlist:', page_info['watchers'])
print('Pageviews per 30 days:', page_info['pageviews'])
print('Revisions of the page in total:', page_info['revisions'])
print('Number of editors in total:', page_info['editors'])
print('Minor edits of the page in total:', page_info['minor_edits'])
print('The creator of the page:', page_info['author'])
print('Creation date and revision ID:', page_info['created_at'], 'and', page_info['created_rev_id'])
print('Last modify date and revision ID:', page_info['modified_at'], 'and', page_info['last_edit_id'])
print('Content Assessment:', page_info['assessment']['value'])

The page that was found: The Camp of the Saints
Raw data can be found in: https://xtools.wmflabs.org/api/page/articleinfo/en.wikipedia.org/The%20Camp%20of%20the%20Saints
The project url: en.wikipedia.org
The page of interest: The Camp of the Saints
How many users added this page to their watchlist: 99
Pageviews per 30 days: 46862
Revisions of the page in total: 560
Number of editors in total: 260
Minor edits of the page in total: 109
The creator of the page: Morning star
Creation date and revision ID: 2005-03-22 and 12053908
Last modify date and revision ID: 2019-11-22 23:25 and 927514955
Content Assessment: C


---

# C. Page Views

Provided through the Wikimedia API

In [12]:
def pageviews_button(b):
    with out_pageviews:
        clear_output()
        
        display(md(f"***Page: {the_page['title']}***"))
        # Query request
        wikimedia_api = WikiMediaAPI(project='en.wikipedia')
        wikimedia_dv = WikiMediaDV(wikimedia_api)
        views = wikimedia_dv.get_pageviews(the_page['title'], 'daily')

        # Visualization
        listener = ViewsListener(views)
        inter_func = interact(listener.listen, 
                         begin=Dropdown(options=views.timestamp),
                         end=Dropdown(options=views.timestamp.sort_values(ascending=False)),
                         granularity=Dropdown(options=['Yearly', 'Monthly', 'Weekly', 'Daily'], value='Monthly'))

        # The df_plotted keeps a reference to the plotted data above
        pageviews_agg = listener.df_plotted['views'].agg({
                            'Total views': sum,
                            'Max views period': max,
                            'Min views period': min,
                            'Average views': min,}).to_frame('Value')
        
        

# create and display the button    
button = widgets.Button(description="Get Pageviews Info")
display(button)

# the output widget is used to remove the output after the search field
out_pageviews = Output()
display(out_pageviews)

# set the event
button.on_click(pageviews_button)

# trigger the event with the default value
pageviews_button(button)

Button(description='Get Pageviews Info', style=ButtonStyle())

Output()

In [13]:
# design the button
toggle_pageviews = widgets.Button(description='Show/Hide The Code', button_style='success')

# cell show/hide to play around with
def hide_pageviews(b):
    with out5:
        clear_output()
        display(hide_toggle2(for_next_next=True))
        display(hide_toggle2(for_next_next_next=True))
        display(Javascript('Jupyter.notebook.execute_cells([19])')) 
        display(Javascript('Jupyter.notebook.execute_cells([20])'))
        
out5 = Output()
display(out5)

toggle_pageviews.on_click(hide_pageviews)
display(toggle_pageviews)

Output()

Button(button_style='success', description='Show/Hide The Code', style=ButtonStyle())

In [36]:
# hide the tutorial code output
html_searchbutton = """
                        <script>
                            $('div.cell.code_cell.rendered.selected').next().find('div.output').hide()
                            $('div.cell.code_cell.rendered.selected').next().next().find('div.output').hide()
                        </script>
                    """
display(HTML(html_searchbutton))

In [42]:
## TRY YOURSELF! VISUALIZATION.##
## PART 1 ##

# define a WikiMediaAPI instance, more details see:
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/wikimedia.py
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/api.py
wikimedia_api = WikiMediaAPI(project='en.wikipedia')
wikimedia_dv = WikiMediaDV(wikimedia_api)

# page of insterest
print('The page that was found:', the_page['title'])

# get pageview counts for the article, more details see:
# https://github.com/gesiscss/wikiwho_demo/blob/master/external/wikimedia.py
views = wikimedia_dv.get_pageviews(the_page['title'], 'daily')

# visualization, core visual code lies in ViewsListener, then the interact function
# make it interactive, mode details see:
# https://github.com/gesiscss/wikiwho_demo/blob/master/visualization/views_listener.py
# https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
listener = ViewsListener(views)
inter_func = interact(listener.listen, 
                 begin=Dropdown(options=views.timestamp),
                 end=Dropdown(options=views.timestamp.sort_values(ascending=False)),
                 granularity=Dropdown(options=['Yearly', 'Monthly', 'Weekly', 'Daily'], value='Monthly'))

The page that was found: The Camp of the Saints


interactive(children=(Dropdown(description='begin', options=(Timestamp('2015-07-01 00:00:00'), Timestamp('2015…

In [44]:
## TRY YOURSELF! BASIC INFO FOR VISUALIZATION.##
## PART 2 ##

# url of raw data
granularity = inter_func.widget.kwargs['granularity']
start = inter_func.widget.kwargs['begin']
end = inter_func.widget.kwargs['end']

url = (f'{wikimedia_dv.api.base}metrics/pageviews/per-article/'
       f"{wikimedia_dv.api.project}/all-access/all-agents/{urllib.parse.quote(the_page['title'])}/{granularity}"
       f"/{urllib.parse.quote(str(start))}/{urllib.parse.quote(str(end))}")
print('Raw data can be found in:', url)

# time range you selected
print('Granularity:', granularity)
print('Start date:', start)
print('End date:', end)

# pageviews aggregation data: pageviews_agg
pageviews_agg = listener.df_plotted['views'].agg({
                    'Total views': sum,
                    'Max views period': max,
                    'Min views period': min,
                    'Average views': min,}).to_frame('Value')

print('Total views of this page:', pageviews_agg['Value']['Total views'])
print('Max views during the selected period:', pageviews_agg['Value']['Max views period'])
print('Min views during the selected period:', pageviews_agg['Value']['Min views period'])
print('Average views during the selected period:', pageviews_agg['Value']['Average views'])

Raw data can be found in: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/The%20Camp%20of%20the%20Saints/Monthly/2015-07-01%2000%3A00%3A00/2019-11-25%2000%3A00%3A00
Granularity: Monthly
Start date: 2015-07-01 00:00:00
End date: 2019-11-25 00:00:00
Total views of this page: 611309
Max views during the selected period: 76388
Min views during the selected period: 4082
Average views during the selected period: 4082



After we have no seen some general statistics of the article and the views it attracted, we will go on to take a look at what specific kinds of changes by which editors it was subject to over time. 

Click below to go to the next notebook. You can later come back to this notebook and simply enter another article name to start the process over with that new article. 

In [17]:
# from utils.notebooks import get_next_notebook
# from IPython.display import HTML
# display(HTML(f'<a href="{get_next_notebook()}" target="_blank">Go to next workbook</a>'))

In [18]:
#hide_cell(hide_code=False)
hide_toggle()

In [19]:
# Run turorial cells and hide their outputs by default.
display(Javascript('Jupyter.notebook.execute_cells([7])'))

<IPython.core.display.Javascript object>