# ADA Project : Creative Extension
## Chilling Effects : Online Surveillance and Wikipedia Use

In this notebook, we extend the analysis done in the paper by using an extended time range, and try to answer the following questions:

1. Is there a persistent, long-term chilling effect related to Snowden’s revelation?
2. Can we highlight new chilling effects due to other scandals?
3. What may be the other factors governing the traffic and how are they affecting the article views?

In [7]:
import pandas as pd
import numpy as np

import scraping

In [8]:
DATA_DIRECTORY = 'data/'
ARTICLES_TITLES = 'articles.txt'
TERRORISM_DATA = 'terrorism.csv'

### Web scraping

The first step is to scrape the website [wikipediaviews](https://wikipediaviews.org) in order to get monthly pageviews of the terrorism-related articles for the time range 2008-2018. Note that the data provided by this site is based on both `stats.grok.se` (for the period before June 2015) and the Wikimedia REST API (for the period after).

We will only consider the English Wikipedia and we ignore pageviews from mobile devices.

In [9]:
# We can scrape the data and save it in a CSV file using this function
#scrape_wikipedia_pageviews(2008, 1, 2018, 12, DATA_DIRECTORY, ARTICLES_TITLES, TERRORISM_DATA)

### Data processing

In [24]:
terrorism_raw = pd.read_csv(DATA_DIRECTORY + TERRORISM_DATA, usecols=[1, 2, 3])
print(terrorism_raw.shape[0], 'lines')
terrorism_raw.head()

6336 lines


Unnamed: 0,article,date,views
0,abu_sayyaf,2008-01,
1,abu_sayyaf,2008-02,9533.0
2,abu_sayyaf,2008-03,11594.0
3,abu_sayyaf,2008-04,10507.0
4,abu_sayyaf,2008-05,10789.0


In [23]:
terrorism = terrorism_raw[~terrorism_raw.views.isna() & terrorism_raw.views != 0]
print(terrorism.shape[0], 'lines')
terrorism.head()

4105 lines


Unnamed: 0,article,date,views
1,abu_sayyaf,2008-02,9533.0
2,abu_sayyaf,2008-03,11594.0
3,abu_sayyaf,2008-04,10507.0
4,abu_sayyaf,2008-05,10789.0
5,abu_sayyaf,2008-06,15748.0
