In [1]:
pip install mwclient

Note: you may need to restart the kernel to use updated packages.


This section sets up the necessary environment by installing the mwclient library and importing various Python libraries, including mwclient for interacting with the MediaWiki API, time for working with timestamps, transformers for sentiment analysis, statistics for mean calculation, pandas for data manipulation, and datetime for handling date and time.

In [2]:
import mwclient
import time

site = mwclient.Site('en.wikipedia.org')
page = site.pages['Bitcoin']

Established a connection to the English Wikipedia site using mwclient, retrieve the 'Bitcoin' page, and collect a list of its revisions.

In [3]:
revs = list(page.revisions())

In [4]:
revs[0]

OrderedDict([('revid', 1184077795),
             ('parentid', 1183615206),
             ('minor', ''),
             ('user', 'Grayfell'),
             ('timestamp',
              time.struct_time(tm_year=2023, tm_mon=11, tm_mday=8, tm_hour=5, tm_min=37, tm_sec=49, tm_wday=2, tm_yday=312, tm_isdst=-1)),
             ('comment', '/* Creation */  Fixing typo, per talk')])

In [5]:
revs = sorted(revs, key=lambda rev: rev["timestamp"]) 

Sort the revisions based on their timestamps.

In [6]:
revs[0]

OrderedDict([('revid', 275832581),
             ('parentid', 0),
             ('user', 'Pratyeka'),
             ('timestamp',
              time.struct_time(tm_year=2009, tm_mon=3, tm_mday=8, tm_hour=16, tm_min=41, tm_sec=7, tm_wday=6, tm_yday=67, tm_isdst=-1)),
             ('comment', 'creation (stub)')])

In [8]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

def find_sentiment(text):
    sent = sentiment_pipeline([text[:250]])[0]
    score = sent["score"]
    if sent["label"] == "NEGATIVE":
        score *= -1
    return score

  from tensorflow.tsl.python.lib.core import pywrap_ml_dtypes
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Create a sentiment analysis pipeline using the transformers library and define a function to find sentiment in a given text.

In [23]:
edits = {}

for rev in revs:        
    date = time.strftime("%Y-%m-%d", rev["timestamp"])
    if date not in edits:
        edits[date] = dict(sentiments=list(), edit_count=0)
    
    edits[date]["edit_count"] += 1
    
    comment = rev.get("comment", "")
    edits[date]["sentiments"].append(find_sentiment(comment))

Iterate through the sorted revisions, collect information about edits, and store it in the 'edits' dictionary.

In [24]:
from statistics import mean

for key in edits:
    if len(edits[key]["sentiments"]) > 0:
        edits[key]["sentiment"] = mean(edits[key]["sentiments"])
        edits[key]["neg_sentiment"] = len([s for s in edits[key]["sentiments"] if s < 0]) / len(edits[key]["sentiments"])
    else:
        edits[key]["sentiment"] = 0
        edits[key]["neg_sentiment"] = 0
    
    del edits[key]["sentiments"]

Calculated the mean sentiment and negative sentiment ratio for each date in the 'edits' dictionary.

In [25]:
import pandas as pd

edits_df = pd.DataFrame.from_dict(edits, orient="index")

Converting the 'edits' dictionary into a pandas DataFrame.

In [26]:
edits_df

Unnamed: 0,edit_count,sentiment,neg_sentiment
2009-03-08,4,-0.550525,0.75
2009-08-05,1,0.748121,0.00
2009-08-06,2,0.995746,0.00
2009-08-14,1,0.930021,0.00
2009-10-13,2,-0.227499,0.50
...,...,...,...
2023-10-26,1,0.995126,0.00
2023-11-03,2,-0.987722,1.00
2023-11-04,4,0.839732,0.00
2023-11-05,1,-0.813071,1.00


In [27]:
edits_df.index = pd.to_datetime(edits_df.index)

In [28]:
from datetime import datetime

dates = pd.date_range(start="2009-03-08",end=datetime.today())

Prepared the DataFrame for analysis by converting the index to datetime format, creating a date range, and reindexing with fill values.

In [29]:
edits_df = edits_df.reindex(dates, fill_value=0)

In [30]:
edits_df

Unnamed: 0,edit_count,sentiment,neg_sentiment
2009-03-08,4,-0.550525,0.75
2009-03-09,0,0.000000,0.00
2009-03-10,0,0.000000,0.00
2009-03-11,0,0.000000,0.00
2009-03-12,0,0.000000,0.00
...,...,...,...
2023-11-04,4,0.839732,0.00
2023-11-05,1,-0.813071,1.00
2023-11-06,0,0.000000,0.00
2023-11-07,0,0.000000,0.00


In [31]:
rolling_edits = edits_df.rolling(30, min_periods=30).mean()

Calculated rolling averages over a 30-day period, drop rows with NaN values, and save the results to a CSV file named "wikipedia_edits.csv". The script effectively summarizes the sentiment of Wikipedia page edit comments on Bitcoin over time and provides a rolling average for a 30-day window.

In [32]:
rolling_edits = rolling_edits.dropna()

In [33]:
rolling_edits

Unnamed: 0,edit_count,sentiment,neg_sentiment
2009-04-06,0.133333,-0.018351,0.025000
2009-04-07,0.000000,0.000000,0.000000
2009-04-08,0.000000,0.000000,0.000000
2009-04-09,0.000000,0.000000,0.000000
2009-04-10,0.000000,0.000000,0.000000
...,...,...,...
2023-11-04,0.966667,-0.022490,0.174359
2023-11-05,1.000000,-0.049592,0.207692
2023-11-06,1.000000,-0.049592,0.207692
2023-11-07,1.000000,-0.049592,0.207692


In [34]:
rolling_edits.to_csv("wikipedia_edits.csv")