# ETH Sentiment Analysis

This notebook showcases the utilization of DistilBERT base uncased finetuned on SST-2 English (distilbert-base-uncased-finetuned-sst-2-english) from Hugging Face Transformers for sentiment analysis on Wikipedia revision updates pertaining to Ethereum. Commencing with importing necessary libraries, the notebook progresses toward procuring raw Wikipedia revision records, extracting sentiments via the selected transformer model, archiving the extracted sentiments, and ultimately saving the processed data as a structured CSV file. Implementing sentiment analysis enables qualitative categorization and interpretation of fluctuating attitudes expressed in Ethereum-related Wikipedia edits, fostering enhanced comprehension of community dynamics, interest trajectories, and potential ramifications for the broader cryptocurrency landscape.

# Table of Contents
- [Import Library](#import-library)
- [Acquire Wikipedia Revisions](#acquire-wikipedia-revisions)
- [Use Transformer From Huggingface For Sentiment Analysis](#use-transformer-from-hugginface-for-sentiment-analysis)
- [Store Sentiments](#store-sentiments)
- [Save CSV](#save-csv)

# Import Library

In [35]:
import mwclient
import time
import os
import warnings
import pandas as pd

os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

from transformers import pipeline
from statistics import mean
from datetime import datetime

# Acquire Wikipedia Revisions

In [12]:
website = mwclient.Site('en.wikipedia.org')
website_pages = website.pages['Ethereum']

In [14]:
website_revisions = list(website_pages.revisions())
website_revisions = sorted(website_revisions, key=lambda website_revisions: website_revisions['timestamp']) # 2014 instead of 2024

In [15]:
website_revisions[0]

OrderedDict([('revid', 592567939),
             ('parentid', 0),
             ('user', 'Sanpitch'),
             ('timestamp',
              time.struct_time(tm_year=2014, tm_mon=1, tm_mday=27, tm_hour=1, tm_min=53, tm_sec=45, tm_wday=0, tm_yday=27, tm_isdst=-1)),
             ('comment',
              "[[WP:AES|←]]Created page with '{{Infobox currency | image_1 =  | image_title_1 =  | image_width_1 =  | image_2 =  | image_title_2 =  | image_width_2 =  |issuing_authority = None. The Ethereum...'")])

# Use Transformer From Huggingface For Sentiment Analysis

In [21]:
sentiment_model = pipeline('sentiment-analysis')

def text_sentiment(text):
    sentiment = sentiment_model([text[:250]])[0]
    
    sentiment_score = sentiment['score']
    
    if sentiment['label'] == 'NEGATIVE':
        sentiment_score *= -1
    
    return sentiment_score

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [22]:
print(f"Sentiment score: {text_sentiment('I love you')}")
print(f"Sentiment score: {text_sentiment('I hate you')}")
print(f"Sentiment score: {text_sentiment('I feel neutral about you')}") # not always perfect

Sentiment score: 0.9998656511306763
Sentiment score: -0.9991129040718079
Sentiment score: -0.9937905669212341


# Store Sentiments

In [27]:
edits = {}

for website_revision in website_revisions:
    date = time.strftime('%Y-%m-%d', website_revision['timestamp'])
    
    if date not in edits:
        edits[date] = dict(sentiments=list(), edit_count=0)
    
    edits[date]['edit_count'] += 1
    
    comment = website_revision.get('comment', '')
    edits[date]['sentiments'].append(text_sentiment(comment))

In [29]:
for key in edits:
    if len(edits[key]['sentiments']) > 0:
        edits[key]['sentiment'] = mean(edits[key]['sentiments'])
        edits[key]['negative_sentiment'] = len([edit for edit in edits[key]['sentiments'] if edit < 0]) / len(edits[key]['sentiments'])
    else:
        edits[key]['sentiment'] = 0
        edits[key]['negative_sentiment'] = 0
    
    del edits[key]['sentiments']

In [30]:
edits_df = pd.DataFrame.from_dict(edits, orient='index')
edits_df

Unnamed: 0,edit_count,sentiment,negative_sentiment
2014-01-27,1,-0.998511,1.000000
2014-02-01,1,-0.997276,1.000000
2014-04-06,5,0.790979,0.000000
2014-04-09,24,0.646407,0.083333
2014-04-10,9,-0.361518,0.666667
...,...,...,...
2023-12-05,1,-0.999764,1.000000
2023-12-11,1,-0.994897,1.000000
2023-12-23,1,0.748121,0.000000
2024-01-05,1,0.965976,0.000000


In [36]:
edits_df.index = pd.to_datetime(edits_df.index)
dates = pd.date_range(start='2014-01-27', end=datetime.today())
edits_df = edits_df.reindex(dates, fill_value=0)

In [37]:
edits_df

Unnamed: 0,edit_count,sentiment,negative_sentiment
2014-01-27,1,-0.998511,1.0
2014-01-28,0,0.000000,0.0
2014-01-29,0,0.000000,0.0
2014-01-30,0,0.000000,0.0
2014-01-31,0,0.000000,0.0
...,...,...,...
2024-01-11,0,0.000000,0.0
2024-01-12,0,0.000000,0.0
2024-01-13,0,0.000000,0.0
2024-01-14,0,0.000000,0.0


In [38]:
rolling_edits = edits_df.rolling(30, min_periods=30).mean()
rolling_edits = rolling_edits.dropna()
rolling_edits

Unnamed: 0,edit_count,sentiment,negative_sentiment
2014-02-25,0.066667,-0.066526,0.066667
2014-02-26,0.033333,-0.033243,0.033333
2014-02-27,0.033333,-0.033243,0.033333
2014-02-28,0.033333,-0.033243,0.033333
2014-03-01,0.033333,-0.033243,0.033333
...,...,...,...
2024-01-11,0.066667,0.057137,0.000000
2024-01-12,0.066667,0.057137,0.000000
2024-01-13,0.066667,0.057137,0.000000
2024-01-14,0.066667,0.057137,0.000000


# Save CSV

In [39]:
rolling_edits.to_csv('ethereum_wikipedia.csv')