# Champions of sentiment discourse

AUTHOR: Michal Mochtak (michal.mochtak@ru.nl), Peter Rupnik (peter.rupnik@ijs.si), Nikola Ljubešić

DATE: 2024-06-24

---

In this notebook we look into specific countries and their sentiment scores on speaker- and party-level.

On the first run, the data will be downloaded from the internet. In the next cell a function was prepared to filter the dataset by specific conditions (e.g. taking only the MPs that have a specific number of speeches on the record). In the next cells we will inspect two countries in a comparable time frame, Croatia and the Netherlands, and then the entire corpus across full time span.

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)
from pathlib import Path
import seaborn as sns
from IPython.display import display
if not Path("speeches.csv.zip").exists():
    from os import system
    system("wget https://huggingface.co/datasets/5roop/parlasent_data/resolve/main/speeches.csv.zip")
df = pd.read_csv("speeches.csv.zip")

  df = pd.read_csv("speeches.csv.zip")


In [2]:
def calculate_sentiment(target="Speaker_name", *, country=None, term=None):
    global df
    all_countries = df.country.unique().tolist()
    if country == None:
        country = input(f"Choose country from {all_countries} \n(empty for all): ")

    # Filtering
    # Select speeches from a specific country:
    c0 = df.country == country
    if country in ["all", ""]:
        c0 = pd.Series([True for i in df.country])
    # Keep only MPs
    c1 = (df.Speaker_MP == "MP")
    # Limit searches to speeches longer than 100 characters:
    c2 = df.char_length >= 100
    # Include only speakers with at least 10 speeches:
    gb = df[c0&c1&c2].groupby("Speaker_name").logits_pondered.count().reset_index()
    speakers_to_keep = gb.Speaker_name[gb.logits_pondered >= 10]
    c3 = df.Speaker_name.isin(speakers_to_keep)
    ndf = df[c0&c1&c2&c3]
    if term == None:
        print(f"Available terms:")
        display(ndf.groupby("Term").agg({
            "Date": [min, max, "count"],
        }).sort_values(("Date", "min")), clear=True, )
        term = input(f"Choose term from {ndf.Term.unique().tolist()} (empty for all): ")
    if term:
        c0 = ndf.Term == term
        nndf = ndf[c0].reset_index(drop=True)
    else:
        nndf = ndf
    gb2 = nndf.groupby([f"{target}"]).agg({
        "logits_pondered": ["mean", "count"],
    }).reset_index()
    gb2.columns = f"{target} mean count".split()

    # gb2 = gb2.merge(gb1, on=f"{target}", how="left")
    gb2 = gb2.sort_values(by="mean", ascending=True)
    return gb2

Let's inspect the terms we have available, so that an approximately equal timeframe can be set:

In [3]:
df[df.country.isin(["HR", "NL"])].groupby("country Term Speaker_MP".split()).agg({
    "Date": [min, max, "count"]
}).sort_values(by=("Date", "min"))

  df[df.country.isin(["HR", "NL"])].groupby("country Term Speaker_MP".split()).agg({
  df[df.country.isin(["HR", "NL"])].groupby("country Term Speaker_MP".split()).agg({


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Date,Date,Date
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,min,max,count
country,Term,Speaker_MP,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
HR,5. mandat,MP,2003-12-22,2007-10-12,74856
HR,5. mandat,notMP,2003-12-22,2007-10-12,4404
HR,5. mandat,-,2004-04-01,2007-10-11,2632
HR,6. mandat,MP,2008-01-11,2011-10-28,68561
HR,6. mandat,notMP,2008-01-11,2011-10-28,4200
HR,6. mandat,-,2008-02-21,2011-10-27,455
HR,7. mandat,notMP,2011-12-22,2015-09-24,3824
HR,7. mandat,MP,2011-12-22,2015-09-25,96544
HR,7. mandat,-,2012-01-27,2015-07-03,1650
NL,Meeting of the 28th Tweede Kamer,notMP,2014-04-16,2017-10-25,76021


In [4]:
calculate_sentiment("Speaker_name", country="HR", term="10. mandat").shape

(179, 3)

In [5]:
calculate_sentiment("Speaker_name", country="NL", term="Meeting of the 36th Eerste Kamer").shape

(24, 3)

In [6]:
calculate_sentiment("Speaker_party", country="HR", term="10. mandat").shape


(20, 3)

In [7]:
calculate_sentiment("Speaker_party", country="NL", term="Meeting of the 36th Eerste Kamer")

Unnamed: 0,Speaker_party,mean,count
4,FvD,1.660307,297
6,PvdD,1.77233,84
0,-,1.841281,262
5,PvdA,1.911831,419
9,vanPareren,1.956687,7
1,CDA,2.010493,179
3,D66,2.20987,166
2,CU,2.225707,212
7,SP,2.553617,29
8,VVD,2.611193,3482


In [8]:
df["Date"] = pd.to_datetime(df.Date)
df[df.country=="NL"].set_index("Date").groupby([
    # pd.Grouper(freq="1YS"),
    "Term",
    "Party_status",
    "Speaker_MP"
]).logits_pondered.count()

Term                              Party_status  Speaker_MP
Meeting of the 28th Tweede Kamer  -             MP             15091
                                                notMP           9856
                                  Coalition     MP             54753
                                                notMP          33608
                                  Opposition    MP             30175
                                                notMP          32557
Meeting of the 29th Tweede Kamer  -             MP            110694
                                                notMP          31010
                                  Coalition     MP             28734
                                                notMP          74127
                                  Opposition    MP             22160
                                                notMP          54426
Meeting of the 30th Tweede Kamer  -             MP                48
                                            

# Overall most negative and most positive parties



In [9]:
calculate_sentiment("Speaker_party", country="", term="").head(20)

Unnamed: 0,Speaker_party,mean,count
453,SNS;NSS,0.679782,4
2,64RT,0.705099,16
208,ICV,0.960849,110
579,Živi zid,0.994839,3429
169,GP-CH,1.010155,1090
479,Szolidaritás,1.05228,27
223,JOBBIK,1.058236,33
525,Vox,1.063593,1586
244,KV,1.070499,136
125,EMEP,1.090846,211


In [10]:
calculate_sentiment("Speaker_party", country="", term="").tail(20)

Unnamed: 0,Speaker_party,mean,count
30,BM365,3.074956,11
237,KNDP-frakció,3.077919,18829
27,BDSS,3.093356,249
275,LK,3.11605,591
471,SVM,3.116631,1254
373,PR,3.14426,4989
200,Holos,3.191994,28
70,CPU,3.237316,626
447,SN,3.24189,18468
43,Batkivshchyna,3.262185,5858
