# Champions of sentiment discourse

AUTHOR: Michal Mochtak (michal.mochtak@ru.nl), Peter Rupnik (peter.rupnik@ijs.si), Nikola Ljubešić

DATE: 2024-06-24

---

In this notebook we look into specific countries and their sentiment scores on speaker- and party-level.

On the first run, the data will be downloaded from the internet. In the next cell a function was prepared to filter the dataset by specific conditions (e.g. taking only the MPs that have a specific number of speeches on the record). In the next cells we will inspect two countries, Croatia and the Netherlands.

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)
from pathlib import Path
import seaborn as sns
from IPython.display import display
if not Path("speeches.csv.zip").exists():
    from os import system
    system("wget https://huggingface.co/datasets/5roop/parlasent_data/resolve/main/speeches.csv.zip")
df = pd.read_csv("speeches.csv.zip")

  df = pd.read_csv("speeches.csv.zip")


In [2]:
def calculate_sentiment(target="Speaker_name", *, country=None, term=None):
    global df
    all_countries = df.country.unique().tolist()
    if not country:
        country = input(f"Choose country from {all_countries}: ")

    # Filtering
    # Select speeches from a specific country:
    c0 = df.country == country
    # Keep only MPs
    c1 = (df.Speaker_MP == "MP")
    # Keep only speeches where speaker is either Opposition or Coalition:
    c2 = df.Party_status.isin(["Opposition", "Coalition"])
    # Keep only people that have at least 10 speeches:
    gb = df[c0&c1&c2].groupby("Speaker_name").logits_pondered.count().reset_index()
    speakers_to_keep = gb.Speaker_name[gb.logits_pondered >= 10]
    c3 = df.Speaker_name.isin(speakers_to_keep)
    ndf = df[c0&c1&c2&c3]
    if not term:
        print(f"Available terms:")
        display(ndf.groupby("Term").agg({
            "Date": [min, max, "count"],
        }).sort_values(("Date", "min")), clear=True, )
        term = input(f"Choose term from {ndf.Term.unique().tolist()} (empty for all): ")
    if term:
        c0 = ndf.Term == term
        nndf = ndf[c0].reset_index(drop=True)
    else:
        nndf = ndf
    gb1 = nndf.groupby([f"{target}", "Party_status"]).agg({
        "logits_pondered": ["mean", "count"]
    }).reset_index()
    gb1 = gb1.set_axis(gb1.columns.map(lambda l: '_'.join(l).rstrip("_").replace("logits_pondered_", "")), axis=1,).pivot(
        index=f"{target}",
        columns="Party_status",
        values=["mean", "count"]
    ).reset_index()
    gb1 = gb1.set_axis(gb1.columns.map(lambda l: '_'.join(l).rstrip("_").replace("logits_pondered_", "")), axis=1,)
    gb2 = nndf.groupby([f"{target}"]).agg({
        "logits_pondered": ["mean", "count"]
    }).reset_index()
    gb2.columns = f"{target} mean count".split()

    gb2 = gb2.merge(gb1, on=f"{target}", how="left")
    gb2 = gb2.sort_values(by="mean", ascending=True)
    return gb2

Let's inspect the terms we have available, so that an approximately equal timeframe can be set:

In [3]:
df[df.country.isin(["HR", "NL"])].groupby("country Term".split()).agg({
    "Date": [min, max]
}).sort_values(by=("Date", "min"))

Unnamed: 0_level_0,Unnamed: 1_level_0,Date,Date
Unnamed: 0_level_1,Unnamed: 1_level_1,min,max
country,Term,Unnamed: 2_level_2,Unnamed: 3_level_2
HR,5. mandat,2003-12-22,2007-10-12
HR,6. mandat,2008-01-11,2011-10-28
HR,7. mandat,2011-12-22,2015-09-25
NL,Meeting of the 28th Tweede Kamer,2014-04-16,2017-10-25
NL,Meeting of the 34th Eerste Kamer,2014-12-15,2015-06-02
NL,Meeting of the 35th Eerste Kamer,2015-06-09,2019-06-04
HR,8. mandat,2015-12-03,2016-06-20
HR,9. mandat,2016-10-14,2020-05-13
NL,Meeting of the 29th Tweede Kamer,2017-10-31,2021-12-21
NL,Meeting of the 36th Eerste Kamer,2019-07-02,2022-07-12


In [4]:
calculate_sentiment("Speaker_name", country="HR", term="10. mandat")

Unnamed: 0,Speaker_name,mean,count,mean_Coalition,mean_Opposition,count_Coalition,count_Opposition
50,"Hasanbegović, Zlatko",1.068624,29,1.068624,,29.0,
46,"Grmoja, Nikola",1.088536,901,,1.088536,,901.0
28,"Bulj, Miro",1.122184,1832,,1.122184,,1832.0
18,"Beljak, Krešo",1.160652,180,1.160652,,180.0,
89,"Mlinarić, Stipo",1.166643,184,1.166643,,184.0,
118,"Spajić, Daniel",1.18825,74,1.18825,,74.0,
100,"Peović, Katarina",1.204643,1063,1.204643,,1063.0,
96,"Orešković, Dalija",1.306683,1301,1.306683,,1301.0,
102,"Petrov, Božo",1.31062,195,,1.31062,,195.0
99,"Penava, Ivan",1.315106,30,1.315106,,30.0,


In [5]:
calculate_sentiment("Speaker_name", country="NL", term="Meeting of the 36th Eerste Kamer")

Unnamed: 0,Speaker_name,mean,count,mean_Coalition,mean_Opposition,count_Coalition,count_Opposition
3,"Cliteur, Paul",1.773396,103,,1.773396,,103.0
7,"Teunissen, Christine",1.840137,90,,1.840137,,90.0
4,"Frentrop, Paul",1.85746,226,,1.85746,,226.0
6,"Knapen, Ben",1.932604,70,1.932604,,70.0,
1,"Bikker, Mirjam",2.268604,200,2.268604,,200.0,
8,"van Huffelen, Alexandra",2.391884,25,2.391884,,25.0,
0,"Adriaansens, Micky",2.550818,58,2.550818,,58.0,
5,"Karabulut, Sadet",2.71989,35,,2.71989,,35.0
2,"Bruijn, Jan Anthonie",2.887518,9352,2.887518,,9352.0,


In [6]:
calculate_sentiment("Speaker_party", country="HR", term="10. mandat")


Unnamed: 0,Speaker_party,mean,count,mean_Coalition,mean_Opposition,count_Coalition,count_Opposition
8,HSS,1.243623,1490,1.243623,,1490.0,
11,MOST,1.296922,4075,,1.296922,,4075.0
16,SIP,1.306683,1301,1.306683,,1301.0,
0,DP,1.559042,2126,1.559042,,2126.0,
17,ZK,1.767277,2190,1.767277,,2190.0,
12,Pametno,1.90515,517,1.90515,,517.0,
2,GLAS,2.03214,1426,2.03214,,1426.0,
14,SDP,2.035653,5700,2.035653,,5700.0,
1,Fokus,2.06573,133,2.06573,,133.0,
10,IDS,2.215527,549,2.215527,,549.0,


In [7]:
calculate_sentiment("Speaker_party", country="NL", term="Meeting of the 36th Eerste Kamer")

Unnamed: 0,Speaker_party,mean,count,mean_Coalition,mean_Opposition,count_Coalition,count_Opposition
3,FvD,1.831142,329,,1.831142,,329.0
4,PvdD,1.840137,90,,1.840137,,90.0
0,CDA,1.932604,70,1.932604,,70.0,
1,CU,2.268604,200,2.268604,,200.0,
2,D66,2.391884,25,2.391884,,25.0,
5,SP,2.71989,35,,2.71989,,35.0
6,VVD,2.885443,9410,2.885443,,9410.0,
