1. [Working directory and packages](#chapter1)
2. [Data](#chapter2)
3. [Preprocessing](#chapter3)
4. [Wrapper function](#chapter4)
5. [Rooduijn & Pauwels (speeches dataset](#chapter5)
   1. [Construct validity](#subparagraph1)
   2. [Face validity](#subparagraph2)
   3. [External validity](#subparagraph3)
       1. [CHES](#subparagraph4)
       2. [PopuList](#subparagraph5)
6. [Decadri & Boussalis (speeches dataset)](#chapter6)
   1. [Construct validity](#subparagraph6)
   2. [Face validity](#subparagraph7)
   3. [External validity](#subparagraph8)
       1. [CHES](#subparagraph9)
       2. [PopuList](#subparagraph10)
7. [Grundl (Manifesto project)](#chapter7)
8. [Decadri and Boussalis (Manifesto Project)](#chapter8)
9. [Decadri and Boussalis + Grundl](#chapter9)
10.[Keywords in context](#chapter10)

# Working directory and packages <a class="anchor" id="chapter1"></a>

Setting the working directory

In [1]:
setwd("C:/Users/jacop/Tesi/")

Loading the libraries

In [2]:
suppressWarnings(suppressPackageStartupMessages(library(dtplyr)))
suppressWarnings(suppressPackageStartupMessages(library(tidyverse)))
suppressWarnings(suppressPackageStartupMessages(library(data.table)))
suppressWarnings(suppressPackageStartupMessages(library(quanteda)))
suppressWarnings(suppressPackageStartupMessages(library(manifestoR)))

The 'tokens_group' function often returns an error when grouping the tokens by more than one variable. One way to fix this is to install a previous version of Quanteda. Let's check which of version we currently have installed.

In [3]:
sessionInfo()

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] manifestoR_1.5.0  tm_0.7-8          NLP_0.2-1         quanteda_2.1.2   
 [5] data.table_1.14.2 forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7      
 [9] purrr_0.3.4       readr_2.1.0       tidyr_1.1.4       tibble_3.1.6     
[13] ggplot2_3.3.5     tidyverse_1.3.1   dtplyr_1.1.0     

loaded via a namespace (and not attached):
 [1] httr_1.4.2         jsonlite_1.7.2     tmvnsim_1.0-2      modelr_0.1.8      
 [5] functional_0.6     RcppParallel_5.1.4 assertthat_0.2.1   cellranger_1.1.0  
 [9] yaml_2.2.

If it's the latest one, we'll need to unistall it and replace it with a previous version (2.1.2 in this case but others may work as well)

In [4]:
# remove.packages('quanteda')
# devtools::install_version("quanteda", version = "2.1.2", repos = "http://cran.us.r-project.org")

# Data <a class="anchor" id="chapter2"></a>

## Speeches dataset

Loading the data and turning it into a lazy data.table so that we can use dtplyr on it

In [5]:
# load("data/parliamentary_groups2.rds")
texts <- readRDS("data/joined_texts.rds") %>% lazy_dt() %>% as_tibble()

Casting the "legislatura" variable as numeric

In [6]:
texts <- texts %>% mutate(legislatura = as.integer(legislatura)) %>% as_tibble()

Filtering the dataset by focusing on the last seven legislatures

In [7]:
texts <- texts %>% filter(legislatura >= 12) %>% as_tibble()

## Project Manifesto dataset

Setting the API key in our work environment

In [8]:
mp_setapikey("data/manifesto_apikey.txt")

Filtering the dataset by focusing only on the following parties: LN, M5S, PdL, FI, SC, CD, UDC, FDI-CDN, SEL, PD

In [9]:
party_codes <- c(32061, 32230, 32440, 32460, 32530, 32610, 32630, 32720, 32956, 32450)

ita_manifestoes <- mp_corpus(countryname == "Italy" & party %in% party_codes)

Connecting to Manifesto Project DB API... 
Connecting to Manifesto Project DB API... corpus version: 2021-1 
Connecting to Manifesto Project DB API... 
Connecting to Manifesto Project DB API... corpus version: 2021-1 
Connecting to Manifesto Project DB API... corpus version: 2021-1 
Connecting to Manifesto Project DB API... corpus version: 2021-1 


## External validity datasets

Let's load the two datasets we'll be using to test the dictionaries' external validity: the Chapel Hill Expert Survey and the PopuList dataset.

In [10]:
ches <- read_csv("data/1999-2019_CHES_dataset_means(v2).csv", show_col_types = FALSE)

populist <- readxl::read_xlsx("data/populist-version-2-20200626.xlsx")

## Stopwords

Decadri and Boussalis' additional stopwords

In [11]:
db_additional_stopwords  <- suppressMessages(read_csv("data/it_stopwords_new_list.csv")) %>% 
                            pull(stopwords)

Procedural stopwords

In [12]:
procedural_stopwords <- suppressMessages(read_csv("data/it_stopwords_procedural.csv")) %>% 
                        pull(it_stopwords_procedural)

## Dictionaries

Rooduijn and Pauwels' dictionary

In [13]:
anti_elitism <- c("elit*", "consens*", "antidemocratic*", "referend*", "corrot*", "propagand*", 
                  "politici*","ingann*", "tradi*", "vergogn*", "scandal*", "verita", "disonest*", 
                  "partitocrazia", "menzogn*", "mentir*")

rp_dictionary <- dictionary(list(anti_elitism = anti_elitism))

Decadri and Boussalis' dictionary

In [14]:
anti_elitism <- c("antidemocratic*", "casta", "consens*", "corrot*", "disonest*", "elit*", 
                  "establishment", "ingann*", "mentir*", "menzogn*", "partitocrazia", "propagand*", 
                  "scandal*", "tradim*", "tradir*", "tradit*", "vergogn*", "verita")

people_centrism  <- c("abitant*", "cittadin*", "consumator*", "contribuent*", "elettor*", "gente", "popol*")

db_dictionary <- dictionary(list(anti_elitism = anti_elitism, 
                                 people_centrism = people_centrism))

The integral translation of Grundl's dictionary

In [15]:
grundl <- readxl::read_xlsx("data/gruendl_terms_Fedra_Silvia_comments3.xlsx", sheet = 1) %>% 
filter(!is.na(Italian_integral)) %>% # Removing nulls
mutate(Italian_integral = str_split(Italian_integral, ', ')) %>% # Some cells contain more than one value: let's split and unnest everything
unnest(cols = c(Italian_integral)) %>% 
distinct(Italian_integral) %>% # Removing duplicate terms
pull(Italian_integral) # Extracting the 'terms' vector

g_dictionary <- dictionary(list(populism = grundl))

Combining Decadri and Boussalis' dictionary to a translation of Grundl's dictionary adapted to the Italian context

In [16]:
dbg <- readxl::read_xlsx("data/gruendl_terms_Fedra_Silvia_comments3.xlsx", sheet = 1) %>% 
filter(!is.na(Decadri_Boussalis_Grundl)) %>% 
mutate(Decadri_Boussalis_Grundl = str_split(Decadri_Boussalis_Grundl, ', ')) %>% 
unnest(cols = c(Decadri_Boussalis_Grundl)) %>% 
distinct(Decadri_Boussalis_Grundl) %>% 
pull(Decadri_Boussalis_Grundl)

dbg_dictionary <- dictionary(list(populism = dbg))

# Preprocessing <a class="anchor" id="chapter3"></a>

## Speeches dataset

Creating the corpus

In [17]:
speeches_corpus <- corpus(texts, text_field = "textclean")

Tokenizing the corpus, removing stopwords and grouping the tokens by the 'year' and 'gruppoP' variables

In [18]:
speeches_toks <- speeches_corpus %>% 
                 tokens(., remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_separators = TRUE)  %>% 
                 tokens_remove(., pattern = stopwords("it"), padding = TRUE) %>% 
                 tokens_remove(., pattern = db_additional_stopwords) %>% 
                 tokens_remove(., pattern = procedural_stopwords) %>% 
                 quanteda:::tokens_group(x = ., groups = c('year', 'gruppoP'))

## Manifesto project dataset

Creating the corpus, tokenizing it, removing stopwords and grouping the tokens by the 'party' variable

In [19]:
manifesto_corpus <- corpus(ita_manifestoes)

manifesto_toks <- manifesto_corpus %>% 
                  tokens(., remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_separators = TRUE)  %>% 
                  tokens_remove(., pattern = stopwords("it"), padding = TRUE) %>% 
                  tokens_remove(., pattern = db_additional_stopwords) %>% 
                  tokens_remove(., pattern = procedural_stopwords) %>% 
                  quanteda:::tokens_group(x = ., groups = 'party')

# Wrapper function <a class="anchor" id="chapter4"></a>

In [20]:
dict_analysis <- function(tokens, data, dictionary) {
        
  # Applying Rooduijn and Pauwels' dictionary to the speeches dataset
  
  if (data == "speeches" & dictionary == "Rooduijn_Pauwels") {
    
  my_dfm <- tokens_lookup(x = tokens, dictionary = rp_dictionary) %>% 
            dfm(.)  %>% 
            convert(., to = "data.frame") %>% 
            mutate(year = docvars(tokens)$year,
                   party = docvars(tokens)$gruppoP,
                   total_toks = ntoken(tokens),
                   perc_of_populist_toks = anti_elitism / total_toks,
                   standardized_perc_of_populist_toks = as.double(scale(perc_of_populist_toks))) %>% 
            relocate(doc_id, year, party, anti_elitism, total_toks, perc_of_populist_toks, 
                     standardized_perc_of_populist_toks) %>% 
            as_tibble()

  }
    
  # Applying Decadri and Boussalis' dictionary to the speeches dataset
  
  if (data == 'speeches' & dictionary == "Decadri_Boussalis") {
    
    my_dict_lookup <- 
    
    my_dfm <- tokens_lookup(x = tokens, dictionary = db_dictionary) %>% 
              dfm(.) %>% 
              convert(., to = "data.frame") %>% 
              mutate(year = docvars(tokens)$year,
                     party = docvars(tokens)$gruppoP,
                     populist_toks = anti_elitism + people_centrism,
                     total_toks = ntoken(tokens),
                     perc_of_populist_toks = populist_toks / total_toks,
                     standardized_perc_of_populist_toks = as.double(scale(perc_of_populist_toks))) %>% 
              relocate(doc_id, year, party, anti_elitism, people_centrism, populist_toks,
                       total_toks, perc_of_populist_toks, standardized_perc_of_populist_toks) %>% 
              as_tibble()
    
  }
    
  # Applying Grundl's dictionary
    
  if (data == "manifesto" & dictionary == "Grundl") {
      
      my_dfm <- tokens_lookup(x = tokens, dictionary = g_dictionary) %>% 
                dfm(.)  %>% 
                convert(., to = "data.frame")  %>% 
                rename(party = doc_id) %>% 
                mutate(party = case_when(
                                           party == '32630' ~ 'FDI-CDN', 
                                           party == '32610' ~ 'FI',
                                           party == '32720' ~ 'LN',
                                           party == '32956' ~ 'M5S',
                                           party == '32061' ~ 'PdL',
                                           party == '32460' ~ 'SC',
                                           party == '32450' ~ 'CD',
                                           party == '32530' ~ 'UDC',
                                           party == '32230' ~ 'SEL',
                                           party == '32440' ~ 'PD'),
                       total_toks = ntoken(tokens),
                       perc_of_populist_toks = populism / total_toks,
                       standardized_perc_of_populist_toks = as.double(scale(perc_of_populist_toks))) %>% 
               arrange(desc(perc_of_populist_toks)) %>% 
               as_tibble()
  }
    
  if (data == "manifesto" & dictionary == "Decadri_Boussalis") {
      
      my_dfm <- tokens_lookup(x = tokens, dictionary = db_dictionary) %>% 
                dfm(.) %>% 
                convert(., to = "data.frame") %>% 
                rename(party = doc_id) %>% 
                mutate(party = case_when(
                                           party == '32630' ~ 'FDI-CDN', 
                                           party == '32610' ~ 'FI',
                                           party == '32720' ~ 'LN',
                                           party == '32956' ~ 'M5S',
                                           party == '32061' ~ 'PdL',
                                           party == '32460' ~ 'SC',
                                           party == '32450' ~ 'CD',
                                           party == '32530' ~ 'UDC',
                                           party == '32230' ~ 'SEL',
                                           party == '32440' ~ 'PD'),
                       total_toks = ntoken(tokens),
                       populist_toks = anti_elitism + people_centrism,
                       perc_of_populist_toks = populist_toks / total_toks,
                       standardized_perc_of_populist_toks = as.double(scale(perc_of_populist_toks))) %>% 
                arrange(desc(perc_of_populist_toks)) %>% 
                as_tibble()
  }
    
    
  if (data == "manifesto" & dictionary == "Decadri_Boussalis_Grundl") {
      
      my_dfm <- tokens_lookup(x = tokens, dictionary = dbg_dictionary) %>% 
                dfm(.) %>% 
                convert(., to = "data.frame") %>% 
                rename(party = doc_id) %>% 
                mutate(party = case_when(
                                           party == '32630' ~ 'FDI-CDN', 
                                           party == '32610' ~ 'FI',
                                           party == '32720' ~ 'LN',
                                           party == '32956' ~ 'M5S',
                                           party == '32061' ~ 'PdL',
                                           party == '32460' ~ 'SC',
                                           party == '32450' ~ 'CD',
                                           party == '32530' ~ 'UDC',
                                           party == '32230' ~ 'SEL',
                                           party == '32440' ~ 'PD'),
                       total_toks = ntoken(tokens),
                       perc_of_populist_toks = populism / total_toks,
                       standardized_perc_of_populist_toks = as.double(scale(perc_of_populist_toks))) %>% 
                arrange(desc(perc_of_populist_toks)) %>% 
                as_tibble()
  }  
  
  return(my_dfm)
    
  
}


# Rooduijn & Pauwels (speeches dataset) <a class="anchor" id="chapter5"></a>

Let's run the dictionary analysis by using Roodujin and Pauwels' dictionary

In [21]:
df_rp <- dict_analysis(tokens = speeches_toks, data = "speeches", dictionary = "Rooduijn_Pauwels")

The first rows of the dataframe

In [22]:
head(df_rp)

doc_id,year,party,anti_elitism,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>
1994.AN,1994,AN,245,206560,0.001186096,0.45833531
1995.AN,1995,AN,508,350039,0.0014512669,1.06035033
1996.AN,1996,AN,229,272610,0.0008400279,-0.3273403
1997.AN,1997,AN,414,381823,0.001084272,0.22716511
1998.AN,1998,AN,740,746852,0.0009908255,0.01501423
1999.AN,1999,AN,472,399795,0.0011806051,0.44586916


## Construct validity <a class="anchor" id="subparagraph1"></a>

Rooduijn and Pauwels' dictionary captures the "anti-elitism" component of populism, but not the "people-centrism" one. As a result, from a construct validity standpoint, it is only partially valid. The authors motivated the decision to leave out the "people-centrism" dimension by pointing out that the "people" is often referenced to by words such as "us", "we" and "our" which are also used to reference entities other than the people (such as political parties). The inclusion of these words in the dictionary, they argue, would result in a large number of false positives.

## Face validity <a class="anchor" id="subparagraph2"></a>

A populist dictionary has face validity if the allegedly populist parties are indeed populist. In the Italian case, we would expect populist values to be higher for parties that the literature deems populist (i.e. Five Star Movement, Lega Nord, Forza Italia and Il Popolo delle Libertà).

The following are the 20 party-year combinations with the highest populist score in the 1994-2021 period. Consistently with our expectations, we find populist parties such as FDI (2013, 2014, 2017), FI-PDL (2019), and LEGA (1995). However, we also find mainstream parties such as SI-SEL-POS-LU (2016, 2018), IV (2018), PD (2018, 2019). These results could be interpreted as evidence of either populist contagion of mainstream parties or lack of face validity. The absence of M5S among the most populist parties makes me lean towards the latter.

In [23]:
df_rp %>% 
arrange(desc(standardized_perc_of_populist_toks)) %>% 
head(20)

doc_id,year,party,anti_elitism,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>
1996.FLD,1996,FLD,11,2894,0.003800968,6.394856
1996.PPI,1996,PPI,34,9296,0.003657487,6.069113
2018.SI-SEL-POS-LU,2018,SI-SEL-POS-LU,7,2189,0.003197807,5.025506
1995.FLD,1995,FLD,120,57421,0.002089828,2.51007
2000.DEM-U,2000,DEM-U,108,52222,0.002068094,2.460727
2016.SI-SEL-POS-LU,2016,SI-SEL-POS-LU,346,168754,0.002050322,2.42038
2014.FDI,2014,FDI,99,51695,0.001915079,2.113339
2019.FI-PDL,2019,FI-PDL,949,533560,0.001778619,1.803535
1995.DEMO,1995,DEMO,85,49325,0.001723264,1.677863
1996.PROGR-F,1996,PROGR-F,46,26734,0.001720655,1.671941


The following are the party-year combinations with the lowest populist scores. Consistenly with our expectations, all parties included in this subset are mainstream. This might interpreted as evidence of face validity in Rooduijn and Pauwels' dictionary.

In [71]:
df_rp %>% 
arrange(desc(standardized_perc_of_populist_toks)) %>% 
tail(20) %>% 
arrange(standardized_perc_of_populist_toks)

doc_id,year,party,anti_elitism,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>
2018.AP-CPE-NCD-NCI,2018,AP-CPE-NCD-NCI,0,269,0.0,-2.234448
2018.CI,2018,CI,0,550,0.0,-2.234448
2008.COM/IT/,2008,COM/IT/,0,770,0.0,-2.234448
2008.DCA-NPSI,2008,DCA-NPSI,0,269,0.0,-2.234448
2009.DCA-NPSI,2009,DCA-NPSI,0,26,0.0,-2.234448
2013.FLPTP,2013,FLPTP,0,2,0.0,-2.234448
1994.LIFED,1994,LIFED,0,217,0.0,-2.234448
1995.LIFED,1995,LIFED,0,1870,0.0,-2.234448
1996.LIFED,1996,LIFED,0,979,0.0,-2.234448
2018.MDP-LU,2018,MDP-LU,0,1252,0.0,-2.234448


## External validity <a class="anchor" id="subparagraph3"></a>

### Chapel Hill Expert Survey <a class="anchor" id="subparagraph4"></a>

As Rooduijn and Pauwels' dictionary only captures the anti-elite dimension of populism, the external validity will be carried out against the anti-elite salience variable from the CHES dataset, which has been introduced in 2014.

The countrycode for Italy is 8. The following is a list of all Italian parties in the CHES dataset in the 2014-2019 time period.

In [25]:
ches %>% filter(country == 8 & year >= 2014 & year <= 2019) %>% distinct(party)

party
<chr>
UDC
SC
VdA
PD
FI
LN
FdI
SEL
M5S
CD


While these are the parties included in our dataset in the same timeframe

In [26]:
df_rp %>% filter(year >= 2014 & year <= 2019) %>% distinct(party)

party
<chr>
AP-CPE-NCD-NCI
CI
DES-CD
FDI
FI-PDL
IV
LEGA
LEU
M5S
MDP-LU


Let's now compare how R&P' dictionary and the CHES dataset ranked party-year combinations by populism in 2014 and 2019. We'll only keep parties that are present in both datasets.

The difference between the two rankings is stark. PD (2019) ranks among the most populist party-year combinations according to the dictionary analysis while the opposite is true in the CHES dataset. Moreover, Lega (2019) and M5S (2019), two of the most populist party-year combinations according to CHES, are only slightly populist according to R&P' dictionary.

In [27]:
df_rp %>% 
filter((year == 2014 | year == 2019) & party != "MISTO" & party != "IV") %>% 
arrange(desc(standardized_perc_of_populist_toks))

doc_id,year,party,anti_elitism,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>
2014.FDI,2014,FDI,99,51695,0.0019150788,2.113338668
2019.FI-PDL,2019,FI-PDL,949,533560,0.0017786191,1.803535286
2019.PD-ULIVO,2019,PD-ULIVO,888,557509,0.0015927994,1.38167044
2014.MDP-LU,2014,MDP-LU,148,101300,0.0014610069,1.082463081
2019.FDI,2019,FDI,368,262423,0.0014023161,0.949217856
2014.AP-CPE-NCD-NCI,2014,AP-CPE-NCD-NCI,136,101160,0.0013444049,0.817742543
2019.LEU,2019,LEU,71,53261,0.001333058,0.791981675
2014.LEGA,2014,LEGA,164,148505,0.0011043399,0.27272507
2014.M5S,2014,M5S,561,552684,0.0010150466,0.070003146
2019.LEGA,2019,LEGA,216,214370,0.0010076037,0.053105601


In [28]:
to_drop <- c('VdA', 'SVP', 'RI')

ches %>% 
filter(country == 8 & year >= 2014 & year <= 2019 & (!party %in% to_drop))  %>% 
group_by(party, year) %>% 
summarize(mean_anti_elite_salience = mean(antielite_salience), .groups = "keep") %>% 
arrange(desc(mean_anti_elite_salience))

party,year,mean_anti_elite_salience
<chr>,<dbl>,<dbl>
M5S,2014,10.0
RC,2014,9.333333
M5S,2019,8.888889
LN,2014,8.8
LN,2019,8.333333
FdI,2019,8.0
SEL,2014,6.8
FdI,2014,6.25
PD,2014,4.4
FI,2019,4.176471


### The PopuList <a class="anchor" id="subparagraph5"></a>

All the Italian parties in the PopuList dataset

In [29]:
populist %>% filter(country_name == "Italy") %>% distinct(party_name)

party_name
<chr>
Fiamma Tricolore
Forza Italia – Il Popolo della Libertà
Fratelli d'Italia – Centrodestra Nazionale
Il Popolo della Libertà
Lega (Nord)
Lega d'Azione Meridionale
Liga Veneta
Movimento 5 Stelle
Movimento Sociale Italiano
Partito dei Comunisti Italiani


Let's compare the populism scores between PopuList and R&D' dictionary by focusing on parties that are present in both datasets.

According to the dictionary analysis, FI-PDL, FDI, Lega and M5S have higher populism scores compared to most parties. These parties are all coded as populist in the PopuList dataset. However, the dictionary analysis also assigned high populism scores to left-wing parties such as SI-SEL-POS-LU (2016-2018) and RC (1995, 1999) which are have been labeled as not-populist in the PopuList dataset. The two measures are thus only partially consistent.

In [30]:
to_keep <- c("F-ITA", "FI", "PDL", "FI-PDL", "FDI-AN", "FDI", "LEGA-N", "LEGA-NORD-P", "LNA", "LEGA", "LNP", "M5S", 
             "RC-PROGR", "COMUNISTA", "RC", "COM/IT/", "RC-SE", "SI-SEL-POS-LU")

df_rp %>% 
filter(party %in% to_keep) %>% 
arrange(desc(perc_of_populist_toks)) %>% 
head(20)

doc_id,year,party,anti_elitism,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>
2018.SI-SEL-POS-LU,2018,SI-SEL-POS-LU,7,2189,0.003197807,5.0255059
2016.SI-SEL-POS-LU,2016,SI-SEL-POS-LU,346,168754,0.002050322,2.4203796
2014.FDI,2014,FDI,99,51695,0.001915079,2.1133387
2019.FI-PDL,2019,FI-PDL,949,533560,0.001778619,1.8035353
2013.FDI,2013,FDI,140,82996,0.001686828,1.5951434
2017.FDI,2017,FDI,58,38123,0.001521391,1.2195532
1995.LEGA,1995,LEGA,238,156792,0.001517935,1.2117055
1995.RC,1995,RC,445,303724,0.001465146,1.09186
2018.FDI,2018,FDI,194,135424,0.001432538,1.0178299
2000.LEGA,2000,LEGA,420,293739,0.001429841,1.0117069


In [31]:
to_drop <- c("Fiamma Tricolore", "Lega d'Azione Meridionale", "Movimento Sociale Italiano")

populist %>% 
filter(country_name == "Italy" & (!party_name %in% to_drop)) %>% 
select(party_name, populist) %>% 
arrange(desc(populist))

party_name,populist
<chr>,<dbl>
Forza Italia – Il Popolo della Libertà,1
Fratelli d'Italia – Centrodestra Nazionale,1
Il Popolo della Libertà,1
Lega (Nord),1
Liga Veneta,1
Movimento 5 Stelle,1
Partito dei Comunisti Italiani,0
Partito della Rifondazione Comunista,0
Rivoluzione Civile,0
Sinistra,0


# Decadri & Boussalis (speeches dataset) <a class="anchor" id="chapter6"></a>

Let's run the dictionary analysis with Decadri and Boussalis' dictionary

In [32]:
df_db <- dict_analysis(tokens = speeches_toks, data = "speeches", dictionary = "Decadri_Boussalis")

The first rows of the dataframe

In [33]:
head(df_db)

doc_id,year,party,anti_elitism,people_centrism,populist_toks,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
1994.AN,1994,AN,168,475,643,206560,0.003112897,-0.1599508
1995.AN,1995,AN,285,1024,1309,350039,0.003739583,0.2260893
1996.AN,1996,AN,150,476,626,272610,0.002296321,-0.6629634
1997.AN,1997,AN,279,660,939,381823,0.002459255,-0.5625958
1998.AN,1998,AN,446,1490,1936,746852,0.002592214,-0.4806927
1999.AN,1999,AN,311,824,1135,399795,0.002838955,-0.3286996


## Construct validity <a class="anchor" id="subparagraph6"></a>

Decadri and Boussalis' dictionary catpures both the "anti-elitism" and "people-centrism" dimenions of populist ideology and it thus constitutes an improvement over Rooduijn and Pauwels' dictionary in terms of construct validity.

## Face validity <a class="anchor" id="subparagraph7"></a>

To assess the face validity of Decadri and Boussalis' dictionary we'll have a look at the mean % of populist tokens (both anti-establishment and people-centrism) grouped by party and year.

As it was the case for R&P' dictionary, both mainstream (UDEUR, FLPTP, PPI, DEMO) and populist (Lega, M5S, FDI-AN) party-year combinations received high populist scores.

In [34]:
df_db %>% 
arrange(desc(standardized_perc_of_populist_toks)) %>% 
head(20)

doc_id,year,party,anti_elitism,people_centrism,populist_toks,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
2009.SOCRAD-RNP,2009,SOCRAD-RNP,0,10,10,582,0.017182131,8.506726
2008.UDEUR,2008,UDEUR,2,33,35,2338,0.01497006,7.144086
2008.DCA-NPSI,2008,DCA-NPSI,0,3,3,269,0.011152416,4.79241
1996.PPI,1996,PPI,22,72,94,9296,0.010111876,4.151435
1996.FLD,1996,FLD,8,13,21,2894,0.007256393,2.392451
2008.FLPTP,2008,FLPTP,9,142,151,23058,0.006548703,1.956513
2008.SDPSE,2008,SDPSE,4,34,38,6108,0.006221349,1.754862
2008.LEGA,2008,LEGA,41,498,539,94856,0.005682297,1.422805
2015.LEGA,2015,LEGA,80,617,697,122917,0.005670493,1.415534
2014.FDI,2014,FDI,82,209,291,51695,0.005629171,1.390079


Similarly, when we look at the party-year combinations with the lowest populist scores we find both mainstream and populist parties. This seems to suggest that D&B' dictionary lacks face validity.

In [35]:
df_db %>% 
arrange(desc(standardized_perc_of_populist_toks)) %>% 
tail(20) %>% 
arrange(standardized_perc_of_populist_toks)

doc_id,year,party,anti_elitism,people_centrism,populist_toks,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
2018.CI,2018,CI,0,0,0,550,0.0,-2.0775017
2008.COM/IT/,2008,COM/IT/,0,0,0,770,0.0,-2.0775017
2009.DCA-NPSI,2009,DCA-NPSI,0,0,0,26,0.0,-2.0775017
2013.FLPTP,2013,FLPTP,0,0,0,2,0.0,-2.0775017
1994.LIFED,1994,LIFED,0,0,0,217,0.0,-2.0775017
1996.LIFED,1996,LIFED,0,0,0,979,0.0,-2.0775017
2009.SDPSE,2009,SDPSE,0,0,0,80,0.0,-2.0775017
2016.DES-CD,2016,DES-CD,1,3,4,5886,0.0006795787,-1.6588799
1996.UDEUR,1996,UDEUR,10,13,23,29778,0.0007723823,-1.6017127
2001.UDR,2001,UDR,2,2,4,4213,0.0009494422,-1.4926434


## External validity <a class="anchor" id="subparagraph8"></a>

### Chapel Hill Expert Survey <a class="anchor" id="subparagraph9"></a>

As Decadri and Boussalis' dictionary captures both dimensions of populism we will validate it against a combination of two different variables from the CHES dataset, i.e. "anti-élite salience" and "people_vs_élite". We'll use the former as a proxy for the anti-establishment component and the latter as a proxy for the people-centrist one. The "people_vs_élite" variable has been introduced in the 2019 edition of the dataset, so we'll only work with observations from that year.

The following are the Italian parties in the CHES dataset for the year 2019

In [36]:
ches %>% filter(country == 8 & year == 2019) %>% select(party, antielite_salience, people_vs_elite)

party,antielite_salience,people_vs_elite
<chr>,<dbl>,<dbl>
RI,2.2,3.357143
M5S,8.888889,9.529411
SI,3.785714,2.666667
FdI,8.0,6.625
PD,1.882353,2.0625
LN,8.333333,6.9375
SVP,2.166667,1.4
FI,4.176471,4.066667


The parties in our dataset in the same year

In [37]:
df_db %>% filter(year == 2019) %>% distinct(party)

party
<chr>
FDI
FI-PDL
IV
LEGA
LEU
M5S
MISTO
PD-ULIVO


Let's compute the average populist value for each party in the CHES dataset by summing the people vs elite and the anti-elite salience variables and then taking the mean. "Radicali Italiani" and "Südtiroler Volkspartei" are not in our dataset so we'll drop them from CHES.

In [38]:
to_drop <- c("RI", "SVP")

ches %>% 
filter(country == 8 & year == 2019 & (!party %in% to_drop)) %>% 
group_by(party) %>% 
summarize(mean_populism = mean(people_vs_elite + antielite_salience)) %>% 
arrange(desc(mean_populism))

party,mean_populism
<chr>,<dbl>
M5S,18.418301
LN,15.270833
FdI,14.625
FI,8.243137
SI,6.452381
PD,3.944853


The two rankings are rather different. According to CHES, M5S and Lega rank as the two most populist parties, whereas in the results of the dictionary analysis they turned out to be the least populist ones.

In [39]:
to_drop <- c("IV", "MISTO")

df_db %>% 
filter(year == 2019 & (! party %in% to_drop)) %>% 
arrange(desc(perc_of_populist_toks))

doc_id,year,party,anti_elitism,people_centrism,populist_toks,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
2019.FI-PDL,2019,FI-PDL,383,2108,2491,533560,0.004668641,0.79839064
2019.FDI,2019,FDI,226,823,1049,262423,0.003997363,0.38488214
2019.PD-ULIVO,2019,PD-ULIVO,512,1638,2150,557509,0.00385644,0.29807325
2019.LEU,2019,LEU,38,160,198,53261,0.003717542,0.21251175
2019.LEGA,2019,LEGA,144,603,747,214370,0.003484629,0.06903715
2019.M5S,2019,M5S,204,1547,1751,509145,0.003439099,0.04099027


### The PopuList <a class="anchor" id="subparagraph10"></a>

Let's now compare D&B' dictionary with the PopuList dataset.

Lega, FdI, FI/PdL and M5S rank among the most populist parties according to D&B' dictionary. These parties have all been coded as populist by PopuList. The two measures can thus be considered to be similar.

In [40]:
populist %>% 
filter(country_name == "Italy") %>%
select(party_name, populist) %>% 
arrange(desc(populist))

party_name,populist
<chr>,<dbl>
Forza Italia – Il Popolo della Libertà,1
Fratelli d'Italia – Centrodestra Nazionale,1
Il Popolo della Libertà,1
Lega (Nord),1
Lega d'Azione Meridionale,1
Liga Veneta,1
Movimento 5 Stelle,1
Fiamma Tricolore,0
Movimento Sociale Italiano,0
Partito dei Comunisti Italiani,0


In [41]:
to_keep <- c("F-ITA", "FI", "PDL", "FI-PDL", "FDI-AN", "FDI", "LEGA-N", "LEGA-NORD-P", "LNA", "LEGA", "LNP", "M5S", 
             "RC-PROGR", "COMUNISTA", "RC", "COM/IT/", "RC-SE", "SI-SEL-POS-LU")

df_db %>% 
filter(party %in% to_keep) %>% 
arrange(desc(perc_of_populist_toks)) %>% 
head(20)

doc_id,year,party,anti_elitism,people_centrism,populist_toks,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
2008.LEGA,2008,LEGA,41,498,539,94856,0.005682297,1.4228052
2015.LEGA,2015,LEGA,80,617,697,122917,0.005670493,1.4155337
2014.FDI,2014,FDI,82,209,291,51695,0.005629171,1.3900793
2017.LEGA,2017,LEGA,72,483,555,103023,0.005387147,1.2409917
2015.M5S,2015,M5S,449,2332,2781,527990,0.005267145,1.1670706
2015.FI-PDL,2015,FI-PDL,177,1181,1358,263332,0.005156988,1.0992136
2000.LEGA,2000,LEGA,279,1158,1437,293739,0.004892098,0.9360408
2014.LEGA,2014,LEGA,122,603,725,148505,0.004881991,0.9298145
2006.LEGA,2006,LEGA,50,322,372,77125,0.004823339,0.8936849
2006.COM/IT/,2006,COM/IT/,5,48,53,11028,0.004805948,0.8829725


# Grundl (Manifesto project) <a class="anchor" id="chapter7"></a>

Running the dictionary analysis on the Manifesto Project dataset with Grundl's dictionary

In [42]:
dict_analysis(tokens = manifesto_toks, data = "manifesto", dictionary = "Grundl")

party,populism,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<dbl>,<int>,<dbl>,<dbl>
FDI-CDN,46,13480,0.003412463,1.41739125
SC,27,8385,0.003220036,0.93933809
FI,94,29483,0.003188278,0.86044115
LN,305,97142,0.003139734,0.73984066
UDC,25,8738,0.002861067,0.04753895
PD,86,31632,0.002718766,-0.3059837
CD,42,15853,0.002649341,-0.47845852
M5S,454,172008,0.002639412,-0.50312472
SEL,41,16357,0.002506572,-0.83314361
PdL,14,6719,0.002083643,-1.88383957


# Decadri and Boussalis (Manifesto Project) <a class="anchor" id="chapter8"></a>

Running the dictionary analysis on the Manifesto Project dataset with Decadri and Boussalis' dictionary

In [43]:
dict_analysis(tokens = manifesto_toks, data = "manifesto", dictionary = "Decadri_Boussalis")

party,anti_elitism,people_centrism,total_toks,populist_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<dbl>
UDC,4,51,8738,55,0.006294347,2.0371665
FI,5,139,29483,144,0.004884171,0.8311817
FDI-CDN,6,57,13480,63,0.004673591,0.6510933
LN,20,398,97142,418,0.004302979,0.3341459
PdL,1,24,6719,25,0.003720792,-0.1637417
SC,2,27,8385,29,0.003458557,-0.3880054
M5S,28,563,172008,591,0.003435887,-0.407393
SEL,7,48,16357,55,0.003362475,-0.470175
PD,4,84,31632,88,0.002781993,-0.966604
CD,1,34,15853,35,0.002207784,-1.4576684


# Decadri and Boussalis + Grundl <a class="anchor" id="chapter9"></a>

In [44]:
dict_analysis(tokens = manifesto_toks, data = "manifesto", dictionary = "Decadri_Boussalis_Grundl")

party,populism,total_toks,perc_of_populist_toks,standardized_perc_of_populist_toks
<chr>,<dbl>,<int>,<dbl>,<dbl>
UDC,58,8738,0.006637675,2.0194866
FI,156,29483,0.005291185,0.8143831
FDI-CDN,67,13480,0.004970326,0.5272161
LN,466,97142,0.004797101,0.3721801
SC,38,8385,0.004531902,0.134828
SEL,66,16357,0.00403497,-0.3099247
M5S,664,172008,0.003860286,-0.4662664
PdL,25,6719,0.003720792,-0.5911129
PD,99,31632,0.003129742,-1.1201002
CD,45,15853,0.002838579,-1.3806897


# Keywords in context <a class="anchor" id="chapter10"></a>

puzza sotto il naso

In [45]:
kwic(x = manifesto_toks, pattern = phrase("puzza sotto il naso"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


senso di superiorita

In [46]:
kwic(x = manifesto_toks, pattern = phrase("senso di superiorita"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


pezz? gross?

In [47]:
kwic(x = manifesto_toks, pattern = phrase("pezz? gross?"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


uomo della strada

In [48]:
kwic(x = manifesto_toks, pattern = phrase("uomo della strada"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


democrazia diretta

In [49]:
kwic(x = manifesto_toks, pattern = phrase("democrazia diretta"))

Unnamed: 0_level_0,docname,from,to,pre,keyword,post,pattern
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>
1,32720,70876,70877,rafforzamento istituti,democrazia diretta,concepiti correttivo possibili,democrazia diretta
2,32956,2700,2701,mirano stravolgerla semplificazione partecipazione,democrazia diretta,miglioramento rapporto cittadini,democrazia diretta
3,32956,2935,2936,referendum popolare crediamo,democrazia diretta,referendum popolare davvero esprimere,democrazia diretta
4,32956,3623,3624,referendum propositivi quórum,democrazia diretta,rivoluzionaria concezione,democrazia diretta
5,32956,5703,5704,riferimento tramite ricorso,democrazia diretta,seguito adeguato,democrazia diretta


italiano medio

In [50]:
kwic(x = manifesto_toks, pattern = phrase("italiano medio"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


uomo medio

In [51]:
kwic(x = manifesto_toks, pattern = phrase("uomo medio"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


torre d avorio

In [52]:
kwic(x = manifesto_toks, pattern = phrase("torre d avorio"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


solit? partit

In [53]:
kwic(x = manifesto_toks, pattern = phrase("solit? partit"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


vecch? partit?

In [54]:
kwic(x = manifesto_toks, pattern = phrase("vecch? partit?"))

Unnamed: 0_level_0,docname,from,to,pre,keyword,post,pattern
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>
1,32530,1555,1556,giudizio,vecchi partiti,neanche disinvolti protagonisti,vecch? partit?
2,32530,1711,1712,pensiero politico,vecchi partiti,tramontati partiti necessari,vecch? partit?
3,32720,36941,36942,battaglia referendum,vecchi partiti,voluti minimamente,vecch? partit?
4,32956,3111,3112,tentativi fermati,vecchi partiti,rinviato l'approvazione,vecch? partit?
5,32956,3564,3565,rappresentanti,vecchi partiti,parla rimuovere Fiscal,vecch? partit?


uomini onesti

In [55]:
kwic(x = manifesto_toks, pattern = phrase("uomini onesti"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


senso comune

In [56]:
kwic(x = manifesto_toks, pattern = phrase("senso comune"))

Unnamed: 0_level_0,docname,from,to,pre,keyword,post,pattern
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>
1,32530,7107,7108,separano,senso comune,idee opposte,senso comune
2,32956,35436,35437,scriveva parole,senso comune,insegna l'acqua,senso comune
3,32956,35460,35461,sete mondo,senso comune,comune sensi,senso comune
4,32956,47727,47728,comprare entrato,senso comune,l'idea buona prestazione,senso comune


attaccat? all? poltron?

In [57]:
kwic(x = manifesto_toks, pattern = phrase("attaccat? all? poltron?"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


assetat? di potere

In [58]:
kwic(x = manifesto_toks, pattern = phrase("assetat? di potere"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


comun? mortal?

In [59]:
kwic(x = manifesto_toks, pattern = phrase("comun? mortal?"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


bugie dei partiti

In [60]:
kwic(x = manifesto_toks, pattern = phrase("bugie dei partiti"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


falsita dei partiti

In [61]:
kwic(x = manifesto_toks, pattern = phrase("falsita dei partiti"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


senza valori

In [62]:
kwic(x = manifesto_toks, pattern = phrase("senza valori"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


pseudo-partit?

In [63]:
kwic(x = manifesto_toks, pattern = phrase("pseudo-partit?"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


sistema-partito

In [64]:
kwic(x = manifesto_toks, pattern = phrase("sistema-partito"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


non democratic*

In [65]:
kwic(x = manifesto_toks, pattern = phrase("non democratic*"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


prendere in giro

In [66]:
kwic(x = manifesto_toks, pattern = phrase("prendere in giro"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


dittatur* di partito

In [67]:
kwic(x = manifesto_toks, pattern = phrase("dittatur* di partito"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


teatr* politico

In [68]:
kwic(x = manifesto_toks, pattern = phrase("teatr* politico"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


cosiddett? giornalist?

In [69]:
kwic(x = manifesto_toks, pattern = phrase("cosiddett? giornalist?"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>


cosiddetti media


In [70]:
kwic(x = manifesto_toks, pattern = phrase("cosiddetti media"))

docname,from,to,pre,keyword,post,pattern
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<fct>
