Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
324 lines (286 sloc) 19.4 KB
---
title: 'Analyzing the European Election: The Candidates'
author: Corrie
date: '2019-05-21'
categories:
- R
tags:
- R
- European Election 2019
- Election
slug: european-election-data-analysis
comments: yes
image: images/tea_with_books.jpg
share: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = F, warning = F, message = F, comment=NA)
```
The European Election is coming up and for the first time, I have the impression this election is actually talked about and might have an impact. I don't remember people caring that much about European Elections in the years before, but this, of course, could also just be because I got more interested in European politics.
Unfortunately, European politics are complex and this is also mirrored in the quantity of parties that are up for vote in Germany. In Germany alone, there are 40 parties on the ballot paper and in total 1.380 candidates are trying to get one of the 96 seats allocated for Germany. No joke, my ballot paper is almost as long as I'm tall!
Reading through some of the names on the ballot paper, I thought it might be interesting to do a small exploratory data analysis and to compare some of the parties regarding their candidate composition.
Conveniently, the data is provided by the government in a [CSV-file](https://www.bundeswahlleiter.de/dam/jcr/0f8779e8-c05b-4247-be85-6317fbb5ae4d/ew19_kandidaten.zip).
Almost all parties have the same candidate list for all of Germany, the only party that has differing lists for each state is _CDU/CSU_. Depending on the analysis, I sometimes collapse _CDU_ and _CSU_ to a single party or only consider their list for Berlin.
```{r, echo=F}
library(tidyverse)
library(lubridate)
library(scales)
library(cowplot)
library(knitr)
theme_set(theme_minimal())
ggplot <- function(...) ggplot2::ggplot(...) +
scale_color_brewer(palette="Set1") +
scale_fill_brewer(palette="Set1", direction = -1)
```
```{r, echo=F}
df <- read_csv2("../../data/ew19_kandidaten_utf8.csv", skip = 7) %>%
rename(party=Gruppenname)
```
## How many candidates are in a party?
While the ballot paper only shows the top ten candidates for each party (or less if the party has less), many of them have more candidates. Interestingly, some parties have surprisingly many:
```{r, echo=T}
df %>%
group_by(party) %>%
summarise(n=n(),
isReplacement = sum(Kennzeichen == "Ersatzbewerber")) %>%
filter(n >= 96) %>% arrange(-n) %>% kable()
```
The _FDP_ (democratic liberal party) and the _CDU_ (Christian democratic union) nominate both almost twice as many candidates as there are seats available! The SPD (social democratic party) only has more candidates than seats since it has many replacement candidates. Die PARTEI is a satire party, so I'm not really surprised they show up with too many candidates. _ÖDP_ (ecological democratic party), a small party that isn't in the Bundestag, also places quite many candidates in the election. Looking them up, I found out the _ÖDP_ does has one seat in the current parliament. I don't know if there are any other benefits on being on the list if they can't be elected. Maybe for taxing or employment reason? Or there is some event in which even the list places above 96 could still become elected?
## A gender-balanced parliament?
Right now, about a third of the members of the European Parliament (MEPs) are women. We can check the data on how good the chances are that the new parliament might sport more women (at least on the German seats). Since we've seen before that some parties have many more candidates than seats available, it might be misleading to look at all candidates. Instead, it gives a better picture to also look at only the "Spitzenkandidaten", that is the top candidates:
```{r, echo=F}
all <- df %>%
ggplot(aes(x=Geschlecht, fill=Geschlecht)) +
geom_bar(aes(y=..prop.., group=1,
fill=factor(..x..)), stat="count", show.legend = F) +
scale_y_continuous(labels = percent, name="", breaks = c(0, 0.34, 0.5, 0.66), limits=c(0, 0.7)) +
scale_x_discrete(labels=c("Male", "Female"), name="") +
labs(subtitle="All Candidates", title=" \n")
```
```{r}
top <- df %>%
filter( Listenplatz <= 10 & Kennzeichen == "Bewerber") %>%
ggplot(aes(x=Geschlecht, fill=Geschlecht)) +
geom_bar(aes(y=..prop.., group=1,
fill=factor(..x..)), stat="count", show.legend = F) +
scale_y_continuous(labels = percent, name="", breaks = c(0, 0.38, 0.5, 0.62),
limits=c(0, 0.7)) +
scale_x_discrete(labels=c("Male", "Female"), name="") +
labs(subtitle="Top Candidates", title="How gender-balanced is the candidate pool?\n")
plot_grid(top, all)
```
The top candidates have a slightly more equal gender-ratio but nevertheless, the gender ratio in the candidates seems the same as the ratio in the old parliament. However, there are 40 parties and all of them have 5 to 10 top candidates but the top candidates of the big party surely have better chances of entering the parliament. It is thus reasonable to look at the gender ratio for the bigger parties in particular. Since the bigger parties have good chances to get more than the top ten candidates in the parliament, we will look at the whole candidate pool for these parties. Also, I've pooled the parties _CDU_ and _CSU_ together for this plot:
```{r, message=F, warning=F, echo=F}
main_parties <- c("CDU/CSU", "SPD", "GRÜNE", "AfD", "DIE LINKE", "FDP")
df %>%
mutate(party = fct_collapse(party,
`CDU/CSU` = c("CDU", "CSU"))) %>%
filter(party %in% main_parties &
Kennzeichen == "Bewerber") %>%
group_by(party) %>%
summarise(fem = sum( Geschlecht == "w"), n=n(), fem_prop = fem/n) %>%
mutate(party = fct_reorder(party, fem_prop),
bias = ifelse(fem_prop < 0.5, "male",
ifelse(fem_prop > 0.5, "female", "balanced"))) %>%
ggplot(aes(y=fem_prop, x=party, col=bias)) +
geom_point() +
geom_hline(yintercept = 0.5, col="grey", alpha=.9) +
geom_segment(aes(x=party,
xend=party,
y=0,
yend=1),
color="grey",
alpha=0.6,
size=0.2) +
geom_segment(aes(x=party,
xend=party,
y=pmin(fem_prop, 0.5),
yend= pmax(fem_prop, 0.5)), ) +
scale_color_manual(values = c("balanced"="grey", "male"="#377EB8", "female"="#E41A1C"), guide=F) +
scale_y_continuous(labels=percent, limits=c(0, 1), name="Proportion of female top candidates") +
labs(x="", title="How gender-balanced are the top parties?") +
coord_flip() + theme_minimal() + theme(panel.grid = element_blank())
```
_Die Linke_, the left party, has exactly 50% female and male candidates, whereas the Green (_GRÜNE_) party and the _SPD_ have a very slight female bias. All three parties are classified by Wikipedia as centre-left to left. The other three bigger parties, all classified as conservative by Wikipedia, have a male bias that is especially strong for the _FDP_ and _AfD_. The _AfD_, by many considered as far-right, has been described as advocating "old gender roles", which might contribute to their poor gender ratio. Personally, I didn't expect _FDP_ to do so poorly as well. I expected them to be more on the same line as _CDU/CSU_, around 40% of females.
Having seen the gender ratio of the six main parties, it would be interesting to check which parties show the most extreme gender ratio. I classified gender ratios as extreme if they were above 75% or below 25%.
```{r, echo=F}
df %>%
mutate(party = fct_collapse(party,
`CDU/CSU` = c("CDU", "CSU"))) %>%
filter( Kennzeichen == "Bewerber" ) %>%
group_by(party) %>%
summarise(fem = sum( Geschlecht == "w"), n=n(), fem_prop = fem/n) %>%
filter( fem_prop <= 0.25 | fem_prop >= 0.75) %>%
mutate(party = fct_reorder(party, fem_prop),
bias = ifelse(fem_prop < 0.5, "male",
ifelse(fem_prop > 0.5, "female", "balanced"))) %>%
ggplot(aes(y=fem_prop, x=party, col=bias)) +
geom_point() +
geom_hline(yintercept = 0.5, col="grey", alpha=.9) +
geom_segment(aes(x=party,
xend=party,
y=0,
yend=1),
color="grey",
alpha=0.6,
size=0.2) +
geom_segment(aes(x=party,
xend=party,
y=pmin(fem_prop, 0.5),
yend= pmax(fem_prop, 0.5)), ) +
scale_color_manual(values = c("balanced"="grey", "male"="#377EB8", "female"="#E41A1C"), guide=F) +
scale_y_continuous(labels=percent, limits=c(0, 1), name="Proportion of female candidates") +
labs(x="", title="Which parties have the most extreme gender ratios?") +
coord_flip() + theme_minimal() + theme(panel.grid = element_blank())
```
The feminist party _DIE FRAUEN_ has only female candidates but this isn't really surprising given their name. The only other party with a rather extreme female bias is the animal protection party _TIERSCHUTZ hier!_. The list of parties with an extreme male bias is comparable long: 15 parties, so more than a third of all parties on the ballot, have less than 25% female candidates. There are the parties _III. Weg_, _AfD_ and _DIE RECHTE_ (all described as right-leaning or even far-right by Wikipedia). Interestingly, the party _NPD_, by Wikipedia considered to be far-right and ultranationalist, is not listed here and thus has a better gender ratio than for example the _FDP_. The list further includes some other small conservative parties such as _ÖDP_, _BP_ (Bavaria party) and _Bündnis C_ (Christian party). I found especially curious the family party _FAMILIE_: no female candidates, draw your own conclusion what image they have of a family. The parties _SGP_ (socialist equality party) and _BIG_ (party for innovation and equity/fairness) have equality or equity in their name but it doesn't seem to apply to gender equality.
The internet party _PIRATEN_ and _DIE HUMANISTEN_ (the humanist party) are both described by Wikipedia as progressive but still seem to be rather male based so this is not just an issue of conservative or right parties.
## What professional background do the candidates have?
The data provides the title (doctor or professor title) of each candidate so we can easily check which party has the highest percentage of candidates with a doctorate:
```{r, fig.width=7}
df %>%
filter(Kennzeichen == "Bewerber" &
Listenplatz <= 10) %>%
group_by(party) %>%
summarise(titel = sum(!is.na(Titel)), n=n(), prop = titel/n,p=percent(prop)) %>%
arrange(-prop) %>%
select(party=party, percentage=p, prop) %>%
head(10) %>% mutate(party = fct_reorder(party, prop)) %>%
ggplot(aes(x=party, y=prop)) +
geom_bar(stat = "identity", fill="#377EB8") +
scale_y_continuous(label=percent) +
labs(y="", x="", title="Parties with the highest proportion of \nTop Candidates with a Doctorate") +
coord_flip()
```
The _AfD_ has with 50% the highest percentage of doctorates among the candidates, followed by the party _LKR_ (liberal conservative reformers), which is actually a split from the _AfD_. Overall, 8.5% of all candidates have a doctoral degree, while in the general population only about 8% have even any university degree and only 1% have a doctoral degree.
While the data also provides the job title for each candidate, they're not standardized so it would be more difficult to analyze. Luckily, on top of the job title, the data also provides us with a job key which indicates the professional area in which the candidate is working. It took me a while to find a mapping from the job index number to a description but eventually found one provided by the stats department of the [Arbeitsamt](https://statistik.arbeitsagentur.de/Statischer-Content/Grundlagen/Klassifikation-der-Berufe/KldB2010/Systematik-Verzeichnisse/Generische-Publikationen/Systematisches-Verzeichnis-Berufsbenennung.xls).
```{r, echo=F}
job_dict <- c("Administration & \nBusiness Management"="71",
"Finance"="72",
"Law"="73",
"Students"="04",
"Education"="84",
"Humanities, Social Sciences & \nEconomics"="91",
"Health Care"="81",
"Pensioner"="05",
"Marketing & Media"="92",
"IT Jobs"="43",
"Sales"="61")
```
```{r}
df %>%
filter(Listenplatz <= 10 &
Kennzeichen == "Bewerber") %>%
mutate(job_key = fct_lump(Berufsschluessel, n=10) %>%
fct_infreq() %>%
fct_recode(!!!job_dict)) %>%
filter(job_key != "Other") %>%
group_by(job_key) %>%
summarize(n=n()) %>%
mutate(prop = n/sum(n), job_key=fct_reorder(job_key, n)) %>%
ggplot(aes(x=job_key, y=prop)) +
geom_bar(stat="identity",show.legend = F, fill="#377EB8") +
scale_y_continuous(labels = percent, name="") +
labs(x="", title="Of which profession are the top candidates?") +
coord_flip()
```
About one-third of candidates work in Business Management and Administration which makes the single largest area of employment. One thing to note here is that the job key doesn't provide information if someone works as a CEO or low-level employee, it really only gives the area. An interesting fact: Students is the sixth largest group, closely followed by pensioners, both of which are groups I personally didn't expect to be that well represented in the candidate pool.
## Generation change
There are a few cliches out there regarding politics and age: Young people don't care about politics or that politicians are always old people. While this data set is not the right one to answer how much truth there is to these stereotypes, we can check how the age distribution is for the candidates:
```{r}
df <- df %>%
mutate(age = year(today())- Geburtjahr)
df %>%
ggplot(aes(x=age)) +
geom_histogram(aes(y=..density..), binwidth = 1, alpha=0.4, fill="#377EB8") +
stat_density(geom="line", col="#377EB8", size=1) +
scale_x_continuous(breaks = seq(20, 80, by=10), name="") +
scale_y_continuous(breaks=NULL, name = "") +
labs(title="How old are the candidates?")
```
Interestingly, the age distribution is bi-modal, that is, we have a peak at 30 years old and one at 55 years old. Of course, not all candidates are elected into the parliament and most likely rookie candidates are more often delegated to later places in the list. So we shouldn't expect the elected parliament to have a similar age distribution. But this figure is at least an indicator that young people (if you consider people around 25 still young) are interested and especially also active in politics.
```{r, dpi=250, out.height='7cm'}
df %>%
mutate(party = fct_reorder(party, age),
med_age = median(age),
perc25 = quantile(age, prob=0.33),
perc75 = quantile(age, prob=0.66)) %>%
group_by(party) %>%
summarise(med = median(age), med_age = median(med_age), perc25=median(perc25), perc75=median(perc75) ) %>%
filter(med <= perc25 | med >= perc75) %>%
ggplot(aes(x=med, y=party)) +
geom_vline(aes(xintercept = med_age), col="grey") +
geom_point() +
scale_x_continuous(breaks = c(30, 40, 46, 50, 60)) +
coord_fixed(ratio=1.5) +
labs(x="Median Age", y="", title="Which are the youngest and oldest parties?")
```
The party _Volksabstimmung_ (Democracy by referendum, Wikipedia says they used to be far-right) has a median age of 65 years, meaning most of its candidates are older than this. It is even older than the party _Graue Panther_, a party for pensioners. On the other side is the party _DIE DIREKTE_ that similarly to the party _Volksabstimmung_ supports direct democracy (Wikipedia knows of no far-right history though) with a median age of 25. The latter party originated from a student initiative and almost all its candidates are students. Surprisingly, the conservative _CSU_ party has a median age that places it in the younger parties, only a bit older than the new pan-European party _Volt_.
## Where do they come from?
The data also provides us with the birthplace for each candidate which gives us the opportunity to check how many candidates were born in a country other than Germany. Unfortunately, the birthplace column only names the city but not the country. To obtain the country, I used the Google Geocoding API. Only three cities could not be identified by Google's API which was easy to code by hand.
```{r, eval=F}
library(ggmap)
get_country <- function(result) {
for (comp in result$address_components) {
if (comp$types[[1]] == "country")
return (comp$long_name)
}
}
locs <- unique(df$Geburtsort)
res <- purrr::map_chr(locs, possibly(
function(loc) {
geocode(loc, output = "all", force=F,
inject=c("region"="de"))$results[[1]] %>%
get_country() }, otherwise = NA_character_ ) )
countries <- res %>% transpose() %>% simplify_all() %>% first %>% as.character()
country_mapping <- tibble(
city = locs,
country = na_if(countries,"NULL")
) %>%
mutate(country = case_when(city == "Spizewka" ~ "Russia",
city == "Alpen" ~ "Germany",
city == "Babone, Kamerun" ~ "Cameroon",
TRUE ~ country))
```
```{r, eval=F, echo=F}
write_csv(country_mapping, "../../data/country_mapping.csv")
```
```{r, echo=F}
country_mapping <- read_csv("../../data/country_mapping.csv")
```
Not very surprisingly, most candidates were born in Germany:
```{r, echo=T}
df <- df %>%
left_join(country_mapping, by=c("Geburtsort"="city"))
df%>%
mutate(country=fct_infreq(country) %>%
fct_lump(n=5)) %>%
count(country) %>%
kable
```
Next in line are the countries with the biggest immigrant population in Germany: Turkey, Poland and Russia. For a better comparison, I excluded Germany as country of origin from the following plot:
```{r}
df %>%
filter(country != "Germany") %>%
mutate(country = fct_lump(country, n=10) %>%
fct_infreq %>% fct_rev) %>%
ggplot(aes(x=country)) +
geom_bar(fill="#377EB8") + coord_flip() +
labs(x="", title="In which countries were \nthe candidates born?",
subtitle = "(Excluding Germany)\n")
```
Now, are some parties more diverse (in terms of birthplaces of their candidates) than others or are they relatively equally distributed over the parties?
Let's compare the parties by the number of distinct birth countries. To make the comparison equal, I restrict to only the top candidates and also restrict the _CDU_ to their top candidates in Berlin. This ensures that each party is represented by at most 10 candidates in this comparison:
```{r}
df %>%
filter(Listenplatz <= 10 &
Kennzeichen == "Bewerber" &
Gebietsname %in% c("Bundesgebiet", "Berlin")) %>%
group_by(party) %>%
summarise(n_countries=n_distinct(country), percentage=percent(n_countries/n())) %>%
arrange(-n_countries) %>% select(-n_countries) %>% head() %>% kable
```
_BIG_ is one of the first parties in Germany founded by Muslims and has a particular focus on migrant politics. It thus comes as no surprise that they have such a high percentage of candidates born outside Germany.
## Summary
While this data does not allow to directly analyse political ideology and party positions, some positions shine through in the data about the candidates. Especially for some of the smaller parties whose name I didn't know before, I now have a better idea of what kind of positions or ideology they might represent. If you're eligible for voting, I recommend you to check the positions for the parties in more detail. And of course, don't forget to vote.
The full code for this analysis (plus data) can be found [here](https://github.com/corriebar/blogdown_source/blob/master/content/post/2019-05-21-european-election.Rmd).
You can’t perform that action at this time.