# Candidate Test 2022 Analysis Part 1

This exercise focuses on the candidate tests from two television networks: DR and TV2. Data from both tests have been given on a scale of five responses (-2, -1, 0, 1, 2).

---

There are 6 datasets included in this exercise:

- `alldata.xlsx`: Contains responses from both TV stations.
- `drdata.xlsx`: Contains responses from DR.
- `drq.xlsx`: Contains questions from DR.
- `tv2data.xlsx`: Contains responses from TV2.
- `tv2q.xlsx`: Contains questions from TV2.
- `electeddata.xlsx`: Contains responses from both TV stations for candidates who were elected to the parliament. Note that 9 members are missing; 7 of them didn't take any of the tests. Additionally, some notable figures like Mette F. and Lars Løkke did not participate in any of the tests.

---

It's entirely up to you how you approach this data, but at a *minimum*, your analysis should include:
- Age of the candidates grouped by parties.
- An overview of the most "confident" candidates, i.e., those with the highest proportion of "strongly agree" or "strongly disagree" responses.
- Differences in responses between candidates, both inter-party and intra-party, along with an explanation of which parties have the most internal disagreements.
- Classification models to predict candidates' party affiliations. Investigate if there are any candidates who seem to be in the "wrong" party based on their political landscape positions. You must use the following three algorithms: **Decision Tree, Random Forrest, and Gradient Boosted Tree**, and **two other** classification algorithms of your choice.

---

The following parties are represented:

| Party letter | Party name | Party name (English) | Political position |
| :-: | :-: | :-: | :-: |
| A | Socialdemokratiet | Social Democrats | Centre-left |
| V | Venstre | Danish Liberal Party | Centre-right |
| M | Moderaterne | Moderates | Centre-right |
| F | Socialistisk Folkeparti | Socialist People's Party | Left-wing |
| D | Danmarksdemokraterne | Denmark Democrats | Right-wing |
| I | Liberal Alliance | Liberal Alliance | Right-wing |
| C | Konservative | Conservative People's Party | Right-wing |
| Æ | Enhedslisten | Red-Green Alliance | Far-left |
| B | Radikale Venstre | Social Liberal Party | Centre-left |
| D | Nye Borgerlige | New Right | Far-right |
| Z | Alternativet | The Alternative | Centre-left |
| O | Dansk Folkeparti | Danish People's Party | Far-right |
| G | Frie Grønne | Free Greens | Centre-left |
| K | Kristendemokraterne | Christian Democrats | Centre-right |

Below you can see the results and the colors chosen to represent the parties. Use these colors in your analysis above.

![Alt text](image-1.png)


Others have undertaken similar analyses. You can draw inspiration from the following (use Google tranlsate if your Danish is rusty):

- [Analysis of where individual candidates stand relative to each other and their parties](https://v2022.dumdata.dk/)
- [Candidate Test 2022 – A deep dive into the data](https://kwedel.github.io/kandidattest2022/)
- [The Political Landscape 2019](https://kwedel.github.io/kandidattest2019/)



In [8]:
import pandas as pd

data = pd.read_excel('alldata.xlsx')

print(data.head())

   530  531  533  534  535  537  538  540  541  543  ...  9a  9b  10a  10b  \
0   -1   -2    1   -2    2    1   -2    1    1    2  ...   2   0    1   -2   
1    2    2   -1   -2   -1   -2    1   -2    2   -2  ...  -2   0   -1    2   
2    2    1   -2   -2    1   -2    1   -1    1   -1  ...  -1  -2    0    2   
3    2    1   -2   -1    1    1    1    1    1   -2  ...  -2   2    2    2   
4    1    1   -2    2   -2    1   -2    1    2   -2  ...  -2   0   -1    0   

   11a  11b  12a  12b               storkreds  alder  
0   -2    1    1    1    Københavns Storkreds     78  
1    1   -2    0    0          Fyns Storkreds     64  
2    0   -1    1   -2     Bornholms Storkreds     37  
3    2   -1    2    0  Nordjyllands Storkreds     28  
4   -2    0    2   -2    Københavns Storkreds     58  

[5 rows x 53 columns]


In [10]:
average_age = data.groupby('parti')['alder'].mean().sort_values(ascending=False)
print(average_age)

parti
Danmarksdemokraterne                           51.216216
Kristendemokraterne                            49.023810
Løsgænger                                      48.000000
Nye Borgerlige                                 47.568627
Venstre                                        46.678571
Alternativet                                   46.000000
Dansk Folkeparti                               45.530612
Radikale Venstre                               44.955224
Socialdemokratiet                              44.344828
Moderaterne                                    44.261905
Det Konservative Folkeparti                    42.897959
Socialistisk Folkeparti                        40.928571
Enhedslisten                                   40.513514
Liberal Alliance                               36.885714
Frie Grønne, Danmarks Nye Venstrefløjsparti    34.807692
Name: alder, dtype: float64


In [27]:
data_questions = data.drop(['alder', 'storkreds', 'parti'], axis=1)

answered_with_2_or_minus_2 = data_questions.groupby('navn').apply(lambda x: (x == 2) | (x == -2)).sum(axis=1)

print(answered_with_2_or_minus_2.reset_index(level=1, drop=True).sort_values(ascending=False).head(10))

navn
Søren Vanting          49
Sarah Nørris           49
Kim Andkjær Doberck    44
Rashid Ali             43
Mohamed Abdikarim      43
Frank Sørensen         42
Elise Bjerkrheim       42
Lone Vase Langballe    42
John Bjerg             42
Jan Filbært            42
dtype: int64


We are making dictionaries for each subject, that we want to compare the candidates in.

The dictionary consists of a key, that is the id of the question. And a value that is a boolean that dictates if answering "2" to the question is progressive or conservative.

E.g.: "Den økonomiske ulighed i det danske samfund bør mindskes?"
- "2": "Ulighed skal mindskes" (progressive)
- "-2": "Ulighed skal øges/ikke ændres" (conservative)

In [None]:
economy_questions = {
  '4a': True,
  '4b': True,
  '541': True,
  '531': True,
  '537': False,
}

immigration_questions = {
  '6a': True,
  '6b': False,
  '555': False,
  '551': False,
}

social_questions = {
  '10a': False,
  '10b': True,
  '544': True,
  '538': True,
  '550': True,
  '553': False,
}

welfare_questions = {
  '9a': False,
  '557': True,
  '548': True,
  '543': False,
  '545': True,
}

climate_questions = {
  '559': True,
  '546': False,
  '530': True,
  '1a': True,
  '1b': True,
  '7a': False,
  '7b': False,
}