# Candidate Test 2022 Analysis Part 1

This exercise focuses on the candidate tests from two television networks: DR and TV2. Data from both tests have been given on a scale of five responses (-2, -1, 0, 1, 2).

---

There are 6 datasets included in this exercise:

- `alldata.xlsx`: Contains responses from both TV stations.
- `drdata.xlsx`: Contains responses from DR.
- `drq.xlsx`: Contains questions from DR.
- `tv2data.xlsx`: Contains responses from TV2.
- `tv2q.xlsx`: Contains questions from TV2.
- `electeddata.xlsx`: Contains responses from both TV stations for candidates who were elected to the parliament. Note that 9 members are missing; 7 of them didn't take any of the tests. Additionally, some notable figures like Mette F. and Lars Løkke did not participate in any of the tests.

---

It's entirely up to you how you approach this data, but at a *minimum*, your analysis should include:
- Age of the candidates grouped by parties.
- An overview of the most "confident" candidates, i.e., those with the highest proportion of "strongly agree" or "strongly disagree" responses.
- Differences in responses between candidates, both inter-party and intra-party, along with an explanation of which parties have the most internal disagreements.
- Classification models to predict candidates' party affiliations. Investigate if there are any candidates who seem to be in the "wrong" party based on their political landscape positions. You must use the following three algorithms: **Decision Tree, Random Forrest, and Gradient Boosted Tree**, and **two other** classification algorithms of your choice, i.e. a total of 5 models are to be trained.

---

The following parties are represented:

| Party letter | Party name | Party name (English) | Political position |
| :-: | :-: | :-: | :-: |
| A | Socialdemokratiet | Social Democrats | Centre-left |
| V | Venstre | Danish Liberal Party | Centre-right |
| M | Moderaterne | Moderates | Centre-right |
| F | Socialistisk Folkeparti | Socialist People's Party | Left-wing |
| D | Danmarksdemokraterne | Denmark Democrats | Right-wing |
| I | Liberal Alliance | Liberal Alliance | Right-wing |
| C | Konservative | Conservative People's Party | Right-wing |
| Æ | Enhedslisten | Red-Green Alliance | Far-left |
| B | Radikale Venstre | Social Liberal Party | Centre-left |
| D | Nye Borgerlige | New Right | Far-right |
| Z | Alternativet | The Alternative | Centre-left |
| O | Dansk Folkeparti | Danish People's Party | Far-right |
| G | Frie Grønne | Free Greens | Centre-left |
| K | Kristendemokraterne | Christian Democrats | Centre-right |

Below you can see the results and the colors chosen to represent the parties. Use these colors in your analysis above.

![Alt text](image-1.png)


Others have undertaken similar analyses. You can draw inspiration from the following (use Google tranlsate if your Danish is rusty):

- [Analysis of where individual candidates stand relative to each other and their parties](https://v2022.dumdata.dk/)
- [Candidate Test 2022 – A deep dive into the data](https://kwedel.github.io/kandidattest2022/)
- [The Political Landscape 2019](https://kwedel.github.io/kandidattest2019/)



In [12]:
# pip install pandas
# pip install scikit-learn
import pandas as pd
import sklearn as sk

all_data = pd.read_excel('alldata.xlsx')
# Access the data in the DataFrame
print(all_data)

     530  531  533  534  535  537  538  540  541  543  ...  9a  9b  10a  10b  \
0     -1   -2    1   -2    2    1   -2    1    1    2  ...   2   0    1   -2   
1      2    2   -1   -2   -1   -2    1   -2    2   -2  ...  -2   0   -1    2   
2      2    1   -2   -2    1   -2    1   -1    1   -1  ...  -1  -2    0    2   
3      2    1   -2   -1    1    1    1    1    1   -2  ...  -2   2    2    2   
4      1    1   -2    2   -2    1   -2    1    2   -2  ...  -2   0   -1    0   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ..  ..  ...  ...   
862    1    1    1   -2    2   -1    1   -2    1    1  ...   1   2   -1    2   
863    1   -2   -2   -2    1   -2   -1   -2   -2   -2  ...  -1   0   -1    2   
864    1    1    1   -2    2   -1    1   -1   -1    1  ...  -1   0   -1    2   
865    1   -1    1   -2    1   -1   -1   -1   -1    1  ...   0   0    0    2   
866    1    1   -1   -2    1   -1    1   -1    1   -1  ...  -2  -1   -1    2   

     11a  11b  12a  12b                

In [6]:
# Access the data in the DataFrame
print(pd.read_excel('tv2q.xlsx'))

            id  type               header  \
0   bornholm-1  rate  Bornholms Storkreds   
1   bornholm-2  rate  Bornholms Storkreds   
2   bornholm-3  rate  Bornholms Storkreds   
3   bornholm-4  rate  Bornholms Storkreds   
4   bornholm-5  rate  Bornholms Storkreds   
..         ...   ...                  ...   
69         10b  rate        Socialområdet   
70         11a  rate           Mink-sagen   
71         11b  rate           Mink-sagen   
72         12a  rate          Coronavirus   
73         12b  rate          Coronavirus   

                                             question  \
0   Med særligt attraktive vilkår bør staten sikre...   
1   Staten bør sørge for, at flytrafikken til og f...   
2   Der skal sættes flere penge af til Forsvarets ...   
3   Det skal være lettere at hente udenlandsk arbe...   
4   Beslutningen om at etablere en naturnationalpa...   
..                                                ...   
69  Den såkaldte Arne-pension, der giver mulighed ...   
70 

In [14]:
# Access the data in the DataFrame
drq = pd.read_excel('drq.xlsx')
print(drq)

     ID                     Title  \
0   530           KLIMA OG ENERGI   
1   531  ARBEJDSMARKED OG ØKONOMI   
2   533              RET OG STRAF   
3   534            EU OG UDENRIGS   
4   535            EU OG UDENRIGS   
5   537                   VELFÆRD   
6   538                UDDANNELSE   
7   540                   VELFÆRD   
8   541  ARBEJDSMARKED OG ØKONOMI   
9   543                   VELFÆRD   
10  544                 DEMOKRATI   
11  545  ARBEJDSMARKED OG ØKONOMI   
12  546           KLIMA OG ENERGI   
13  547                 DEMOKRATI   
14  548                   VELFÆRD   
15  550  ARBEJDSMARKED OG ØKONOMI   
16  551            EU OG UDENRIGS   
17  552                   SUNDHED   
18  553                UDDANNELSE   
19  555            EU OG UDENRIGS   
20  556  ARBEJDSMARKED OG ØKONOMI   
21  557                   VELFÆRD   
22  559           KLIMA OG ENERGI   
23  561           KLIMA OG ENERGI   
24  563                   SUNDHED   

                                     

In [9]:
# Access the data in the DataFrame
print(pd.read_excel('tv2q.xlsx'))

            id  type               header  \
0   bornholm-1  rate  Bornholms Storkreds   
1   bornholm-2  rate  Bornholms Storkreds   
2   bornholm-3  rate  Bornholms Storkreds   
3   bornholm-4  rate  Bornholms Storkreds   
4   bornholm-5  rate  Bornholms Storkreds   
..         ...   ...                  ...   
69         10b  rate        Socialområdet   
70         11a  rate           Mink-sagen   
71         11b  rate           Mink-sagen   
72         12a  rate          Coronavirus   
73         12b  rate          Coronavirus   

                                             question  \
0   Med særligt attraktive vilkår bør staten sikre...   
1   Staten bør sørge for, at flytrafikken til og f...   
2   Der skal sættes flere penge af til Forsvarets ...   
3   Det skal være lettere at hente udenlandsk arbe...   
4   Beslutningen om at etablere en naturnationalpa...   
..                                                ...   
69  Den såkaldte Arne-pension, der giver mulighed ...   
70 

In [11]:
# Access the data in the DataFrame
print(pd.read_excel('electeddata.xlsx'))

     530  531  533  534  535  537  538  540  541  543  ...  9a  9b  10a  10b  \
0      2    2   -1   -2   -1   -2    1   -2    2   -2  ...  -2   0   -1    2   
1      2    2   -2   -2   -2   -2    2    2    2   -2  ...  -2   0   -2    2   
2      2    2   -2   -1   -1   -1    1   -2    2   -2  ...  -2   0   -1    2   
3     -1   -1    2   -1    2   -1   -2    1   -2    1  ...  -1  -1    0    2   
4     -1   -1    1   -2    2    1   -2    1   -2    1  ...  -1   0    1    1   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ..  ..  ...  ...   
164    1    1    2   -2    2   -1    1   -1   -1    1  ...   0  -1    1    2   
165    1    1    1   -2    2   -1    1   -1   -1    1  ...  -1   0   -1    2   
166    1   -2   -2   -2    1   -2   -1   -2   -2   -2  ...  -1   0   -1    2   
167    1    1    1   -2    2   -1    1   -1   -1    1  ...  -1   0   -1    2   
168    1   -1    1   -2    1   -1   -1   -1   -1    1  ...   0   0    0    2   

     11a  11b  12a  12b                

In [21]:
grouped_average = all_data.groupby('parti')['alder'].mean()

print(grouped_average)

parti
Alternativet                                   46.000000
Danmarksdemokraterne                           51.216216
Dansk Folkeparti                               45.530612
Det Konservative Folkeparti                    42.897959
Enhedslisten                                   40.513514
Frie Grønne, Danmarks Nye Venstrefløjsparti    34.807692
Kristendemokraterne                            49.023810
Liberal Alliance                               36.885714
Løsgænger                                      48.000000
Moderaterne                                    44.261905
Nye Borgerlige                                 47.568627
Radikale Venstre                               44.955224
Socialdemokratiet                              44.344828
Socialistisk Folkeparti                        40.928571
Venstre                                        46.678571
Name: alder, dtype: float64
