# Candidate Test 2022 Analysis Part 1

This exercise focuses on the candidate tests from two television networks: DR and TV2. Data from both tests have been given on a scale of five responses (-2, -1, 0, 1, 2).

---

There are 6 datasets included in this exercise:

- `alldata.xlsx`: Contains responses from both TV stations.
- `drdata.xlsx`: Contains responses from DR.
- `drq.xlsx`: Contains questions from DR.
- `tv2data.xlsx`: Contains responses from TV2.
- `tv2q.xlsx`: Contains questions from TV2.
- `electeddata.xlsx`: Contains responses from both TV stations for candidates who were elected to the parliament. Note that 9 members are missing; 7 of them didn't take any of the tests. Additionally, some notable figures like Mette F. and Lars Løkke did not participate in any of the tests.

---

It's entirely up to you how you approach this data, but at a *minimum*, your analysis should include:
- Age of the candidates grouped by parties.
- An overview of the most "confident" candidates, i.e., those with the highest proportion of "strongly agree" or "strongly disagree" responses.
- Differences in responses between candidates, both inter-party and intra-party, along with an explanation of which parties have the most internal disagreements.
- Classification models to predict candidates' party affiliations. Investigate if there are any candidates who seem to be in the "wrong" party based on their political landscape positions. You must use the following three algorithms: **Decision Tree, Random Forrest, and Gradient Boosted Tree**, and **two other** classification algorithms of your choice, i.e. a total of 5 models are to be trained.

---

The following parties are represented:

| Party letter | Party name | Party name (English) | Political position |
| :-: | :-: | :-: | :-: |
| A | Socialdemokratiet | Social Democrats | Centre-left |
| V | Venstre | Danish Liberal Party | Centre-right |
| M | Moderaterne | Moderates | Centre-right |
| F | Socialistisk Folkeparti | Socialist People's Party | Left-wing |
| D | Danmarksdemokraterne | Denmark Democrats | Right-wing |
| I | Liberal Alliance | Liberal Alliance | Right-wing |
| C | Konservative | Conservative People's Party | Right-wing |
| Æ | Enhedslisten | Red-Green Alliance | Far-left |
| B | Radikale Venstre | Social Liberal Party | Centre-left |
| D | Nye Borgerlige | New Right | Far-right |
| Z | Alternativet | The Alternative | Centre-left |
| O | Dansk Folkeparti | Danish People's Party | Far-right |
| G | Frie Grønne | Free Greens | Centre-left |
| K | Kristendemokraterne | Christian Democrats | Centre-right |

Below you can see the results and the colors chosen to represent the parties. Use these colors in your analysis above.

![Alt text](image-1.png)


Others have undertaken similar analyses. You can draw inspiration from the following (use Google tranlsate if your Danish is rusty):

- [Analysis of where individual candidates stand relative to each other and their parties](https://v2022.dumdata.dk/)
- [Candidate Test 2022 – A deep dive into the data](https://kwedel.github.io/kandidattest2022/)
- [The Political Landscape 2019](https://kwedel.github.io/kandidattest2019/)



In [39]:
import pandas as pd

df = pd.read_excel("alldata.xlsx")
display(df.shape)
display(df.describe())
display(df.head())

(867, 53)

Unnamed: 0,530,531,533,534,535,537,538,540,541,543,...,8b,9a,9b,10a,10b,11a,11b,12a,12b,alder
count,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0,...,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0
mean,0.378316,-0.303345,-0.489043,-1.131488,0.94925,-0.126874,-0.974625,0.410611,-0.294118,0.136101,...,0.547866,-0.340254,-0.035755,0.490196,0.366782,-0.72549,0.589389,0.657439,-0.422145,43.876586
std,1.423131,1.615893,1.529029,1.386595,1.353196,1.543086,1.332418,1.521145,1.570514,1.526494,...,0.955574,1.410381,1.0137,1.235851,1.615944,1.474442,1.603195,1.028047,1.222147,14.386282
min,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,...,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,0.0
25%,-1.0,-2.0,-2.0,-2.0,1.0,-2.0,-2.0,-1.0,-2.0,-1.0,...,0.0,-2.0,-1.0,-1.0,-1.0,-2.0,-1.0,0.0,-2.0,34.0
50%,1.0,-1.0,-1.0,-2.0,1.0,-1.0,-2.0,1.0,-1.0,1.0,...,1.0,-1.0,0.0,1.0,1.0,-1.0,1.0,1.0,0.0,45.0
75%,2.0,1.0,1.0,-1.0,2.0,1.0,1.0,2.0,1.0,1.0,...,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,0.0,54.0
max,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,...,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,79.0


Unnamed: 0,530,531,533,534,535,537,538,540,541,543,...,9a,9b,10a,10b,11a,11b,12a,12b,storkreds,alder
0,-1,-2,1,-2,2,1,-2,1,1,2,...,2,0,1,-2,-2,1,1,1,Københavns Storkreds,78
1,2,2,-1,-2,-1,-2,1,-2,2,-2,...,-2,0,-1,2,1,-2,0,0,Fyns Storkreds,64
2,2,1,-2,-2,1,-2,1,-1,1,-1,...,-1,-2,0,2,0,-1,1,-2,Bornholms Storkreds,37
3,2,1,-2,-1,1,1,1,1,1,-2,...,-2,2,2,2,2,-1,2,0,Nordjyllands Storkreds,28
4,1,1,-2,2,-2,1,-2,1,2,-2,...,-2,0,-1,0,-2,0,2,-2,Københavns Storkreds,58


In [40]:
import pandas as pd

df = pd.read_excel("electeddata.xlsx")
display(df.shape)
display(df.describe())
display(df.head())

(169, 53)

Unnamed: 0,530,531,533,534,535,537,538,540,541,543,...,8b,9a,9b,10a,10b,11a,11b,12a,12b,alder
count,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,...,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0
mean,0.088757,-0.218935,0.065089,-1.491124,1.284024,-0.153846,-0.650888,0.272189,-0.710059,0.508876,...,0.491124,-0.254438,-0.159763,0.402367,0.816568,-0.35503,0.142012,0.609467,-0.230769,45.0
std,1.357669,1.513627,1.468486,1.04151,1.053246,1.40577,1.468726,1.478978,1.424264,1.376323,...,0.824609,1.309442,0.804328,1.166596,1.518367,1.619631,1.826706,0.958126,1.123345,12.504285
min,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,...,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,0.0
25%,-1.0,-2.0,-1.0,-2.0,1.0,-1.0,-2.0,-1.0,-2.0,-1.0,...,0.0,-1.0,-1.0,-1.0,0.0,-2.0,-2.0,0.0,-1.0,37.0
50%,-1.0,-1.0,1.0,-2.0,2.0,-1.0,-2.0,1.0,-1.0,1.0,...,1.0,0.0,0.0,1.0,1.0,-1.0,1.0,1.0,0.0,46.0
75%,1.0,1.0,1.0,-1.0,2.0,1.0,1.0,2.0,1.0,1.0,...,1.0,1.0,0.0,1.0,2.0,1.0,2.0,1.0,1.0,53.0
max,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,...,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,75.0


Unnamed: 0,530,531,533,534,535,537,538,540,541,543,...,9a,9b,10a,10b,11a,11b,12a,12b,storkreds,alder
0,2,2,-1,-2,-1,-2,1,-2,2,-2,...,-2,0,-1,2,1,-2,0,0,Fyns Storkreds,64
1,2,2,-2,-2,-2,-2,2,2,2,-2,...,-2,0,-2,2,1,1,0,-2,Østjyllands Storkreds,58
2,2,2,-2,-1,-1,-1,1,-2,2,-2,...,-2,0,-1,2,0,-2,-1,1,Østjyllands Storkreds,30
3,-1,-1,2,-1,2,-1,-2,1,-2,1,...,-1,-1,0,2,-2,2,0,0,Østjyllands Storkreds,53
4,-1,-1,1,-2,2,1,-2,1,-2,1,...,-1,0,1,1,-2,2,1,-1,Østjyllands Storkreds,46


In [41]:
import pandas as pd

df = pd.read_excel("drq.xlsx")
display(df.shape)
#All of there should be deleted really, they don't tell us anything
display(df.describe())
# Rule seems useless as well
display(df['Rule'].unique())
display(df.head())

(25, 10)

Unnamed: 0,ID,Info,WordMerger,ID_CandidateQuestionType,QuestionListData
count,25.0,0.0,25.0,25.0,0.0
mean,545.96,,0.0,1.0,
std,9.684524,,0.0,0.0,
min,530.0,,0.0,1.0,
25%,538.0,,0.0,1.0,
50%,546.0,,0.0,1.0,
75%,553.0,,0.0,1.0,
max,563.0,,0.0,1.0,


array(['FT'], dtype=object)

Unnamed: 0,ID,Title,Question,Info,ArgumentFor,ArgumentAgainst,WordMerger,ID_CandidateQuestionType,Rule,QuestionListData
0,530,KLIMA OG ENERGI,Danmark skal bruge flere penge på at styrke to...,,Den kollektive trafik bør være billig og tilgæ...,Det er dyrt for statskassen at opretholde drif...,0,1,FT,
1,531,ARBEJDSMARKED OG ØKONOMI,Der skal indføres en særlig skat på de allerhø...,,Historisk rammer kriser de fattigste hårdest. ...,Det danske samfund er et af de mest lige samfu...,0,1,FT,
2,533,RET OG STRAF,Kriminalitet begået i udsatte boligområder ska...,,Den mest effektive måde at stoppe bandekrimina...,"Det er urimeligt, at den samme forbrydelse ska...",0,1,FT,
3,534,EU OG UDENRIGS,På sigt skal Danmark meldes ud af EU,,"Så længe Danmark er medlem af EU, kan flertall...",Danmark er bedst tjent med at være en del af E...,0,1,FT,
4,535,EU OG UDENRIGS,"Det er fornuftigt, at Danmark i de kommende år...",,Ruslands angreb på Ukraine har ændret alting o...,Vi er medlemmer af Nato og har i forvejen et s...,0,1,FT,


In [42]:
import pandas as pd

df = pd.read_excel("drdata.xlsx")
display(df.shape)
display(df.describe())
display(df.head())

(904, 27)

Unnamed: 0,530,531,533,534,535,537,538,540,541,543,...,550,551,552,553,555,556,557,559,561,563
count,904.0,904.0,904.0,904.0,904.0,904.0,904.0,904.0,904.0,904.0,...,904.0,904.0,904.0,904.0,904.0,904.0,904.0,904.0,904.0,904.0
mean,0.363938,-0.30531,-0.50885,-1.119469,0.952434,-0.108407,-0.996681,0.432522,-0.29646,0.142699,...,-0.382743,-0.768805,0.238938,-0.00885,-0.202434,-0.337389,0.803097,0.496681,0.469027,0.570796
std,1.431053,1.62065,1.524234,1.394935,1.348045,1.548541,1.325694,1.514743,1.577299,1.529551,...,1.587455,1.484016,1.470413,1.388104,1.655875,1.651234,1.184503,1.513319,1.408381,1.435767
min,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,...,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0
25%,-1.0,-2.0,-2.0,-2.0,1.0,-2.0,-2.0,-1.0,-2.0,-1.0,...,-2.0,-2.0,-1.0,-1.0,-2.0,-2.0,0.0,-1.0,-1.0,-1.0
50%,1.0,-1.0,-1.0,-2.0,1.0,-1.0,-2.0,1.0,-1.0,1.0,...,-1.0,-1.0,1.0,1.0,-1.0,-1.0,1.0,1.0,1.0,1.0
75%,2.0,1.0,1.0,-1.0,2.0,1.0,1.0,2.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0
max,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,...,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0


Unnamed: 0,530,531,533,534,535,537,538,540,541,543,...,552,553,555,556,557,559,561,563,navn,parti
0,-1,-2,1,-2,2,1,-2,1,1,2,...,2,-2,2,2,1,2,2,-2,Lars Philipsen Prahm,Moderaterne
1,2,2,-1,-2,-1,-2,1,-2,2,-2,...,1,1,-1,-2,1,2,1,2,Karsten Hønge,Socialistisk Folkeparti
2,2,1,-2,-2,1,-2,1,-1,1,-1,...,2,-1,-2,-1,1,2,2,2,Martin Kelleher-Petersen,Alternativet
3,2,1,-2,-1,1,1,1,1,1,-2,...,2,1,2,-2,2,2,2,2,Nicklas Gjedsig Larsen,Alternativet
4,1,1,-2,2,-2,1,-2,1,2,-2,...,2,-2,-2,-1,1,-2,-1,-2,Tom Gillesberg,Løsgænger


In [43]:
import pandas as pd

df = pd.read_excel("tv2q.xlsx")
display(df.shape)
display(df.describe())
display(df.head())

(74, 5)

Unnamed: 0,id,type,header,question,depends
count,74,74,74,74,50
unique,74,1,22,72,10
top,bornholm-1,rate,Bornholms Storkreds,Der skal være strengere miljøkrav til industri...,{'selectedArea': 'bornholms storkreds'}
freq,1,74,5,2,5


Unnamed: 0,id,type,header,question,depends
0,bornholm-1,rate,Bornholms Storkreds,Med særligt attraktive vilkår bør staten sikre...,{'selectedArea': 'bornholms storkreds'}
1,bornholm-2,rate,Bornholms Storkreds,"Staten bør sørge for, at flytrafikken til og f...",{'selectedArea': 'bornholms storkreds'}
2,bornholm-3,rate,Bornholms Storkreds,Der skal sættes flere penge af til Forsvarets ...,{'selectedArea': 'bornholms storkreds'}
3,bornholm-4,rate,Bornholms Storkreds,Det skal være lettere at hente udenlandsk arbe...,{'selectedArea': 'bornholms storkreds'}
4,bornholm-5,rate,Bornholms Storkreds,Beslutningen om at etablere en naturnationalpa...,{'selectedArea': 'bornholms storkreds'}


In [44]:
import pandas as pd

df = pd.read_excel("tv2data.xlsx")
display(df.shape)
display(df.describe())
display(df.head())

(962, 28)

Unnamed: 0,1a,1b,2a,2b,3a,3b,4a,4b,5a,5b,...,8b,9a,9b,10a,10b,11a,11b,12a,12b,alder
count,962.0,962.0,962.0,962.0,962.0,962.0,962.0,962.0,962.0,962.0,...,962.0,962.0,962.0,962.0,962.0,962.0,962.0,962.0,962.0,962.0
mean,-0.377339,0.389813,0.844075,0.309771,0.673597,-0.064449,0.471933,0.56341,-1.177755,0.634096,...,0.560291,-0.320166,-0.028067,0.485447,0.367983,-0.738046,0.62474,0.674636,-0.407484,44.2921
std,1.482477,1.364337,1.168709,1.45186,1.420641,1.575701,1.391868,1.199476,0.966991,0.935596,...,0.954208,1.432995,1.0253,1.246972,1.618614,1.471897,1.594583,1.036865,1.246125,14.714449
min,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,...,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,-2.0,0.0
25%,-2.0,-1.0,0.0,-1.0,0.0,-2.0,-1.0,0.0,-2.0,0.0,...,0.0,-2.0,-1.0,-1.0,-1.0,-2.0,-1.0,0.0,-2.0,34.0
50%,-1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,-1.0,1.0,...,1.0,-1.0,0.0,1.0,1.0,-1.0,1.0,1.0,0.0,46.0
75%,1.0,2.0,2.0,2.0,2.0,2.0,2.0,1.0,-1.0,1.0,...,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0,55.0
max,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,...,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,79.0


Unnamed: 0,1a,1b,2a,2b,3a,3b,4a,4b,5a,5b,...,10a,10b,11a,11b,12a,12b,parti,navn,storkreds,alder
0,0,1,1,1,1,-1,0,1,0,0,...,0,2,-1,2,0,0,Venstre,Birthe Tindbæk Bredo,Bornholms Storkreds,63
1,-1,1,0,-1,2,-2,-1,-1,-2,1,...,1,-1,-1,2,1,-1,Venstre,Julie Pauch Nymark,Bornholms Storkreds,27
2,-2,0,0,1,1,-1,0,1,-1,1,...,1,1,-2,2,1,1,Venstre,Peter Juel-Jensen,Bornholms Storkreds,56
3,-2,-1,2,2,2,-2,0,-1,-2,1,...,1,-1,-2,2,2,-2,Dansk Folkeparti,Mette Sode Hansen,Bornholms Storkreds,42
4,-2,-1,1,1,2,-2,1,0,-1,1,...,2,1,-1,2,0,1,Dansk Folkeparti,René Danielsson,Bornholms Storkreds,35
