# Candidate Test 2022 Analysis Part 1

This exercise focuses on the candidate tests from two television networks: DR and TV2. Data from both tests have been given on a scale of five responses (-2, -1, 0, 1, 2).

---

There are 6 datasets included in this exercise:

- `alldata.xlsx`: Contains responses from both TV stations.
- `drdata.xlsx`: Contains responses from DR.
- `drq.xlsx`: Contains questions from DR.
- `tv2data.xlsx`: Contains responses from TV2.
- `tv2q.xlsx`: Contains questions from TV2.
- `electeddata.xlsx`: Contains responses from both TV stations for candidates who were elected to the parliament. Note that 9 members are missing; 7 of them didn't take any of the tests. Additionally, some notable figures like Mette F. and Lars Løkke did not participate in any of the tests.

---

It's entirely up to you how you approach this data, but at a *minimum*, your analysis should include:
- Age of the candidates grouped by parties.
- An overview of the most "confident" candidates, i.e., those with the highest proportion of "strongly agree" or "strongly disagree" responses.
- Differences in responses between candidates, both inter-party and intra-party, along with an explanation of which parties have the most internal disagreements.
- Classification models to predict candidates' party affiliations. Investigate if there are any candidates who seem to be in the "wrong" party based on their political landscape positions. You must use the following three algorithms: **Decision Tree, Random Forrest, and Gradient Boosted Tree**, and **two other** classification algorithms of your choice.

---

The following parties are represented:

| Party letter | Party name | Party name (English) | Political position |
| :-: | :-: | :-: | :-: |
| A | Socialdemokratiet | Social Democrats | Centre-left |
| V | Venstre | Danish Liberal Party | Centre-right |
| M | Moderaterne | Moderates | Centre-right |
| F | Socialistisk Folkeparti | Socialist People's Party | Left-wing |
| D | Danmarksdemokraterne | Denmark Democrats | Right-wing |
| I | Liberal Alliance | Liberal Alliance | Right-wing |
| C | Konservative | Conservative People's Party | Right-wing |
| Æ | Enhedslisten | Red-Green Alliance | Far-left |
| B | Radikale Venstre | Social Liberal Party | Centre-left |
| D | Nye Borgerlige | New Right | Far-right |
| Z | Alternativet | The Alternative | Centre-left |
| O | Dansk Folkeparti | Danish People's Party | Far-right |
| G | Frie Grønne | Free Greens | Centre-left |
| K | Kristendemokraterne | Christian Democrats | Centre-right |

Below you can see the results and the colors chosen to represent the parties. Use these colors in your analysis above.

![Alt text](image-1.png)


Others have undertaken similar analyses. You can draw inspiration from the following (use Google tranlsate if your Danish is rusty):

- [Analysis of where individual candidates stand relative to each other and their parties](https://v2022.dumdata.dk/)
- [Candidate Test 2022 – A deep dive into the data](https://kwedel.github.io/kandidattest2022/)
- [The Political Landscape 2019](https://kwedel.github.io/kandidattest2019/)



**Age of the candidates grouped by parties.**

In [1]:
import pandas as pd
import xlrd

tv2data = pd.read_excel('tv2data.xlsx')
tv2data = tv2data[['parti', 'alder']]

# Drop individual candidates, who are not a part of a party
tv2data = tv2data[tv2data['parti'] != 'Løsgænger']

tv2data = tv2data.groupby('parti').mean()
tv2data['alder'] = tv2data['alder'].round(1)
tv2data = tv2data.sort_values(by='alder', ascending=False)

display(tv2data)

Unnamed: 0_level_0,alder
parti,Unnamed: 1_level_1
Danmarksdemokraterne,51.2
Kristendemokraterne,50.8
Alternativet,47.3
Venstre,47.3
Nye Borgerlige,46.9
Radikale Venstre,45.5
Dansk Folkeparti,45.3
Moderaterne,44.8
Socialdemokratiet,44.6
Det Konservative Folkeparti,42.8


**An overview of the most "confident" candidates, i.e., those with the highest proportion of "strongly agree" or "strongly disagree" responses.**

In [2]:
allCandidates = pd.read_excel('alldata.xlsx')
allCandidates = allCandidates.drop(columns=['alder', 'storkreds'])
confidentCandidates = allCandidates

# Define a function to count occurrences of 2 or -2 in each row
def count_2_or_minus_2(row):
    # Count how many times 2 or -2 appears in the row and return the count
    return (row == 2).sum() + (row == -2).sum()

# Apply the function to each row to create the new column, axis=1 means apply the function to each row
confidentCandidates['confident_answers'] = confidentCandidates.apply(count_2_or_minus_2, axis=1)

confidentCandidates = confidentCandidates[['navn', 'parti', 'confident_answers']]
confidentCandidates = confidentCandidates.sort_values(by='confident_answers', ascending=False)
confidentCandidates = confidentCandidates.head(10).reset_index(drop=True)

display(confidentCandidates)

Unnamed: 0,navn,parti,confident_answers
0,Sarah Nørris,Enhedslisten,49
1,Søren Vanting,Det Konservative Folkeparti,49
2,Kim Andkjær Doberck,Nye Borgerlige,44
3,Rashid Ali,"Frie Grønne, Danmarks Nye Venstrefløjsparti",43
4,Mohamed Abdikarim,"Frie Grønne, Danmarks Nye Venstrefløjsparti",43
5,Jan Filbært,Enhedslisten,42
6,Elise Bjerkrheim,"Frie Grønne, Danmarks Nye Venstrefløjsparti",42
7,John Bjerg,Nye Borgerlige,42
8,Lone Vase Langballe,Dansk Folkeparti,42
9,Frank Sørensen,Dansk Folkeparti,42


**An overview of the most "neutral" candidates, i.e., those with the highest proportion of "neutral" responses.**

In [3]:
allCandidates = pd.read_excel('alldata.xlsx')
allCandidates = allCandidates.drop(columns=['alder', 'storkreds'])
neutralCandidates = allCandidates

# Define a function to count occurrences of 0 in each row
def count_0(row):
    # Count how many times 0 in the row and return the count
    return (row == 0).sum() 

# Apply the function to each row to create the new column, axis=1 means apply the function to each row
neutralCandidates['confident_answers'] = neutralCandidates.apply(count_0, axis=1)

neutralCandidates = neutralCandidates[['navn', 'parti', 'confident_answers']]
neutralCandidates = neutralCandidates.sort_values(by='confident_answers', ascending=False)
neutralCandidates = neutralCandidates.head(10).reset_index(drop=True)

display(neutralCandidates)

Unnamed: 0,navn,parti,confident_answers
0,Barbara Krarup Hansen,Socialdemokratiet,32
1,Thorkild Holmboe-Hay,Socialdemokratiet,28
2,Kenneth Fredslund Petersen,Danmarksdemokraterne,23
3,Anne Hegelund,Enhedslisten,23
4,Kasper Sten Krebs,Alternativet,21
5,Dorthe Hecht,Enhedslisten,15
6,Kim Rasmussen,Moderaterne,14
7,Malte Larsen,Socialdemokratiet,14
8,Claus Buch,Venstre,14
9,Carsten Damgaard Møller,Moderaterne,14


**Differences in responses between candidates, both inter-party and intra-party, along with an explanation of which parties have the most internal disagreements.**

In [4]:
responses = pd.read_excel('alldata.xlsx')
responses = responses.drop(columns=['alder', 'storkreds', 'navn'])

# Drop individual candidates, who are not a part of a party
responses = responses[responses['parti'] != 'Løsgænger']

# INTER PARTY

responsesWithoutParty = responses.drop(columns=['parti'])

# Std for all responses 
responsesVariance = responsesWithoutParty.std()
print("DIfferences in responses inter-party")
display(responsesVariance)

sortedDifferences = responsesVariance.sort_values(ascending=False)

# Display the top 10 questions where the candidates disagreed most
print("Top 10 contradictory questions:")
display(sortedDifferences.head(10))

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

# INTRA PARTY 

# Std for each response in each party
responsesVariationsGroupedByParty = responses.groupby('parti').std()
print("Differences in responses between candidates intra-party")
display(responsesVariationsGroupedByParty)

# Mean of std of responses in each party
responsesVariationsGroupedByPartyMean = responsesVariationsGroupedByParty.mean(axis=1)

# Sort the mean values in descending order
sortedMeans = responsesVariationsGroupedByPartyMean.sort_values(ascending=False)

# Display the top 10 parties with their mean standard deviation
print("Top 10 Parties with most internal disagreements:")
display(sortedMeans.head(10))

DIfferences in responses inter-party


530    1.422987
531    1.616885
533    1.528374
534    1.380771
535    1.350876
537    1.543304
538    1.333816
540    1.522558
541    1.567395
543    1.525814
544    1.604332
545    1.379807
546    1.621371
547    1.458963
548    1.404736
550    1.588156
551    1.482317
552    1.462314
553    1.384817
555    1.658040
556    1.648914
557    1.178702
559    1.505487
561    1.405903
563    1.427938
1a     1.452737
1b     1.364897
2a     1.151963
2b     1.440227
3a     1.410110
3b     1.579852
4a     1.378342
4b     1.185192
5a     0.959794
5b     0.928012
6a     1.605280
6b     1.604415
7a     1.548218
7b     1.095970
8a     0.831373
8b     0.951641
9a     1.410224
9b     1.015459
10a    1.235777
10b    1.617698
11a    1.472809
11b    1.602712
12a    1.027731
12b    1.220717
dtype: float64

Top 10 contradictory questions:


555    1.658040
556    1.648914
546    1.621371
10b    1.617698
531    1.616885
6a     1.605280
6b     1.604415
544    1.604332
11b    1.602712
550    1.588156
dtype: float64

Differences in responses between candidates intra-party


Unnamed: 0_level_0,530,531,533,534,535,537,538,540,541,543,...,8a,8b,9a,9b,10a,10b,11a,11b,12a,12b
parti,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alternativet,0.477442,0.983123,0.308188,0.490064,1.125744,0.569463,1.312019,1.300419,1.033584,0.856771,...,0.789747,0.832949,0.765906,1.023911,1.076663,1.086698,1.200972,1.223174,1.078342,1.172773
Danmarksdemokraterne,0.727,0.54525,0.901284,0.725966,0.494717,0.967179,0.419137,0.664411,0.3635,0.763271,...,0.57735,0.651863,0.554804,0.570812,0.68225,0.276725,0.164399,0.3148,0.65071,0.776919
Dansk Folkeparti,0.935414,0.837574,0.867497,0.596902,0.708908,1.172604,0.614452,1.094979,0.708908,1.226826,...,0.968026,1.023117,1.093036,1.158377,0.731344,0.765431,0.547878,0.455503,0.928113,1.193805
Det Konservative Folkeparti,1.081178,0.329489,1.271191,0.673504,0.389209,0.784341,0.142119,0.536193,1.028573,0.858431,...,0.786484,0.809488,1.017881,0.976098,0.762579,0.68312,0.422515,0.446248,0.783871,1.123408
Enhedslisten,0.258509,0.381932,0.274823,1.291902,0.625421,0.368362,1.292833,0.437293,0.538245,0.476007,...,0.955992,1.048703,0.417366,0.899319,0.957153,0.33796,1.003142,0.860504,1.061685,1.165339
"Frie Grønne, Danmarks Nye Venstrefløjsparti",0.271746,0.271746,0.0,0.470679,0.429669,0.0,0.651625,0.859338,0.884047,0.325813,...,1.176697,1.066987,0.928191,1.33359,1.55514,0.752432,1.070586,1.017539,1.058301,1.250846
Kristendemokraterne,1.086556,1.319029,1.110608,0.912235,0.94322,1.309307,1.167703,1.199351,1.309307,1.257036,...,0.53289,0.972622,1.199351,1.206111,0.912235,1.074463,1.023816,1.047368,0.923622,1.439649
Liberal Alliance,0.747667,0.167802,1.1457,0.740572,0.630652,0.302166,0.0,0.167802,0.882777,0.572127,...,0.924589,0.97643,0.856228,0.819534,0.557464,0.259399,0.203997,0.337142,0.649892,0.996993
Moderaterne,0.98331,0.957882,1.293242,0.297102,0.496796,0.882137,0.354169,0.739228,1.208756,0.771517,...,0.696765,0.912235,0.923622,1.018984,0.676896,1.001741,0.680319,0.696765,0.866528,1.164965
Nye Borgerlige,0.458471,0.237635,0.981396,0.529891,0.196039,0.756151,0.140028,0.325396,0.691687,0.737244,...,0.621825,0.621194,0.73458,0.623085,0.36729,0.930949,0.196039,0.0,0.941838,0.814573


Top 10 Parties with most internal disagreements:


parti
Kristendemokraterne            1.030463
Alternativet                   0.909648
Moderaterne                    0.890754
Dansk Folkeparti               0.888009
Det Konservative Folkeparti    0.887302
Radikale Venstre               0.857237
Venstre                        0.834677
Socialdemokratiet              0.825141
Liberal Alliance               0.730710
Enhedslisten                   0.678929
dtype: float64

**Classification models to predict candidates' party affiliations. Investigate if there are any candidates who seem to be in the "wrong" party based on their political landscape positions. You must use the following three algorithms: *Decision Tree, Random Forrest, and Gradient Boosted Tree*, and *two other* classification algorithms of your choice.**

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

candidatesData = pd.read_excel('alldata.xlsx')
candidatesData = candidatesData.drop(columns=['alder', 'storkreds', 'navn'])

candidatesResponses = candidatesData.drop(columns=['parti'])
candidatesResponses.columns = candidatesResponses.columns.astype(str)


candidatesParty = candidatesData['parti']

# Use stratify to ensure that the class distribution is similar in both training and testing sets
X_train, X_test, y_train, y_test = train_test_split(candidatesResponses, candidatesParty, stratify=candidatesParty, random_state = 42)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, stratify= y_train, random_state=42)


In [6]:
# DECISION TREE

# Train the decision tree classifier
decisionTree = DecisionTreeClassifier(max_depth=5, random_state=42)
decisionTree.fit(X_train, y_train)

# Evaluate the model
y_pred = decisionTree.predict(X_val)
accuracy = accuracy_score(y_val, y_pred)
print("Accuracy:", accuracy)


# Predict party affiliations for new candidates
candidatesPartyAffiliationsTree = decisionTree.predict(X_test)

# Investigate potential misclassifications or candidates in the wrong party
for candidate, predictedParty, actualParty in zip(X_test.index, candidatesPartyAffiliationsTree, y_test):
   if actualParty != predictedParty:
    print(f"Candidate: {candidate}, Predicted Party: {predictedParty}, Actual Party: {actualParty}")


Accuracy: 0.6871165644171779
Candidate: 164, Predicted Party: Venstre, Actual Party: Moderaterne
Candidate: 357, Predicted Party: Alternativet, Actual Party: Frie Grønne, Danmarks Nye Venstrefløjsparti
Candidate: 778, Predicted Party: Socialistisk Folkeparti, Actual Party: Socialdemokratiet
Candidate: 806, Predicted Party: Alternativet, Actual Party: Frie Grønne, Danmarks Nye Venstrefløjsparti
Candidate: 230, Predicted Party: Venstre, Actual Party: Alternativet
Candidate: 65, Predicted Party: Nye Borgerlige, Actual Party: Liberal Alliance
Candidate: 519, Predicted Party: Nye Borgerlige, Actual Party: Venstre
Candidate: 3, Predicted Party: Socialdemokratiet, Actual Party: Alternativet
Candidate: 585, Predicted Party: Kristendemokraterne, Actual Party: Radikale Venstre
Candidate: 692, Predicted Party: Alternativet, Actual Party: Radikale Venstre
Candidate: 782, Predicted Party: Socialistisk Folkeparti, Actual Party: Alternativet
Candidate: 392, Predicted Party: Venstre, Actual Party: Det

In [7]:
from sklearn.ensemble import RandomForestClassifier

# RANDOM FOREST

forest = RandomForestClassifier(n_estimators = 200, max_depth = 6, 
                                max_features = 20, random_state = 42)
forest.fit(X_train, y_train)

# Evaluate the model
y_pred = forest.predict(X_val)
accuracy = accuracy_score(y_val, y_pred)
print("Accuracy:", accuracy)

# Predict party affiliations for new candidates
candidatesPartyAffiliationsForest = forest.predict(X_test)

# Investigate potential misclassifications or candidates in the wrong party
for candidate, predictedParty, actualParty in zip(X_test.index, candidatesPartyAffiliationsForest, y_test):
   if actualParty != predictedParty:
    print(f"Candidate: {candidate}, Predicted Party: {predictedParty}, Actual Party: {actualParty}")


Accuracy: 0.8957055214723927
Candidate: 1, Predicted Party: Enhedslisten, Actual Party: Socialistisk Folkeparti
Candidate: 65, Predicted Party: Nye Borgerlige, Actual Party: Liberal Alliance
Candidate: 519, Predicted Party: Dansk Folkeparti, Actual Party: Venstre
Candidate: 3, Predicted Party: Socialdemokratiet, Actual Party: Alternativet
Candidate: 782, Predicted Party: Socialistisk Folkeparti, Actual Party: Alternativet
Candidate: 92, Predicted Party: Liberal Alliance, Actual Party: Det Konservative Folkeparti
Candidate: 42, Predicted Party: Alternativet, Actual Party: Frie Grønne, Danmarks Nye Venstrefløjsparti
Candidate: 76, Predicted Party: Alternativet, Actual Party: Socialistisk Folkeparti
Candidate: 600, Predicted Party: Socialistisk Folkeparti, Actual Party: Socialdemokratiet
Candidate: 863, Predicted Party: Socialistisk Folkeparti, Actual Party: Socialdemokratiet
Candidate: 8, Predicted Party: Radikale Venstre, Actual Party: Løsgænger
Candidate: 391, Predicted Party: Venstre,

In [8]:
from sklearn.ensemble import GradientBoostingClassifier

# GRADIENT BOOSTED TREE

gbt = GradientBoostingClassifier(random_state = 42, n_estimators = 100,
                                max_depth = 2, learning_rate = 0.03)
gbt.fit(X_train, y_train)

# Evaluate the model
y_pred = gbt.predict(X_val)
accuracy = accuracy_score(y_val, y_pred)
print("Accuracy:", accuracy)

# Predict party affiliations for new candidates
candidatesPartyAffiliationsGBT = gbt.predict(X_test)

# Investigate potential misclassifications or candidates in the wrong party
for candidate, predictedParty, actualParty in zip(X_test.index, candidatesPartyAffiliationsGBT, y_test):
   if actualParty != predictedParty:
    print(f"Candidate: {candidate}, Predicted Party: {predictedParty}, Actual Party: {actualParty}")


Accuracy: 0.8957055214723927
Candidate: 806, Predicted Party: Alternativet, Actual Party: Frie Grønne, Danmarks Nye Venstrefløjsparti
Candidate: 36, Predicted Party: Det Konservative Folkeparti, Actual Party: Venstre
Candidate: 1, Predicted Party: Enhedslisten, Actual Party: Socialistisk Folkeparti
Candidate: 296, Predicted Party: Alternativet, Actual Party: Radikale Venstre
Candidate: 519, Predicted Party: Dansk Folkeparti, Actual Party: Venstre
Candidate: 3, Predicted Party: Socialdemokratiet, Actual Party: Alternativet
Candidate: 782, Predicted Party: Radikale Venstre, Actual Party: Alternativet
Candidate: 92, Predicted Party: Liberal Alliance, Actual Party: Det Konservative Folkeparti
Candidate: 42, Predicted Party: Alternativet, Actual Party: Frie Grønne, Danmarks Nye Venstrefløjsparti
Candidate: 76, Predicted Party: Alternativet, Actual Party: Socialistisk Folkeparti
Candidate: 863, Predicted Party: Socialistisk Folkeparti, Actual Party: Socialdemokratiet
Candidate: 124, Predicte

In [9]:
from sklearn.neighbors import KNeighborsClassifier

# K NEAREST NEIGHBOUR

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Evaluate the model
y_pred = knn.predict(X_val)
accuracy = accuracy_score(y_val, y_pred)
print("Accuracy:", accuracy)

# Predict party affiliations for new candidates
candidatesPartyAffiliationsKNN = knn.predict(X_test)

# Investigate potential misclassifications or candidates in the wrong party
for candidate, predictedParty, actualParty in zip(X_test.index, candidatesPartyAffiliationsKNN, y_test):
   if actualParty != predictedParty:
    print(f"Candidate: {candidate}, Predicted Party: {predictedParty}, Actual Party: {actualParty}")


Accuracy: 0.901840490797546
Candidate: 519, Predicted Party: Nye Borgerlige, Actual Party: Venstre
Candidate: 3, Predicted Party: Socialdemokratiet, Actual Party: Alternativet
Candidate: 782, Predicted Party: Enhedslisten, Actual Party: Alternativet
Candidate: 92, Predicted Party: Liberal Alliance, Actual Party: Det Konservative Folkeparti
Candidate: 42, Predicted Party: Alternativet, Actual Party: Frie Grønne, Danmarks Nye Venstrefløjsparti
Candidate: 8, Predicted Party: Radikale Venstre, Actual Party: Løsgænger
Candidate: 626, Predicted Party: Det Konservative Folkeparti, Actual Party: Venstre
Candidate: 155, Predicted Party: Danmarksdemokraterne, Actual Party: Dansk Folkeparti
Candidate: 680, Predicted Party: Liberal Alliance, Actual Party: Det Konservative Folkeparti
Candidate: 763, Predicted Party: Venstre, Actual Party: Det Konservative Folkeparti
Candidate: 139, Predicted Party: Det Konservative Folkeparti, Actual Party: Venstre
Candidate: 623, Predicted Party: Danmarksdemokrate

In [10]:
from sklearn.svm import SVC

# SVM

svc = SVC()
svc.fit(X_train, y_train)

# Evaluate the model
y_pred = svc.predict(X_val)
accuracy = accuracy_score(y_val, y_pred)
print("Accuracy:", accuracy)

# Predict party affiliations for new candidates
candidatesPartyAffiliationsSVC = svc.predict(X_test)

# Investigate potential misclassifications or candidates in the wrong party
for candidate, predictedParty, actualParty in zip(X_test.index, candidatesPartyAffiliationsSVC, y_test):
   if actualParty != predictedParty:
    print(f"Candidate: {candidate}, Predicted Party: {predictedParty}, Actual Party: {actualParty}")


Accuracy: 0.9570552147239264
Candidate: 1, Predicted Party: Enhedslisten, Actual Party: Socialistisk Folkeparti
Candidate: 519, Predicted Party: Nye Borgerlige, Actual Party: Venstre
Candidate: 3, Predicted Party: Socialdemokratiet, Actual Party: Alternativet
Candidate: 92, Predicted Party: Liberal Alliance, Actual Party: Det Konservative Folkeparti
Candidate: 42, Predicted Party: Alternativet, Actual Party: Frie Grønne, Danmarks Nye Venstrefløjsparti
Candidate: 76, Predicted Party: Alternativet, Actual Party: Socialistisk Folkeparti
Candidate: 8, Predicted Party: Radikale Venstre, Actual Party: Løsgænger
Candidate: 626, Predicted Party: Det Konservative Folkeparti, Actual Party: Venstre
Candidate: 139, Predicted Party: Det Konservative Folkeparti, Actual Party: Venstre
Candidate: 343, Predicted Party: Venstre, Actual Party: Danmarksdemokraterne
Candidate: 500, Predicted Party: Alternativet, Actual Party: Enhedslisten
Candidate: 816, Predicted Party: Det Konservative Folkeparti, Actual