# Candidate Test 2022 Analysis Part 1

This exercise focuses on the candidate tests from two television networks: DR and TV2. Data from both tests have been given on a scale of five responses (-2, -1, 0, 1, 2).

---

There are 6 datasets included in this exercise:

- `alldata.xlsx`: Contains responses from both TV stations.
- `drdata.xlsx`: Contains responses from DR.
- `drq.xlsx`: Contains questions from DR.
- `tv2data.xlsx`: Contains responses from TV2.
- `tv2q.xlsx`: Contains questions from TV2.
- `electeddata.xlsx`: Contains responses from both TV stations for candidates who were elected to the parliament. Note that 9 members are missing; 7 of them didn't take any of the tests. Additionally, some notable figures like Mette F. and Lars Løkke did not participate in any of the tests.

---

It's entirely up to you how you approach this data, but at a *minimum*, your analysis should include:
- Age of the candidates grouped by parties.
- An overview of the most "confident" candidates, i.e., those with the highest proportion of "strongly agree" or "strongly disagree" responses.
- Differences in responses between candidates, both inter-party and intra-party, along with an explanation of which parties have the most internal disagreements.
- Classification models to predict candidates' party affiliations. Investigate if there are any candidates who seem to be in the "wrong" party based on their political landscape positions. You must use the following three algorithms: **Decision Tree, Random Forrest, and Gradient Boosted Tree**, and **two other** classification algorithms of your choice, i.e. a total of 5 models are to be trained.

---

The following parties are represented:

| Party letter | Party name | Party name (English) | Political position |
| :-: | :-: | :-: | :-: |
| A | Socialdemokratiet | Social Democrats | Centre-left |
| V | Venstre | Danish Liberal Party | Centre-right |
| M | Moderaterne | Moderates | Centre-right |
| F | Socialistisk Folkeparti | Socialist People's Party | Left-wing |
| D | Danmarksdemokraterne | Denmark Democrats | Right-wing |
| I | Liberal Alliance | Liberal Alliance | Right-wing |
| C | Konservative | Conservative People's Party | Right-wing |
| Æ | Enhedslisten | Red-Green Alliance | Far-left |
| B | Radikale Venstre | Social Liberal Party | Centre-left |
| D | Nye Borgerlige | New Right | Far-right |
| Z | Alternativet | The Alternative | Centre-left |
| O | Dansk Folkeparti | Danish People's Party | Far-right |
| G | Frie Grønne | Free Greens | Centre-left |
| K | Kristendemokraterne | Christian Democrats | Centre-right |

Below you can see the results and the colors chosen to represent the parties. Use these colors in your analysis above.

![Alt text](image-1.png)


Others have undertaken similar analyses. You can draw inspiration from the following (use Google tranlsate if your Danish is rusty):

- [Analysis of where individual candidates stand relative to each other and their parties](https://v2022.dumdata.dk/)
- [Candidate Test 2022 – A deep dive into the data](https://kwedel.github.io/kandidattest2022/)
- [The Political Landscape 2019](https://kwedel.github.io/kandidattest2019/)



In [23]:
import pandas as pd

all_data = pd.read_excel('alldata.xlsx')
dr_data = pd.read_excel('drdata.xlsx')
drq = pd.read_excel('drq.xlsx')
tv2_data = pd.read_excel('tv2data.xlsx')
tv2q = pd.read_excel('tv2q.xlsx')

print(all_data)

print(dr_data)

print(drq)

print(tv2_data)

print(tv2q)


     530  531  533  534  535  537  538  540  541  543  ...  9a  9b  10a  10b  \
0     -1   -2    1   -2    2    1   -2    1    1    2  ...   2   0    1   -2   
1      2    2   -1   -2   -1   -2    1   -2    2   -2  ...  -2   0   -1    2   
2      2    1   -2   -2    1   -2    1   -1    1   -1  ...  -1  -2    0    2   
3      2    1   -2   -1    1    1    1    1    1   -2  ...  -2   2    2    2   
4      1    1   -2    2   -2    1   -2    1    2   -2  ...  -2   0   -1    0   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ..  ..  ...  ...   
862    1    1    1   -2    2   -1    1   -2    1    1  ...   1   2   -1    2   
863    1   -2   -2   -2    1   -2   -1   -2   -2   -2  ...  -1   0   -1    2   
864    1    1    1   -2    2   -1    1   -1   -1    1  ...  -1   0   -1    2   
865    1   -1    1   -2    1   -1   -1   -1   -1    1  ...   0   0    0    2   
866    1    1   -1   -2    1   -1    1   -1    1   -1  ...  -2  -1   -1    2   

     11a  11b  12a  12b                

In [26]:
# Loading the entire Excel file
df = pd.read_excel('alldata.xlsx')

# Drop the columns by name (e.g., exclude 'Column1' and 'Column3')
df_filtered = df.drop(columns=['parti', 'storkreds', 'alder'])

# Print the resulting DataFrame
print(df_filtered)


     530  531  533  534  535  537  538  540  541  543  ...  8a  8b  9a  9b  \
0     -1   -2    1   -2    2    1   -2    1    1    2  ...   1   0   2   0   
1      2    2   -1   -2   -1   -2    1   -2    2   -2  ...   0   0  -2   0   
2      2    1   -2   -2    1   -2    1   -1    1   -1  ...   1   1  -1  -2   
3      2    1   -2   -1    1    1    1    1    1   -2  ...   2   2  -2   2   
4      1    1   -2    2   -2    1   -2    1    2   -2  ...   1   0  -2   0   
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ..  ..  ..  ..   
862    1    1    1   -2    2   -1    1   -2    1    1  ...   2   1   1   2   
863    1   -2   -2   -2    1   -2   -1   -2   -2   -2  ...   1   0  -1   0   
864    1    1    1   -2    2   -1    1   -1   -1    1  ...   1   0  -1   0   
865    1   -1    1   -2    1   -1   -1   -1   -1    1  ...   0   0   0   0   
866    1    1   -1   -2    1   -1    1   -1    1   -1  ...   1   1  -2  -1   

     10a  10b  11a  11b  12a  12b  
0      1   -2   -2    1    

In [2]:
import pandas as pd

# Loading the Excel file into a DataFrame
df = pd.read_excel('alldata.xlsx')

# Defining a function to count 2's and -2's in each row ("strongly agree" and "strongly disagree" answers)
def count_values(row):
    count_2 = (row == 2).sum()  # Count occurrences of 2
    count_neg_2 = (row == -2).sum()  # Count occurrences of -2
    return pd.Series({'Count_2': count_2, 'Count_-2': count_neg_2})

# Applying the function row-wise, excluding the irrelevant, non-numerical columns
df_counts = df.drop(columns=['navn', 'parti', 'storkreds', 'alder']).apply(count_values, axis=1)

# Concatenating the counts with the 'navn' column
df_final = pd.concat([df['navn'], df_counts], axis=1)

# Grouping by 'navn' if there are multiple rows with the same name
df_grouped = df_final.groupby('navn').sum().reset_index()

# Creating a new column 'Total_Count' that sums 'Count_2' and 'Count_-2'
df_grouped['Total_Count'] = df_grouped['Count_2'] + df_grouped['Count_-2']

# Sorting the DataFrame by 'Total_Count' in descending order
df_sorted = df_grouped.sort_values(by='Total_Count', ascending=False)

# Printing the first 50 rows of the sorted DataFrame
print(df_sorted.head(50))



                               navn  Count_2  Count_-2  Total_Count
801                   Søren Vanting       18        31           49
735                    Sarah Nørris       24        25           49
397             Kim Andkjær Doberck       22        22           44
596               Mohamed Abdikarim       21        22           43
689                      Rashid Ali       23        20           43
337                      John Bjerg       18        24           42
298                     Jan Filbært       19        23           42
482             Lone Vase Langballe       23        19           42
213                  Frank Sørensen       25        17           42
189                Elise Bjerkrheim       23        19           42
752                     Simon Hampe       21        20           41
687               Qasam Nazir Ahmad       21        20           41
55                     Asham Nadeem       20        21           41
399                Kim Christiansen       23    