Election Results

The election is conducted in a city and everyone can vote for one or more candidates, or choose not to vote at all. Each person has 1 vote so if they vote for multiple candidates, their vote gets equally split across these candidates. For example, if a person votes for 2 candidates, these candidates receive an equivalent of 0.5 vote each.
Find out who got the most votes and won the election. Output the name of the candidate or multiple names in case of a tie. To avoid issues with a floating-point error you can round the number of votes received by a candidate to 3 decimal places.

In [1]:
import pandas as pd
import numpy as np

In [3]:
voting_results = pd.read_excel("../CSV/voting_results.xlsx")
voting_results.head()

Unnamed: 0,Tаблица 1,Unnamed: 1
0,voter,candidate
1,Kathy,
2,Charles,Ryan
3,Charles,Christine
4,Charles,Kathy


In [4]:
dic = {}
for i, j in voting_results[:1].items():
    dic[i] = j.values[0]
dic

{'Tаблица 1': 'voter', 'Unnamed: 1': 'candidate'}

In [5]:
voting_results.rename(columns=dic, inplace=True)
voting_results.head()

Unnamed: 0,voter,candidate
0,voter,candidate
1,Kathy,
2,Charles,Ryan
3,Charles,Christine
4,Charles,Kathy


In [6]:
voting_results.drop(0, inplace=True)
voting_results.head()

Unnamed: 0,voter,candidate
1,Kathy,
2,Charles,Ryan
3,Charles,Christine
4,Charles,Kathy
5,Benjamin,Christine


In [7]:
voting_results = voting_results[~voting_results['candidate'].isna()]
voting_results.head()

Unnamed: 0,voter,candidate
2,Charles,Ryan
3,Charles,Christine
4,Charles,Kathy
5,Benjamin,Christine
6,Anthony,Paul


In [10]:
voting_results['vote_value'] = voting_results['voter'].apply(lambda x: 1/(voting_results['voter'] == x).sum())
voting_results.head(10)

Unnamed: 0,voter,candidate,vote_value
2,Charles,Ryan,0.333333
3,Charles,Christine,0.333333
4,Charles,Kathy,0.333333
5,Benjamin,Christine,1.0
6,Anthony,Paul,0.2
7,Anthony,Anthony,0.2
8,Edward,Ryan,0.333333
9,Edward,Paul,0.333333
10,Edward,Kathy,0.333333
12,Nancy,Ryan,0.2


In [11]:
voting_results = voting_results.groupby('candidate').sum().reset_index()
voting_results

Unnamed: 0,candidate,voter,vote_value
0,Anthony,AnthonyEvelynShirleyMarthaMarieAnthony,2.4
1,Christine,CharlesBenjaminNancyEvelynBobbyHelenAndrewAlan...,5.283333
2,Kathy,CharlesEdwardNancyShirleyHelenKevinKathy,1.95
3,Nicole,NancyEvelynShirleyHelenMatthewNicoleKathy,2.7
4,Paul,AnthonyEdwardNancyEvelynKevinAnthony,1.516667
5,Ryan,CharlesEdwardNancyShirleyBobbyHelenKevinAndrew...,5.15


In [12]:
voting_results['place'] = voting_results['vote_value'].round(3).rank(ascending=False, method='min')
voting_results

Unnamed: 0,candidate,voter,vote_value,place
0,Anthony,AnthonyEvelynShirleyMarthaMarieAnthony,2.4,4.0
1,Christine,CharlesBenjaminNancyEvelynBobbyHelenAndrewAlan...,5.283333,1.0
2,Kathy,CharlesEdwardNancyShirleyHelenKevinKathy,1.95,5.0
3,Nicole,NancyEvelynShirleyHelenMatthewNicoleKathy,2.7,3.0
4,Paul,AnthonyEdwardNancyEvelynKevinAnthony,1.516667,6.0
5,Ryan,CharlesEdwardNancyShirleyBobbyHelenKevinAndrew...,5.15,2.0


In [13]:
result = voting_results[voting_results['place'] == 1]['candidate']
result

1    Christine
Name: candidate, dtype: object

Solution Walkthrough
This problem involves analyzing voting data to determine which candidate received the most votes in an election. The voting data is stored in a table format with columns for the voter, candidate, and vote value.

The given code performs several steps to achieve this result. Let's break down the code and understand each part.

Understanding The Data
The voting data is stored in a table called voting_results. It has the following columns:

'voter' represents the name of the person who voted.
'candidate' represents the name of the candidate they voted for.
'vote_value' represents the value of the vote (i.e., the fraction of a vote each candidate receives if the voter voted for multiple candidates).
The Problem Statement
The problem is to find out which candidate received the most votes and won the election. In case of a tie, the code should output multiple names. To avoid floating-point errors, the vote count for each candidate is rounded to 3 decimal places.

Breaking Down The Code
Let's break down the given code into smaller steps and understand their purpose.

voting_results = voting_results[~voting_results["candidate"].isna()]
This line removes any rows from the voting_results table where the 'candidate' column is empty. This is done using the isna() method, which returns True for all missing values and False otherwise. The ~ symbol is used to negate the conditions, so ~isna() selects all rows where the 'candidate' column is not empty.

voting_results["vote_value"] = voting_results["voter"].apply(
    lambda x: 1 / (voting_results["voter"] == x).sum()
)
This line calculates the vote value for each row in the voting_results table. The 'vote_value' column is created using the apply() method, which applies a function to each value in the 'voter' column. The lambda function (lambda x: 1/(voting_results['voter'] == x).sum()) counts the number of occurrences of each voter in the 'voter' column and divides 1 by that count, resulting in the vote value.

voting_results = (
    voting_results.groupby("candidate").sum().reset_index()
)
This line groups the data by the 'candidate' column and calculates the sum of each group. The groupby() method is used to group the data, and the sum() method is used to calculate the sum of each group. The reset_index() method is then used to reset the index of the resulting table.

voting_results["place"] = (
    voting_results["vote_value"]
    .round(3)
    .rank(ascending=False, method="min")
)
This line assigns a place to each candidate based on their vote value. The 'vote_value' column is rounded to 3 decimal places using the round() method. The rank() method is then used to assign a rank to each value, with the highest value receiving a rank of 1. The ascending=False parameter is used to rank the values in descending order, and the method='min' parameter is used to handle ties by assigning the lowest rank to them. The resulting ranks are assigned to the 'place' column.

result = voting_results[voting_results["place"] == 1]["candidate"]
This line selects the candidate(s) with a place of 1 (i.e., the candidate(s) with the most votes) from the voting_results table. The condition voting_results['place'] == 1 selects rows where the 'place' column is equal to 1, and the ['candidate'] portion selects the 'candidate' column from those rows.

Bringing It All Together
The code first removes any rows from the voting_results table where the 'candidate' column is empty. It then calculates the vote value for each row based on the number of occurrences of each voter. Next, it groups the data by the 'candidate' column and calculates the sum of each group. It assigns a place to each candidate based on their vote value, rounding it to 3 decimal places and handling ties by assigning the lowest rank. Finally, it selects the candidate(s) with a place of 1 (i.e., the candidate(s) with the most votes) and assigns them to the result variable.

Conclusion
The given code analyzes voting data to determine the candidate(s) with the most votes and outputs their name(s). The code performs several steps, including removing rows with missing candidate names, calculating vote values, grouping and summing the data by candidate, assigning ranks based on vote values, and selecting the candidate(s) with the highest rank.