# Challenge Problem: Simulating the Effect of Low Voter Turnout

## Basic assumptions

Let $\mathcal{R}$ be the set of **all** registered voters and $\mathcal{V}:\mathcal{R}_i\to v_i\in\mathcal{C}$ the set of votes, with $v_i$ as the individual vote, representing the **opinion** of each of the registered voters. $\mathcal{C}:=[A,B]$ is the set of candiates with $A$ and $B$ being values representing a vote for candidate $A$ or $B$, respectively.

We declare that candidate the **preferred candidate** who collects most votes out of $\mathcal{V}$.


## Voting procedure

Now let $\mathcal{R}_v\subset\mathcal{R}$ be the subset of registered voters, $\mathcal{R}$, who actually voted. Let $\mathcal{V}_v\subset\mathcal{V}$ be the set of these votes.

We declare that candidate the **winner of the election** who collected most votes out of $\mathcal{V}_v$.

## Research question

1. Assume 45% (47.5%) of $\mathcal{V}$ are votes for candidate $A$ and 55% (52.5%) of $\mathcal{V}$ are votes for candidate $B$ (everybody has a favorite candidate; no omissions).
2. Assume further that only a random group $\mathcal{R}_v = \mathcal{R}/3$ actually votes, i.e., only $\frac{1}{3}$rd of registered voters actually vote.

What is the probability that the **winner of the election** is **NOT** the **preferred candidate** of the electorate?

## Approach

Develop 
1. a function that randomly generates a list $\mathcal{R}_v$ and their votes $\mathcal{V}_v$ by picking the respective votes out of $\mathcal{V}$.
2. a function that analyzes a set $\mathcal{V}$ (or $\mathcal{V}_v$) and returns the **winner of the election**
3. a driver function that takes a `number of elections` as argument and simulates that number of random voters and collects the election results.
4. analyze the set of election results for probability of each candidate to be the **winner of the election**

**Hints**: For plausibility test cases.
- if $\mathcal{R}_v = \mathcal{R}$, then $\mathcal{V}_v = \mathcal{V}$, and the **preferred candidate** should win at 100%
- if $\mathcal{R}_v$ consists of just one single voter, the **preferred candidate** and the **opposing candiate** should win in 66.67%  and 33.33% of the elections, respectively.

## Defining the electorate and their choices

In [31]:
import numpy as np

# candidates (9:11)
C = ('A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B')
C = ('A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B')

# voters
N_voters = 1000000

R = np.arange(N_voters)
V = np.array([ C[k%len(C)] for k in range(N_voters) ])

# reduce to unique candidates
C = ('A','B')

In [32]:
# verify that the generated set satisfies
    
for candidate in C:
    print("Candidate {}\t{:6.2f}%".format(candidate,100*sum(1 for i in V if i==candidate)/N_voters))

Candidate A	 47.37%
Candidate B	 52.63%


## Defining election functions

In [3]:
def pick_voters(R,V,turnout=1./3.):
    
    N = len(R)
    M = int(N*turnout)
    idx = np.random.randint(0,N,size=M)
    Rv = R[idx]
    Vv = V[idx]
    
    return (Rv,Vv)

In [4]:
def count_election(Rv,Vv,Candidates):
    N = len(Rv)
    
    winner = {'name':   'unknown',
              'result': 0.00}
    
    for candidate in Candidates:
        result = sum(1 for i in Vv if i==candidate)/N
        if result > winner['result']:
            winner['name']   = candidate
            winner['result'] = result
            
    return winner
        

In [5]:
def run_elections(R,V,Candidates,n=1,turnout=1./3.):
    results = []
    
    for e in range(n):
        (Rv,Vv) = pick_voters(R,V,turnout)
        winner  = count_election(Rv,Vv,C)
        results.append(winner)
    
    return results

In [11]:
def analyze_elections(Candidates,Results):
    ans = {}
    for candidate in Candidates:
        rec = {'name':candidate, 'wins':0, 'percent_wins':0.0, 'results':[]}
        ans[candidate] = rec
        
    for winner in Results:
        if winner['name'] in ans:
            rec = ans[winner['name']]
            rec['wins'] += 1
            rec['results'].append(winner['result'])
    
    for candidate in Candidates:
        rec = ans[candidate]
        rec['percent_wins'] = 100.*rec['wins']/len(Results)
        
    return ans

## Run the election simulation

In [33]:
# this should yield the perfect result
results = run_elections(R,V,C,10,1.0)
analysis = analyze_elections(C,results)

# print(analysis)

for candidate in analysis:
    print("Candidate {name}\twins {percent_wins:6.2f}% of the time".format(**analysis[candidate]))
    
print(analysis[C[1]]['results'])

Candidate A	wins   0.00% of the time
Candidate B	wins 100.00% of the time
[0.526676, 0.525981, 0.526498, 0.526727, 0.52656, 0.525674, 0.52653, 0.526725, 0.52593, 0.526365]


In [34]:
results = run_elections(R,V,C,10000,1./N_voters)
analysis = analyze_elections(C,results)

# print(analysis)

for candidate in analysis:
    print("Candidate {name}\twins {percent_wins:6.2f}% of the time".format(**analysis[candidate]))

Candidate A	wins  46.72% of the time
Candidate B	wins  53.28% of the time


In [35]:
results = run_elections(R,V,C,10000,0.00001)
analysis = analyze_elections(C,results)

# print(analysis)

for candidate in analysis:
    print("Candidate {name}\twins {percent_wins:6.2f}% of the time".format(**analysis[candidate]))

Candidate A	wins  55.74% of the time
Candidate B	wins  44.26% of the time


In [36]:
results = run_elections(R,V,C,10000,0.0001)
analysis = analyze_elections(C,results)

# print(analysis)

for candidate in analysis:
    print("Candidate {name}\twins {percent_wins:6.2f}% of the time".format(**analysis[candidate]))

Candidate A	wins  32.82% of the time
Candidate B	wins  67.18% of the time


In [37]:
results = run_elections(R,V,C,10000,0.01)
analysis = analyze_elections(C,results)

# print(analysis)

for candidate in analysis:
    print("Candidate {name}\twins {percent_wins:6.2f}% of the time".format(**analysis[candidate]))
    
#print(analysis[C[1]]['results'])

Candidate A	wins   0.00% of the time
Candidate B	wins 100.00% of the time


In [38]:
results = run_elections(R,V,C,1000,0.1)
analysis = analyze_elections(C,results)

# print(analysis)

for candidate in analysis:
    print("Candidate {name}\twins {percent_wins:6.2f}% of the time".format(**analysis[candidate]))

Candidate A	wins   0.00% of the time
Candidate B	wins 100.00% of the time


In [39]:
results = run_elections(R,V,C,100,0.3333333333333333)
analysis = analyze_elections(C,results)

# print(analysis)

for candidate in analysis:
    print("Candidate {name}\twins {percent_wins:6.2f}% of the time".format(**analysis[candidate]))

Candidate A	wins   0.00% of the time
Candidate B	wins 100.00% of the time
