# Programming for Data Analysis - Project
***

This notebook will contain my submission for the project piece of the assessment in the Programming for Data Analysis module, Winter 2021.

<br>

### Modules Required
***

In [1]:
# NumPy for numerical operations
import numpy as np

# Pandas for data analysis
import pandas as pd

# Statistics for descriptive statistics
import statistics

<br>

### Problem Statement
***

For this project you must create a data set by simulating a real-world phenomenon of
your choosing. You may pick any phenomenon you wish – you might pick one that is
of interest to you in your personal or professional life. Then, rather than collect data
related to the phenomenon, you should model and synthesise such data using Python.
We suggest you use the numpy.random package for this purpose.

<br>

### Ideas
***

1. Covid hospital admissions with variables gender, age, vaccination status, underlying health condition

2. Voter turnout with possible variables age, socio-economic status, family history of voting, proximity to polling station, day of voting, political mobilisation, type of election, compulsory voting

<br>

### Intention & Scope
***

Simulate a dataset investigating voter turnout in Ireland. I intend to do this by showing the results of whether a person chooses to exercise their right to vote in an election/referendum or not.  

<br>

### Variables to Investigate
***

Main point to investigate: whether a person votes in a particular election or not

Other Variables to investigate:  

- age of citizen  
- voter location
- type of election  

The final variable will be the outcome of whether a person chooses to vote or not based on the relationships between the different variables.  


<br>

### Background: Who is Entitled to Vote?
***

Election: General election (and Presidential)  
Entitled to vote: Citizens 18 and over; British citizens resident  
To calculate estimated Voting-Age Population: Population aged 18 and over minus non-Irish citizens but including British citizens  

Election: Referendum  
Entitled to vote: Irish citizens aged 18 and over  
To calculate estimated Voting-Age Population: Population aged 18 and overminus all non-Irish citizens (i.e. excluding British citizens resident in Ireland)  

Election: Local elections  
Entitled to vote: All residents (of 12 months) aged 18 or over  
To calculate estimated Voting-Age Population: Population aged 18 and over  

Election: European elections  
Entitled to vote: Irish citizens and all residents who are citizens of EU Member States  
To calculate estimated Voting-Age Population: Population aged 18 and over minus non-EU citizens  

*Ref:* Data oireachtais

<br>

### Investigating Voter Turnout in Ireland
***

From data.oireachtais.ie - Turnout amongst the 18-25 age groups is lower than average in most European countries.
Cross-national data from the European Social Survey, analysed by political scientist James
Sloam, found that average reported turnout since 2000 in 30 Europe countries among the
18-25 age category was 59% compared with 82% reported turnout amongst the population
as a whole. The Survey also found that there is a strong socio-economic dynamic to this
pattern: young people with low levels of educational achievement who are eligible to vote do
so in ‘alarmingly small numbers’ (average of 25% since across the 15 European States).... higher turnout has tended to be associated with middle class areas and lower turnouts with working class areas for general elections up to 2002.... very high turnout levels were associated with mainly middle class
and mainly settled areas (that is experiencing relatively little population in migration
compared with other parts of the city). He also found that turnout was, relatively speaking,
higher in older working class communities which tend to be settled. A key determent of turnout, perhaps equally important to the socio-economic background of an area, appears to
be the extent to which an area is ‘settled’ or experiences regular population change. 

A downward trend in turnout at local elections is clear regardless of which measure is
used. Over the period from 1967 to 1999 turnout fell from 67 per cent to 50 per cent
(REG). This trend was reversed in 2004, when an official turnout of 59% was
recorded, a level almost maintained in 2009 (58 % turnout).
15 However, in 2014, turnout dropped back to 51.6%, the second lowest official turnout level in Irish local
elections...turnout in local elections has tended to be higher in rural areas, the 2014 local elections saw a narrowing of this urban-rural difference...

Year Subject Turnout Result
1937 Draft Constitution 75.8% Yes
1959 PR 58.4% No
1968 Redrawing of constituencies 65.8% No
1968 Constituencies 65.8% No
1972 Accession to the EC 70.9% Yes
1972 Reducing voting age to 18 50.7% Yes
1972 Recognition of specified religions 50.7% Yes
1979 Adoption 28.6% Yes
1979 University representation in Seanad 28.6% Yes
1983 Right to life of unborn 53.7% Yes
1984 Extension of voting rights at Dáil elections 47.5% Yes
1986 Dissolution of marriage 60.8% No
1987 Single European Act 44.1% Yes
1992 Maastricht Treaty 57.3% Yes
1992 Right to life of unborn 68.2% No
1992 Right to travel 68.2% Yes
1992 Right to information 68.1% Yes
1995 Dissolution of marriage 62.1% Yes
1996 Bail 29.2% Yes
1997 Cabinet confidentiality 47.2% Yes
1998 Amsterdam Treaty 56.2% Yes
1998 British-Irish Agreement 56.2% Yes
1999 Local government 51.1% Yes
2001 Death penalty 34.8% Yes
2001 International Criminal Court 34.8% Yes
2001 Treaty of Nice 34.8% No
2002 Protection of life in pregnancy 42.8% No
2002 Treaty of Nice 49.5% Yes
2004 Citizenship 59.9% Yes
2008 Lisbon Treaty 53.1% No
2009 Lisbon Treaty 59.0% Yes
2011 Judges' remuneration 55.9% Yes
2011 Oireachtas inquiries 55.9% No
2012 Stability EMU 50.60% Yes
2012 Children 33.50% Yes
2013 Seanad abolition 39.2% No
2013 Court of Appeal 39.2% Yes
2015 Marriage equality 60.5% Yes
2015 Age of eligibility to be President 60.5% No 

Note that the eligible voting population for Referendums is smaller in size than that for
General Elections as it excludes British citizens resident in Ireland. Yet turnout (REG) is
reported as a proportion of the number of voters on the electoral register for general
elections. Actual turnout is therefore always some percentage points higher than reported
turnout... volatility in the level of turnout at referendums which has ranged from 28.6%
in the 1979 referendum on adoption rights and Seanad university representation to
75.8% in the 1937 referendum on the Constitution.
 Research into the reasons for abstaining in referendums, considered in more detail in
Section 6, suggests that the perceived saliency and profile of the issue affects this
decision with referendums on moral issues and some European issues frequently
having higher turnout...

turnout is lowest amongst students and the
unemployed...In referendums, post-poll surveys undertaken by the Referendum Commission suggests
that non-voters are circumstantial or voluntary (lack of interest) in equal numbers. A
substantial category of non-voters also cite ‘lack of/insufficient understanding’ which, while
slightly different to ‘lack of interest,’ also fits into the intentional rather than the circumstantial
category. The Commission has found a direct relationship between the level of
understanding of the referendum proposal, and the propensity to vote...

<br>

### Investigating Overall Turnout
***

According to the article "Election Turnout in Ireland: measurement, trends and policy implications", published by the Oireachtas Library & Research Service in 2016, voter turnout is measured in two ways. Firstly, it can be measured as a percentage of all voters on the electoral register (REG), or it can be measured e as a percentage of the estimated voting-age population (VAP). Depending on the type of election, one measurement may be considered a more accurate reflection of voter turnout.  

Observations: steadily downward trend across turnout in both general and local elections.  

**Turnout at general elections**  
For general elections, I will be using the VAP data available where possible. VAP is considered a better method of measuring turnout in GEs, as the REG figure is inflated due to the inclusion of deceased people and/or duplicate records. Turnout at Irish GEs held in the last 30 years is listed below.  

1992 73.65%  
1997 67.38%  
2002 66.98%  
2007 68.89%  
2011 63.78%  
2016 58.04%      
2020 56.65%       

*Data Source:* [idea.int](https://www.idea.int/data-tools/country-view/143/40)  

In [2]:
# Calculating the mean turnout in Irish general elections held in the past 30 years
gen_turnout = (73.65, 67.38, 66.98, 68.89, 63.78, 69.28, 58.04, 56.65)
mean_gen_turnout = statistics.mean(gen_turnout)

print ("The mean voter turnout at Irish general elections in the last 30 years was", round(mean_gen_turnout,2), "%")

The mean voter turnout at Irish general elections in the last 30 years was 65.58 %


<br>

**Turnout at local elections**  
I'm going to use the REG figure when looking at turnout in local elections, as this is considered more accurate in this kind of election due to the larger number of people entitled to vote (ref Data oireachtais). Turnout at Irish local elections held in the last 30 years are listed below.  
 
1991 55.6%  
1999 50.2%  
2004 58.6%    
2009 57.8%  
2014 51.7%  
2019 50.2%  

*Data Source:* [data.oireachtas.ie](https://data.oireachtas.ie/ie/oireachtas/libraryResearch/2016/2016-01-28_l-rs-note-election-turnout-in-ireland-measurement-trends-and-policy-implications_en.pdf) & [rte.ie](https://www.rte.ie/news/elections-2019/results/#/local)  


In [3]:
# Calculating the average turnout at Irish local elections held in the past 30 years
loc_turnout = (55.6, 50.2, 58.6, 57.8, 51.7, 50.2)
ave_loc_turnout = statistics.mean(loc_turnout)
print ("The average voter turnout at Irish local elections in the last 30 years was", round(ave_loc_turnout,2),"%")

The average voter turnout at Irish local elections in the last 30 years was 54.02 %


<br>

**Turnout at Referendums**  

As was the case for General Elections, the VAP method of measuring turnout may be the better option for Referendums, as not everyone on the register is entitled to vote in them. Turnout at referendums that have taken place in the last 30 years are listed below.  
_Note:_ Referendums taking place on the same day are listed together.  

1992 Maastricht Treaty 59.75%*    
1992 Right to life of unborn, Right to travel & Right to information 70.65%*    
1995 Dissolution of marriage 64.35%*    
1996 Bail 31.65%*    
1997 Cabinet confidentiality 49.65%*    
1998 British-Irish Agreement & Amsterdam Treaty 58.65%*   
1999 Local government 53.35%*    
2001 Death penalty, International Criminal Court & Treaty of Nice 37.25%*    
2002 Protection of life in pregnancy 45.25%*    
2002 Treaty of Nice 51.95%*    
2004 Citizenship 62.35%*  
2008 Lisbon Treaty 55.55%*    
2009 Lisbon Treaty 61.45%*    
2011 Oir Inquiries & Judges' remuneration 60%    
2012 EU Stability 52.4%  
2012 Children 35.1%  
2013 Seanad & Court of Appeal 40.9%  
2015 Marriage Equality & Age of Eligibility for Election of President 64.3%  
2018 Blasphemy & Termination of Pregnancy 66.95%*  
2019 Diveorce 53.28%*  

*Note:* VAP figures were only available for the referenda that have taken place in the last 10 years. In order to estimate the VAP figure, I took the difference between the VAP & REG figures for the referenda that have taken place since 2011, and calculated the average difference between the two figures, which was VAP being 2.45% higher than the corresponding REG figure. This was then added to the REG figures for all other referenda that have taken place in the last 30 years to estimate the VAP figure (denoted by an asterix in the figures shown above).  

*Data Source:* Data Oireachtas, RTE & independent.ie

In [4]:
# Calculating the mean turnout at referendums that have taken place in the last 30 years
ref_turnout = (59.75, 70.65, 64.35, 31.65, 49.65, 58.65, 53.35, 37.25, 45.25, 51.95, 62.35, 55.55, 
                   61.45, 60, 52.4, 35.1, 40.9, 64.3, 66.95, 53.28)
mean_ref_turnout = statistics.mean(ref_turnout)
print ("The mean voter turnout at Irish referendums over the last 30 years was", round(mean_ref_turnout,2), "%")

The mean voter turnout at Irish referendums over the last 30 years was 53.74 %


<br>

**Turnout at European Elections**  

VAP is again considered the better method of measuring voter turnout at European elections. The VAP figure for turnout at European elections held in Ireland over the last 30 years is listed below.  

1994 47.92%  
1999 54.72%  
2004 62.52%  
2009 54.19%  
2014 47.45%  
2019 45.63%  

*Data Source:* [idea.int](https://www.idea.int/data-tools/country-view/143/40)  

In [5]:
# Calculating the average turnout at European elections held in the past 30 years
eu_turnout = (47.92, 54.72, 62.52, 54.19, 47.45, 45.63)
ave_eu_turnout = statistics.mean(eu_turnout)
print ("The average voter turnout at European elections in the last 30 years was", round(ave_eu_turnout,2),"%")

The average voter turnout at European elections in the last 30 years was 52.07 %


<br>

**Turnout at Presidential Elections**  

As the eligibility to vote in a Presidential election is the same as for a General Election, it would again be better to use the VAP method for measuring turnout here. Unfortunately this data was not available for the 1990 Presidential election. Therefore, the REG figure was used for the Presidential election held in that year, but VAP figures were used for the other three elections taking place in the last 30 years.   

1990 64%*    
1997 47.72%  
2011 50.72%  
2018 39.33%  

*Data Source:* [idea.int](https://www.idea.int/data-tools/country-view/143/40) & [thejournal.ie](https://www.thejournal.ie/low-turnout-presidential-election-4309871-Oct2018/)

In [6]:
# Calculating the average turnout at Irish Presidential elections held in the past 30 years
pres_turnout = (64, 47.72, 50.72, 39.33)
ave_pres_turnout = statistics.mean(pres_turnout)
print ("The average voter turnout at Irish Presidential elections in the last 30 years was", round(ave_pres_turnout,2),"%")

The average voter turnout at Irish Presidential elections in the last 30 years was 50.44 %


In [7]:
# Saving all of these figures as variables for use later
pvote_gen = (round(mean_gen_turnout,2) / 100)
pvote_loc = (round(ave_loc_turnout,2) / 100)
pvote_ref = (round(mean_ref_turnout,2) /100)
pvote_eu = (round(ave_eu_turnout,2) / 100)
pvote_pres = (round(ave_pres_turnout,2) / 100)

<br>

### Data for variable: voter participation in different age groups
***

**Distribution of Age Across Irish Population**  

The following breakdown of the Irish population was copied from indexmundi.com:  

0-14 years: 21.15% (male 560,338/female 534,570)  

15-24 years: 12.08% (male 316,239/female 308,872)  

25-54 years: 42.19% (male 1,098,058/female 1,085,794)  

55-64 years: 10.77% (male 278,836/female 278,498)  

65 years and over: 13.82% (male 331,772/female 383,592) (2020 est.)  

_Ref:_ IndexMundi, as of September 2021.

<br>

For this project, we are interested in the following figures:  

15-24 years: 625111  
25-54 years: 2183852  
55-64 years: 557334  
65 years and over: 715364  

<br>

As we are only interested in the population aged 18 and over, we need to remove any persons aged 15, 16 and 17 from the total figure in the 15-24 years age group (those born in 2004, 2005 & 2006).  

61,972 born in 2004  
61,372 born in 2005  
65,425 born in 2006  
Total to remove: 188769  
_Ref:_ CSO

<br>

Therefore, we will we working with the following figures:  

18-24 years: 436342  
25-54 years: 2183852  
55-64 years: 557334  
65 years and over: 715364  

In [8]:
# Creating some variables to represent these figures
age18_24 = 436342
age25_54 = 2183852
age55_64 = 557334
age65_up = 715364
total_register = age18_24 + age25_54 + age55_64 + age65_up
print ("The total number of Irish citizens eligible to vote is", total_register)

The total number of Irish citizens eligible to vote is 3892892


In [9]:
# Breaking down the probability of a randomly drawn citizen being from one of these age groups
p_age18_24 = age18_24 / total_register
p_age25_54 = age25_54 / total_register
p_age55_64 = age55_64 / total_register
p_age65_up = age65_up / total_register

In [10]:
# Creating an array of voter age groups using the probability calculated above
age_groups = np.random.choice(["18-24", "25-54", "55-64", "65+"], size=(100,), p=[p_age18_24, p_age25_54, p_age55_64, p_age65_up])

<br>

**Relationship between Age and Voter Participation**  

From my research on voter participation in both Ireland and other countries, it is clear that those in the younger age groups are less likely to vote, and the older a person gets, the more likely they are to vote. Unfortunately I was unable to find a comprehensive breakdown of voter participation across these different groups in Ireland. Instead, the sources I found tended to focus on the difference between voter turnout in the 18-25 age group versus the rest of the population, so I will need to estimate participation levels across the other groups based on data I found for other countries such as Britain and the USA. (REF census, british, tcd)

**Irish voter turnout in 18-25 age group**  
2002 53.3% vs. 76.3% across all ages  
2007 69.2% vs. 79.2%  
2011 75.4% vs. 89.7%  

Turnout since 2000 in 30 European countries among the 18-25 age category was 59% compared with 82% across all ages.  

*Data Source:* [data.oireachtas.ie](https://data.oireachtas.ie/ie/oireachtas/libraryResearch/2016/2016-01-28_l-rs-note-election-turnout-in-ireland-measurement-trends-and-policy-implications_en.pdf)  

<br>

**Data from Britain & USA**  

The following figure illustrates voter turnout amongst the different age groups in the 2015, 2017 & 2019 general elections in Britain:  

![](https://www.britishelectionstudy.com/wp-content/uploads/2021/01/turnoutBayesPlot-1.png)  
_Image Source:_ [britishelectionstudy.com](https://www.britishelectionstudy.com/wp-content/uploads/2021/01/turnoutBayesPlot-1.png)  

<br>

The following information regarding the 2020 Presidential Election in USA was copied directly from [census.gov](https://www.census.gov/library/stories/2021/04/record-high-turnout-in-2020-general-election.html):

_Voting rates were higher in 2020 than in 2016 across all age groups, with turnout by voters ages 18-34 increasing the most between elections:_  

- _For citizens ages 18-34, 57% voted in 2020, up from 49% in 2016_  
- _In the 35-64 age group, turnout was 69%, compared to 65% in 2016_  
- _In the 65 and older group, 74% voted in 2020, compared to 71% in 2016_  

<br>

**Conclusions Drawn on Voter Turnout in Different Age Groups**

From all of this information, we can see that voter participation seems to be increasing amongst the younger age groups but still remains lowest amongst those aged 18-24, and steadily increases as you move upwards through the different age cohorts.  
Now lets try and assign some values for the probability of those falling into the age groups participating in an election.  

In [11]:
# Calculating average voter participation amongst 18-24 years with data available
# Excluding US as parameters are different to data for other countries
ire_18_24 = (53.3 + 69.2 + 75.4) / 3
eu_18_24 = 59
uk_18_24 = (48 + 50 + 52) / 3

all_18_24 = (ire_18_24, eu_18_24, uk_18_24)
mean_18_24 = statistics.mean(all_18_24)

# Calculating average voter participation amongst 65+ age group, US again excluded
# Excluding US as parameters are different to data for other countries
uk_65plus = (80 + 82 + 76 + 81 + 82 + 81) / 6
usa_65plus = 74

all_65plus = (uk_65plus, usa_65plus)
mean_65plus = statistics.mean(all_65plus)

# Calculate the difference between these two averages
diff = round((mean_65plus - mean_18_24),2)

In [12]:
# Calculating mean for 25-54 year olds
mean_25_54 = (mean_18_24 + (diff / 4 )) # Divide the difference by 4 as there are 4 age groups

# Calculating mean for 55-64 year olds (as per calculation above)
mean_55_64 = (mean_25_54 + (diff / 4 ))

# Saving these figures in variables for probability to vote
pvote_1824 = (round(mean_18_24,2) / 100)
pvote_2554 = (round(mean_25_54,2) / 100)
pvote_5564 = (round(mean_55_64,2) / 100)
pvote_65plus = (round(mean_65plus,2) / 100)

In [13]:
# Create a list showing the impact of age on voter turnout using variables created above
age_impact = []

for i in age_groups:
    if i == "18-24":
        age_impact.append(round(pvote_1824,2))
    elif i == "25-54":
        age_impact.append(round(pvote_2554,2))
    elif i == "55-64":
        age_impact.append(round(pvote_5564,2))
    elif i == "65+":
        age_impact.append(round(pvote_5564,2))

In [14]:
# Creating a list to serve as index later on
survey_id = []

# Setting counter
count = 0
for i in age_groups:
    survey_id.append(count + 1)
    count += 1

<br>

**Creating a Dataframe for Age Group**  

In [15]:
df_agegroups = pd.DataFrame(list(zip(survey_id, age_groups, age_impact)), columns = ["Survey ID", "Age Group", "Age Impact"])
df_agegroups.head()

Unnamed: 0,Survey ID,Age Group,Age Impact
0,1,25-54,0.63
1,2,55-64,0.68
2,3,25-54,0.63
3,4,25-54,0.63
4,5,65+,0.68


<BR>

### Data for variable: type of election
***

We have five options for the different types of election - General Election, Local Election, Referendum, EUropean Election, Presidential Election.  

In the last 30 years there have been:  
7 x General Elections  
6 x Local Elections  
20 x Referendums (*note:* where more than one referendum was held on the same day, these were counted once)  
6 x European Elections  
4 x Presidential Elections  


We can clearly see that referendums are held far more frequently than any other kind of election, and above we saw that referendums & Presidential elections have a much lower turnout than the likes of general elections.  

In [16]:
# Calculating the probability of an election falling into each of these categories based on the last 30 years' data
gen, loc, ref, eu, pres = 7, 6, 20, 6, 4
elections = (gen + loc + ref + eu + pres)
prob_gen = gen / elections
prob_loc = loc / elections
prob_ref = ref / elections
prob_eu = eu / elections
prob_pres = pres / elections

In [17]:
# Lets create an array of election types using the probability discussed above
election_type = np.random.choice(["general", "local", "referendum", "european", "presidential"], size=(100), 
                     p=[prob_gen, prob_loc, prob_ref, prob_eu, prob_pres])

In [18]:
# Create a list showing the impact of election type on voter turnout using variables previously created
election_impact = []

for i in election_type:
    if i == "general":
        election_impact.append(round(pvote_gen,2))
    elif i == "local":
        election_impact.append(round(pvote_loc,2))
    elif i == "referendum":
        election_impact.append(round(pvote_ref,2))
    elif i == "european":
        election_impact.append(round(pvote_eu,2))
    elif i == "presidential":
        election_impact.append(round(pvote_pres,2))      

<br>

**Creating a Dataframe for Election Type**

In [19]:
df_electype = pd.DataFrame(list(zip(election_type, election_impact)), columns = ["Election Type", "Election Impact"])
df_electype.head()

Unnamed: 0,Election Type,Election Impact
0,european,0.52
1,referendum,0.54
2,referendum,0.54
3,local,0.54
4,referendum,0.54


<br>

### Data for Variable: Voter Location (Urban / Rural Population Split)
***

Rurally based voters are usually associated with a higher level of turnout than urban based voters (maynooth). This is particularly true for General Elections, and to an extent in Local Elections, while turnout at Referendums tends to be higher in urban areas (irish times). No significant difference is noted for Presidential or European Elections.  

The latest figures available show that the urban / rural population split in Ireland is heavily weighted on the urban side:  
Urban - 63.65%  
Rural - 36.35%  
_Data Source:_ [tradingeconomics.com](https://tradingeconomics.com/ireland/rural-population-percent-of-total-population-wb-data.html)  

<br>

**Urban and Rural Turnout in Local Elections**  

The following figures were taken from the 2016 article "Election Turnout in Ireland: measurement, trends and policy implications" (available [here](https://data.oireachtas.ie/ie/oireachtas/libraryResearch/2016/2016-01-28_l-rs-note-election-turnout-in-ireland-measurement-trends-and-policy-implications_en.pdf)):  

1999 National Turnout: 50.2%  
1999 Urban Turnout: 36%  
1999 Rural Turnout: 61.9%  

2004 National Turnout: 59.3%  
2004 Urban Turnout: 53%  
2004 Rural Turnout: 63.9%  

2009 National Turnout: 57.6%  
2009 Urban Turnout: 49.3%  
2009 Rural Turnout: 63.8%  

2014 National Turnout: 51.6%  
2009 Urban Turnout: 43%  
2009 Rural Turnout: 58.9%    

In [20]:
# Calculating the average difference in turnout between urban and rurally based voters
urb_rural_diff = (61.9-36),(63.9-53),(63.8-49.3),(58.9-43)

mean_geo_diff = statistics.mean(urb_rural_diff)
print ("The average difference in turnout between urban and rurally based voters is", 
           round(mean_geo_diff,2), "% in favour of rural voters")

The average difference in turnout between urban and rurally based voters is 16.8 % in favour of rural voters


In [21]:
# Using above figures to calculate a general national turnout figure
nat_turnout = (50.2, 59.3, 57.6, 51.6)

mean_turnout = statistics.mean(nat_turnout)

print("The average national turnout figure we will be using when looking at voter location is", 
         round(mean_turnout,2), "%")

The average national turnout figure we will be using when looking at voter location is 54.67 %


In [22]:
# Creating varibales for probability to vote in urban/rural areas for use later
pvote_urban = (mean_turnout - mean_geo_diff) / 100
pvote_rural = (mean_turnout  + mean_geo_diff) / 100

<br>

As we are only interested in two results here, we can use a binomial distribution to generate the voter locations using the probability of a voter being urban based being 63.65%, and rurally based being 36.35%.  

In [23]:
# Using numpy.random.binomial to generate voter locations
urban = 1
prob = (0.6365)

geo_list = []
geo_location = np.random.binomial(urban, p=prob, size=100)
for i in geo_location:
    if i == 0:
        geo_list.append("rural")
    else: 
        geo_list.append("urban")

In [24]:
# Confirming the number of entries in the geo_list
len(geo_list)

100

In [25]:
# I could also generate an array of voter locations using random.choice
# np.random.choice(["urban", "rural"], size=(100,), p=[0.6365,0.3635])

In [26]:
# Create a list showing the impact of location on voter turnout using variables created above
location_impact = []

for i in geo_list:
    if i == "urban":
        location_impact.append(round(pvote_urban,2))
    else:
        location_impact.append(round(pvote_rural,2))    

<br>

**Creating a Dataframe for Voter Location**

In [27]:
df_voterloc = pd.DataFrame(list(zip(geo_list, location_impact)), columns = ["Location", "Location Impact"])
df_voterloc.head()

Unnamed: 0,Location,Location Impact
0,rural,0.71
1,rural,0.71
2,rural,0.71
3,urban,0.38
4,rural,0.71


<br>

### Combining the Dataframes
***

In [28]:
df_all_data = pd.concat([df_agegroups, df_voterloc, df_electype], axis=1)

In [29]:
# Adding another column for Probability Voted, which will be the sum of the 3 impact columns divided by 3
df_all_data["Probability Voted"] =  (df_all_data[["Age Impact", "Location Impact", "Election Impact"]].sum(axis=1)) / 3

df_all_data.head()

Unnamed: 0,Survey ID,Age Group,Age Impact,Location,Location Impact,Election Type,Election Impact,Probability Voted
0,1,25-54,0.63,rural,0.71,european,0.52,0.62
1,2,55-64,0.68,rural,0.71,referendum,0.54,0.643333
2,3,25-54,0.63,rural,0.71,referendum,0.54,0.626667
3,4,25-54,0.63,urban,0.38,local,0.54,0.516667
4,5,65+,0.68,rural,0.71,referendum,0.54,0.643333


In [30]:
df_all_data.describe()

Unnamed: 0,Survey ID,Age Impact,Location Impact,Election Impact,Probability Voted
count,100.0,100.0,100.0,100.0,100.0
mean,50.5,0.6445,0.5054,0.5612,0.570367
std,29.011492,0.031217,0.160984,0.053997,0.06025
min,1.0,0.58,0.38,0.5,0.5
25%,25.75,0.63,0.38,0.54,0.516667
50%,50.5,0.63,0.38,0.54,0.545
75%,75.25,0.68,0.71,0.54,0.626667
max,100.0,0.68,0.71,0.66,0.683333


In [31]:
# Converting this column into a numpy array so I can perform some random operations on it
prob_array = df_all_data[["Probability Voted"]].to_numpy() 

# Setting variables for voted and abstained & creating a list for use in random function below
voted = 1
abstained = 0
voters = []

# Using random.binomial again as I can only have 2 outcomes, with p = probability in array above
for i in prob_array:
    voters.append(np.random.binomial(voted, p=i, size=1))

# Making final list with string values based on these values
voted = []
for i in voters:
    if i == 0:
        voted.append("abstained")
    else: 
        voted.append("voted")

In [32]:
# Adding this new list into a dataframe
df_voted = pd.DataFrame(voted, columns = ["Voted/Abstained"])

df_new = pd.concat([df_all_data, df_voted], axis=1)

<br>

### Simulating the Final Dataset
***

In [33]:
df_dataset = df_new[["Survey ID", "Age Group", "Location", "Election Type", "Voted/Abstained"]]

# Setting Survey ID as the index
df_dataset.set_index(["Survey ID"], inplace = True)

df_dataset

Unnamed: 0_level_0,Age Group,Location,Election Type,Voted/Abstained
Survey ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,25-54,rural,european,abstained
2,55-64,rural,referendum,voted
3,25-54,rural,referendum,abstained
4,25-54,urban,local,voted
5,65+,rural,referendum,voted
...,...,...,...,...
96,25-54,urban,european,abstained
97,25-54,urban,referendum,abstained
98,25-54,rural,referendum,voted
99,55-64,urban,referendum,abstained


<br>

### References & Data Sources  
***


[] https://adriankavanaghelections.org/  

[] https://www.britishelectionstudy.com/bes-findings/age-and-voting-behaviour-at-the-2019-general-election/#.Yc8q3mjP02w  

[] https://www.census.gov/library/stories/2021/04/record-high-turnout-in-2020-general-election.html  

[] https://www.cso.ie/en/csolatestnews/pressreleases/2006pressreleases/reportonvitalstatistics2004/  

[] https://www.cso.ie/en/csolatestnews/pressreleases/2008pressreleases/reportonvitalstatistics2005/  

[] https://www.cso.ie/en/csolatestnews/pressreleases/2009pressreleases/reportonvitalstatistics2006/  

[] https://www.cso.ie/en/qnhs/qnhsmethodology/voterregistrationandparticipationmodule/

[] https://data.oireachtas.ie/ie/oireachtas/libraryResearch/2016/2016-01-28_l-rs-note-election-turnout-in-ireland-measurement-trends-and-policy-implications_en.pdf

[] https://www.europarl.europa.eu/election-results-2019/en/turnout/  

[] https://www.europeanmovement.ie/irish-general-election-february-2020/  

[] https://www.fairvote.org/what_affects_voter_turnout_rates

[] https://www.geeksforgeeks.org/create-a-pandas-dataframe-from-lists/  

[] https://www.idea.int/data-tools/country-view/143/40  

[] https://www.independent.ie/irish-news/abortion-referendum/abortion-referendum-turn-out-is-third-highest-ever-for-a-referendum-in-ireland-36949137.html  

[] https://www.indexmundi.com/ireland/demographics_profile.html  

[] https://www.ipa.ie/_fileUpload/Documents/LA_Times_Summer_2019.pdf  

[] https://www.irishtimes.com/news/politics/urban-rural-divide-among-voters-made-clear-by-turnout-figures-1.1553322  

[] https://www.maynoothuniversity.ie/research/spotlight-research/getting-out-vote-what-influences-voter-turnout

[] https://numpy.org/doc/stable/reference/random/generated/numpy.random.binomial.html  

[] https://ourworldindata.org/age-structure  

[] https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html  

[] https://python-course.eu/numerical-programming/weighted-probabilities.php  

[] https://pythonexamples.org/pandas-set-column-as-index/#2  

[] https://www.refcom.ie/previous-referendums/  

[] https://www.researchgate.net/publication/228215776_What_Affects_Voter_Turnout

[] https://www.researchgate.net/publication/253829692_The_geography_of_Irish_voter_turnout_A_case_study_of_the_2002_general_election  

[] https://www.rte.ie/brainstorm/2018/1019/1005247-voter-turnout-elections-ireland/  

[] https://www.rte.ie/news/2019/0526/1051741-divorce-referendum/  

[] https://www.rte.ie/news/elections-2019/results/#/local  

[] https://stackoverflow.com/questions/34023918/make-new-column-in-panda-dataframe-by-adding-values-from-other-columns/42634214  

[] https://www.statista.com/statistics/710767/irish-population-by-age/  

[] https://www.tcd.ie/Political_Science/people/michael_gallagher/Election2016.php  

[] https://towardsdatascience.com/bernoulli-and-binomial-random-variables-d0698288dd36  

[] https://tradingeconomics.com/ireland/rural-population-percent-of-total-population-wb-data.html  

[] https://www.thejournal.ie/low-turnout-presidential-election-4309871-Oct2018/  

[] https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8  

<br>

# End
***