<h1>MODELLING SWISS VOTATION OUTCOMES</h1>
<h3>Coursera Capstone Report for the IBM Professional Certificate in Data Science</h3><br>
Juliane Klatt
<hr>
<h2>Table of Contents</h2>
<ol>
    <li><a href="#intro">Introduction</a></li>
    <li><a href="#data">Data</a></li>
        <ol>
            <li><a href="#pleb">Swiss Plebiscite Data</a></li>
            <li><a href="#mun">List of Swiss Municipalities</a></li>
            <li><a href="#four">Foursquare Location Data</a></li>
        </ol>
    <li><a href="#meth">Methodology</a></li>
    <li><a href="#ana">Analysis</a></li>
    <li><a href="#res">Results</a></li>
    <li><a href="#con">Conclusion</a></li>
    <li><a href="#ref">References</a></li>
</ol>

<a id='intro'></a>
<hr>
<h2>1. Introduction</h2>

<p>In Switzerland, plebiscites at federal, cantonal, and municipal level are a central feature of political life. It is not the government's choice whether or when a referendum is held, but it is a legal procedure regulated by the Swiss constitution. There are two types of referendums: optional and mandatory referendums. Any federal law, certain other federal resolutions, and international treaties that are ongoing in nature, or any change to Swiss law may be subject to an optional referendum. Such referendum takes place if at least 50,000 people or eight cantons have petitioned to do so within 100 days. Mandatory referendums take place on any amendment to the constitution and on any joining of a multinational community or organization for collective security. Constitutional amendments are proposed by the parliament or by the cantons or by federal popular initiative.</p>
<p>In this project we will investigate whether canton-level outcomes of Swiss federal plebiscites are correlated with the type and quantity of venues present in a canton. Once correlations have been identified, several classification models will be built and compared in their ability to correctly predict swiss votation outcomes by canton based on the canton's venue characteristics.</p>

<a id='data'></a>
<hr>
<h2>2. Data</h2>
<p>
In order to meet the above objectives, several data sources are required. We need to infer canton-level outcomes of past referencdums, and we need list of venues per canton.
<ul>
    <li><a href="https://www.bk.admin.ch/bk/de/home/politische-rechte/volksabstimmungen.html">Swiss plebiscite data</a><br>
        Swiss authorities provide the canton-level outcomes of federal plebiscites since 1848
    </li>
    <li><a href="https://en.wikipedia.org/wiki/List_of_municipalities_of_Switzerland">List of Swiss municipalities</a><br>
        Wikipedia provides a list of Swiss municipalities by canton 
    </li>
    <li><a href="https://foursquare.com/">Foursquare location data</a><br>
        Foursquare provides venue data for municipalities of interest
    </li>
</ul>
In the following we will compile the aborementioned data in dataframes and quickly characterize their properties.
</p>

<a id='pleb'></a>
<h3>2.A. Swiss Plebiscite Data</h3>
<p>The canton-level outcomes of past Swiss federal plebiscites since 1848 are published by Swiss authorities on <a href="https://www.bk.admin.ch/ch/d/pore/va/vab_2_2_4_1.html">www.bk.admin.ch</a>. We will pull websites by means of the <b>requests</b> package and scrape their respective content employing the <b>BeautifulSoup</b> package.

In [1]:
import requests                                                                                           # importing the requests package for pulling html documents from links
from bs4 import BeautifulSoup as bs                                                                       # importing the beautifulsoup package for scraping web content
                                                                                                          #
plebiscite_link = 'https://www.bk.admin.ch/ch/d/pore/va/vab_2_2_4_1_gesamt.html'                          # the link to the overview of all federal plebiscites since 1848
plebiscite_doc  = requests.get(plebiscite_link).text                                                      # pulling the html document of pointed to by the link
plebiscite_soup = bs(plebiscite_doc,'html.parser')                                                        # parsing the html document into a xml-like tree structure
plebiscite_rows = plebiscite_soup.find('table').findAll('tr')[1:]                                         # searching the tree for the table of plebiscites and storing all but its 1st row in a list

In [2]:
from tqdm import tqdm                                                                                     # importing tqdm package in order to display progess bars for loops
import pandas as pd                                                                                       # importing pandas package in order to be able to deal with dataframes
                                                                                                          #
def rep(s):                                                                                               # defining a function which corrects for ill-decoded umlauts and other non-standard characters
    t = s.replace('Ã¼','ü').replace('Ã¶','ö').replace('Ã¤','ä').replace("Â«","'").replace("Â»","'")       #
    return t                                                                                              #
                                                                                                          #
plebiscite_df = pd.DataFrame(index=range(1000),columns=['date','title'])                                  # allocating data frame for plebiscites with a 1000 rows, i.e., room for 1000 referendums
i             = 0                                                                                         # initiate counter keeping track of dataframe index
previous_link = ''                                                                                        # initiate link object
for r in tqdm(plebiscite_rows):                                                                           # loop through all referendums
    r_link  = 'https://www.bk.admin.ch/ch/d/pore/va/'+r.findAll('a',href=True)[0]['href']                 # construct link to overview of all referendums on a given date
    if  r_link != previous_link:                                                                          # check whether the link has been visited before (happens if multiple referendums on same day)
        r_doc   = requests.get(r_link).text                                                               # pulling the html document of pointed to by the link
        r_soup  = bs(r_doc,'html.parser')                                                                 # parsing the html document into a xml-like tree structure
        if len(r_soup.findAll('article')) != 0:                                                           # check whether page contains overview of referendums on that date
            for href in r_soup.findAll('article')[0].findAll('a',href=True):                              # loop through all link extensions on the page
                if href['href'].startswith("./can"):                                                      # checking whether link extension points to canton-level results of a referendum
                    p_link  = r_link[:-11]+href['href'][1:]                                               # construct link to canton-level results
                    p_doc   = requests.get(p_link).text                                                   # pulling the html document of pointed to by the link
                    p_soup  = bs(p_doc,'html.parser')                                                     # parsing the html document into a xml-like tree structure
                    entries = p_soup.find('table').find('tbody').findAll('tr')                            # searching the tree for the table of cantons and storing all canton entries in a list
                    for e in entries[:-1]:                                                                # looping through all but last row, because last row list swiss total not canton
                        canton = e.findAll('td')[0].string.replace('Ã¼','ü')                              # name of canton extracted from 0th column
                        yes    = int(e.findAll('td')[4].string.replace("'",''))                           # number of yes votes extracted from 4th column, 1000 marker "'" has to be removed
                        no     = int(e.findAll('td')[5].string.replace("'",''))                           # number of no votes extracted from 5th column of the results table 
                        if yes > no : vote = 1                                                            # if more yes than no votes, then 1 for affirmative result
                        else:         vote = 0                                                            # else 0 for non-affirmative result
                        if canton not in plebiscite_df.columns: plebiscite_df[canton] = ''                # add canton column to dataframe if not yet existent
                        plebiscite_df.loc[i,canton] = vote                                                # canton column of current referendum (indexed i) is filled with canton's result
                    plebiscite_df.loc[i,'date']     = r.findAll('a',href=True)[0].string                  # after all canton results have been entered, record date of the referendum in date column
                    plebiscite_df.loc[i,'title']    = rep(p_soup.findAll('h3')[0].string)                 # after all canton results have been entered, record title of the referendum in title column
                    i += 1                                                                                # increasing index by 1 such that next referendum results are stored in next row
        previous_link = r_link                                                                            # store link to overview of all referendums on a given date to be able to compare to it
plebiscite_df = plebiscite_df[:i].drop('MilitÃ¤rschulen',axis=1).drop_duplicates().reset_index(drop=True) # dropping duplicate entries and the military school votes since they do not correspond to any canton
print("Dataframe of "+str(plebiscite_df.shape[0])+" referendums successfuly constructed!")                #

100%|██████████| 638/638 [14:21<00:00,  1.35s/it]

Dataframe of 607 referendums successfuly constructed!





In [3]:
plebiscite_df.head()

Unnamed: 0,date,title,Zürich,Bern,Luzern,Uri,Schwyz,Obwalden,Nidwalden,Glarus,...,St. Gallen,Graubünden,Aargau,Thurgau,Tessin,Waadt,Wallis,Neuenburg,Genf,Jura
0,12.05.1872,Totalrevision,1,1,0,0,0,0,0,1,...,1,0,1,1,0,0,0,0,0,
1,19.04.1874,Totalrevision,1,1,0,0,0,0,0,1,...,1,1,1,1,0,1,0,1,1,
2,23.05.1875,Bundesgesez betreffend Feststellung und Beurku...,1,1,0,0,0,0,0,1,...,0,0,1,1,0,0,0,1,1,
3,23.05.1875,Bundesgesez über die politische Stimmberechtig...,1,1,0,0,0,0,0,1,...,0,0,0,1,0,0,0,1,1,
4,23.04.1876,Bundesgesez über die Ausgabe und Einlösung von...,1,0,0,0,1,0,0,1,...,0,0,0,0,0,1,0,0,0,


<a id='mun'></a>
<h3>2.B. Swiss Municipalities by Canton and Georeferences</h3>
<p>As mentioned in the introduction, we want to investigate correlations between the above canton-level votation outcomes and the various venue profiles of Swiss cantons. In order to infer a canton's venues by means of the Foursquare API, we need a more granular grid of georeferences within the cantons. To that end, we look up all 2'551 Swiss municipalities from <a href="https://en.wikipedia.org/wiki/List_of_municipalities_of_Switzerland">Wikipedia</a>. A municipality's venues may then be infered via Foursquare and by associating the municipalities with their respective canton, the venue profiles of the cantons can be built.</p>

In [6]:
municipality_link = 'https://en.wikipedia.org/wiki/List_of_municipalities_of_Switzerland'                 # the link to the wikipedia page on Swiss municipalities
municipality_doc  = requests.get(municipality_link).text                                                  # pulling the html document of pointed to by the link
municipality_soup = bs(municipality_doc,'html.parser')                                                    # parsing the html document into a xml-like tree structure
municipality_rows = municipality_soup.find('table').findAll('tr')[1:]                                     # searching the tree for the table of municipalities and storing all but its first row in a list

In [20]:
municipality_df = pd.DataFrame(index=range(len(municipality_rows)),columns=['canton','municipality'])     #
                                                                                                          #
for i in tqdm(municipality_df.index):                                                                     #
    municipality_df.loc[i,'municipality'] = municipality_rows[i].findAll('td')[0].string                  #
    municipality_df.loc[i,'canton']       = municipality_rows[i].findAll('td')[1].string                  #
                                                                                                          #
municipality_df = municipality_df.sort_values(['canton','municipality']).reset_index(drop=True)           #
print("Dataframe of "+str(municipality_df.shape[0])+" municipalities successfuly constructed!")

100%|██████████| 2551/2551 [00:02<00:00, 1045.04it/s]

Dataframe of 2551 municipalities successfuly constructed!





In [21]:
municipality_df.head()

Unnamed: 0,canton,municipality
0,Aargau,Aarau
1,Aargau,Aarburg
2,Aargau,Abtwil
3,Aargau,Ammerswil
4,Aargau,Aristau


<a id='four'></a>
<h3>2.C. Foursquare Location Data</h3>
<a href="https://foursquare.com/">Source</a>

<a id='meth'></a>
<hr>
<h2>3. Methodology</h2>

<a id='ana'></a>
<hr>
<h2>4. Analysis</h2>

<a id='res'></a>
<hr>
<h2>5. Results</h2>

<a id='con'></a>
<hr>
<h2>6. Conclusion</h2>

<a id='ref'></a>
<hr>
<h2>7. References</h2>