# Automatic collection and processing of data on the results of the largest federal elections in the Russian Federation for 2004-2020.
## The data collection part

Website of the Central Election Commission (CEC) for data collection (may not be available from some countries): http://www.vybory.izbirkom.ru/region/izbirkom

## All-Russian voting on amendments to the Russian Constitution

In this section I will collect extended data on the results of voting on amendments to the Russian Constitution in 2020. Most of the code for parsing has been put into separate functions, which can be found in the `parsing_tools` module.

Please note that the code should run slowly to mimic the behavior of a real user on the Central Election Commission website and not be recognized as a robot.

In [None]:
# version specifications as well as all dependencies can be found in the requirements.txt file
# !pip install -r requirements.txt

In [1]:
# import dependencies
import pandas as pd
import requests
import datetime as dt

from bs4 import BeautifulSoup
from time import sleep
from lxml import etree

from selenium import webdriver as wb
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

from IPython.display import clear_output

### Collect links to results by region
First, let's launch the Google Chrome browser in automatic mode:

In [2]:
# Pass to path_to_chromedriver the path 
# To the chromedriver file on your computer

# This file, which is necessary to start the browser in automatic mode, 
# can be downloaded at the following link:
# https://googlechromelabs.github.io/chrome-for-testing/
path_to_chromedriver = '<your-path-to-chromedriver>'

In [20]:
# Start Google Chrome in automatic mode
s = Service(path_to_chromedriver)
br = wb.Chrome(service=s)

Let's collect links to voting results for regional election commissions (as well as commissions abroad of the Russian Federation):

In [4]:
# Function to search for links to results by region
from parsing_tools.parse_constitution import find_regional_links

In [28]:
links = find_regional_links(br)

[INFO] 87 links found


### Collecting electoral statistics
Let's open each of the 87 links and collect information about voting results from there:

In [6]:
# Function for parsing results from a single region page
from parsing_tools.parse_constitution import parse_one_regional_page

In [78]:
# Collecting information by region
for i, link in enumerate(links):
    print(f'[INFO] Analysis of the region no. {i + 1} out of {len(links)}')
    # In this loop, the wait for code execution will be long, 
    # to avoid blocking and captcha
    df = parse_one_regional_page(link)

    # On the first iteration we'll create a stacked_df
    # Next we'll concatenate it with the new result
    if i == 0:
        stacked_df = df.copy()
    else:
        stacked_df = pd.concat([stacked_df, df])

    clear_output()

# Let's reset the indices to 0 to len(stacked_df) - 1
stacked_df = stacked_df.reset_index(drop=True)
# Save the collected data
stacked_df.to_csv('data/data-raw/constitution_2020_data_raw.csv')

Successfully collected amendment voting data for all available territorial election commissions (TECs)! The processing and analysis of this data will be presented in the file `Data-processing-and-analysis.ipynb`.

## Russian Presidential Elections 2004-2018
In this section we will collect extended data on the results of four presidential elections in Russia: in 2004, 2008, 2012 and 2018 for all available regions and TECs within them. 

Most of the parsing code in this section has also been put into separate functions, which can be found in the `parsing_tools` module.
A separate function has been written to handle each of the 4 votes, as the structure of data storage on the site differs from election to election, but is identical within a single vote. 

Note that the code must run slowly to mimic the behavior of a real user on the CEC site and not be recognized as a robot.

### Collect links to results by region
First, let's launch the Google Chrome browser in automatic mode:

In [3]:
# Start Google Chrome in automatic mode
s = Service(path_to_chromedriver)
br = wb.Chrome(service=s)

Let's collect links to voting results for regional election commissions (as well as commissions abroad of the Russian Federation):

In [5]:
from parsing_tools.parse_presidential import find_presidential_elections_links

In [32]:
region_links_all_years = find_presidential_elections_links(br, s)

[INFO] Found a link to the presidential election: http://www.vybory.izbirkom.ru/region/izbirkom?action=show&global=1&vrn=1001000882950&region=0&prver=0&pronetvd=null
[INFO] Found a link to the presidential election: http://www.vybory.izbirkom.ru/region/izbirkom?action=show&global=1&vrn=100100022176412&region=0&prver=0&pronetvd=null
[INFO] Found a link to the presidential election: http://www.vybory.izbirkom.ru/region/izbirkom?action=show&global=1&vrn=100100031793505&region=0&prver=0&pronetvd=null
[INFO] Found a link to the presidential election: http://www.vybory.izbirkom.ru/region/izbirkom?action=show&global=1&vrn=100100084849062&region=0&prver=0&pronetvd=null
[INFO] 91 links were found for 2004
[INFO] 85 links were found for 2004
[INFO] 85 links were found for 2004
[INFO] 87 links were found for 2004


### Collecting electoral statistics for each of the 4 polls
First, let's define a logger using Python's built-in [`logging`](https://docs.python.org/3/library/logging.html) library.
It will be needed to track and save errors that occur during the processing of a large amount of data, since doing it through the `print()` function will be irrational in this case.

In [5]:
import logging

# Define the logger 
# Keep the default logging levels: errors and critical (levels 4 and 5 respectively)
logger = logging.getLogger("high_level_log")

# Define a different handler for each logger to write the logs to a file
handler_for_logger = logging.FileHandler("errors.log", mode='w')

# Define a formatter for each logger to standardize the output format
# Based on docs: https://docs.python.org/3/library/logging.html#logrecord-attributes
# Format: date, time, logger name, error level, comment (message)
handler_for_logger.setFormatter(logging.Formatter(fmt="%(asctime)s %(name)s %(levelname)s %(message)s"))

# Adding handlers to loggers
logger.addHandler(handler_for_logger)

Let's collect the results for all the elections under consideration in turn, using separate functions:

In [6]:
from parsing_tools.parse_presidential import parse_2004_presidential_elections
from parsing_tools.parse_presidential import parse_2008_presidential_elections
from parsing_tools.parse_presidential import parse_2012_presidential_elections
from parsing_tools.parse_presidential import parse_2018_presidential_elections

In [16]:
# Go through all elections and each region, collect data
# In the region_links_all_years list, the 4 sets are in chronological
# order of elections (see the function for collecting links)
for i, link in enumerate(region_links_all_years):

    if i == 0:
        # Election 2004
        # Collecting data
        df_2004 = parse_2004_presidential_elections(s, region_links_all_years[i], logger)
        # Reset the index
        df_2004 = df_2004.reset_index(drop=True)
        # Save data for 2004
        df_2004.to_csv('data/data-raw/presidential_2004_data_raw.csv')
    elif i == 1:
        # Election 2008
        df_2008 = parse_2008_presidential_elections(s, region_links_all_years[i], logger)
        # Reset the index
        df_2008 = df_2008.reset_index(drop=True)
        # Save data for 2008
        df_2008.to_csv('data/data-raw/presidential_2008_data_raw.csv')
    elif i == 2:
        # Election 2012
        df_2012 = parse_2012_presidential_elections(s, region_links_all_years[i], logger)
        # Reset the index
        df_2012 = df_2012.reset_index(drop=True)
        # Save data for 2012
        df_2012.to_csv('data/data-raw/presidential_2012_data_raw.csv')
    elif i == 3:
        # Election 2018
        df_2018 = parse_2018_presidential_elections(s, region_links_all_years[i], logger)
        # Reset the index
        df_2018 = df_2018.reset_index(drop=True)
        # Save data for 2018
        df_2018.to_csv('data/data-raw/presidential_2018_data_raw.csv')

We have successfully collected data on Russian presidential elections from 2004-2018! Processing and analysis of these data will be presented in the file `Data-processing-and-analysis.ipynb`.

### Analyzing errors in the `errors.log` file

We can see that the logger has recorded in the error file the information that the parser failed to collect data on the results of the 2004 presidential election in 6 regions:
* Nizhny Novgorod region
* Moscow city
* Yaroslavl region
* Kursk region
* Rostov region
* Murmansk region

However, this is not an error of my program: if you follow the necessary links and look at the results of the 2004 presidential elections in these regions, you will notice that there are really no results for these regions, although the card for the elections is entered. I should note that this would be very difficult to identify without using the logger!

In the user interface (see below), where there should be a link to the tables with results, there is an empty space:

<img src="https://www.dropbox.com/scl/fi/clknzma1qlraewz67xcjx/City-of-Moscow.png?rlkey=egkdwdas5pqi1g8fxd5jqzxx1&dl=1" align="left" width="40%" style="margin-left: 20px; margin-bottom: 5px; margin-top: 5px; margin-right: 10px; clear: right">

<img src="https://www.dropbox.com/scl/fi/dzuqenugak1gyshiwi9ef/Kursk-region.png?rlkey=9vif5uquiw1rte0mvqv3qtto0&dl=1" align="left" width="40%" style="margin-left: 20px; margin-bottom: 5px; margin-top: 5px; margin-right: 10px; clear: right">

<img src="https://www.dropbox.com/scl/fi/bxnvbz79afxthesb1jmzb/Murmansk-region.png?rlkey=fcf4iqnz9ivpv6hu6o5925a8w&dl=1" align="left" width="40%" style="margin-left: 20px; margin-bottom: 5px; margin-top: 5px; margin-right: 10px; clear: right">

<img src="https://www.dropbox.com/scl/fi/3zwti35eqxfpxy95zhgdu/Nizhny-Novgorod-region.png?rlkey=pymzp1skt99u630oxvv48d1yk&dl=1" align="left" width="40%" style="margin-left: 20px; margin-bottom: 5px; margin-top: 5px; margin-right: 10px; clear: right">

<img src="https://www.dropbox.com/scl/fi/zs9t15qdyuhx782td9afs/Rostov-region.png?rlkey=47nq83mhpf0ao7r0nh6q1c5st&dl=1" align="left" width="40%" style="margin-left: 20px; margin-bottom: 5px; margin-top: 5px; margin-right: 10px; clear: right">

<img src="https://www.dropbox.com/scl/fi/05q98rgvy62rbdcguwwvn/Yaroslavl-region.png?rlkey=9gf8ysmkwrrgk6043wjf19er1&dl=1" align="left" width="40%" style="margin-left: 20px; margin-bottom: 5px; margin-top: 5px; margin-right: 10px; clear: right">






