Data QA Case: ``mggg-states``
========================

Below are the steps involved in performing automated data quality checks on ``mggg-states`` data. 

*Note:* the automated checks are not completely exhaustive and further manual checks are required.

Step 0. Setup
----------------

In [1]:
# !pip3 install numpy
# !pip3 install pandas
# !pip3 install geopandas
# !pip3 install wikipedia

# !pip3 install git+https://github.com/KeiferC/gdutils.git

In [2]:
import numpy as np
import pandas as pd
import geopandas as gpd
import json
import wikipedia
import os
import re

import gdutils.datamine as dm
import gdutils.dataqa as dq
import gdutils.extract as et

from typing import Any, List, Tuple, Dict, Hashable, Union, NoReturn

Step 1. Data collection
---------------------------

In [3]:
state_names = [
    'Alabama', 'Alaska','Arizona', 'Arkansas', 'California', 
    'Colorado', 'Connecticut', 'Delaware',  'Florida', 'Georgia', 
    'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 
    'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 
    'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 
    'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 
    'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 
    'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 
    'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 
    'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']

state_abbreviations = [
    'AL', 'AK', 'AZ', 'AR', 'CA', 
    'CO', 'CT', 'DE', 'FL', 'GA', 
    'HI', 'ID', 'IL', 'IN', 'IA', 
    'KS', 'KY', 'LA', 'ME', 'MD', 
    'MA', 'MI', 'MN', 'MS', 'MO', 
    'MT', 'NE', 'NV', 'NH', 'NJ', 
    'NM', 'NY', 'NC', 'ND', 'OH', 
    'OK', 'OR', 'PA', 'RI', 'SC', 
    'SD', 'TN', 'TX', 'UT', 'VT', 
    'VA', 'WA', 'WV', 'WI', 'WY']

states = list(zip(state_names, state_abbreviations))

__Step 1.1.__ Gather ``mggg-states`` data

In [4]:
# commented out to save local space and time
# # this will take some time to complete
# dm.clone_gh_repos(account='mggg-states', account_type='orgs', 
#                   outpath=os.path.join('src', 'mggg'))

In [5]:
mggg_gdfs = {}

# stores extracted GeoDataFrames in a dictionary where the keys are the filenames from
# which the GeoDataFrames were found
for filepath in dm.list_files_of_type('.zip', os.path.join('src', 'mggg')):
      mggg_gdfs[os.path.basename(filepath)[:-4]] = et.read_file(filepath).extract()

__Step 1.2.__ Gather MEDSL data for comparison purposes

In [6]:
# Print available MEDSL data to select applicable datasets

# # commented out to save local space and time
# print('{:27} : {}'.format('Repo Name', 'Repo URL'))
# print('------------------------------------------------------------------')
# for (repo, url) in dm.list_gh_repos(account='MEDSL', account_type='orgs'):
#     print("{:27} : {}".format(repo, url))

In [7]:
medsl_repos = ['official-precinct-returns', # precinct-level 2016 election results
               '2018-elections-official']   # constituency-level 2018 election results

# commented out to save local space and time
# # this will take some time to complete
# dm.clone_gh_repos(account='MEDSL', account_type='orgs', repos=medsl_repos, 
#                   outpath=os.path.join('src', 'medsl'))

In [8]:
medsl_dfs = {}

# stores extracted GeoDataFrames in a dictionary where the keys are the filenames from
# which the GeoDataFrames were found
for filepath in dm.list_files_of_type('.zip', os.path.join('src', 'medsl')):
    medsl_dfs[os.path.basename(filepath)[:-4]] = et.read_file(filepath).extract()

__Step 1.3.__ Gather Wikipedia data for comparison purposes

In [9]:
# Generate wikipedia page titles

pres_election = ('PRES', 'United States presidential election')
fed_elections = [('SEN',  'United States Senate election'),
                 ('USH',  'United States House of Representatives election')]
election_years_to_check = [2016, 2017, 2018]

wiki_titles = []
for yr in election_years_to_check:
    generate_key = lambda yr, ekey, st_abv: ekey + str(yr % 100) + '_' + st_abv
    generate_title = lambda yr, etype, st: str(yr) + ' ' + etype + ' in ' + st
    
    if yr % 4 == 0:
        [wiki_titles.append((generate_key(yr, pres_election[0], st_abv),
                             generate_title(yr, pres_election[1], st)))
         for st, st_abv in states]
        
    [wiki_titles.append((generate_key(yr, ekey, st_abv),
                         generate_title(yr, etype, st)))
     for ekey, etype in fed_elections
     for st, st_abv in states]

In [10]:
# Gather wikipedia page URLs from wiki page titles

wiki_urls = {}
for wiki_title in wiki_titles:
    key, title = wiki_title
    
    try:
        url = wikipedia.page(title=title).url

        if set(title.split(' ')).issubset(
                set(re.findall('[a-zA-Z0-9]+', url))):
            wiki_urls[key] = (title, url)
            
    except Exception:
        continue # it's okay to not find a page

In [None]:
# Print retrieved page URLs
# Necessary for manually verifying URL-to-election mapping since
# Wikipedia API tries to find best match, not the exact match

# commented out to save screen space
# for wiki_key in wiki_urls:
#     title, url = wiki_urls[wiki_key]
#     print('{:9} : {}\n\t{}'.format(wiki_key, title, url))

In [12]:
# Gather wikipedia tabular election results

wiki_tables = {}
for wiki_key in wiki_urls:
    try:
        wiki_tables[wiki_key] = pd.read_html(wiki_urls[wiki_key][1])
    except Exception as e:
        print("Unable to gather Wikipedia tabular data:", e)

In [13]:
# Display wikipedia tabular election data
# Necessary for finding applicable table because a page can 
# contain multiple nameless tables whose orders differ from
# other pages

def print_wiki_tables(key):
    for wiki in wiki_tables:
        if wiki.startswith(key):
            print('================================================')
            print('Wiki: {} '.format(wiki))
            print('================================================')

            for i in range(len(wiki_tables[wiki])):
                print('TABLE {}: ############################\n{}\n\n\n'.format(
                        i, wiki_tables[wiki][i].head()))

In [None]:
# commented out to save screen space
# print_wiki_tables('PRES16')

In [15]:
# commented out to save screen space
print_wiki_tables('SEN16')

Wiki: SEN16_AL 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                     ← 2010 November 8, 2016 2022 →   
2                                             ← 2010   
3  Nominee Richard Shelby Ron Crumpton Party Repu...   
4                                                NaN   

                                                   1       2   3  
0                                                NaN     NaN NaN  
1                     ← 2010 November 8, 2016 2022 →     NaN NaN  
2                                   November 8, 2016  2022 → NaN  
3  Nominee Richard Shelby Ron Crumpton Party Repu...     NaN NaN  
4                                                NaN     NaN NaN  



TABLE 1: ############################
        0                 1       2
0  ← 2010  November 8, 2016  2022 →



TABLE 2: ############################
              0               1             2   3
0  


TABLE 3: ############################
              0            1                2           3
0           NaN          NaN              NaN         NaN
1       Nominee  John McCain  Ann Kirkpatrick  Gary Swing
2         Party   Republican       Democratic       Green
3  Popular vote      1359267          1031245      138634
4    Percentage        53.7%            40.8%        5.5%



TABLE 4: ############################
                                                   0  \
0  U.S. senator before election John McCain Repub...   

                                             1  
0  Elected U.S. Senator John McCain Republican  



TABLE 5: ############################
                                Elections in Arizona
0  Federal government Presidential elections 1912...
1                             Presidential elections
2  1912 1916 1920 1924 1928 1932 1936 1940 1944 1...
3                             Presidential primaries
4  Democratic 2004 2008 2016 2020 Republican 2008...





TABLE 50: ############################
  vteNotable third party performances in United States elections  \
0                          Presidential (since 1832)               
1                            Senatorial (since 1990)               
2                         Gubernatorial (since 1990)               
3  Portal:Politics Third party (United States) Th...               

  vteNotable third party performances in United States elections.1  
0  1832 1848 1856 1860 1892 1912 1924 1948 1968 1...                
1  Virginia 1990 Alaska 1992 Arizona 1992 Hawaii ...                
2  Alaska 1990 Connecticut 1990 Kansas 1990 Maine...                
3  Portal:Politics Third party (United States) Th...                



TABLE 51: ############################
  vte Elections in Arizona                         vte Elections in Arizona.1
0                 Governor  1911 1914 1916 1918 1920 1922 1924 1926 1928 1...
1           U.S. President  1912 1916 1920 1924 1928 1932 1936 1940 1944 1..


TABLE 34: ############################
                              Pollsource Date(s)administered  Samplesize  \
0                         Marist College     May 29–31, 2016        2485   
1                         The Field Poll     May 26–31, 2016        1002   
2  Public Policy Institute of California     May 13–22, 2016         996   
3                              SurveyUSA     May 19–22, 2016        1416   
4  Public Policy Institute of California     May 13–22, 2016         996   

  Margin oferror Tom Del Beccaro (R) KamalaHarris (D) LorettaSánchez (D)  \
0         ± 2.3%                  8%              37%                19%   
1         ± 3.1%                  4%              30%                14%   
2         ± 4.3%                  8%              27%                19%   
3         ± 2.7%                  9%              31%                22%   
4         ± 4.3%                  8%              27%                19%   

  DufSundheim (R) RonUnz (R) Other Undecided  


TABLE 6: ############################
                                     State elections
0                                          2010 2014
1                            Gubernatorial elections
2  1776 1880 1882 1884 1886 1888 1890 1892 1894 1...
3                         Attorney General elections
4                                               2010



TABLE 7: ############################
  Mayoral Elections
0         2015 2019



TABLE 8: ############################
  Mayoral Elections
0    2011 2015 2019



TABLE 9: ############################
     Mayoral Elections
0  2013 2015 2017 2019



TABLE 10: ############################
   Mayoral Elections
0               2017



TABLE 11: ############################
   Mayoral Elections
0               2019



TABLE 12: ############################
       Candidate  Delegates Percentage
0     Dan Carter        907      76.7%
1    August Wolf        123      10.4%
2  Jack Orchulli         20       1.7%
3    Not Present        132 


TABLE 30: ############################
                   Poll source Date(s)administered  Samplesize Margin oferror  \
0                  Mason-Dixon  August 22–24, 2016         400           ± 5%   
1  Florida Atlantic University  August 19–22, 2016         364            NaN   
2  Florida Chamber of Commerce  August 17–22, 2016         258         ± 4.0%   
3           St. Leo University  August 14–18, 2016         532         ± 4.5%   
4           Suffolk University    August 1–3, 2016         194         ± 4.4%   

  AlanGrayson PamKeith LateresaJones PatrickMurphy Other/Undecided  
0         22%       4%             —           55%             19%  
1          8%       7%             —           54%             22%  
2         11%        —             —           40%             38%  
3         17%       8%             —           48%             27%  
4       17.2%     2.4%             –         35.7%           44.7%  



TABLE 31: ############################
  Party     Party


TABLE 63: ############################
             Poll source     Date(s)administered  Samplesize Margin oferror  \
0               iCitizen      August 18–24, 2016         600         ± 4.0%   
1     St. Leo University      August 14–18, 2016        1380         ± 3.0%   
2    Monmouth University      August 12–15, 2016         402         ± 4.9%   
3  Quinnipiac University  July 30–August 7, 2016        1056         ± 3.0%   
4     Suffolk University        August 1–3, 2016         500         ± 4.4%   

  MarcoRubio (R) AlanGrayson (D) Other Undecided  
0            44%             39%     —       16%  
1            47%             34%     —       19%  
2            50%             39%    5%        6%  
3            49%             43%    1%        8%  
4            45%             31%     —       24%  



TABLE 64: ############################
  Party      Party.1                Candidate    Votes       %        ±
0   NaN   Republican  Marco Rubio (incumbent)  4835191  51.98%   


TABLE 6: ############################
                                   General elections
0                                               2014
1                            Gubernatorial elections
2  1928 1930 1932 1934 1936 1938 1940 1942 1944 1...
3                        State legislature elections
4                                               2006



TABLE 7: ############################
     Gubernatorial elections
0  Amendment 2 Proposition 2



TABLE 8: ############################
  Mayoral elections
0         2015 2019



TABLE 9: ############################
         Party      Party.1               Candidate   Votes        %
0          NaN   Republican  Mike Crapo (incumbent)  119633  100.00%
1  Total votes  Total votes             Total votes  119633  100.00%



TABLE 10: ############################
         Party      Party.1       Candidate  Votes        %
0          NaN   Democratic  Jerry Sturgill  26471  100.00%
1  Total votes  Total votes     Total votes  26471  1


TABLE 22: ############################
                                Hypothetical polling           Unnamed: 1  \
0  with Baron Hill Poll source Date(s)administere...                  NaN   
1                                        Poll source  Date(s)administered   
2                                Bellwether Research      May 11–15, 2016   
3                                         WTHR/Howey    April 18–21, 2016   
4                                        Poll source  Date(s)administered   

   Unnamed: 2      Unnamed: 3          Unnamed: 4     Unnamed: 5 Unnamed: 6  
0         NaN             NaN                 NaN            NaN        NaN  
1  Samplesize  Margin oferror       ToddYoung (R)  BaronHill (D)  Undecided  
2         600          ± 4.0%                 36%            22%        30%  
3         500          ± 4.3%                 48%            30%        22%  
4  Samplesize  Margin oferror  MarlinStutzman (R)  BaronHill (D)  Undecided  



TABLE 23: ################


TABLE 13: ############################
                  Poll source          Date(s)administered  Samplesize  \
0                SurveyMonkey           November 1–7, 2016        1311   
1                SurveyMonkey  October 31–November 6, 2016        1139   
2  Fort Hays State University           November 1–3, 2016         313   
3                SurveyMonkey  October 28–November 3, 2016        1162   
4                SurveyMonkey  October 27–November 2, 2016        1123   

  Margin oferror JerryMoran (R) PatrickWiesner (D) RobertGarrard (L) Undecided  
0         ± 4.6%            59%                37%                 —        4%  
1         ± 4.6%            58%                38%                 —        4%  
2         ± 3.5%            77%                13%               10%        0%  
3         ± 4.6%            58%                38%                 —        4%  
4         ± 4.6%            57%                38%                 —        5%  



TABLE 14: ################


TABLE 23: ############################
                             Source   Ranking             As of
0    The Cook Political Report[133]    Safe R  November 8, 2016
1        Sabato's Crystal Ball[134]  Likely R  November 7, 2016
2  Rothenberg Political Report[135]    Safe R  November 3, 2016
3                    Daily Kos[136]    Safe R  November 8, 2016
4          Real Clear Politics[137]  Likely R  November 8, 2016



TABLE 24: ############################
         Party          Party.1           Candidate            Votes  \
0          NaN       Republican  John Neely Kennedy           536191   
1          NaN       Democratic     Foster Campbell           347816   
2  Total votes      Total votes         Total votes        '884,007'   
3          NaN  Republican hold     Republican hold  Republican hold   

                 %                ±  
0           60.65%           +4.09%  
1           39.35%           +1.68%  
2         '100.0%'              NaN  
3  Republican hold  R


TABLE 12: ############################
                           Poll source Date(s)administered  Samplesize  \
0  St. Louis Post-Dispatch/Mason-Dixon    July 23–27, 2016         400   

  Margin oferror RoyBlunt KristiNichols BernieMowinski RyanLuethy Undecided  
0         ± 5.0%      66%            9%             5%         1%       19%  



TABLE 13: ############################
                                Hypothetical polling           Unnamed: 1  \
0  Poll source Date(s)administered Samplesize Mar...                  NaN   
1                                        Poll source  Date(s)administered   
2                           Remington Research Group         January 2015   
3                           Remington Research Group   February 2–3, 2015   

   Unnamed: 2      Unnamed: 3 Unnamed: 4   Unnamed: 5 Unnamed: 6 Unnamed: 7  
0         NaN             NaN        NaN          NaN        NaN        NaN  
1  Samplesize  Margin oferror   RoyBlunt  JohnBrunner      Other  Undec


TABLE 7: ############################
  Municipal elections
0                2009
1   Mayoral elections
2                2017



TABLE 8: ############################
                                Hypothetical polling           Unnamed: 1  \
0  Poll source Date(s)administered Samplesize Mar...                  NaN   
1                                        Poll source  Date(s)administered   
2                              Public Policy Polling     April 9–13, 2015   

   Unnamed: 2      Unnamed: 3   Unnamed: 4       Unnamed: 5 Unnamed: 6  \
0         NaN             NaN          NaN              NaN        NaN   
1  Samplesize  Margin oferror  KellyAyotte  OvideLamontagne      Other   
2         358             ± ?          57%              32%          —   

  Unnamed: 7  
0        NaN  
1  Undecided  
2        12%  



TABLE 9: ############################
             Poll source Date(s)administered  Samplesize Margin oferror  \
0  Public Policy Polling    April 9–13, 2015      


TABLE 12: ############################
             Poll source Date(s)administered  Samplesize Margin oferror  \
0  Public Policy Polling      July 2–6, 2015         288         ± 5.8%   

  RichardBurr MarkMeadows Undecided  
0         62%          9%       28%  



TABLE 13: ############################
         Party      Party.1                 Candidate    Votes        %
0          NaN   Republican  Richard Burr (incumbent)   622074   61.41%
1          NaN   Republican              Greg Brannon   255030   25.17%
2          NaN   Republican               Paul Wright    85944    8.48%
3          NaN   Republican           Larry Holmquist    50010    4.94%
4  Total votes  Total votes               Total votes  1013058  100.00%



TABLE 14: ############################
             Poll source   Date(s)administered  Samplesize Margin oferror  \
0  Public Policy Polling     March 11–13, 2016         746          ±3.6%   
1  High Point University      March 9–10, 2016         669     


TABLE 15: ############################
  vte(2015 ←) 2016 United States elections (→ 2017)  \
0                                     U.S.President   
1                                        U.S.Senate   
2                                         U.S.House   
3                                         Governors   
4                                            Mayors   

  vte(2015 ←) 2016 United States elections (→ 2017).1  
0  Alabama Alaska American Samoa Arizona Arkansas...   
1  Alabama Alaska Arizona Arkansas California Col...   
2  Alabama Alaska American Samoa Arizona Arkansas...   
3  American Samoa Delaware Indiana Missouri Monta...   
4  Augusta, GA Bakersfield, CA Baltimore, MD Bato...   



Wiki: SEN16_OH 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                     ← 2010 November 8, 2016 2022 →   
2                                             ← 2010   
3  Nominee R


TABLE 7: ############################
                      List of Oregon ballot measures
0  1990 1992 1994 1996 1997 1998 2002 2003 2004 2...



TABLE 8: ############################
                        Portland elections
0  1996 2000 2004 2006 2008 2012 2016 2020



TABLE 9: ############################
         Party      Party.1    Candidate   Votes        %
0          NaN   Democratic    Ron Wyden  501903   83.20%
1          NaN   Democratic  Kevin Stine   78287   12.98%
2          NaN   Democratic  Paul Weaver   20346    3.37%
3          NaN          NaN    write-ins    2740    0.45%
4  Total votes  Total votes  Total votes  603276  100.00%



TABLE 10: ############################
  Party     Party.1      Candidate   Votes       %
0   NaN  Republican  Mark Callahan  123473  38.24%
1   NaN  Republican  Sam Carpenter  104494  32.36%
2   NaN  Republican   Faye Stewart   57399  17.78%
3   NaN  Republican  Dan Laschober   34157  10.58%
4   NaN         NaN      write-ins    3357


TABLE 40: ############################
  vte(2015 ←) 2016 United States elections (→ 2017)  \
0                                     U.S.President   
1                                        U.S.Senate   
2                                         U.S.House   
3                                         Governors   
4                                            Mayors   

  vte(2015 ←) 2016 United States elections (→ 2017).1  
0  Alabama Alaska American Samoa Arizona Arkansas...   
1  Alabama Alaska Arizona Arkansas California Col...   
2  Alabama Alaska American Samoa Arizona Arkansas...   
3  American Samoa Delaware Indiana Missouri Monta...   
4  Augusta, GA Bakersfield, CA Baltimore, MD Bato...   



Wiki: SEN16_SC 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1            ← 2014(special) November 8, 2016 2022 →   
2                                    ← 2014(special)   
3  Nominee T

TABLE 17: ############################
            Poll source Date(s)administered  Samplesize Margin oferror  \
0  UtahPolicy/Dan Jones      May 4–12, 2015         803         ± 3.5%   

  MikeLee (R) DougOwens (D) Other Undecided  
0         55%           36%     —       10%  



TABLE 18: ############################
            Poll source     Date(s)administered  Samplesize Margin oferror  \
0  UtahPolicy/Dan Jones  March 30–April 7, 2015         601         ± 4.0%   

  MikeLee (R) JimMatheson (D) BenMcAdams (D) DougOwens (D) JoshRomney (R)  \
0         33%             20%             5%            8%            20%   

  ThomasWright (R) Undecided  
0               2%       12%  



TABLE 19: ############################
         Party               Party.1             Candidate        Votes  \
0          NaN            Republican  Mike Lee (incumbent)       760241   
1          NaN            Democratic            Misty Snow       301860   
2          NaN  Independent American 


TABLE 17: ############################
             Poll source Date(s)administered  Samplesize Margin oferror  \
0  Public Policy Polling     March 6–8, 2015        1071         ± 3.0%   

  RonJohnson (R) MaryBurke (D) Undecided  
0            45%           46%        9%  



TABLE 18: ############################
             Poll source Date(s)administered  Samplesize Margin oferror  \
0  Public Policy Polling     March 6–8, 2015        1071         ± 3.0%   

  RonJohnson (R) MarkPocan (D) Undecided  
0            43%           36%       20%  



TABLE 19: ############################
             Poll source Date(s)administered  Samplesize Margin oferror  \
0  Public Policy Polling     March 6–8, 2015        1071         ± 3.0%   

  RonJohnson (R) GwenMoore (D) Other Undecided  
0            45%           37%     —       18%  



TABLE 20: ############################
             Poll source    Date(s)administered  Samplesize Margin oferror  \
0  Public Policy Polling        M

In [16]:
# commented out to save screen space
print_wiki_tables('USH16')

Wiki: USH16_AK 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                     ← 2014 November 8, 2016 2018 →   
2                                             ← 2014   
3  Nominee Don Young Steve Lindbeck Jim McDermott...   
4                                                NaN   

                                                   1       2    3  
0                                                NaN     NaN  NaN  
1                     ← 2014 November 8, 2016 2018 →     NaN  NaN  
2                                   November 8, 2016  2018 →  NaN  
3  Nominee Don Young Steve Lindbeck Jim McDermott...     NaN  NaN  
4                                                NaN     NaN  NaN  



TABLE 1: ############################
        0                 1       2
0  ← 2014  November 8, 2016  2018 →



TABLE 2: ############################
              0           1               2    


TABLE 5: ############################
                              Presidential Elections
0  1892 1896 1900 1904 1908 1912 1916 1920 1924 1...
1                             Presidential Primaries
2  .mw-parser-output .nobold{font-weight:normal}D...
3                             U. S. Senate elections
4  1890 1895 1899 1901 1905 1907 1911 1913 1916 1...



TABLE 6: ############################
                             Gubernatorial elections
0  1916 1920 1924 1928 1932 1936 1940 1944 1948 1...



TABLE 7: ############################
         Party      Party.1               Candidate   Votes      %
0          NaN   Republican  Ryan Zinke (incumbent)  144660  100.0
1  Total votes  Total votes             Total votes  144660  100.0



TABLE 8: ############################
         Party      Party.1      Candidate   Votes      %
0          NaN   Democratic  Denise Juneau  112821  100.0
1  Total votes  Total votes    Total votes  112821  100.0



TABLE 9: ###########################


TABLE 5: ############################
                              Presidential elections
0  1792 1796 1800 1804 1808 1812 1816 1820 1824 1...
1                              U.S. Senate elections
2  Class One 1916 1922 1928 1934 1940 1946 1952 1...
3                               U.S. House elections
4  1791 1793 1794/95 1796 1798 1800 1802/03 1804/...



TABLE 6: ############################
                                   General elections
0  2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
1                            Gubernatorial elections
2  1789 1865 1866 1867 1868 1869 1870 1872 1874 1...
3                              Legislative elections
4  Senate 2018 2020 House of Representatives 2018...



TABLE 7: ############################
                                   Mayoral elections
0  City of Burlington 1865 1981 1983 1985 1987 19...



TABLE 8: ############################
         Party      Party.1                Candidate  Votes       %
0          NaN   Democratic 

In [17]:
# commented out to save screen space
print_wiki_tables('SEN17')

Wiki: SEN17_AL 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                    ← 2014 December 12, 2017 2020 →   
2                                             ← 2014   
3                                            Turnout   
4  Nominee Doug Jones Roy Moore Party Democratic ...   

                                                   1       2   3  
0                                                NaN     NaN NaN  
1                    ← 2014 December 12, 2017 2020 →     NaN NaN  
2                                  December 12, 2017  2020 → NaN  
3                                           40.5%[1]     NaN NaN  
4  Nominee Doug Jones Roy Moore Party Democratic ...     NaN NaN  



TABLE 1: ############################
        0                  1       2
0  ← 2014  December 12, 2017  2020 →



TABLE 2: ############################
              0           1           2   3
0      


TABLE 50: ############################
  vte(2016 ←) 2017 United States elections (→ 2018)  \
0                                       U.S. Senate   
1                                        U.S. House   
2                                         Governors   
3                                            Mayors   
4                                            Cities   

  vte(2016 ←) 2017 United States elections (→ 2018).1  
0                                  Alabama (special)   
1  California 34th sp Georgia 6th sp Kansas 4th s...   
2                                New Jersey Virginia   
3  Albuquerque, NM Annapolis, MD Arlington, TX At...   
4              Houston, TX Minneapolis, MN Plano, TX   





In [18]:
# commented out to save screen space
print_wiki_tables('USH17')

In [19]:
# commented out to save screen space
print_wiki_tables('SEN18')

Wiki: SEN18_AZ 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                     ← 2012 November 6, 2018 2024 →   
2                                             ← 2012   
3                                            Turnout   
4  Nominee Kyrsten Sinema Martha McSally Party De...   

                                                   1       2   3  
0                                                NaN     NaN NaN  
1                     ← 2012 November 6, 2018 2024 →     NaN NaN  
2                                   November 6, 2018  2024 → NaN  
3                                          64.85%[1]     NaN NaN  
4  Nominee Kyrsten Sinema Martha McSally Party De...     NaN NaN  



TABLE 1: ############################
        0                 1       2
0  ← 2012  November 6, 2018  2024 →



TABLE 2: ############################
              0               1               2   3
0

TABLE 31: ############################
  Campaign finance reports as of October 17, 2018  \
                                Candidate (party)   
0                              Kyrsten Sinema (D)   
1                              Martha McSally (R)   
2        Source: Federal Election Commission[201]   

                                             \
                             Total receipts   
0                               $19,287,249   
1                               $16,211,836   
2  Source: Federal Election Commission[201]   

                                             \
                        Total disbursements   
0                               $20,249,341   
1                               $13,688,178   
2  Source: Federal Election Commission[201]   

                                             
                               Cash on hand  
0                                $1,301,542  
1                                $2,523,657  
2  Source: Federal Election Commission[


TABLE 41: ############################
  Poll source Date(s)administered  Samplesize Marginof error Kevinde León(D)  \
0   SurveyUSA   March 22–25, 2018         517         ± 5.0%              5%   
1   SurveyUSA   January 7–9, 2018         506         ± 4.4%              4%   

  DianneFeinstein(D) TimothyCharlesKalemkarian(R) CarenLancona(R)  \
0                31%                           5%              2%   
1                34%                           6%              5%   

  PatrickLittle(R) JohnMelendez(D) StephenSchrader(R) Other /Undecided  
0               5%              5%                 7%         42%[131]  
1               5%              2%                 5%         38%[132]  



TABLE 42: ############################
  Pollsource Date(s)administered  Samplesize Marginof error Kevinde León(D)  \
0  SurveyUSA     January–9, 2018         506         ± 4.4%              3%   

  DianneFeinstein(D) TimothyCharlesKalemkarian(R) CarenLancona(R)  \
0                29%  


TABLE 15: ############################
                          Source    Ranking               As of
0  The Cook Political Report[21]    Solid D  September 28, 2018
1           Inside Elections[22]    Solid D  September 14, 2018
2      Sabato's Crystal Ball[23]     Safe D   September 6, 2018
3                  Daily Kos[24]     Safe D  September 17, 2018
4                   Fox News[25]  Likely D^  September 19, 2018



TABLE 16: ############################
             Poll source            Date(s)administered  Samplesize  \
0       Gravis Marketing  October 30 – November 1, 2018         681   
1        Emerson College            October 27–29, 2018         780   
2  Quinnipiac University            October 22–28, 2018        1201   
3  Quinnipiac University              October 3–8, 2018         767   
4       Gravis Marketing             August 24–27, 2018         606   

  Marginof error ChrisMurphy (D) MatthewCorey (R) Other Undecided  
0         ± 3.8%             58%       


TABLE 5: ############################
                                      U.S. President
0  1960 1964 1968 1972 1976 1980 1984 1988 1992 1...
1                                        U.S. Senate
2  1959 1962 1964 1968 1970 1974 1976 1980 1982 1...
3                      U.S. House of Representatives
4  1900 (Terr) 1902 (Terr) 1922 (Terr) 1926 (Terr...



TABLE 6: ############################
                             Gubernatorial elections
0  1959 1962 1966 1970 1974 1978 1982 1986 1990 1...
1                             State Senate elections
2                                          2012 2018
3           State House of Representatives elections
4                                          2012 2018



TABLE 7: ############################
          1998
0  Amendment 2



TABLE 8: ############################
               Mayoral elections
0  2010 (special) 2012 2016 2020



TABLE 9: ############################
                                        Mazie Hirono
0  U.S. Sena


TABLE 4: ############################
                                  Elections in Maine
0  Federal offices U.S. President 1820 1824 1828 ...
1                                     U.S. President
2  1820 1824 1828 1832 1836 1840 1844 1848 1852 1...
3                                        U.S. Senate
4  1820 1821 1823 1823 1827 1829 sp 1835 1835 sp ...



TABLE 5: ############################
                                      U.S. President
0  1820 1824 1828 1832 1836 1840 1844 1848 1852 1...
1                                        U.S. Senate
2  1820 1821 1823 1823 1827 1829 sp 1835 1835 sp ...
3                      U.S. House of Representatives
4  1820–1821 1822 sp 2nd 1823 1824–1825 1826–1827...



TABLE 6: ############################
                             Gubernatorial elections
0  1820 1848 1880 1936 1938 1940 1942 1944 1946 1...
1                             State Senate elections
2                                          2018 2020
3                              


TABLE 21: ############################
                      Poll source  Date(s)administered Samplesize  \
0                         MassINC  October 25–28, 2018        502   
1              Suffolk University  October 24–28, 2018        500   
2  Western New England University  October 10–27, 2018     402 LV   
3  Western New England University  October 10–27, 2018     485 RV   
4                    UMass Lowell    October 1–7, 2018     485 LV   

  Marginof error ElizabethWarren (D) GeoffDiehl (R) ShivaAyyadurai (I) Other  \
0         ± 4.4%                 54%            32%                 6%    3%   
1         ± 4.4%                 56%            34%                 4%     –   
2         ± 5.0%                 57%            27%                 7%     –   
3         ± 4.0%                 54%            27%                 6%     –   
4         ± 5.6%                 56%            31%                 8%    3%   

  Undecided  
0        3%  
1        5%  
2        8%  
3       


TABLE 1: ############################
        0                 1       2
0  ← 2012  November 6, 2018  2024 →



TABLE 2: ############################
              0                1           2   3
0           NaN              NaN         NaN NaN
1       Nominee  Debbie Stabenow  John James NaN
2         Party       Democratic  Republican NaN
3  Popular vote          2214478     1938818 NaN
4    Percentage            52.3%       45.8% NaN



TABLE 3: ############################
                                                   0  \
0  U.S. senator before election Debbie Stabenow D...   

                                                 1  
0  Elected U.S. Senator Debbie Stabenow Democratic  



TABLE 4: ############################
                               Elections in Michigan
0  Federal government U.S. president 1836 1840 18...
1                                     U.S. president
2  1836 1840 1844 1848 1852 1856 1860 1864 1868 1...
3                                        


TABLE 26: ############################
                                Hypothetical polling              Unnamed: 1  \
0  with Sandy Pensler Poll source Date(s)administ...                     NaN   
1                                        Poll source     Date(s)administered   
2                                    Emerson College        July 19–21, 2018   
3                                     Marist College        July 15–19, 2018   
4                                 SurveyMonkey/Axios  June 11 – July 2, 2018   

   Unnamed: 2      Unnamed: 3          Unnamed: 4        Unnamed: 5  \
0         NaN             NaN                 NaN               NaN   
1  Samplesize  Marginof error  DebbieStabenow (D)  SandyPensler (R)   
2         600          ± 4.3%                 48%               32%   
3         886          ± 3.9%                 52%               37%   
4         978          ± 5.0%                 53%               41%   

  Unnamed: 6 Unnamed: 7  
0        NaN        NaN  



TABLE 22: ############################
       Poll source  Date(s)administered Samplesize Marginof error  \
0  Change Research   November 2–4, 2018       1003              –   
1   Marist College  October 13–18, 2018     511 LV         ± 6.1%   
2   Marist College  October 13–18, 2018     511 LV         ± 6.1%   
3   Marist College  October 13–18, 2018     856 RV         ± 4.7%   
4   Marist College  October 13–18, 2018     856 RV         ± 4.7%   

  RogerWicker (R) DavidBaria (D) DannyBedwell (L)   Other Undecided  
0             48%            40%               5%  3%[47]         –  
1             57%            31%               2%  2%[48]        9%  
2             60%            32%                –      2%        7%  
3             54%            30%               3%  2%[48]       10%  
4             57%            32%                –      2%        9%  



TABLE 23: ############################
         Party      Party.1                 Candidate   Votes       %       ±
0    


TABLE 27: ############################
                                Hypothetical polling           Unnamed: 1  \
0  with Austin Petersen Poll source Date(s)admini...                  NaN   
1                                        Poll source  Date(s)administered   
2                      Gravis Marketing (R-Petersen)         May 16, 2018   
3                                        Poll source  Date(s)administered   
4                                   Fabrizio Lee (R)     July 10–11, 2017   

   Unnamed: 2      Unnamed: 3           Unnamed: 4          Unnamed: 5  \
0         NaN             NaN                  NaN                 NaN   
1  Samplesize  Marginof error  ClaireMcCaskill (D)  AustinPetersen (R)   
2         822          ± 3.4%                  40%                 56%   
3  Samplesize  Marginof error  ClaireMcCaskill (D)   GenericRepublican   
4         500          ± 4.4%                  38%                 54%   

  Unnamed: 6  
0        NaN  
1  Undecided  
2      


TABLE 17: ############################
                                  Matt Rosendale (R)
0  U.S. Executive Branch officials Donald Trump, ...



TABLE 18: ############################
  Campaign finance reports as of October 17, 2018  \
                                Candidate (party)   
0                                  Jon Tester (D)   
1                              Matt Rosendale (R)   
2                           Rick Breckenridge (L)   
3         Source: Federal Election Commission[88]   

                                            \
                            Total receipts   
0                              $19,499,290   
1                               $5,034,075   
2                                        -   
3  Source: Federal Election Commission[88]   

                                            \
                       Total disbursements   
0                              $17,946,600   
1                               $4,515,910   
2                               


TABLE 23: ############################
       Poll source            Date(s)administered Samplesize Marginof error  \
0          HarrisX             November 3–5, 2018        600         ± 4.0%   
1          HarrisX             November 2–4, 2018        600         ± 4.0%   
2  Emerson College             November 1–4, 2018       1197         ± 3.0%   
3          HarrisX             November 1–3, 2018        600         ± 4.0%   
4          HarrisX  October 31 – November 2, 2018        600         ± 4.0%   

  DeanHeller (R) JackyRosen (D) TimHagan (L) None ofthese Other Undecided  
0            45%            47%            –            –     –         –  
1            46%            46%            –            –     –         –  
2            45%            49%            –            –    3%        4%  
3            46%            45%            –            –     –         –  
4            45%            44%            –            –     –         –  



TABLE 24: ################


TABLE 22: ############################
  Party      Party.1                 Candidate    Votes       %       ±
0   NaN   Democratic  Bob Menendez (incumbent)  1711654  54.01%  -4.86%
1   NaN   Republican                 Bob Hugin  1357355  42.83%  +3.46%
2   NaN        Green           Madelyn Hoffman    25150   0.79%  +0.32%
3   NaN  Libertarian             Murray Sabrin    21212   0.67%  +0.17%
4   NaN  Independent            Natalie Rivera    19897   0.63%     NaN



TABLE 23: ############################
       County Menendez %  Menendez votes Hugin %  Hugin votes Other %  \
0    Atlantic     47.43%           44617  48.85%        45954   3.72%   
1      Bergen     54.69%          188235  42.54%       146406   2.77%   
2  Burlington     52.78%           98749  43.96%        82240   3.26%   
3      Camden     61.82%          113137  34.58%        63279   3.60%   
4    Cape May     35.78%           14555  61.02%        24823   3.20%   

   Other votes  
0         3502  
1         954


TABLE 20: ############################
                 Poll source Date(s)administered  Samplesize Marginof error  \
0  NSON Opinion Strategy (L)           July 2018         500              –   
1         Carroll Strategies    June 15–16, 2018        1199         ± 2.8%   

  MartinHeinrich (D) MickRich (R) AubreyDunn (L) Undecided  
0                47%          30%             7%       16%  
1                50%          39%             5%        6%  



TABLE 21: ############################
         Party          Party.1                    Candidate            Votes  \
0          NaN       Democratic  Martin Heinrich (incumbent)           376998   
1          NaN       Republican                    Mick Rich           212813   
2          NaN      Libertarian                 Gary Johnson           107201   
3  Total votes      Total votes                  Total votes           697012   
4          NaN  Democratic hold              Democratic hold  Democratic hold   

          


TABLE 20: ############################
                   Poll source   Date(s)administered  Samplesize  \
0  The Tarrance Group (R-NRSC)  February 18–20, 2018         500   

  Marginof error GenericDemocrat GenericRepublican Undecided  
0         ± 4.5%             34%               48%       18%  



TABLE 21: ############################
                 Poll source    Date(s)administered  Samplesize  \
0  1892 Polling (R-Campbell)    October 11–12, 2017         500   
1  1892 Polling (R-Campbell)  May 30 – June 1, 2017         500   

  Marginof error HeidiHeitkamp (D) TomCampbell (R) Undecided  
0         ± 4.9%               41%             44%       15%  
1         ± 4.9%               43%             37%       20%  



TABLE 22: ############################
                            Poll source    Date(s)administered  Samplesize  \
0  WPA Intelligence (R-Club for Growth)  September 10–11, 2017         406   

  Marginof error HeidiHeitkamp (D) KellySchmidt (R) Undecided  
0


TABLE 25: ############################
                                Hypothetical polling  \
0  with Mike Gibbons Poll source Date(s)administe...   
1                                        Poll source   
2                                          SurveyUSA   
3                         Baldwin Wallace University   
4                                        Poll source   

                    Unnamed: 1  Unnamed: 2      Unnamed: 3        Unnamed: 4  \
0                          NaN         NaN             NaN               NaN   
1          Date(s)administered  Samplesize  Marginof error  SherrodBrown (D)   
2            March 16–20, 2018        1408          ± 3.5%               52%   
3  February 28 – March 9, 2018        1011          ± 3.0%               41%   
4          Date(s)administered  Samplesize  Marginof error  SherrodBrown (D)   

           Unnamed: 5 Unnamed: 6  
0                 NaN        NaN  
1     MikeGibbons (R)  Undecided  
2                 38%        10%  
3 


TABLE 18: ############################
         Party      Party.1       Candidate   Votes       %
0          NaN   Republican    Lou Barletta  433312  63.03%
1          NaN   Republican  Jim Christiana  254118  36.97%
2  Total votes  Total votes     Total votes  687430    100%



TABLE 19: ############################
                          Source   Ranking             As of
0  The Cook Political Report[34]  Likely D  October 26, 2018
1           Inside Elections[35]    Safe D  November 1, 2018
2      Sabato's Crystal Ball[36]    Safe D  November 5, 2018
3                  Daily Kos[37]    Safe D  November 5, 2018
4                Fox News[38][a]  Likely D  November 5, 2018



TABLE 20: ############################
                                    Lou Barletta (R)
0  Federal officials Donald Trump, President of t...



TABLE 21: ############################
                                   Bob Casey Jr. (D)
0  Federal officials Barack Obama, former Preside...



TABLE 22: ###


TABLE 15: ############################
         Party      Party.1       Candidate   Votes       %
0          NaN   Democratic   Phil Bredesen  349718  91.51%
1          NaN   Democratic      Gary Davis   20170   5.28%
2          NaN   Democratic  John Wolfe Jr.   12269   3.21%
3  Total votes  Total votes     Total votes  382157    100%



TABLE 16: ############################
                          Source Ranking             As of
0          RealClearPolitics[43]  Tossup  November 5, 2018
1  The Cook Political Report[44]  Tossup  October 26, 2018
2           Inside Elections[45]  Lean R  November 1, 2018
3      Sabato's Crystal Ball[46]  Lean R  November 5, 2018
4                  Daily Kos[47]  Lean R  November 5, 2018



TABLE 17: ############################
                                Marsha Blackburn (R)
0  U.S. Executive Branch officials Donald Trump, ...



TABLE 18: ############################
                                   Phil Bredesen (D)
0  Former U.S. Execut


TABLE 21: ############################
             Poll source Date(s)administered  Samplesize Marginof error  \
0  Public Policy Polling  August 12–14, 2016         522              –   
1       Dixie Strategies    August 8–9, 2016         448              –   

  TedCruz DanPatrick Other Undecided  
0     49%        27%     –       24%  
1     38%        23%   15%       24%  



TABLE 22: ############################
             Poll source Date(s)administered  Samplesize Marginof error  \
0  Public Policy Polling  August 12–14, 2016         522              –   

  TedCruz RickPerry Undecided  
0     37%       46%       18%  



TABLE 23: ############################
  Party     Party.1             Candidate    Votes       %
0   NaN  Republican  Ted Cruz (incumbent)  1322724  85.36%
1   NaN  Republican           Mary Miller    94715   6.11%
2   NaN  Republican   Bruce Jacobson, Jr.    64791   4.18%
3   NaN  Republican    Stefano de Stefano    44456   2.87%
4   NaN  Republican    


TABLE 10: ############################
  State Republican Convention results, 2018                       \
                                  Candidate First ballot    Pct.   
0                              Mike Kennedy         1354  40.69%   
1                               Mitt Romney         1539  46.24%   
2                               Loy Brunson            4   0.12%   
3                             Alicia Colvin           29   0.87%   
4                              Stoney Fonua            7   0.21%   

                             
  Second ballot      Pct..1  
0          1642      50.88%  
1          1585      49.12%  
2    Eliminated  Eliminated  
3    Eliminated  Eliminated  
4    Eliminated  Eliminated  



TABLE 11: ############################
  Hostnetwork          Date Link(s) Participants              \
  Hostnetwork          Date Link(s)   MittRomney MikeKennedy   
0     KBYU-TV  May 29, 2018    [26]      Invited     Invited   

                      
  Unnamed: 5_le


Wiki: SEN18_VT 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                     ← 2012 November 6, 2018 2024 →   
2                                             ← 2012   
3                                            Turnout   
4  Nominee Bernie Sanders Lawrence Zupan Party In...   

                                                   1       2   3  
0                                                NaN     NaN NaN  
1                     ← 2012 November 6, 2018 2024 →     NaN NaN  
2                                   November 6, 2018  2024 → NaN  
3                                             55.57%     NaN NaN  
4  Nominee Bernie Sanders Lawrence Zupan Party In...     NaN NaN  



TABLE 1: ############################
        0                 1       2
0  ← 2012  November 6, 2018  2024 →



TABLE 2: ############################
              0               1               2   3



TABLE 10: ############################
                      Mayor
0  2004 2008 2012 2016 2020
1              City Council
2                      2016



TABLE 11: ############################
                      Mayor
0  2004 2008 2012 2016 2018



TABLE 12: ############################
                                        Nick Freitas
0  U.S. Senators Mike Lee (R-UT)[23] Rand Paul (R...



TABLE 13: ############################
                                       E. W. Jackson
0  Cabinet-level officials William G. Boykin, exe...



TABLE 14: ############################
                             Poll source  Date(s)administered  Samplesize  \
0  Atlantic Media & Research (R-Stewart)      May 14–18, 2018         355   
1         Christopher Newport University  February 5–28, 2018         422   

  Marginof error NickFreitas E. W.Jackson CoreyStewart   Other Undecided  
0         ± 5.2%          9%           5%          32%       –         –  
1         ± 2.5%          6%  


TABLE 30: ############################
             Poll source   Date(s)administered  Samplesize Marginof error  \
0  Quinnipiac University      April 6–10, 2017        1115         ± 2.9%   
1  Quinnipiac University  February 10–15, 2017         989         ± 3.1%   

  TimKaine (D) LauraIngraham (R) Other Undecided  
0          56%               35%     –        7%  
1          56%               36%    2%        7%  



TABLE 31: ############################
                     Poll source   Date(s)administered Samplesize  \
0  University of Mary Washington  September 5–12, 2017     562 LV   
1  University of Mary Washington  September 5–12, 2017     867 RV   

  Marginof error TimKaine (D) ScottTaylor (R) Undecided  
0         ± 5.2%          52%             37%        7%  
1         ± 4.1%          53%             41%        4%  



TABLE 32: ############################
         Party      Party.1              Candidate    Votes       %       ±
0          NaN   Democratic  Tim 


TABLE 4: ############################
                          Elections in West Virginia
0  Federal elections Presidential Elections 1864 ...
1                             Presidential Elections
2  1864 1868 1872 1876 1880 1884 1888 1892 1896 1...
3                             Presidential Primaries
4  Democratic Primaries 2008 2016 2020 Republican...



TABLE 5: ############################
                              Presidential Elections
0  1864 1868 1872 1876 1880 1884 1888 1892 1896 1...
1                             Presidential Primaries
2  Democratic Primaries 2008 2016 2020 Republican...
3                              U.S. Senate Elections
4  1958 1964 1970 1978 1982 1984 1988 1994 2000 2...



TABLE 6: ############################
                             Gubernatorial elections
0  1863 1864 1866 1868 1870 1872 1876 1880 1884 1...



TABLE 7: ############################
                               Paula Jean Swearengin
0  Individuals Cenk Uygur, host of The Youn


TABLE 28: ############################
      Poll source   Date(s)administered  Samplesize Marginof error  \
0  Harper Polling  November 16–17, 2016         500         ± 4.4%   

  CarteGoodwin (D) AlexMooney (R) Undecided  
0              41%            31%       28%  



TABLE 29: ############################
      Poll source   Date(s)administered  Samplesize Marginof error  \
0  Harper Polling  November 16–17, 2016         500         ± 4.4%   

  CarteGoodwin (D) EvanJenkins (R) Undecided  
0              31%             43%       25%  



TABLE 30: ############################
      Poll source   Date(s)administered  Samplesize Marginof error  \
0  Harper Polling  November 16–17, 2016         500         ± 4.4%   

  CarteGoodwin (D) PatrickMorrisey (R) Undecided  
0              39%                 43%       18%  



TABLE 31: ############################
         Party          Party.1                Candidate            Votes  \
0          NaN       Democratic  Joe Manchin (


TABLE 22: ############################
                                Hypothetical polling              Unnamed: 1  \
0  with Kevin Nicholson Poll source Date(s)admini...                     NaN   
1                                        Poll source     Date(s)administered   
2                                    Emerson College        July 26–28, 2018   
3                                     Marist College        July 15–19, 2018   
4                                 SurveyMonkey/Axios  June 11 – July 2, 2018   

   Unnamed: 2      Unnamed: 3        Unnamed: 4          Unnamed: 5  \
0         NaN             NaN               NaN                 NaN   
1  Samplesize  Marginof error  TammyBaldwin (D)  KevinNicholson (R)   
2         632          ± 4.2%               49%                 40%   
3         906          ± 3.8%               54%                 39%   
4         968          ± 4.5%               55%                 42%   

  Unnamed: 6 Unnamed: 7  
0        NaN        NaN  


In [20]:
# commented out to save screen space
print_wiki_tables('USH18')

Wiki: USH18_AK 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                     ← 2016 November 6, 2018 2020 →   
2                                             ← 2016   
3                                            Turnout   
4  Nominee Don Young Alyse Galvin[a] Party Republ...   

                                                   1       2   3  
0                                                NaN     NaN NaN  
1                     ← 2016 November 6, 2018 2020 →     NaN NaN  
2                                   November 6, 2018  2020 → NaN  
3                                             49.34%     NaN NaN  
4  Nominee Don Young Alyse Galvin[a] Party Republ...     NaN NaN  



TABLE 1: ############################
        0                 1       2
0  ← 2016  November 6, 2018  2020 →



TABLE 2: ############################
              0           1                2   3
0   


TABLE 9: ############################
                                          Lee Murphy
0  Political candidates Gene Truono, Republican c...



TABLE 10: ############################
         Party      Party.1     Candidate  Votes       %
0          NaN   Republican  Scott Walker  19573  53.00%
1          NaN   Republican    Lee Murphy  17359  47.00%
2  Total votes  Total votes   Total votes  36932    100%



TABLE 11: ############################
                                Lisa Blunt Rochester
0  U.S. Senators Tom Carper, U.S. Senator from De...



TABLE 12: ############################
                Scott Walker
0  Individuals Lil B, rapper



TABLE 13: ############################
                              Andrew Webb (write-in)
0  County Councilpersons Rob Arlett, Sussex Count...



TABLE 14: ############################
                                 Declined to endorse
0  Declined to endorse Scott Walker Rob Arlett, S...



TABLE 15: ############################


TABLE 4: ############################
                           Elections in North Dakota
0  Federal government Presidential 1892 1896 1900...
1                                       Presidential
2  1892 1896 1900 1904 1908 1912 1916 1920 1924 1...
3                                        U.S. Senate
4  1889 1891 1893 1897 1899 1903 1905 1909 1911 1...



TABLE 5: ############################
                                        Presidential
0  1892 1896 1900 1904 1908 1912 1916 1920 1924 1...
1                                        U.S. Senate
2  1889 1891 1893 1897 1899 1903 1905 1909 1911 1...
3                                         U.S. House
4  1889 1890 1892 1894 1896 1898 1900 1902 1904 1...



TABLE 6: ############################
                                   General elections
0            2006 2008 2010 2012 2014 2016 2018 2020
1                            Gubernatorial elections
2  1912 1914 1916 1918 1920 1921 1922 1924 1926 1...
3                       Secreta

Wiki: USH18_VT 
TABLE 0: ############################
                                                   0  \
0                                                NaN   
1                     ← 2016 November 6, 2018 2020 →   
2                                             ← 2016   
3  Nominee Peter Welch Anya Tynio Party Democrati...   
4                                                NaN   

                                                   1       2   3  
0                                                NaN     NaN NaN  
1                     ← 2016 November 6, 2018 2020 →     NaN NaN  
2                                   November 6, 2018  2020 → NaN  
3  Nominee Peter Welch Anya Tynio Party Democrati...     NaN NaN  
4                                                NaN     NaN NaN  



TABLE 1: ############################
        0                 1       2
0  ← 2016  November 6, 2018  2020 →



TABLE 2: ############################
              0            1           2   3
0       

In [21]:
# Manually gather wikipedia tabular election results
# Has to be done manually because every wikipedia page 
# is different and because Wikipedia doesn't have datasets
# to download and thus all tables are scraped

wiki_dfs = {}

In [22]:
# PRES16

wiki_dfs['PRES16_AL'] = wiki_tables['PRES16_AL'][6]  # county-level
wiki_dfs['PRES16_AK'] = wiki_tables['PRES16_AK'][17] # state-level
wiki_dfs['PRES16_AZ'] = wiki_tables['PRES16_AZ'][20] # county-level
wiki_dfs['PRES16_AR'] = wiki_tables['PRES16_AR'][8]  # county-level
wiki_dfs['PRES16_CA'] = wiki_tables['PRES16_CA'][36] # county-level
wiki_dfs['PRES16_CO'] = wiki_tables['PRES16_CO'][21] # county-level
wiki_dfs['PRES16_CT'] = wiki_tables['PRES16_CT'][16] # county-level
wiki_dfs['PRES16_DE'] = wiki_tables['PRES16_DE'][13] # county-level
wiki_dfs['PRES16_FL'] = wiki_tables['PRES16_FL'][14] # county-level
wiki_dfs['PRES16_GA'] = wiki_tables['PRES16_GA'][13] # county-level
wiki_dfs['PRES16_HI'] = wiki_tables['PRES16_HI'][12] # county-level
wiki_dfs['PRES16_ID'] = wiki_tables['PRES16_ID'][16] # county-level
wiki_dfs['PRES16_IL'] = wiki_tables['PRES16_IL'][26] # county-level
wiki_dfs['PRES16_IN'] = wiki_tables['PRES16_IN'][14] # county-level
wiki_dfs['PRES16_IA'] = wiki_tables['PRES16_IA'][13] # county-level
wiki_dfs['PRES16_KS'] = wiki_tables['PRES16_KS'][17] # county-level
wiki_dfs['PRES16_KY'] = wiki_tables['PRES16_KY'][12] # county-level
wiki_dfs['PRES16_LA'] = wiki_tables['PRES16_LA'][8]  # parish-level
wiki_dfs['PRES16_ME'] = wiki_tables['PRES16_ME'][19] # county-level
wiki_dfs['PRES16_MD'] = wiki_tables['PRES16_MD'][15] # county-level
wiki_dfs['PRES16_MA'] = wiki_tables['PRES16_MA'][16] # county-level
wiki_dfs['PRES16_MI'] = wiki_tables['PRES16_MI'][17] # county-level
wiki_dfs['PRES16_MN'] = wiki_tables['PRES16_MN'][18] # county-level
wiki_dfs['PRES16_MS'] = wiki_tables['PRES16_MS'][17] # county-level
wiki_dfs['PRES16_MO'] = wiki_tables['PRES16_MO'][16] # county-level
wiki_dfs['PRES16_MT'] = wiki_tables['PRES16_MT'][10] # county-level
wiki_dfs['PRES16_NE'] = wiki_tables['PRES16_NE'][26] # county-level
wiki_dfs['PRES16_NV'] = wiki_tables['PRES16_NV'][17] # county-level
wiki_dfs['PRES16_NH'] = wiki_tables['PRES16_NH'][20] # county-level
wiki_dfs['PRES16_NJ'] = wiki_tables['PRES16_NJ'][14] # county-level
wiki_dfs['PRES16_NM'] = wiki_tables['PRES16_NM'][13] # county-level
wiki_dfs['PRES16_NY'] = wiki_tables['PRES16_NY'][27] # county-level
wiki_dfs['PRES16_NC'] = wiki_tables['PRES16_NC'][18] # county-level
wiki_dfs['PRES16_ND'] = wiki_tables['PRES16_ND'][12] # county-level
wiki_dfs['PRES16_OH'] = wiki_tables['PRES16_OH'][23] # county-level
wiki_dfs['PRES16_OK'] = wiki_tables['PRES16_OK'][20] # county-level
wiki_dfs['PRES16_OR'] = wiki_tables['PRES16_OR'][21] # county-level
wiki_dfs['PRES16_PA'] = wiki_tables['PRES16_PA'][19] # county-level
wiki_dfs['PRES16_RI'] = wiki_tables['PRES16_RI'][12] # county-level
wiki_dfs['PRES16_SC'] = wiki_tables['PRES16_SC'][16] # county-level
wiki_dfs['PRES16_SD'] = wiki_tables['PRES16_SD'][11] # county-level
wiki_dfs['PRES16_TN'] = wiki_tables['PRES16_TN'][13] # county-level
wiki_dfs['PRES16_TX'] = wiki_tables['PRES16_TX'][29] # county-level
wiki_dfs['PRES16_UT'] = wiki_tables['PRES16_UT'][13] # county-level
wiki_dfs['PRES16_VT'] = wiki_tables['PRES16_VT'][16] # county-level
wiki_dfs['PRES16_VA'] = wiki_tables['PRES16_VA'][20] # county-level
wiki_dfs['PRES16_WA'] = wiki_tables['PRES16_WA'][13] # county-level
wiki_dfs['PRES16_WV'] = wiki_tables['PRES16_WV'][11] # county-level
wiki_dfs['PRES16_WI'] = wiki_tables['PRES16_WI'][16] # county-level
wiki_dfs['PRES16_WY'] = wiki_tables['PRES16_WY'][12] # county-level

In [23]:
# SEN16

wiki_dfs['SEN16_AL'] = wiki_tables['SEN16_AL'][19]
wiki_dfs['SEN16_AK'] = wiki_tables['SEN16_AK'][20]
wiki_dfs['SEN16_AZ'] = wiki_tables['SEN16_AZ'][45]
wiki_dfs['SEN16_AR'] = wiki_tables['SEN16_AR'][16]
wiki_dfs['SEN16_CA'] = wiki_tables['SEN16_CA'][53]
wiki_dfs['SEN16_CO'] = wiki_tables['SEN16_CO'][25]
wiki_dfs['SEN16_CT'] = wiki_tables['SEN16_CT'][20]
wiki_dfs['SEN16_FL'] = wiki_tables['SEN16_FL'][64]
wiki_dfs['SEN16_GA'] = wiki_tables['SEN16_GA'][16]
wiki_dfs['SEN16_HI'] = wiki_tables['SEN16_HI'][18]
wiki_dfs['SEN16_ID'] = wiki_tables['SEN16_ID'][15]
wiki_dfs['SEN16_IL'] = wiki_tables['SEN16_IL'][29]
wiki_dfs['SEN16_IN'] = wiki_tables['SEN16_IN'][25]
wiki_dfs['SEN16_IA'] = wiki_tables['SEN16_IA'][20]
wiki_dfs['SEN16_KS'] = wiki_tables['SEN16_KS'][17]
wiki_dfs['SEN16_KY'] = wiki_tables['SEN16_KY'][22]
wiki_dfs['SEN16_LA'] = wiki_tables['SEN16_LA'][24]
wiki_dfs['SEN16_MD'] = wiki_tables['SEN16_MD'][29]
wiki_dfs['SEN16_MO'] = wiki_tables['SEN16_MO'][24]
wiki_dfs['SEN16_NV'] = wiki_tables['SEN16_NV'][32]
wiki_dfs['SEN16_NH'] = wiki_tables['SEN16_NH'][23]
wiki_dfs['SEN16_NY'] = wiki_tables['SEN16_NY'][15]
wiki_dfs['SEN16_NC'] = wiki_tables['SEN16_NC'][42]
wiki_dfs['SEN16_ND'] = wiki_tables['SEN16_ND'][14]
wiki_dfs['SEN16_OH'] = wiki_tables['SEN16_OH'][29]
wiki_dfs['SEN16_OK'] = wiki_tables['SEN16_OK'][12]
wiki_dfs['SEN16_OR'] = wiki_tables['SEN16_OR'][14]
wiki_dfs['SEN16_PA'] = wiki_tables['SEN16_PA'][38]
wiki_dfs['SEN16_SC'] = wiki_tables['SEN16_SC'][16]
wiki_dfs['SEN16_SD'] = wiki_tables['SEN16_SD'][9]
wiki_dfs['SEN16_UT'] = wiki_tables['SEN16_UT'][19]
wiki_dfs['SEN16_VT'] = wiki_tables['SEN16_VT'][12]
wiki_dfs['SEN16_WA'] = wiki_tables['SEN16_WA'][15]
wiki_dfs['SEN16_WI'] = wiki_tables['SEN16_WI'][21]

In [24]:
# USH16

# wiki_dfs['USH16_AK'] = wiki_tables['USH16_AK'][]
# wiki_dfs['USH16_DE'] = wiki_tables['USH16_DE'][]
# wiki_dfs['USH16_MT'] = wiki_tables['USH16_MT'][]
# wiki_dfs['USH16_ND'] = wiki_tables['USH16_ND'][]
# wiki_dfs['USH16_SD'] = wiki_tables['USH16_SD'][]
# wiki_dfs['USH16_VT'] = wiki_tables['USH16_VT'][]
# wiki_dfs['USH16_WY'] = wiki_tables['USH16_WY'][]

In [25]:
# SEN17

# wiki_dfs['SEN17_AL'] = wiki_tables['SEN17_AL'][]

In [26]:
# USH17

In [27]:
# SEN18

wiki_dfs['SEN18_AZ'] = wiki_tables['SEN18_AZ'][40]
wiki_dfs['SEN18_CA'] = wiki_tables['SEN18_CA'][54]
wiki_dfs['SEN18_CT'] = wiki_tables['SEN18_CT'][17]
wiki_dfs['SEN18_DE'] = wiki_tables['SEN18_DE'][29]
wiki_dfs['SEN18_FL'] = wiki_tables['SEN18_FL'][29]
wiki_dfs['SEN18_HI'] = wiki_tables['SEN18_HI'][14]
wiki_dfs['SEN18_IN'] = wiki_tables['SEN18_IN'][29]
wiki_dfs['SEN18_ME'] = wiki_tables['SEN18_ME'][20]
wiki_dfs['SEN18_MD'] = wiki_tables['SEN18_MD'][25]
wiki_dfs['SEN18_MA'] = wiki_tables['SEN18_MA'][29]
wiki_dfs['SEN18_MI'] = wiki_tables['SEN18_MI'][30]
wiki_dfs['SEN18_MN'] = wiki_tables['SEN18_MN'][20]
wiki_dfs['SEN18_MS'] = wiki_tables['SEN18_MS'][23]
wiki_dfs['SEN18_MO'] = wiki_tables['SEN18_MO'][35]
wiki_dfs['SEN18_MT'] = wiki_tables['SEN18_MT'][22]
wiki_dfs['SEN18_NE'] = wiki_tables['SEN18_NE'][19]
wiki_dfs['SEN18_NV'] = wiki_tables['SEN18_NV'][28]
wiki_dfs['SEN18_NJ'] = wiki_tables['SEN18_NJ'][22]
wiki_dfs['SEN18_NM'] = wiki_tables['SEN18_NM'][21]
wiki_dfs['SEN18_NY'] = wiki_tables['SEN18_NY'][16]
wiki_dfs['SEN18_ND'] = wiki_tables['SEN18_ND'][23]
wiki_dfs['SEN18_OH'] = wiki_tables['SEN18_OH'][32]
wiki_dfs['SEN18_PA'] = wiki_tables['SEN18_PA'][28]
wiki_dfs['SEN18_RI'] = wiki_tables['SEN18_RI'][17]
wiki_dfs['SEN18_TN'] = wiki_tables['SEN18_TN'][29]
wiki_dfs['SEN18_TX'] = wiki_tables['SEN18_TX'][37]
wiki_dfs['SEN18_UT'] = wiki_tables['SEN18_UT'][31]
wiki_dfs['SEN18_VT'] = wiki_tables['SEN18_VT'][13]
wiki_dfs['SEN18_VA'] = wiki_tables['SEN18_VA'][32]
wiki_dfs['SEN18_WA'] = wiki_tables['SEN18_WA'][12]
wiki_dfs['SEN18_WV'] = wiki_tables['SEN18_WV'][31]
wiki_dfs['SEN18_WI'] = wiki_tables['SEN18_WI'][27]
wiki_dfs['SEN18_WY'] = wiki_tables['SEN18_WY'][13]

In [28]:
# USH18

# wiki_dfs['USH18_AK'] = wiki_tables['USH18_AK'][]
# wiki_dfs['USH18_DE'] = wiki_tables['USH18_DE'][]
# wiki_dfs['USH18_MT'] = wiki_tables['USH18_MT'][]
# wiki_dfs['USH18_ND'] = wiki_tables['USH18_ND'][]
# wiki_dfs['USH18_SD'] = wiki_tables['USH18_SD'][]
# wiki_dfs['USH18_VT'] = wiki_tables['USH18_VT'][]
# wiki_dfs['USH18_WY'] = wiki_tables['USH18_WY'][]

In [29]:
# Save all the scraped tables

# commented out to save local space and time
for election in wiki_dfs:
    outpath = os.path.join('src', 'wiki', election + '.csv')
    et.ExtractTable(wiki_dfs[election], 
                    outfile=outpath).extract_to_file()

[Coming Soon] __Step 1.4.__ Gather Ballotpedia data for comparison purposes

*Note:* Depending on response to API access, this step may need to be done manually

Step 2. Data wrangling
---------------------------

__Step 2.1.__ Wrangle MEDSL data

In [30]:
def pivot_medsl(medsl_dfs_dict: Dict[str, Union[pd.DataFrame, gpd.GeoDataFrame]]
        ) -> Dict[str, Union[pd.DataFrame, gpd.GeoDataFrame]]:
    """
    Given a dictionary of MEDSL (Geo)DataFrames, returns a dictionary of
    pivoted DataFrames where the columns are the election and parties and 
    the values are the votes for every precinct.
    
    """
    for df in medsl_dfs_dict:
        medsl_pvt = medsl_dfs_dict[df].pivot_table(index='precinct',
                                                   columns=['office', 'party'],
                                                   values='votes')
        medsl_pvt.columns = [' '.join(col).strip() for col in medsl_pvt.columns.values]
        medsl_dfs_dict[df] = et.ExtractTable(medsl_pvt).extract()

In [31]:
# View available data to find applicable datasets

# commented out to save screen space
# for df in medsl_dfs:
#     print('--------{}--------'.format(df))
#     print(medsl_dfs[df].head())
#     print()

In [None]:
# Pivot and extract relevant MEDSL election data

medsl_18_dfs = {}
medsl_pres16_dfs = {}
medsl_sen16_dfs = {}
medsl_ush16_dfs = {}

for state in states:
    st, st_abv = state
    try:
        medsl_18_dfs[st] = et.ExtractTable(medsl_dfs['precinct_2018'], 
                                           column='state', value=st).extract()
    except Exception as e:
        print('Missing data in medsl_18:', e)
    
    try:
        medsl_pres16_dfs[st] = et.ExtractTable(medsl_dfs['2016-precinct-president'], 
                                               column='state', value=st).extract()
    except Exception as e:
        print('Missing data in medsl_pres16:', e)
    
    try:
        medsl_sen16_dfs[st] = et.ExtractTable(medsl_dfs['2016-precinct-senate'], 
                                              column='state', value=st).extract()
    except Exception as e:
        print('Missing data in medsl_sen16:', e)
    
    try:
        medsl_ush16_dfs[st] = et.ExtractTable(medsl_dfs['2016-precinct-house'], 
                                              column='state', value=st).extract()
    except Exception as e:
        print('Missing data in medsl_ush16:', e)

        
pivot_medsl(medsl_18_dfs)
pivot_medsl(medsl_pres16_dfs)
pivot_medsl(medsl_sen16_dfs)
pivot_medsl(medsl_ush16_dfs)

__Step 2.2.__ Wrangle Wikipedia data

Step 3. Data standardization check of ``mggg-states``
---------------------------------------------------------------

__Step 3.1__ Generate naming conventions

In [33]:
with open(os.path.join('src', 'naming_convention.json')) as json_file:
    standards_raw = json.load(json_file)
    
offices = dm.get_keys_by_category(standards_raw, 'offices')
parties = dm.get_keys_by_category(standards_raw, 'parties')
counts  = dm.get_keys_by_category(standards_raw, 'counts')
others  = dm.get_keys_by_category(standards_raw, ['geographies', 
                                                  'demographics', 
                                                  'districts', 
                                                  'other'])

elections = [office + format(year, '02') + party 
             for office in offices
             for year in range(0, 21)
             for party in parties 
             if not (office == 'PRES' and year % 4 != 0)]

counts    = [count + format(year, '02') 
             for count in counts 
             for year in range(0, 20)]

standards = elections + counts + others

__Step 3.2.__ Check ``mggg-states`` data compliance with naming conventions

In [34]:
naming_check = {}

for gdf in mggg_gdfs:
      naming_check[gdf] = dq.compare_column_names(mggg_gdfs[gdf], standards)

In [35]:
# Print and store results of naming convention check

# dictionary with mggg GeoDataFrame names as keys (names of files from
# which dataset was gathered), and a set of columns that fit the 
# naming convention as values
matched_columns = {}

for gdf in naming_check:
    print('=========================================')
    print('Dataset: {}'.format(gdf))
    print('=========================================')
    
    (matches, diffs) = naming_check[gdf]
    matched_columns[gdf] = matches
    
    diffs = list(diffs)
    diffs.sort()
    
    print('Discrepancies from naming convention:')
    print(diffs)
    print()

Dataset: MN12
Discrepancies from naming convention:
['CA1NO12', 'CA1YES12', 'CA2NO12', 'CA2YES12', 'CONGDIST', 'COUNTYFIPS', 'COUNTYNAME', 'CTU_TYPE', 'CTYCOMDIST', 'JUDDIST', 'MCDCODE', 'MCDNAME', 'MNLEGDIST', 'MNSENDIST', 'PCTCODE', 'PCTNAME', 'VTD']

Dataset: MN16
Discrepancies from naming convention:
['CONGDIST', 'COUNTYFIPS', 'COUNTYNAME', 'CTU_TYPE', 'CTYCOMDIST', 'JUDDIST', 'MCDCODE', 'MCDNAME', 'MNLEGDIST', 'MNSENDIST', 'PCTCODE', 'PCTNAME', 'VTDID']

Dataset: MN14
Discrepancies from naming convention:
['AG14LM', 'AUD14LC', 'CONGDIST', 'COUNTYFIPS', 'COUNTYNAME', 'CTU_TYPE', 'CTYCOMDIST', 'JUDDIST', 'MCDCODE', 'MCDNAME', 'MNLEGDIST', 'MNSENDIST', 'PCTCODE', 'PCTNAME', 'VTDID']

Dataset: MN12_18
Discrepancies from naming convention:
['AG14LM', 'AG18LC', 'AUD14LC', 'AUD18LM', 'CA1NO12', 'CA1YES12', 'CA2NO12', 'CA2YES12', 'CONGDIST', 'COUNTYFIPS', 'COUNTYNAME', 'CTU_TYPE', 'CTYCOMDIST', 'GOV18LC', 'JUDDIST', 'MCDCODE', 'MCDNAME', 'MNLEGDIST', 'MNSENDIST', 'PCTCODE', 'PCTNAME', 'SE

Step 4. Compare ``mggg-states`` data with external sources
----------------------------------------------------------

In [36]:
# Categorize mggg-states dataframes by the names of their States

# a list of tuples of state names (left) and a list of GeoDataFrames (right)
available_mggg_states = []

for state in states:
    state_name, state_abv = state
    
    # messy name matching because file naming isn't standardized
    mggg_gdf_names = [gdf_name for gdf_name in list(mggg_gdfs) 
                               if gdf_name.startswith(state_abv.lower() + '_') or
                                  gdf_name.startswith(state_abv + '_') or
                                  gdf_name.startswith(state_name.lower() + '_') or
                                  gdf_name.startswith(state_name.upper() + '_') or
                                  gdf_name.startswith(state_name + '_') or
                                  gdf_name.startswith(
                                      state_name.replace(' ', '_').lower() + '_') or
                                  gdf_name.startswith(
                                      state_name.replace(' ', '_').upper() + '_') or
                                  gdf_name.startswith(
                                      state_name.replace(' ', '_') + '_')]
                      
    available_mggg_states.append((state_name, mggg_gdf_names))

In [None]:
# View available columns that can be compared in mggg-states

# commented out to save screen space
# for state in available_mggg_states:
#     st, gdf_names = state
#     if gdf_names:
#         print('======== {} ========'.format(st))
#         for name in gdf_names:
#             print('{} --------'.format(name))
#             print(sorted(list(matched_columns[name])))
#             print()
#         print()

__Step 4.1.__ Compare against MEDSL

In [38]:
# Generate Naming Convention Translations between MGGG and MEDSL
# Has to be done manually because some MEDSL data don't use the 
# same naming convention

pres16_cols = [
    ('PRES16D', 'US President democratic'), 
    ('PRES16G', 'US President green'), 
    ('PRES16L', 'US President libertarian'), 
    ('PRES16R', 'US President republican')
]

sen16_cols = [
    ('SEN16D', 'US Senate democrat'),
    ('SEN16G', 'US Senate green'),
    ('SEN16L', 'US Senate libertarian'),
    ('SEN16R', 'US Senate republican')
]

ush16_cols = [
    ('USH16D', 'US House democratic'),
    ('USH16G', 'US House green'),
    ('USH16L', 'US House libertarian'),
    ('USH16R', 'US House republican')
]

fed18_cols = [
    ('SEN18D', 'US Senate democratic'),
    ('SEN18G', 'US Senate green'),
    ('SEN18L', 'US Senate libertarian'),
    ('SEN18R', 'US Senate republican'),
    ('USH18D', 'US House democrat'),
    ('USH18G', 'US House green'),
    ('USH18L', 'US House libertarian'),
    ('USH18R', 'US House republican')
]

In [39]:
def bulk_compare(bulk_results: Dict[str, List[Tuple[Hashable, Any]]], 
                 st: str, 
                 mggg_names: List[str], 
                 medsls: List[pd.DataFrame], 
                 cols: Tuple[str, str]
        ) -> NoReturn: # Returns bulk_results by reference
    """
    Returns, by reference, a dict containing state names as keys and a value of a 
    dict containing mggg_gdf names as keys and the results of dm.compare_column_sums
    as values.
    
    """
    if st in medsls:
        try:
            mggg_results = bulk_results[st]
        except Exception:
            mggg_results = {}
            
        for mggg_name in mggg_names:
            x = []
            y = []
            
            for mggg_col, medsl_col in cols:
                if (mggg_col in list(mggg_gdfs[mggg_name]) and 
                    medsl_col in list(medsls[st])):
                    x.append(mggg_col)
                    y.append(medsl_col)
                
            if x:
                try:
                    # append results
                    mggg_results[mggg_name] += \
                        dq.compare_column_sums(mggg_gdfs[mggg_name], 
                                               medsls[st], x, y)
                except Exception as e:
                    try: # if results don't already exist
                        mggg_results[mggg_name] = \
                            dq.compare_column_sums(mggg_gdfs[mggg_name], 
                                                   medsls[st], x, y)
                    except Exception as z:
                        print("Unable to compare {} and {} in {}.\n\t{}".format(
                              x, y, mggg_name, z))
            else:
                mggg_results[mggg_name] = []
        
            bulk_results[st] = mggg_results

In [40]:
# Compare mggg-states data with MEDSL data

# a dictionary with State names as keys and with a list 
# of comparison results of column for each mggg-states
# dataframe (dicts) as values
results = {}

# printed errors indicate differing column data types --
# to manually investigate as part of QA
for state in available_mggg_states:
    st, mggg_names = state
    
    if mggg_names:
        bulk_compare(results, st, mggg_names, medsl_pres16_dfs, pres16_cols)
        bulk_compare(results, st, mggg_names, medsl_sen16_dfs, sen16_cols)
        bulk_compare(results, st, mggg_names, medsl_ush16_dfs, ush16_cols)
        bulk_compare(results, st, mggg_names, medsl_18_dfs, fed18_cols)

Unable to compare ['PRES16D', 'PRES16L', 'PRES16R'] and ['US President democratic', 'US President libertarian', 'US President republican'] in GA_precincts.
	ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U8310'), dtype('<U8310')) -> dtype('<U8310')
Unable to compare ['SEN16L', 'SEN16R'] and ['US Senate libertarian', 'US Senate republican'] in GA_precincts.
	ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U5508'), dtype('<U5508')) -> dtype('<U5508')
Unable to compare ['PRES16D', 'PRES16R'] and ['US President democratic', 'US President republican'] in MA_no_islands_12_16.
	ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U7939'), dtype('<U7939')) -> dtype('<U7939')
Unable to compare ['SEN16R'] and ['US Senate republican'] in VT_towns.
	ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U747'), dtype('<U747')) -> dtype('<U747')


In [41]:
# Print comparison results

print("============================================================================")
print("Results of State-level Aggregation comparisons between mggg-states and MEDSL")
print("============================================================================")
print()
print('{:37} : {}'.format('mggg-states column [vs] MEDSL column', 'difference in sums'))
print('----------------------------------------------------------------------------')

for st in results:
    print("######## {} ########".format(st))
    
    for mggg_name in results[st]:
        print("{} ========".format(mggg_name))
        
        if results[st][mggg_name]:
            for col_v_col, diff in results[st][mggg_name]:
                print('{:37} : {}'.format(col_v_col, diff))
        else:
            print("No comparable columns found.")
            
        print()
    print()


Results of State-level Aggregation comparisons between mggg-states and MEDSL

mggg-states column [vs] MEDSL column  : difference in sums
----------------------------------------------------------------------------
######## Alaska ########
PRES16D [vs] US President democratic  : -47220.0
PRES16G [vs] US President green       : -1947.5
PRES16L [vs] US President libertarian : -6717.5
PRES16R [vs] US President republican  : -59910.0
SEN16L [vs] US Senate libertarian     : -30050.0
SEN16R [vs] US Senate republican      : -50259.5
USH16D [vs] US House democratic       : -43628.5
USH16L [vs] US House libertarian      : -11665.5
USH16R [vs] US House republican       : -54359.5
USH18R [vs] US House republican       : -46390.5


######## Arizona ########
SEN18R [vs] US Senate republican      : 0.0
USH18D [vs] US House democrat         : 0.0
USH18G [vs] US House green            : -3672.0
USH18R [vs] US House republican       : 0.0


######## Colorado ########
USH18R [vs] US House republican     

__Don't panic just yet -- potential explanations for differences:__

- MGGG data may have columns with mixed datatypes. Some columns contain strings, when they should contain numbers
- Some shapefiles use VTDs and towns, which may differ from precincts
- Undercounts may be the result of excluding absentee ballots in the count
- Sums should be spot-checked manually in addition to the automated checks
- The comparisons are conducted on the intersection between mggg-states and MEDSL datasets and are limited to 2016 and 2018 data
- MEDSL float values indicate proration -- to investigate
- I also haven't rigourously examined the MEDSL data :P

[Coming Soon] __Step 4.2.__ Compare against Wikipedia

[Coming Soon] __Step 4.3.__ Compare against Ballotpedia

__Step 4.4__ Data summation for manual checks against external sources

In [None]:
# Print MGGG summation results

# commented out to save screen space -- TODO: uncomment final
# print("============================================================================")
# print("Results of State-level Aggregations - mggg-states")
# print("============================================================================")
# print()
# print('{:11} : {:10}\t\t{}'.format('mggg-states column', 'sum', 'dtype'))
# print('----------------------------------------------------------------------------')

# for st, gdf_names in available_mggg_states:
#     if gdf_names:
#         print('######## {} ########'.format(st))
        
#         for gdf_name in gdf_names:
#             print("{} ========".format(gdf_name))

#             cols_to_sum = [col for col in matched_columns[gdf_name] if col != 'geometry']
#             col_sums = dq.sum_column_values(mggg_gdfs[gdf_name], cols_to_sum)

#             for result in col_sums:
#                 col_name, col_sum = result
                
#                 if col_name in matched_columns[gdf_name]:
#                     if isinstance(col_sum, str):
#                         col_sum = "{} is a str".format(col_name)
                        
#                     print('{:11} : {:10}\t\t{}'.format(col_name, col_sum, str(type(col_sum))))
                    
#             print('\n\n')
    

In [None]:
# Print MEDSL summation results

def print_medsl_sums(st: str, 
                     medsl_name: str, 
                     medsls: Dict[str, Union[pd.DataFrame, gpd.GeoDataFrame]]
        ) -> NoReturn:
    if st in medsls:
        print("######## {} : {} ########".format(st, medsl_name))
        
        cols_to_sum = [col for col in list(medsls[st]) if col != 'geometry']
        
        try:
            col_sums = dq.sum_column_values(medsls[st], cols_to_sum)
        except Exception as e:
            pass
        
        for result in col_sums:
            col_name, col_sum = result
            print('{:65} : {}\t{}'.format(col_name, col_sum, str(type(col_sum))))

        print('\n\n')

# commented out to save screen space -- TODO: uncomment final        
# print("============================================================================")
# print("Results of State-level Aggregations - MEDSL")
# print("============================================================================")
# print()
# print('{:65} : {}\t{}'.format('MEDSL column', 'sum', 'dtype'))
# print('----------------------------------------------------------------------------')

# for st, st_abv in states:
#     print_medsl_sums(st, 'MEDSL 2018', medsl_18_dfs)
#     print_medsl_sums(st, 'MEDSL PRES16', medsl_pres16_dfs)
#     print_medsl_sums(st, 'MEDSL SEN16', medsl_sen16_dfs)
#     print_medsl_sums(st, 'MEDSL USH16', medsl_ush16_dfs)
    

Step 5. Check topological soundness of ``mggg-states`` data
-----------------------------------------------------------------------

__Step 5.1.__ Check for empty or missing geometries

In [44]:
topological_warnings = []
for gdf in mggg_gdfs:
    if any(mggg_gdfs[gdf]['geometry'].isna()):
        topological_warnings.append('{} has missing geometries.'.format(gdf))
        
    if any(mggg_gdfs[gdf]['geometry'].is_empty):
        topological_warnings.append('{} has empty geometries.')

if len(topological_warnings) == 0:
    print("No missing or empty geometries.")
else:
    [print(msg) for msg in topological_warnings]

No missing or empty geometries.


Step 6. Cleanup
-------------------

In [45]:
# Remove cloned repos

# commented out to save local space and time
# dm.remove_repos('src/')

In [46]:
# Uninstall installed python packages

# commented out to save local space and time
# !echo y | pip3 uninstall numpy
# !echo y | pip3 uninstall pandas
# !echo y | pip3 uninstall geopandas
# !echo y | pip3 uninstall wikipedia

# !echo y | pip3 uninstall gdutils

In [47]:
# Reset Jupyter Notebook IPython Kernel

# commented out to save local space and time
# from IPython.core.display import HTML
# HTML("<script>Jupyter.notebook.kernel.restart()</script>")

Next Steps
-------------

__Data Standardization__

- Manually evaluate column naming discrepancies to determine if changes are needed
- Manually evaluate column datatypes to determine if changes are needed

__Data Comparison__

- Manually investigate large differences found through comparing ``mggg-states`` data with external sources (e.g. Are absentee ballots counted? Are the precinct counts accurate? etc.) 
- For more accurate comparisons, compare ``mggg-states`` data with those in each States' Secretary of State website

__Topological Soundness__

- Manually examine shapefiles for gaps and overlaps. *Note:* although gaps and overlaps are not necessarily indicators of inaccurate data (because some counties have precinct islands), they do mean that the data cannot be for chain runs. 

__Data Documentation__

- Do the READMEs provide data sources?
- Do the READMEs describe what aggregation/disaggregation processes were used?
- Do the READMEs discuss discrepancies/caveats in the data?
- Do the READMEs provide scripts used and/or discuss the data wrangling/processing process?