# Consistency over time

Over a week (02.06.2017 - 09.06.2017) each day I did a snapshot of renovation votes dataset. During this week voting was going and figures are still not final yet (voting ends at 15.06.2017). So each snaphot captures subsequent intermidiate voting figures. Idea behind this notebook is to check how consistent are changes in those figures. For example one can expect that attendance for house either will stay the same or grow, but won't drop.

In [1]:
%matplotlib inline

import json
import pandas as pd
import numpy as np
import seaborn as sns
from haversine import haversine

Read and merge datasets.

In [2]:
days = [2, 4, 5, 6, 7, 8, 9]
l = []
for i in days:
    with open('../data/renovation_votes_{}0617.json'.format(str(i).rjust(2,'0'))) as file:    
        data = json.load(file)
    for building in data:
        card_fields = building['card_fields']
        l.append({'area_name': card_fields['area_name'],
                  'district_name': card_fields['district_name'],
                  'name': card_fields['name'],
                  'vote_against': card_fields['result']['protiv'],
                  'vote_for': card_fields['result']['za'],
                  'vote_attendance': card_fields['result']['yavka'],
                  'updated_at': card_fields['updated_at'],
                  'center_lat': building['center']['coordinates'][0],
                  'center_lon': building['center']['coordinates'][1]})
df = pd.DataFrame(l)
df['vote_against'] = df.vote_against * 0.01
df['vote_for'] = df.vote_for * 0.01
df['vote_attendance'] = df.vote_attendance * 0.01
df['june_day'] = df.updated_at.apply(lambda x: int(x[:2]))
df.shape

(31822, 10)

Show sample of data.

In [3]:
df.tail()

Unnamed: 0,area_name,center_lat,center_lon,district_name,name,updated_at,vote_against,vote_attendance,vote_for,june_day
31817,Бабушкинский,55.861982,37.668428,СВАО,"улица Искры, дом 7",9.06,0.06,0.57,0.94,9
31818,Южное Тушино,55.849816,37.448227,СЗАО,"Химкинский бульвар, дом 11",9.06,0.19,0.52,0.81,9
31819,Марфино,55.82745,37.601244,СВАО,"Ботаническая улица, дом 17",9.06,0.17,0.68,0.83,9
31820,Останкинский,55.816973,37.630957,СВАО,"улица Цандера, дом 4, корпус 1",9.06,0.21,0.63,0.79,9
31821,Можайский,55.720121,37.395743,ЗАО,"улица Говорова, дом 14, корпус 3",9.06,0.08,0.81,0.92,9


First let's check if there are occasions when percent of attendance has dropped. It turns out there are three such records (in grid below). Attendance should be incremental and can't drop. Existance of such records suggests that this data is invalid for some reason.
Note, '_nextday' column syffix means measurement is from next snapshot after day mentioned in 'updated_at_day'.

In [4]:
dfs = []
for day, nday in zip(days[:-1], days[1:]):
    m = df[df.june_day == day].merge(df[df.june_day == nday], 
                                 on=['name', 'area_name', 'district_name'], 
                                 suffixes=('_day', '_nextday'))
    dfs.append(m[(m.vote_attendance_nextday - m.vote_attendance_day) < 0])
pd.concat(dfs)[['area_name', 
                'district_name', 
                'name', 
                'updated_at_day', 
                'updated_at_nextday',
                'vote_against_day',
                'vote_against_nextday',
                'vote_for_day',
                'vote_for_nextday',
                'vote_attendance_day',
                'vote_attendance_nextday']]    

Unnamed: 0,area_name,district_name,name,updated_at_day,updated_at_nextday,vote_against_day,vote_against_nextday,vote_for_day,vote_for_nextday,vote_attendance_day,vote_attendance_nextday
910,Чертаново Южное,ЮАО,"улица Газопровод, дом 6Г, корпус 3",2.06,4.06,0.0,0.0,1.0,1.0,1.0,0.75
3622,Соколиная Гора,ВАО,"5-я улица Соколиной Горы, дом 21, корпус 1",2.06,4.06,0.0,0.0,1.0,1.0,1.0,0.92
2699,Северное Измайлово,ВАО,"5-я Парковая улица, дом 57, корпус 1",5.06,6.06,0.2,0.18,0.8,0.82,0.56,0.55


It turns that changes in attendance figures are enough to calculate limits of how vote percentages can change. For example if we now that on particular day attendance is 40% and 20% of people have voted against, and we see that day after attendance became 45%, we can expect (by applying some math) that against percentage will be somewhere between 16% and 35%. So here I will check if voting figures match such expectation. 

It turns out that there are 77 records that violate this rule... Again it should indicated that data is invalid for some reason.

Couple of notes:
1. In formula I take care of the fact that voting figures are obviously rounded.
2. Records are sorted by decreasing extent at which they violate this rules.
3. 'vote_against_low' and 'vote_against_high' columns show in what range 'vote_against_nextday' should be.
4. First table shows records where against percentage is less than expected, second table (of 1 record) where it is more than expected.

In [5]:
against_too_low = []
against_too_high = []
for day, nday in zip(days[:-1], days[1:]):
    m = df[df.june_day == day].merge(df[df.june_day == nday], 
                                 on=['name', 'area_name', 'district_name'], 
                                 suffixes=('_day', '_nextday'))
    m = m[(m.vote_attendance_nextday - m.vote_attendance_day) >= 0.02]
    m['vote_against_low'] = np.floor((m.vote_against_day - 0.01) * 
                                     (m.vote_attendance_day - 0.01) / 
                                     (m.vote_attendance_nextday + 0.01) * 100) / 100
    m['vote_against_high'] = np.ceil(((m.vote_against_day + 0.01) * 
                                      (m.vote_attendance_day + 0.01) + 
                                      (m.vote_attendance_nextday + 0.01) - 
                                      (m.vote_attendance_day - 0.01)) / 
                                      (m.vote_attendance_nextday - 0.01) * 100) / 100
    against_too_low.append(m[m.vote_against_low > m.vote_against_nextday])
    against_too_high.append(m[m.vote_against_high < m.vote_against_nextday])
   
against_too_low = pd.concat(against_too_low)
against_too_low['vote_against_diff'] = m['vote_against_low'] - m['vote_against_nextday']
against_too_high = pd.concat(against_too_high)
against_too_high['vote_against_diff'] = m['vote_against_nextday'] - m['vote_against_high']

columns_to_show = ['area_name', 
                'district_name', 
                'name', 
                'updated_at_day', 
                'updated_at_nextday',
                'vote_against_day',
                'vote_against_nextday',
                'vote_for_day',
                'vote_for_nextday',
                'vote_attendance_day',
                'vote_attendance_nextday', 
                'vote_against_low', 
                'vote_against_high']
against_too_low.sort_values('vote_against_diff', ascending=False)[columns_to_show]

Unnamed: 0,area_name,district_name,name,updated_at_day,updated_at_nextday,vote_against_day,vote_against_nextday,vote_for_day,vote_for_nextday,vote_attendance_day,vote_attendance_nextday,vote_against_low,vote_against_high
2029,Люблино,ЮВАО,"Люблинская улица, дом 133",08.06,09.06,0.29,0.14,0.71,0.86,0.53,0.69,0.20,0.51
3715,Люблино,ЮВАО,"Таганрогская улица, дом 19",08.06,09.06,0.12,0.06,0.88,0.94,0.53,0.56,0.10,0.22
4242,Кузьминки,ЮВАО,"улица Юных Ленинцев, дом 94",08.06,09.06,0.27,0.15,0.73,0.85,0.51,0.68,0.18,0.51
3782,Кунцево,ЗАО,"улица Леси Украинки, дом 6, корпус 2",08.06,09.06,0.62,0.53,0.38,0.47,0.67,0.71,0.55,0.70
4290,Ярославский,СВАО,"Ярославское шоссе, дом 107А",08.06,09.06,0.10,0.06,0.90,0.94,0.62,0.64,0.08,0.18
589,Кузьминки,ЮВАО,"Волгоградский проспект, дом 152, корпус 2",08.06,09.06,0.09,0.04,0.91,0.96,0.55,0.71,0.06,0.34
2247,Коптево,САО,"3-й Новомихалковский проезд, дом 13",08.06,09.06,0.22,0.16,0.78,0.84,0.38,0.42,0.18,0.37
2856,Даниловский,ЮАО,"2-й Павелецкий проезд, дом 4, корпус 2",08.06,09.06,0.20,0.16,0.80,0.84,0.59,0.63,0.17,0.31
2580,Марьина Роща,СВАО,"Октябрьская улица, дом 68",08.06,09.06,0.24,0.19,0.76,0.81,0.48,0.53,0.20,0.38
3338,Нижегородский,ЮВАО,"Рязанский проспект, дом 23",08.06,09.06,0.12,0.09,0.88,0.91,0.55,0.57,0.10,0.21


In [6]:
against_too_high.sort_values('vote_against_diff', ascending=False)[columns_to_show]

Unnamed: 0,area_name,district_name,name,updated_at_day,updated_at_nextday,vote_against_day,vote_against_nextday,vote_for_day,vote_for_nextday,vote_attendance_day,vote_attendance_nextday,vote_against_low,vote_against_high
202,Соколиная Гора,ВАО,"Буракова улица, дом 23",7.06,8.06,0.35,0.41,0.65,0.59,0.89,0.91,0.32,0.41
