![POLITICO](https://rawgithub.com/The-Politico/src/master/images/logo/badge.png)

# POLITICO partisan voting district similarity maps

POLITICO partisan voting district similarity maps align districts by their similarity based on voting history.

The maps are created by calculating the weighted [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between districts based on major party returns in past federal elections.

---

## The data

We use federal election results for the following races:
- 2012, 2014 & 2016 U.S. House
- 2012 & 2016 U.S. president

This [data](https://www.politico.com/election-results/2018/race-ratings/data/historical/house.json) is sourced from MIT Election Lab and Daily Kos.

In [3]:
import requests

response = requests.get('https://www.politico.com/election-results/2018/race-ratings/data/historical/house.json')
data = response.json()
# We are excluding PA (new districts) and LA (jungle general election) this year
districts = [district for district in list(data.keys()) if 'LA' not in district and 'PA' not in district]
districts.sort()

---

## Calculate district similarity

We use vote percent to standardize returns in each district.

In [4]:
def get_results(year, results):
    result = list(filter(lambda x: x['year'] == year, results))[0]
    dem = result.get('dem', {}).get('votePct', 0)
    gop = result.get('gop', {}).get('votePct', 0)
    return [dem, gop]

### Uncontested race discount

If a party is uncontested for a House seat during a cycle but has been contested in one or more other cycles, we discount the party's result in the uncontested race. We do this to limit the effect party dominance in any single uncontested race has because it may otherwise skew how competitive the district truly is.

The discount is arbitrary and calculated as half the return in the uncontested race minus the maximum return the party received in contested races during the period.

For example, if a Democratic candidate was uncontested in one cycle and received 95% of the vote but had only received a maximum of 55% of the vote in contested races, the uncontested result is discounted by (95 - 55) / 2 = 20%. The discounted return in the uncontested year would be **75%**.

If *all* races in the period are uncontested, we **do not** discount the uncontested return.



In [5]:
from statistics import mean

def flatten(list_of_lists):
    return [item for sublist in list_of_lists for item in sublist]

def discount_uncontested(results):
    uncontested = [result for result in results if 0 in result]
    contested = [result for result in results if 0 not in result]
    # If all races are uncontested, do not discount result
    if len(contested) == 0:
        return flatten(uncontested)
    
    discounted_uncontested = []

    for result in uncontested:
        # No Dem, GOP uncontested
        if result.index(0) == 0:
            avg_return = max([r[1] for r in contested])
            discount = (result[1] - avg_return) / 2
            discounted_uncontested.append([0, result[1] - discount])
        # No GOP, Dem uncontested
        else:
            avg_return = mean([result[0] for result in contested])
            discount = (result[0] - avg_return) / 2
            discounted_uncontested.append([result[0] - discount, 0])
    return flatten(discounted_uncontested) + flatten(contested)

In [6]:
district_results = {}

for district in districts:
    seat = data[district]['seat']
    pres = data[district]['president']
    
    # List of results [Dem, GOP]
    h2012 = get_results('2012', seat)
    h2014 = get_results('2014', seat)
    h2016 = get_results('2016', seat)
    p2012 = get_results('2012', pres)
    p2016 = get_results('2016', pres)
    
    house_results = discount_uncontested([h2012, h2014, h2016])
    
    district_results[district] = house_results + p2012 + p2016
    

### Weighted Euclidean distance


Each major party result represents a coordinate point along an axis of 0 to 1 (0 - 100% of the vote). Using those coordinates, we calculate the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between each district.

We weight results, favoring results from *more recent* election cycles **2 to 1**.

In [7]:
from scipy.spatial import distance

weights = {
    'h2012': 0.5, # House results
    'h2014': 1,
    'h2016': 2,
    'p2012': 1, # President results
    'p2016': 2,
}

def get_distance(district, comparitor):
    w = [
        #       DEM               GOP
        weights['h2012'], weights['h2012'],
        weights['h2014'], weights['h2014'],
        weights['h2016'], weights['h2016'],
        weights['p2012'], weights['p2012'],
        weights['p2016'], weights['p2016'],
    ]
    return distance.euclidean(district, comparitor, w)
    

In [8]:
district_distances = {}

for district, districtResults in district_results.items():
    district_distances[district] = []
    for comparitor, comparitorResults in district_results.items():
        if district == comparitor:
            continue
        district_distances[district].append({
            'district': comparitor,
            'distance': get_distance(districtResults, comparitorResults)
        })

For each district we calculate the **22** most similar districts, about 5% of the total number of districts.

In [9]:
N = 22

In [10]:
similar_districts = {}
similar_district_ids = {}

for district, distances in district_distances.items():
    sorted_distances = list(sorted(distances, key=lambda k: k['distance']))
    
    similar_districts[district] = sorted_distances[:N]
    similar_district_ids[district] = [k['district'] for k in sorted_distances[:N]]
    

---

## Output

JSON

In [11]:
import json

with open('data/political-similarity.json', 'w') as file:
    json.dump(similar_district_ids, file)

CSV with similarity score stats

In [12]:
import csv

with open('data/political-similarity.csv','w') as file:
    writer = csv.writer(file)
    writer.writerow(['district', 'min_similarity', 'max_similarity', 'similarity_range', 'most_similar ⬅'])

    for district in districts:
        MIN = similar_districts[district][0]['distance']
        MAX = similar_districts[district][-1]['distance']
        row = [district, MIN, MAX, MAX - MIN] + [k['district'] for k in similar_districts[district]]
        
        writer.writerow(row)

### Compare similarity maps to POLITICO race ratings

These maps list the ratings of similar districts. We also calculate the variance for each map based on a point scale for the ratings.

In [13]:
from statistics import pvariance

response = requests.get('https://www.politico.com/election-results/2018/race-ratings/data/ratings.json')
ratings = {}
rating_codes = {}

for rating in response.json():
    ratings[rating['id']] = rating['latest_rating']['short_label']
    rating_codes[ratings[rating['id']]] = rating['latest_rating']['order']

    
with open('data/political-similarity-with-ratings.csv','w') as file:
    writer = csv.writer(file)
    writer.writerow(['district', 'rating', 'variance', 'most_similar_ratings'])
    
    for district in districts:
        row = [
            district,
            ratings[district],
            pvariance([rating_codes[ratings[k['district']]] for k in similar_districts[district]])
        ] + [ratings[k['district']] for k in similar_districts[district]]
        
        writer.writerow(row)

---

## Upload

In [14]:
import boto3
import os

s3 = boto3.resource('s3')

with open('data/political-similarity.json', 'rb') as data:
    key = 'election-results/2018/district-similarity-maps/political-similarity.json'
    s3.Bucket(os.getenv('AWS_S3_BUCKET')).put_object(
        Key=key,
        Body=data,
        ACL='public-read',
        CacheControl='max-age=300',
        ContentType='application/json'
    )