![POLITICO](https://rawgithub.com/The-Politico/src/master/images/logo/badge.png)

# POLITICO political district similarity maps

POLITICO district political similarity maps create lists of the most similar districts to any one district based on voting history.

The maps are created by calculating a weighted Euclidean distance between major party returns in each district.

---

## Data

Voting data used to determine similarity includes major party returns for:
- 2012, 2014 & 2016 U.S. House
- 2012 & 2016 U.S. president

The [data](https://www.politico.com/election-results/2018/race-ratings/data/historical/house.json) is sourced from MIT Election Lab and Daily Kos.

In [12]:
import requests

response = requests.get('https://www.politico.com/election-results/2018/race-ratings/data/historical/house.json')
data = response.json()
districts = [district for district in list(data.keys()) if 'LA' not in district and 'PA' not in district]
districts.sort()

---

## Processing

We use vote percent to standardize returns in each district.

In [13]:
def get_results(year, results):
    result = list(filter(lambda x: x['year'] == year, results))[0]
    dem = result.get('dem', {}).get('votePct', 0)
    gop = result.get('gop', {}).get('votePct', 0)
    return [dem, gop]

### Uncontested race discount

If a party is uncontested for a House seat during a cycle but has been contested in one or more other cycles, we discount the party's result in the uncontested race. We do this to limit the effect party dominance in any single uncontested race has because it may otherwise skew how competitive the district truly is.

The discount is arbitrary and calculated as half the return in the uncontested race minus the average of the party's contested returns.

For example, if a Democratic candidate was uncontested in one cycle and received 95% of the vote but had only received an average of 55% of the vote in contested races, the uncontested result is discounted by (95 - 55) / 2 = 20%. The discounted return in the uncontested year would be **75%**.

If *all* races in the period are uncontested, we do not discount the uncontested return.



In [14]:
from statistics import mean

def flatten(list_of_lists):
    return [item for sublist in list_of_lists for item in sublist]

def discount_uncontested(results):
    uncontested = [result for result in results if 0 in result]
    contested = [result for result in results if 0 not in result]
    # If all races are uncontested, do not discount result
    if len(contested) == 0:
        return flatten(uncontested)
    
    discounted_uncontested = []

    for result in uncontested:
        # No Dem, GOP uncontested
        if result.index(0) == 0:
            avg_return = mean([r[1] for r in contested])
            discount = (result[1] - avg_return) / 2
            discounted_uncontested.append([0, result[1] - discount])
        # No GOP, Dem uncontested
        else:
            avg_return = mean([result[0] for result in contested])
            discount = (result[0] - avg_return) / 2
            discounted_uncontested.append([result[0] - discount, 0])
    return flatten(discounted_uncontested) + flatten(contested)

In [15]:
district_results = {}

for district in districts:
    seat = data[district]['seat']
    pres = data[district]['president']
    
    # List of results [Dem, GOP]
    h2012 = get_results('2012', seat)
    h2014 = get_results('2014', seat)
    h2016 = get_results('2016', seat)
    p2012 = get_results('2012', pres)
    p2016 = get_results('2016', pres)
    
    house_results = discount_uncontested([h2012, h2014, h2016])
    
    district_results[district] = house_results + p2012 + p2016
    

### Weighted Euclidean distance


Each major party result represents a coordinate point along an axis of 0 to 1 (100% of the vote). Using those coordinates, we calculate the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between each district.

We weight results to favor presidential results and more recent results, respectively.

In [26]:
from scipy.spatial import distance

weights = {
    'h2012': 1, # House results
    'h2014': 2,
    'h2016': 5,
    'p2012': 4, # President results
    'p2016': 5,
}

def get_distance(district, comparitor):
    w = [
        #       DEM               GOP
        weights['h2012'], weights['h2012'],
        weights['h2014'], weights['h2014'],
        weights['h2016'], weights['h2016'],
        weights['p2012'], weights['p2012'],
        weights['p2016'], weights['p2016'],
    ]
    return distance.euclidean(district, comparitor, w)
    

In [27]:
district_distances = {}

for district, districtResults in district_results.items():
    district_distances[district] = []
    for comparitor, comparitorResults in district_results.items():
        if district == comparitor:
            continue
        district_distances[district].append({
            'district': comparitor,
            'distance': get_distance(districtResults, comparitorResults)
        })

For each district we calculate the **25** most similar districts.

In [28]:
N = 25

In [29]:
similar_districts = {}
similar_districts_diagnostics = {}

for district, distances in district_distances.items():
    sorted_distances = list(sorted(distances, key=lambda k: k['distance']))
    similar_districts[district] = [k['district'] for k in sorted_distances[:N]]
    similar_districts_diagnostics[district] = sorted_distances[:N]

---

## Output

JSON

In [30]:
import json

with open('data/political-similarity.json', 'w') as file:
    json.dump(similar_districts, file)

CSV for diagnostics

In [35]:
import csv

with open('data/political-similarity-diagnostics.csv','w') as file:
    writer = csv.writer(file)
    writer.writerow(['district', 'min_similarity', 'max_similarity', 'similarity_range', 'most_similar'])

    for district in districts:
        min = similar_districts_diagnostics[district][0]['distance']
        max = similar_districts_diagnostics[district][-1]['distance']
        row = [district, min, max, max-min] + [k['district'] for k in similar_districts_diagnostics[district]]
        
        writer.writerow(row)
        
        print(row)

['AK-00', 0.23428269248922343, 0.33659278067124376, 0.10231008818202034, 'WI-06', 'OK-05', 'AR-02', 'MT-00', 'CO-03', 'MI-08', 'MI-07', 'TX-25', 'WA-05', 'NC-06', 'TX-06', 'KS-02', 'WA-04', 'FL-03', 'TX-10', 'NC-02', 'CA-01', 'TX-31', 'MO-02', 'MI-01', 'NC-05', 'MI-11', 'CA-04', 'MI-03', 'VA-01']
['AL-01', 0.030823124436046363, 0.5587881821406032, 0.5279650577045568, 'GA-08', 'GA-10', 'GA-03', 'OK-01', 'TX-05', 'FL-04', 'AZ-08', 'TX-03', 'CA-23', 'MS-03', 'OH-08', 'IL-18', 'TX-26', 'TN-02', 'GA-11', 'SC-04', 'GA-14', 'TX-12', 'FL-25', 'FL-01', 'TN-08', 'KY-02', 'OK-04', 'AL-05', 'TX-04']
['AL-02', 0.24850845056053933, 0.43255396195156964, 0.18404551139103031, 'IN-09', 'TX-27', 'FL-03', 'CA-01', 'FL-17', 'TX-25', 'KS-04', 'NC-05', 'NC-11', 'MT-00', 'SD-00', 'TX-31', 'TX-14', 'OK-05', 'NC-10', 'AR-04', 'AZ-05', 'IN-02', 'WV-02', 'TX-06', 'CA-42', 'IN-04', 'WA-04', 'TX-10', 'IA-04']
['AL-03', 0.09904594893280602, 0.24126618080452142, 0.1422202318717154, 'FL-11', 'IN-04', 'TN-03', 'MO-06',

In [34]:
from statistics import pvariance

response = requests.get('https://www.politico.com/election-results/2018/race-ratings/data/ratings.json')
ratings = {}
rating_codes = {}

for rating in response.json():
    ratings[rating['id']] = rating['latest_rating']['short_label']
    rating_codes[ratings[rating['id']]] = rating['latest_rating']['order']

    
with open('data/political-similarity-ratings.csv','w') as file:
    writer = csv.writer(file)
    writer.writerow(['district', 'rating', 'variance', 'most_similar_ratings'])
    
    for district in districts:
        row = [
            district,
            ratings[district],
            pvariance([rating_codes[ratings[k['district']]] for k in similar_districts_diagnostics[district]])
        ] + [ratings[k['district']] for k in similar_districts_diagnostics[district]]
        
        writer.writerow(row)
        
        print(row)

['AK-00', 'Likely-R', 1.1136, 'Lean-R', 'Likely-R', 'Lean-R', 'Likely-R', 'Likely-R', 'Toss-Up', 'Likely-R', 'Solid-R', 'Lean-R', 'Solid-R', 'Likely-R', 'Toss-Up', 'Solid-R', 'Solid-R', 'Solid-R', 'Toss-Up', 'Solid-R', 'Likely-R', 'Likely-R', 'Likely-R', 'Solid-R', 'Toss-Up', 'Likely-R', 'Solid-R', 'Solid-R']
['AL-01', 'Solid-R', 0.10560000000000001, 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Likely-R', 'Likely-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Likely-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R']
['AL-02', 'Solid-R', 0.2176, 'Likely-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Likely-R', 'Solid-R', 'Solid-R', 'Likely-R', 'Solid-R', 'Likely-R', 'Solid-R', 'Likely-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Likely-R', 'Solid-R', 'Likely-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Solid-R', 'Likely-R']
['AL-03', 'Solid-R', 0.038400000000000004, 'Solid-R', 'So