![POLITICO](https://rawgithub.com/The-Politico/src/master/images/logo/badge.png)

# POLITICO demographic district similarity maps

POLITICO demographic district similarity maps create lists of the most similar districts to any one district based on census demographics.

The maps are created by calculating a weighted Euclidean distance between demographic characteristics in each district.

---

## Data

Demographic profile for districts is based on four characteristics:

- Non-hispanic whiteness (B03002)
- Age (B01002)
- Median Income (B19013)
- Education attainment (B15003)

---

## Calculating weights

We weight demographic variables by how significant they are in determing a political identity. To do that, we establish the statistical relationship between the variables and voting behavior in a multivariate linear model.

While our goal is to determine the similarity of districts, we use county-level returns and demographic measures to calculate our weights because there are more counties than districts in which to test the relationship between demographics and voting. Our voting data (dependent) is 2012 and 2016 presidential results. Our demographic data (independent) comes from the 2012 and 2016 5-year American Community Survey. In the model, we weight 2016 double 2012 data.

To calculate the weights, we need to estimate how influential each variable is compared to each other in terms of its impact on party margin.

To begin we normalize our demographic variables to a scale of 0 to 1 representing the min and max of each variable's distribution. That way we can compare their model coefficients. We then take the absolute value of the coefficients and simplify them to a ratio. That ratio represents how influential they are compared to each other when used together to predict voting behavior. These are our weights.

For example, if non-hispanic whiteness has a coefficient of 4 while age has a coefficent of 2 in the model, we weight whiteness 2 to 1 when calculating our Euclidean distanct between districts.

---

### Get census data



In [91]:
import os
from census import Census
from us import states

c = Census(os.getenv('CENSUS_API_KEY'))


def get_census_series(year):
    FIPS = {}
    FIPS_POP = {}
    
    def add_to_fips(d):
        if d["fips"] not in FIPS:
            FIPS[d["fips"]] = {}
        FIPS[d["fips"]] = {**FIPS[d["fips"]], **d}
    
    # Get total population in a dict
    for d in c.acs5.get(['B03002_001E'], {'for': 'county:*'}, year=year):
        fips = d["state"] + d["county"]
        FIPS_POP[fips] = int(d["B03002_001E"])
    
    white = [{
        "fips": w["state"] + w["county"],
        "year": str(year),
        "white": w["B03002_003E"] / w["B03002_001E"]
    } for w in c.acs5.get(['B03002_003E', 'B03002_001E'], {'for': 'county:*'}, year=year)]
    white_values = [d["white"] for d in white]
    white_max = max(white_values)
    white_min = min(white_values)
    for d in white:
        d["white_norm"] = (d["white"] - white_min) / (white_max - white_min)
        add_to_fips(d)
    white.sort(key=lambda d: d["fips"])

    age = [{
        "fips": a["state"] + a["county"],
        "year": str(year),
        "age": a["B01002_001E"]
    } for a in c.acs5.get('B01002_001E', {'for': 'county:*'}, year=year)]
    age_values = [d["age"] for d in age]
    age_max = max(age_values)
    age_min = min(age_values)
    for d in age:
        d["age_norm"] = (d["age"] - age_min) / (age_max - age_min)
        add_to_fips(d)
    age.sort(key=lambda d: d["fips"])

    income = [{
        "fips": i["state"] + i["county"],
        "year": str(year),
        "income": i["B19013_001E"]
    } for i in c.acs5.get('B19013_001E', {'for': 'county:*'}, year=year)]
    income_values = [d["income"] for d in income]
    income_max = max(income_values)
    income_min = min(income_values)
    for d in income:
        d["income_norm"] = (d["income"] - income_min) / (income_max - income_min)
        add_to_fips(d)
    income.sort(key=lambda d: d["fips"])

    education = [{
        "fips": e["state"] + e["county"],
        "year": str(year),
        "education": (
            e["B15003_019E"] +
            e["B15003_020E"] +
            e["B15003_021E"] +
            e["B15003_022E"] +
            e["B15003_023E"] +
            e["B15003_024E"] +
            e["B15003_025E"]
        ) / e["B15003_001E"],
    } for e in c.acs5.get([
        'B15003_001E',
        'B15003_019E',
        'B15003_020E',
        'B15003_021E',
        'B15003_022E',
        'B15003_023E',
        'B15003_024E',
        'B15003_025E'
    ], {'for': 'county:*'}, year=year)]
    education_values = [d["education"] for d in education]
    education_max = max(education_values)
    education_min = min(education_values)
    for d in education:
        d["education_norm"] = (d["education"] - education_min) / (education_max - education_min)
        add_to_fips(d)
    education.sort(key=lambda d: d["fips"])
    
    density = [
        {
            "fips": d["state"] + d["county"],
            "density": FIPS_POP[d["state"] + d["county"]] / int(d["AREALAND"])
        } for d in c.sf1.get('AREALAND', {'for': 'county:*'}, year=2010) if d["state"] + d["county"] in FIPS_POP
    ]
    density_values = [d["density"] for d in density]
    density_max = max(density_values)
    density_min = min(density_values)
    for d in density:
        d["density_norm"] = (d["density"] - density_min) / (density_max - density_min)
        add_to_fips(d)
    
    
    return FIPS

CENSUS_2016 = get_census_series(2016)
CENSUS_2012 = get_census_series(2012)

---

### Get election data

In [58]:
import requests
import io
import csv


def get_results_year(year):
    response = requests.get('https://raw.githubusercontent.com/The-Politico/presidential-county-data/master/output/{}.csv'.format(year))

    results = {}
    txt
    reader = csv.DictReader(io.StringIO(response.text))
    for row in reader:
        fips = row['county_fips']
        results[fips] = {
            "dem": int(row['democrat']),
            "dem_pct": int(row["democrat"]) / int(row["total"]),
            "gop": int(row['republican']),
            "gop_pct": int(row["republican"]) / int(row["total"]),
            "total": int(row['total']),
            "margin": (int(row["democrat"]) / int(row["total"])) - (int(row["republican"]) / int(row["total"]))
        }
    return results

RESULTS_2012 = get_results_year(2012)
RESULTS_2016 = get_results_year(2016)


---

### Fit the model

In [123]:
from sklearn.linear_model import LinearRegression

X = []
y = []
w = []

for fips, demos in CENSUS_2016.items():
    if fips not in RESULTS_2016:
        continue
    y.append(RESULTS_2016[fips]["margin"])
    X.append([
        demos["white_norm"],
        demos["age_norm"],
        demos["income_norm"],
        demos["education_norm"],
        demos["density_norm"],
    ])
    w.append(2)

for fips, demos in CENSUS_2012.items():
    if fips not in RESULTS_2012:
        continue
    y.append(RESULTS_2012[fips]["margin"])
    X.append([
        demos["white_norm"],
        demos["age_norm"],
        demos["income_norm"],
        demos["education_norm"],
        demos["density_norm"],
    ])
    w.append(1)

model = LinearRegression().fit(X, y, w)

print('R2:', model.score(X, y, w))
coefficients = {
    'white': model.coef_[0],
    'age': model.coef_[1],
    'income': model.coef_[2],
    'education': model.coef_[3],
    'density': model.coef_[4]
}
print('\nCoefficients', coefficients)

WEIGHTS = {
    'white': abs(model.coef_[0]),
    'age': abs(model.coef_[1]),
    'income': abs(model.coef_[2]),
    'education': abs(model.coef_[3]),
    'density': abs(model.coef_[4])
}
print('\nWeights', WEIGHTS)

R2: 0.38875571065021575

Coefficients {'white': -0.8512145403228027, 'age': 0.12831134722076493, 'income': -0.15450339436212904, 'education': 0.7075274067274355, 'density': 1.5896494061548279}

Weights {'white': 0.8512145403228027, 'age': 0.12831134722076493, 'income': 0.15450339436212904, 'education': 0.7075274067274355, 'density': 1.5896494061548279}


In [102]:
'https://www2.census.gov/geo/relfiles/cdsld16/natl/natl_landarea_cd_delim.txt'

'https://www2.census.gov/geo/relfiles/cdsld16/natl/natl_landarea_cd_delim.txt'