# North Carolina Scorecard

This notebook generates the county, school district, and statewide "future voter scorecards" for NC It is generalized to be updated every month, with minimal changes.

Scorecard outputs (tables) are written back to BigQuery, where they are then read into Google Sheets for formatting

In [4]:
import pandas as pd
import numpy as np
import pandas_gbq
from google.cloud import storage
from io import BytesIO


## Inputs
Update the fields below each month

In [2]:
# Date in name of GCS folder
date_string = '20250503'


## clean unused cells
Run the cells below, without edits each month

### County Scorecard

In [3]:
# Define table names
voter_file_table = data_date_suffix + "_scorecard_nc"
acs_S0101_table = "S0101_us_counties_acs5y_" + acs_year

NameError: name 'data_date_suffix' is not defined

#### Query from BQ
This query:
* Summarizes the voter file by county, counting the number of registrants in a given birth year.
* Then, left joins the county estimates for the total number of 18 (and 19) yos from the ACS
    * The estimates for the total number of 18 (and 19) yos are derived from the raw estimates of 15-17 yos, **assuming a uniform distribution of population across 15, 16, and 17 year olds.**
    * Since the ACS trails by 2 years, the ACS estimate of 15-17yos is used as a proxy for the number of 17-19yos today. (This means we are intentionally *not* trying to count the college student or "group quarters" population in our denominator)

In [5]:
latest_45_yob

1979

In [6]:
# Define GCP project
project_id = "tcc-research"

# Define query, including variables and column names that adjust with time
sql = """
WITH voter_file_county AS(
SELECT
COUNTY_FIPS,
COUNTY_NAME,
COUNT(VOTER_ID) AS N_VOTERS,
COUNTIF(YEAR_OF_BIRTH = """ + str(latest_18_yob) + ") AS " + REG_YOB_LATE_YEAR + """,
COUNTIF(YEAR_OF_BIRTH = """ + str(earliest_18_yob) + ") AS " + REG_YOB_EARLY_YEAR + """,
COUNTIF(YEAR_OF_BIRTH <= """ + str(latest_45_yob) + ") AS " + REG_45_PLUS_YOB_LATE_YEAR + """,
COUNTIF(YEAR_OF_BIRTH <= """ + str(earliest_45_yob) + ") AS " + REG_45_PLUS_YOB_EARLY_YEAR + """,

FROM `tcc-research.me_production.""" + voter_file_table + """`
GROUP BY COUNTY_FIPS, COUNTY_NAME

), acs_county AS(
SELECT 
COUNTY_FIPS,
EST_15_TO_17_YO,
MOE_15_TO_17_YO,
EST_15_TO_17_YO / 3 AS """ + EST_18_YO_THIS_YEAR + """, 
EST_15_TO_17_YO * 2 / 3 AS """ + EST_18_AND_19_YO_THIS_YEAR + """,
EST_45_TO_49_YO + EST_50_TO_54_YO + EST_55_TO_59_YO + EST_55_TO_59_YO + EST_60_AND_OVER AS """ + EST_45_PLUS_YO_THIS_YEAR + """
FROM `tcc-research.acs_sources.""" + acs_S0101_table + """` 
WHERE STATE_FIPS = "23"

)

SELECT
voter_file_county.*,
acs_county.EST_15_TO_17_YO,
acs_county.MOE_15_TO_17_YO,
acs_county.""" + EST_18_YO_THIS_YEAR + """, 
acs_county.""" + EST_18_AND_19_YO_THIS_YEAR + """,
acs_county.""" + EST_45_PLUS_YO_THIS_YEAR + """
FROM voter_file_county LEFT JOIN acs_county ON voter_file_county.COUNTY_FIPS = acs_county.COUNTY_FIPS
"""
# Query
df = pandas_gbq.read_gbq(sql, project_id=project_id)

  record_batch = self.to_arrow(


In [7]:
# Preview
df.head()

Unnamed: 0,COUNTY_FIPS,COUNTY_NAME,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,MOE_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024
0,1,Androscoggin,86503,282,549,51756,50395,4184,187,1394.666667,2789.333333,58699
1,3,Aroostook,52166,141,193,36757,35980,2226,28,742.0,1484.0,41710
2,5,Cumberland,275023,954,1574,162797,158559,10173,56,3391.0,6782.0,164114
3,7,Franklin,25064,64,103,16359,16024,938,28,312.666667,625.333333,17446
4,9,Hancock,50988,183,290,34950,34208,1726,68,575.333333,1150.666667,34844


In [8]:
df_reg_est = df.copy()

#### Metric 1: Estimated registration rate of 18 year olds as of a rolling date (i.e. latest month)
Ex: In March 2024, we consider the registration rate among those born between March 2nd 2005 and March 1st 2006


Notes:
- The ME voter file does *not* include full birth dates for registrants – only year of birth is included
- 18 yos as of a given date in the middle of the calendar year can have 2 potential years of birth. We refer to these as the "later 18 yo year of birth" (2006 for 2024 scorecards) and the "earlier 18 yo year of birth" (2005 for 2024 scorecards)
Estimation:

To estimate the number of 18 yos as of a rolling date, we "pro-rate" the number of registrants born in a given year based on the share of days in the year that could be 18yo birthdays. There are two steps:
- For the later 18 yo year of birth: Estimate the number of days that could be 18yo birthdays, Calculate the number of total potential birthdays in the year. Divide these numbers to get a 'share' of 18yo birthdays.
- For the earlier 18 yo year of birth: Estimate the number of days that could be 18yo birthdays. Calculate the number of total potential birthdays included in the voter file (just ~365). Calculate the ratio of these numbers.


Assumptions:
- Even distribution of birthdays across all days of year
- Uniform registration rates among older 18 yos, and younger 19yos
- Maine data includes only individuals who are currently 18. This was confirmed with the Manie SoS.

In [9]:
# Define column names
EST_REG_18_YO_AS_OF_ROLLING = 'EST_REG_18_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date


# Birthday splits 18 yo vs. 19 yo in voter file (hypothetical)
earliest_bday_18 = as_of_data_date - pd.tseries.offsets.DateOffset(years=19) + pd.tseries.offsets.DateOffset(days=1) # earliest possible bday for 18yo

n_bdays_of_18_early = pd.Timestamp(str(earliest_18_yob) +"-12-31") - earliest_bday_18 # number of possible bdays of 18 yos in earlier 18 yo year of birth
n_total_days_of_early_year = pd.Timestamp(str(earliest_18_yob) +"-12-31") - pd.Timestamp(str(earliest_18_yob) +"-01-01") # number of total birthdays in earlier 18 yo year of birth (should be 365)

# Discounts
    # Share of 18yo in late year
share_18_late_year = 1 # No 17yo
    
    # Share of 18yo in early year
share_18_early_year = n_bdays_of_18_early / n_total_days_of_early_year

    # CHECKS
print("share of 18 yo in late year: {}".format(share_18_late_year))
print("share of 18 yo in early year: {}".format(share_18_early_year))

share of 18 yo in late year: 1
share of 18 yo in early year: 0.3021978021978022


In [10]:
# Calculate numerator (registrants)
df_reg_est[EST_REG_18_YO_AS_OF_ROLLING] = df_reg_est[REG_YOB_LATE_YEAR] * share_18_late_year  + df_reg_est[REG_YOB_EARLY_YEAR] * share_18_early_year

# Calculate estimated registration rate
EST_REG_RATE_18_YO_AS_OF_ROLLING = 'EST_REG_RATE_18_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date
df_reg_est[EST_REG_RATE_18_YO_AS_OF_ROLLING] = df_reg_est[EST_REG_18_YO_AS_OF_ROLLING] / df_reg_est[EST_18_YO_THIS_YEAR] # estimated registered 18yo over ACS 18yo population estimate

In [11]:
df_reg_est = df_reg_est.sort_values('N_VOTERS', ascending=False)
df_reg_est

Unnamed: 0,COUNTY_FIPS,COUNTY_NAME,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,MOE_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911
2,5,Cumberland,275023,954,1574,162797,158559,10173,56,3391.0,6782.0,164114,1429.659341,0.421604
15,31,York,181254,611,885,119905,117227,7237,96,2412.333333,4824.666667,123419,878.445055,0.364147
9,19,Penobscot,122397,330,655,73792,71995,5100,96,1700.0,3400.0,84390,527.93956,0.310553
5,11,Kennebec,103714,322,518,66399,64861,4251,87,1417.0,2834.0,71070,478.538462,0.337712
0,1,Androscoggin,86503,282,549,51756,50395,4184,187,1394.666667,2789.333333,58699,447.906593,0.321157
1,3,Aroostook,52166,141,193,36757,35980,2226,28,742.0,1484.0,41710,199.324176,0.268631
4,9,Hancock,50988,183,290,34950,34208,1726,68,575.333333,1150.666667,34844,270.637363,0.470401
8,17,Oxford,48222,137,217,33859,33116,2051,53,683.666667,1367.333333,35792,202.576923,0.296309
12,25,Somerset,39733,116,170,27704,27105,1748,61,582.666667,1165.333333,30490,167.373626,0.287255
13,27,Waldo,35096,102,166,23953,23394,1295,20,431.666667,863.333333,23415,152.164835,0.352505


#### Metric 2: Estimated registration rate of 18 year olds as of end-of-year
Ex: In March 2024, we consider the registration rate among those born anytime in 2006 – who will be 18 by December 31st 2024

This is a cumulative measure - it should grow throughout the year.

In [12]:
# Define column names
EST_REG_18_YO_AS_OF_EOY = 'EST_REG_18_YO_AS_OF_' + end_of_year_suffix # col name for estimated 18yo as of end of year


# Calculate scaling 18yo in late year
latest_bday_18 = as_of_data_date - pd.tseries.offsets.DateOffset(years=18) # earliest possible bday for 18yo


n_bdays_of_18_late = latest_bday_18 - pd.Timestamp(str(latest_18_yob) +"-01-01")  # number of possible bdays of 18 yos in earlier 18 yo year of birth
n_total_days_of_late_year = pd.Timestamp(str(latest_18_yob) +"-12-31") - pd.Timestamp(str(latest_18_yob) +"-01-01") # number of total birthdays in earlier 18 yo year of birth (should be 365)

share_18_late_year = n_bdays_of_18_late / n_total_days_of_late_year 

# Calculate numerator (registrants)
df_reg_est[EST_REG_18_YO_AS_OF_EOY] = df_reg_est[REG_YOB_LATE_YEAR] / share_18_late_year# number of registrants born in later 18yo year of birth

# Calculate estimated registration rate
EST_REG_RATE_18_YO_AS_OF_EOY = 'EST_REG_RATE_18_YO_AS_OF_' + end_of_year_suffix # col name for 18yo registration rate as of end of year
df_reg_est[EST_REG_RATE_18_YO_AS_OF_EOY] = df_reg_est[EST_REG_18_YO_AS_OF_EOY] / df_reg_est[EST_18_YO_THIS_YEAR] # estimated registered 18yo over ACS 18yo population estimate

    # CHECKS
print("share of 18 yo in late year: {}".format(share_18_late_year))

share of 18 yo in late year: 0.695054945054945


In [13]:
df_reg_est

Unnamed: 0,COUNTY_FIPS,COUNTY_NAME,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,MOE_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911,EST_REG_18_YO_AS_OF_202412,EST_REG_RATE_18_YO_AS_OF_202412
2,5,Cumberland,275023,954,1574,162797,158559,10173,56,3391.0,6782.0,164114,1429.659341,0.421604,1372.55336,0.404764
15,31,York,181254,611,885,119905,117227,7237,96,2412.333333,4824.666667,123419,878.445055,0.364147,879.067194,0.364405
9,19,Penobscot,122397,330,655,73792,71995,5100,96,1700.0,3400.0,84390,527.93956,0.310553,474.782609,0.279284
5,11,Kennebec,103714,322,518,66399,64861,4251,87,1417.0,2834.0,71070,478.538462,0.337712,463.272727,0.326939
0,1,Androscoggin,86503,282,549,51756,50395,4184,187,1394.666667,2789.333333,58699,447.906593,0.321157,405.72332,0.290911
1,3,Aroostook,52166,141,193,36757,35980,2226,28,742.0,1484.0,41710,199.324176,0.268631,202.86166,0.273398
4,9,Hancock,50988,183,290,34950,34208,1726,68,575.333333,1150.666667,34844,270.637363,0.470401,263.288538,0.457628
8,17,Oxford,48222,137,217,33859,33116,2051,53,683.666667,1367.333333,35792,202.576923,0.296309,197.106719,0.288308
12,25,Somerset,39733,116,170,27704,27105,1748,61,582.666667,1165.333333,30490,167.373626,0.287255,166.893281,0.28643
13,27,Waldo,35096,102,166,23953,23394,1295,20,431.666667,863.333333,23415,152.164835,0.352505,146.750988,0.339964


#### Metric 3: Estimated registration rate of 18 and 19 year olds as of end of year

Ex: In March 2024, we consider the registration rate among those born anytime in 2006 or 2005 – who will be 18 or 19 by end of year.

This is a cumulative measure - it should grow throughout the year.

In [14]:
# Define column names
EST_REG_18_AND_19_YO_AS_OF_EOY = 'EST_REG_18_AND_19_YO_AS_OF_' + end_of_year_suffix # col name for estimated 18yo and 19yo as of end of year

# Calculate numerator (registrants)
df_reg_est[EST_REG_18_AND_19_YO_AS_OF_EOY] = (df_reg_est[REG_YOB_LATE_YEAR] / share_18_late_year)+ df_reg_est[REG_YOB_EARLY_YEAR] # number of registrants born in later or earlier 18yo year of birth

# Calculate estimated registration rate
EST_REG_RATE_18_AND_19_YO_AS_OF_EOY = 'EST_REG_RATE_18_AND_19_YO_AS_OF_' + end_of_year_suffix # col name for 18yo and 19yo registration rate as of end of year
df_reg_est[EST_REG_RATE_18_AND_19_YO_AS_OF_EOY] = df_reg_est[EST_REG_18_AND_19_YO_AS_OF_EOY] / df_reg_est[EST_18_AND_19_YO_THIS_YEAR] # estimated registered 18yo and 19yo over ACS 18yo and 19yo population estimate

In [15]:
df_reg_est.head()

Unnamed: 0,COUNTY_FIPS,COUNTY_NAME,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,MOE_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911,EST_REG_18_YO_AS_OF_202412,EST_REG_RATE_18_YO_AS_OF_202412,EST_REG_18_AND_19_YO_AS_OF_202412,EST_REG_RATE_18_AND_19_YO_AS_OF_202412
2,5,Cumberland,275023,954,1574,162797,158559,10173,56,3391.0,6782.0,164114,1429.659341,0.421604,1372.55336,0.404764,2946.55336,0.434467
15,31,York,181254,611,885,119905,117227,7237,96,2412.333333,4824.666667,123419,878.445055,0.364147,879.067194,0.364405,1764.067194,0.365635
9,19,Penobscot,122397,330,655,73792,71995,5100,96,1700.0,3400.0,84390,527.93956,0.310553,474.782609,0.279284,1129.782609,0.332289
5,11,Kennebec,103714,322,518,66399,64861,4251,87,1417.0,2834.0,71070,478.538462,0.337712,463.272727,0.326939,981.272727,0.34625
0,1,Androscoggin,86503,282,549,51756,50395,4184,187,1394.666667,2789.333333,58699,447.906593,0.321157,405.72332,0.290911,954.72332,0.342277


#### Metric 4: Estimated registration rate of 45 year olds as of a rolling date (i.e. latest month)
To count the 45+ yo as of a rolling date, we need to discount some folks born in the latest year of 45 year olds, because they are still 44

Assumptions:
- Even distribution of birthdays across all days of year
- Uniform registration rates among older 44 yos, and younger 45yos

In [16]:
# Define column names
EST_REG_45_PLUS_YO_AS_OF_ROLLING = 'EST_REG_45_PLUS_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date

# Birthday splits 44 yo vs. 45 yo in voter file (hypothetical)
earliest_bday_45 = as_of_data_date - pd.tseries.offsets.DateOffset(years=45) + pd.tseries.offsets.DateOffset(days=1) # earliest possible bday for 18yo

n_bdays_of_45 = pd.Timestamp(str(latest_45_yob) +"-12-31") - earliest_bday_45 # number of possible bdays of 18 yos in earlier 18 yo year of birth
n_total_days_of_late_year = pd.Timestamp(str(latest_45_yob) +"-12-31") - pd.Timestamp(str(latest_45_yob) +"-01-01") # number of total birthdays in earlier 18 yo year of birth (should be 365)


    # Share of 45yo in early year
share_45_late_year = n_bdays_of_45 / n_total_days_of_late_year

    # CHECKS
print("share of 45 yo in early year: {}".format(share_45_late_year))

share of 45 yo in early year: 0.3021978021978022


In [17]:
# Calculate numerator (registrants)
df_reg_est[EST_REG_45_PLUS_YO_AS_OF_ROLLING] = ((df_reg_est[REG_45_PLUS_YOB_LATE_YEAR] - df_reg_est[REG_45_PLUS_YOB_EARLY_YEAR]) * share_45_late_year)  + df_reg_est[REG_45_PLUS_YOB_EARLY_YEAR] 

# Calculate estimated registration rate
EST_REG_RATE_45_PLUS_YO_AS_OF_ROLLING = 'EST_REG_RATE_45_PLUS_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date
df_reg_est[EST_REG_RATE_45_PLUS_YO_AS_OF_ROLLING] = df_reg_est[EST_REG_45_PLUS_YO_AS_OF_ROLLING] / df_reg_est[EST_45_PLUS_YO_THIS_YEAR] # estimated registered 18yo over ACS 18yo population estimate

In [18]:
df_reg_est = df_reg_est.sort_values('N_VOTERS', ascending=False)
df_reg_est.head()

Unnamed: 0,COUNTY_FIPS,COUNTY_NAME,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,MOE_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911,EST_REG_18_YO_AS_OF_202412,EST_REG_RATE_18_YO_AS_OF_202412,EST_REG_18_AND_19_YO_AS_OF_202412,EST_REG_RATE_18_AND_19_YO_AS_OF_202412,EST_REG_45_PLUS_YO_AS_OF_20240911,EST_REG_RATE_45_PLUS_YO_AS_OF_20240911
2,5,Cumberland,275023,954,1574,162797,158559,10173,56,3391.0,6782.0,164114,1429.659341,0.421604,1372.55336,0.404764,2946.55336,0.434467,159839.714286,0.973955
15,31,York,181254,611,885,119905,117227,7237,96,2412.333333,4824.666667,123419,878.445055,0.364147,879.067194,0.364405,1764.067194,0.365635,118036.285714,0.956387
9,19,Penobscot,122397,330,655,73792,71995,5100,96,1700.0,3400.0,84390,527.93956,0.310553,474.782609,0.279284,1129.782609,0.332289,72538.049451,0.859557
5,11,Kennebec,103714,322,518,66399,64861,4251,87,1417.0,2834.0,71070,478.538462,0.337712,463.272727,0.326939,981.272727,0.34625,65325.78022,0.919175
0,1,Androscoggin,86503,282,549,51756,50395,4184,187,1394.666667,2789.333333,58699,447.906593,0.321157,405.72332,0.290911,954.72332,0.342277,50806.291209,0.865539


In [19]:
df_reg_est_cty = df_reg_est.copy()

#### Output
Write back to BQ

##### Wide

In [20]:
# Flag largest counties
df_reg_est['is_in_10_largest'] = np.where(df_reg_est.COUNTY_NAME.isin(df_reg_est.nlargest(10, columns='EST_18_YO_2024').COUNTY_NAME),1,0)

In [21]:
# write
project_id = "tcc-research"
table_id = 'me_output.' + data_date_suffix+ '_me_county_scorecard_output'

pandas_gbq.to_gbq(df_reg_est, table_id, project_id=project_id, if_exists='replace')

### Statewide Scorecard

In [22]:
# Define table names
voter_file_table = data_date_suffix + "_scorecard_me"
acs_S0101_table = "S0101_us_states_acs5y_" + acs_year

#### Query from BQ
This query:
* Summarizes the voter file for ME state, counting the number of registrants in a given birth year.
* Then, left joins the statewide estimates for the total number of 18 (and 19) yos from the ACS
    * The estimates for the total number of 18 (and 19) yos is derived from the raw estimates of 15-17 yos, **assuming a uniform distribution of population across 15, 16, and 17 year olds.**
    * Since the ACS trails by 2 years, the ACS estimate of 15-17yos is used as a proxy for the number of 17-19yos today. (This means we are intentionally *not* trying to count the college student or "group quarters" population in our denominator)

In [23]:
# Define GCP project
project_id = "tcc-research"

# Define query, including variables and column names that adjust with time
sql = """
WITH voter_file_me AS(
SELECT
STATE_FIPS,
COUNT(VOTER_ID) AS N_VOTERS,
COUNTIF(YEAR_OF_BIRTH = """ + str(latest_18_yob) + ") AS " + REG_YOB_LATE_YEAR + """,
COUNTIF(YEAR_OF_BIRTH = """ + str(earliest_18_yob) + ") AS " + REG_YOB_EARLY_YEAR + """,
COUNTIF(YEAR_OF_BIRTH <= """ + str(latest_45_yob) + ") AS " + REG_45_PLUS_YOB_LATE_YEAR + """,
COUNTIF(YEAR_OF_BIRTH <= """ + str(earliest_45_yob) + ") AS " + REG_45_PLUS_YOB_EARLY_YEAR + """,
FROM `tcc-research.me_production.""" + voter_file_table + """`
GROUP BY STATE_FIPS

), acs_me AS(
SELECT 
STATE_FIPS,
EST_15_TO_17_YO,
EST_15_TO_17_YO / 3 AS """ + EST_18_YO_THIS_YEAR + """, 
EST_15_TO_17_YO * 2 / 3 AS """ + EST_18_AND_19_YO_THIS_YEAR + """,
EST_45_TO_49_YO + EST_50_TO_54_YO + EST_55_TO_59_YO + EST_55_TO_59_YO + EST_60_AND_OVER AS """ + EST_45_PLUS_YO_THIS_YEAR + """

FROM `tcc-research.acs_sources.""" + acs_S0101_table + """` 
WHERE STATE_FIPS = "23"

)

SELECT
voter_file_me.*,
acs_me.EST_15_TO_17_YO,
acs_me.""" + EST_18_YO_THIS_YEAR + """, 
acs_me.""" + EST_18_AND_19_YO_THIS_YEAR + """,
acs_me.""" + EST_45_PLUS_YO_THIS_YEAR + """

FROM voter_file_me LEFT JOIN acs_me ON voter_file_me.STATE_FIPS = acs_me.STATE_FIPS
"""
# Query
df = pandas_gbq.read_gbq(sql, project_id=project_id)

  record_batch = self.to_arrow(


In [24]:
# Preview
df

Unnamed: 0,STATE_FIPS,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024
0,23,1159292,3680,5993,746072,728664,46263,15421.0,30842.0,784949


In [25]:
df_reg_est = df.copy()

#### Metric 1: Estimated registration rate of 18 year olds as of a rolling date (i.e. latest month)
Ex: In March 2024, we consider the registration rate among those born between March 2nd 2005 and March 1st 2006


Notes:
- The ME voter file does *not* include full birth dates for registrants – only year of birth is included
- 18 yos as of a given date in the middle of the calendar year can have 2 potential years of birth. We refer to these as the "later 18 yo year of birth" (2006 for 2024 scorecards) and the "earlier 18 yo year of birth" (2005 for 2024 scorecards)
Estimation:

To estimate the number of 18 yos as of a rolling date, we "pro-rate" the number of registrants born in a given year based on the share of days in the year that could be 18yo birthdays. There are two steps:
- For the later 18 yo year of birth: Estimate the number of days that could be 18yo birthdays, Calculate the number of total potential birthdays in the year. Divide these numbers to get a 'share' of 18yo birthdays.
- For the earlier 18 yo year of birth: Estimate the number of days that could be 18yo birthdays. Calculate the number of total potential birthdays included in the voter file (just ~365). Calculate the ratio of these numbers.


Assumptions:
- Even distribution of birthdays across all days of year
- Uniform registration rates among older 18 yos, and younger 19yos
- Maine data includes only individuals who are currently 18. This was confirmed with the Manie SoS.

In [26]:
# Define column names
EST_REG_18_YO_AS_OF_ROLLING = 'EST_REG_18_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date


# Birthday splits 18 yo vs. 19 yo in voter file (hypothetical)
earliest_bday_18 = as_of_data_date - pd.tseries.offsets.DateOffset(years=19) + pd.tseries.offsets.DateOffset(days=1) # earliest possible bday for 18yo

n_bdays_of_18_early = pd.Timestamp(str(earliest_18_yob) +"-12-31") - earliest_bday_18 # number of possible bdays of 18 yos in earlier 18 yo year of birth
n_total_days_of_early_year = pd.Timestamp(str(earliest_18_yob) +"-12-31") - pd.Timestamp(str(earliest_18_yob) +"-01-01") # number of total birthdays in earlier 18 yo year of birth (should be 365)

# Discounts
    # Share of 18yo in late year
share_18_late_year = 1 # No 17yo
    
    # Share of 18yo in early year
share_18_early_year = n_bdays_of_18_early / n_total_days_of_early_year

    # CHECKS
print("share of 18 yo in late year: {}".format(share_18_late_year))
print("share of 18 yo in early year: {}".format(share_18_early_year))

share of 18 yo in late year: 1
share of 18 yo in early year: 0.3021978021978022


In [27]:
# Calculate numerator (registrants)
df_reg_est[EST_REG_18_YO_AS_OF_ROLLING] = df_reg_est[REG_YOB_LATE_YEAR] * share_18_late_year  + df_reg_est[REG_YOB_EARLY_YEAR] * share_18_early_year

# Calculate estimated registration rate
EST_REG_RATE_18_YO_AS_OF_ROLLING = 'EST_REG_RATE_18_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date
df_reg_est[EST_REG_RATE_18_YO_AS_OF_ROLLING] = df_reg_est[EST_REG_18_YO_AS_OF_ROLLING] / df_reg_est[EST_18_YO_THIS_YEAR] # estimated registered 18yo over ACS 18yo population estimate

In [28]:
df_reg_est = df_reg_est.sort_values('N_VOTERS', ascending=False)
df_reg_est.head()

Unnamed: 0,STATE_FIPS,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911
0,23,1159292,3680,5993,746072,728664,46263,15421.0,30842.0,784949,5491.071429,0.356078


#### Metric 2: Estimated registration rate of 18 year olds as of end-of-year
Ex: In March 2024, we consider the registration rate among those born anytime in 2006 – who will be 18 by December 31st 2024

This is a cumulative measure - it should grow throughout the year.

In [29]:
# Define column names
EST_REG_18_YO_AS_OF_EOY = 'EST_REG_18_YO_AS_OF_' + end_of_year_suffix # col name for estimated 18yo as of end of year


# Calculate scaling 18yo in late year
latest_bday_18 = as_of_data_date - pd.tseries.offsets.DateOffset(years=18) # earliest possible bday for 18yo


n_bdays_of_18_late = latest_bday_18 - pd.Timestamp(str(latest_18_yob) +"-01-01")  # number of possible bdays of 18 yos in earlier 18 yo year of birth
n_total_days_of_late_year = pd.Timestamp(str(latest_18_yob) +"-12-31") - pd.Timestamp(str(latest_18_yob) +"-01-01") # number of total birthdays in earlier 18 yo year of birth (should be 365)

share_18_late_year = n_bdays_of_18_late / n_total_days_of_late_year 

# Calculate numerator (registrants)
df_reg_est[EST_REG_18_YO_AS_OF_EOY] = df_reg_est[REG_YOB_LATE_YEAR] / share_18_late_year# number of registrants born in later 18yo year of birth

# Calculate estimated registration rate
EST_REG_RATE_18_YO_AS_OF_EOY = 'EST_REG_RATE_18_YO_AS_OF_' + end_of_year_suffix # col name for 18yo registration rate as of end of year
df_reg_est[EST_REG_RATE_18_YO_AS_OF_EOY] = df_reg_est[EST_REG_18_YO_AS_OF_EOY] / df_reg_est[EST_18_YO_THIS_YEAR] # estimated registered 18yo over ACS 18yo population estimate

    # CHECKS
print("share of 18 yo in late year: {}".format(share_18_late_year))

share of 18 yo in late year: 0.695054945054945


In [30]:
df_reg_est.head()

Unnamed: 0,STATE_FIPS,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911,EST_REG_18_YO_AS_OF_202412,EST_REG_RATE_18_YO_AS_OF_202412
0,23,1159292,3680,5993,746072,728664,46263,15421.0,30842.0,784949,5491.071429,0.356078,5294.545455,0.343333


#### Metric 3: Estimated registration rate of 18 and 19 year olds as of election

Ex: In March 2024, we consider the registration rate among those born anytime in 2006 or 2005 – who will be 18 or 19 by Nov 5th 2024

This is a cumulative measure - it should grow throughout the year.

In [31]:
# Define column names
EST_REG_18_AND_19_YO_AS_OF_EOY = 'EST_REG_18_AND_19_YO_AS_OF_' + end_of_year_suffix # col name for estimated 18yo and 19yo as of end of year

# Calculate numerator (registrants)
df_reg_est[EST_REG_18_AND_19_YO_AS_OF_EOY] = (df_reg_est[REG_YOB_LATE_YEAR] / share_18_late_year)+ df_reg_est[REG_YOB_EARLY_YEAR] # number of registrants born in later or earlier 18yo year of birth

# Calculate estimated registration rate
EST_REG_RATE_18_AND_19_YO_AS_OF_EOY = 'EST_REG_RATE_18_AND_19_YO_AS_OF_' + end_of_year_suffix # col name for 18yo and 19yo registration rate as of end of year
df_reg_est[EST_REG_RATE_18_AND_19_YO_AS_OF_EOY] = df_reg_est[EST_REG_18_AND_19_YO_AS_OF_EOY] / df_reg_est[EST_18_AND_19_YO_THIS_YEAR] # estimated registered 18yo and 19yo over ACS 18yo and 19yo population estimate

In [32]:
df_reg_est.head()

Unnamed: 0,STATE_FIPS,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911,EST_REG_18_YO_AS_OF_202412,EST_REG_RATE_18_YO_AS_OF_202412,EST_REG_18_AND_19_YO_AS_OF_202412,EST_REG_RATE_18_AND_19_YO_AS_OF_202412
0,23,1159292,3680,5993,746072,728664,46263,15421.0,30842.0,784949,5491.071429,0.356078,5294.545455,0.343333,11287.545455,0.36598


#### Metric 4: Estimated registration rate of 45 year olds as of a rolling date (i.e. latest month)
To count the 45+ yo as of a rolling date, we need to discount some folks born in the latest year of 45 year olds, because they are still 44

Assumptions:
- Even distribution of birthdays across all days of year
- Uniform registration rates among older 44 yos, and younger 45yos

In [33]:
# Define column names
EST_REG_45_PLUS_YO_AS_OF_ROLLING = 'EST_REG_45_PLUS_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date

# Birthday splits 44 yo vs. 45 yo in voter file (hypothetical)
earliest_bday_45 = as_of_data_date - pd.tseries.offsets.DateOffset(years=45) + pd.tseries.offsets.DateOffset(days=1) # earliest possible bday for 18yo

n_bdays_of_45 = pd.Timestamp(str(latest_45_yob) +"-12-31") - earliest_bday_45 # number of possible bdays of 18 yos in earlier 18 yo year of birth
n_total_days_of_late_year = pd.Timestamp(str(latest_45_yob) +"-12-31") - pd.Timestamp(str(latest_45_yob) +"-01-01") # number of total birthdays in earlier 18 yo year of birth (should be 365)


    # Share of 45yo in early year
share_45_late_year = n_bdays_of_45 / n_total_days_of_late_year

    # CHECKS
print("share of 45 yo in early year: {}".format(share_45_late_year))

share of 45 yo in early year: 0.3021978021978022


In [34]:
# Calculate numerator (registrants)
df_reg_est[EST_REG_45_PLUS_YO_AS_OF_ROLLING] = ((df_reg_est[REG_45_PLUS_YOB_LATE_YEAR] - df_reg_est[REG_45_PLUS_YOB_EARLY_YEAR]) * share_45_late_year)  + df_reg_est[REG_45_PLUS_YOB_EARLY_YEAR] 

# Calculate estimated registration rate
EST_REG_RATE_45_PLUS_YO_AS_OF_ROLLING = 'EST_REG_RATE_45_PLUS_YO_AS_OF_' + data_date_suffix # col name for estimated 18yo as of rolling date
df_reg_est[EST_REG_RATE_45_PLUS_YO_AS_OF_ROLLING] = df_reg_est[EST_REG_45_PLUS_YO_AS_OF_ROLLING] / df_reg_est[EST_45_PLUS_YO_THIS_YEAR] # estimated registered 18yo over ACS 18yo population estimate

In [35]:
df_reg_est = df_reg_est.sort_values('N_VOTERS', ascending=False)
df_reg_est.head()

Unnamed: 0,STATE_FIPS,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,EST_18_YO_2024,EST_18_AND_19_YO_2024,EST_45_PLUS_YO_2024,EST_REG_18_YO_AS_OF_20240911,EST_REG_RATE_18_YO_AS_OF_20240911,EST_REG_18_YO_AS_OF_202412,EST_REG_RATE_18_YO_AS_OF_202412,EST_REG_18_AND_19_YO_AS_OF_202412,EST_REG_RATE_18_AND_19_YO_AS_OF_202412,EST_REG_45_PLUS_YO_AS_OF_20240911,EST_REG_RATE_45_PLUS_YO_AS_OF_20240911
0,23,1159292,3680,5993,746072,728664,46263,15421.0,30842.0,784949,5491.071429,0.356078,5294.545455,0.343333,11287.545455,0.36598,733924.659341,0.934997


#### Output
Write back to BQ

In [36]:
# write
project_id = "tcc-research"
table_id = 'me_output.' + data_date_suffix+ '_me_statewide_scorecard_output'

pandas_gbq.to_gbq(df_reg_est, table_id, project_id=project_id, if_exists='replace')

## Estimate per municipality
As we don't have a linked file with breakdown by municipality, we will attempt an estimation.

This operates under the assumption that the proportion of 18-year-olds registering in a municipality is directly proportional to the overall registration rate in that municipality relative to the entire county. 

Rationale: We assume that municipalities with higher total registration rates will also have a higher 18-year-old registration rate, scaled similarly to the overall registration trend in the county.

### Load in Necessary Files
For Maine, the municipalities are associated with a total voter count-- thus we can use them for estimation purposes.


In [37]:
date_string = '2024-09-11'

# Instantiates a client
storage_client = storage.Client(project='tcc-research')
bucket_name = "maine-data"
blobs = storage_client.list_blobs(bucket_name)

bucket= storage_client.bucket(bucket_name)

file_list = []
 # Note: The call returns a response only when the iterator is consumed.
for blob in blobs:
    if date_string in blob.name and blob.name.endswith('Report.txt'):
        file_list.append(blob.name)

In [38]:
# Initialize DataFrame for combined report data
df_report_combined = pd.DataFrame()

# List all files in the bucket
blobs = storage_client.list_blobs(bucket_name)
for section_file in blobs:
    # Look for 'Report.txt' instead of 'Voter.txt'
    if section_file.name.endswith('Report.txt'):
        print('Processing: ' + section_file.name)
        
        # Download the file content as bytes
        content = section_file.download_as_bytes()
        
        # Use BytesIO to create a file-like object
        file_obj = BytesIO(content)
        
        # Read the CSV (or text file) with the given encoding and delimiter
        try:
            df_report = pd.read_csv(file_obj, encoding='latin1', delimiter='|', dtype=str)
            df_report_combined = pd.concat([df_report_combined, df_report], ignore_index=True)
        except Exception as e:
            print(f"Error processing {section_file.name}: {str(e)}")

print("Finished processing all report files")
print(f"Total rows in combined DataFrame: {len(df_report_combined)}")


Processing: 2024-09-11 Registered & Enrolled/Registered And Enrolled Voters Report.txt
Finished processing all report files
Total rows in combined DataFrame: 743


In [39]:
df_report_combined.head()

Unnamed: 0,COUNTY,MUNICIPALITY,W/P,CG,SS,SR,CC,D,G,L,NL,R,U,TOTAL,Unnamed: 14
0,AND,AUBURN,1-1,2,20,89,5,668,110,41,59,445,633,1956,
1,AND,AUBURN,1-1,2,20,90,5,531,68,33,33,466,483,1614,
2,AND,AUBURN,2-1,2,20,89,5,254,26,21,23,137,285,746,
3,AND,AUBURN,2-1,2,20,90,5,1039,123,34,66,680,985,2927,
4,AND,AUBURN,3-1,2,20,88,5,719,95,19,29,690,747,2299,


In [40]:
df_report_combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 743 entries, 0 to 742
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   COUNTY        743 non-null    object
 1   MUNICIPALITY  743 non-null    object
 2   W/P           742 non-null    object
 3   CG            743 non-null    object
 4   SS            743 non-null    object
 5   SR            743 non-null    object
 6   CC            742 non-null    object
 7   D             743 non-null    object
 8   G             743 non-null    object
 9   L             743 non-null    object
 10  NL            743 non-null    object
 11  R             743 non-null    object
 12  U             743 non-null    object
 13  TOTAL         743 non-null    object
 14  Unnamed: 14   0 non-null      object
dtypes: object(15)
memory usage: 87.2+ KB


In [41]:
#select the three relevant columns
df_mun= df_report_combined[['COUNTY', 'MUNICIPALITY','TOTAL' ]]
df_mun['TOTAL'] = df_mun['TOTAL'].astype(np.int64)
df_mun.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_mun['TOTAL'] = df_mun['TOTAL'].astype(np.int64)


Unnamed: 0,COUNTY,MUNICIPALITY,TOTAL
0,AND,AUBURN,1956
1,AND,AUBURN,1614
2,AND,AUBURN,746
3,AND,AUBURN,2927
4,AND,AUBURN,2299


In [42]:
# Group by 'COUNTY' and 'MUNICIPALITY', then sum the 'TOTAL' column
df_mun_total = df_mun.groupby(['COUNTY', 'MUNICIPALITY'], as_index=False)['TOTAL'].sum()

#rename column name

# Display the first few rows of the grouped DataFrame
df_mun_total.head()

Unnamed: 0,COUNTY,MUNICIPALITY,TOTAL
0,AND,AUBURN,17061
1,AND,DURHAM,3840
2,AND,GREENE,3511
3,AND,LEEDS,1863
4,AND,LEWISTON,29655


In [43]:
# Dictionary to map three-letter abbreviations to full county names
county_name_mapping = {
    'AND': 'Androscoggin',
    'ARO': 'Aroostook',
    'CUM': 'Cumberland',
    'FRA': 'Franklin',
    'HAN': 'Hancock',
    'KEN': 'Kennebec',
    'KNO': 'Knox',
    'LIN': 'Lincoln',
    'OXF': 'Oxford',
    'PEN': 'Penobscot',
    'PIS': 'Piscataquis',
    'SAG': 'Sagadahoc',
    'SOM': 'Somerset',
    'WAL': 'Waldo',
    'WAS': 'Washington',
    'YOR': 'York'
}

# Apply the mapping to the municipality DataFrame
df_mun_total['COUNTY'] = df_mun['COUNTY'].map(county_name_mapping)
df_mun_total.head()


Unnamed: 0,COUNTY,MUNICIPALITY,TOTAL
0,Androscoggin,AUBURN,17061
1,Androscoggin,DURHAM,3840
2,Androscoggin,GREENE,3511
3,Androscoggin,LEEDS,1863
4,Androscoggin,LEWISTON,29655


In [44]:
# Rename 'COUNTY' to 'COUNTY_NAME' in the municipality DataFrame
df_mun_total = df_mun_total.rename(columns={'COUNTY': 'COUNTY_NAME'})
df_mun_total.head()

Unnamed: 0,COUNTY_NAME,MUNICIPALITY,TOTAL
0,Androscoggin,AUBURN,17061
1,Androscoggin,DURHAM,3840
2,Androscoggin,GREENE,3511
3,Androscoggin,LEEDS,1863
4,Androscoggin,LEWISTON,29655


In [45]:
# Merge municipality-level data (df_mun_total) with county-level registration estimates (df_reg_est)
df_merged = pd.merge(df_mun_total, df_reg_est_cty, on='COUNTY_NAME')

# Calculate the total registered people per county
county_total_reg = df_merged.groupby('COUNTY_NAME')['TOTAL'].sum().reset_index()
county_total_reg.columns = ['COUNTY_NAME', 'COUNTY_TOTAL_REGISTERED']

# Merge to bring in the total registered people per county
df_merged = pd.merge(df_merged, county_total_reg, on='COUNTY_NAME')

# Calculate the estimated number of registered 18-year-olds in each municipality
df_merged['EST_REGISTERED_18_YO_MUN'] = (df_merged['TOTAL'] / df_merged['COUNTY_TOTAL_REGISTERED']) * df_merged['EST_REG_18_YO_AS_OF_20240911']

# View the results
df_merged[['MUNICIPALITY', 'COUNTY_NAME', 'EST_REGISTERED_18_YO_MUN']]


Unnamed: 0,MUNICIPALITY,COUNTY_NAME,EST_REGISTERED_18_YO_MUN
0,AUBURN,Androscoggin,69.108436
1,DURHAM,Androscoggin,15.554563
2,GREENE,Androscoggin,14.221893
3,LEEDS,Androscoggin,7.546393
4,LEWISTON,Androscoggin,120.12254
...,...,...,...
499,SHAPLEIGH,Penobscot,6.247553
500,SOUTH BERWICK,Penobscot,17.128842
501,WATERBORO,Penobscot,16.066278
502,WELLS,Penobscot,29.44286


In [46]:

# Create a DataFrame with the ten most populous cities in Maine and their populations (all caps for municipality names)
data = {
    'MUNICIPALITY': ['PORTLAND', 'LEWISTON', 'BANGOR', 'SOUTH PORTLAND', 'AUBURN', 
                     'BIDDEFORD', 'SANFORD', 'SACO', 'WESTBROOK', 'AUGUSTA'],
    'POPULATION': [68408, 37121, 31753, 26498, 24061, 
                   22552, 21982, 20381, 20400, 18899]
}


df_maine_populous_cities = pd.DataFrame(data)
#cast to correct type
df_maine_populous_cities['POPULATION'] = df_maine_populous_cities['POPULATION'].astype('int64')
# Display the DataFrame
df_maine_populous_cities



Unnamed: 0,MUNICIPALITY,POPULATION
0,PORTLAND,68408
1,LEWISTON,37121
2,BANGOR,31753
3,SOUTH PORTLAND,26498
4,AUBURN,24061
5,BIDDEFORD,22552
6,SANFORD,21982
7,SACO,20381
8,WESTBROOK,20400
9,AUGUSTA,18899


In [47]:
# Merge df_maine_populous_cities with df_merged on 'MUNICIPALITY'
df_top = pd.merge(df_merged, df_maine_populous_cities, on='MUNICIPALITY', how='inner')

# Display the merged DataFrame
df_top

Unnamed: 0,COUNTY_NAME,MUNICIPALITY,TOTAL,COUNTY_FIPS,N_VOTERS,REG_YOB_2006,REG_YOB_2005,REG_45_PLUS_YOB_1979,REG_45_PLUS_YOB_1978,EST_15_TO_17_YO,...,EST_REG_RATE_18_YO_AS_OF_20240911,EST_REG_18_YO_AS_OF_202412,EST_REG_RATE_18_YO_AS_OF_202412,EST_REG_18_AND_19_YO_AS_OF_202412,EST_REG_RATE_18_AND_19_YO_AS_OF_202412,EST_REG_45_PLUS_YO_AS_OF_20240911,EST_REG_RATE_45_PLUS_YO_AS_OF_20240911,COUNTY_TOTAL_REGISTERED,EST_REGISTERED_18_YO_MUN,POPULATION
0,Androscoggin,AUBURN,17061,1,86503,282,549,51756,50395,4184,...,0.321157,405.72332,0.290911,954.72332,0.342277,50806.291209,0.865539,110576,69.108436,24061
1,Androscoggin,LEWISTON,29655,1,86503,282,549,51756,50395,4184,...,0.321157,405.72332,0.290911,954.72332,0.342277,50806.291209,0.865539,110576,120.12254,37121
2,Aroostook,PORTLAND,66136,3,52166,141,193,36757,35980,2226,...,0.268631,202.86166,0.273398,395.86166,0.266753,36214.807692,0.868252,333806,39.491512,68408
3,Aroostook,SOUTH PORTLAND,22749,3,52166,141,193,36757,35980,2226,...,0.268631,202.86166,0.273398,395.86166,0.266753,36214.807692,0.868252,333806,13.584015,26498
4,Aroostook,WESTBROOK,15342,3,52166,141,193,36757,35980,2226,...,0.268631,202.86166,0.273398,395.86166,0.266753,36214.807692,0.868252,333806,9.161104,20400
5,Cumberland,AUGUSTA,15346,5,275023,954,1574,162797,158559,10173,...,0.421604,1372.55336,0.404764,2946.55336,0.434467,159839.714286,0.973955,224087,97.906404,18899
6,Hancock,BANGOR,21940,9,50988,183,290,34950,34208,1726,...,0.470401,263.288538,0.457628,553.288538,0.480842,34432.230769,0.988182,82202,72.234054,31753
7,Penobscot,BIDDEFORD,16569,19,122397,330,655,73792,71995,5100,...,0.310553,474.782609,0.279284,1129.782609,0.332289,72538.049451,0.859557,198245,44.124344,22552
8,Penobscot,SACO,17053,19,122397,330,655,73792,71995,5100,...,0.310553,474.782609,0.279284,1129.782609,0.332289,72538.049451,0.859557,198245,45.413268,20381
9,Penobscot,SANFORD,15927,19,122397,330,655,73792,71995,5100,...,0.310553,474.782609,0.279284,1129.782609,0.332289,72538.049451,0.859557,198245,42.414655,21982


In [48]:
#create new column w municipal total rate
df_top['EST_MUNICIPAL_RATE_202409'] = df_top['TOTAL'] / df_top['POPULATION']

#calculate estimated rate
df_top['EST_MUN_REG_RATE_18_YO_AS_OF_202409']= df_top['EST_REG_RATE_18_YO_AS_OF_202409']*df_top['EST_MUNICIPAL_RATE_202409']

# fix municipal names
df_top['MUNICIPALITY'] = df_top['MUNICIPALITY'].str.title()
#sort for columns we care about
df_top= df_top[['MUNICIPALITY', 'COUNTY_NAME', 'EST_REGISTERED_18_YO_MUN', 'EST_MUNICIPAL_RATE_202409', 'EST_MUN_REG_RATE_18_YO_AS_OF_202409', 'POPULATION' ]]

KeyError: 'EST_REG_RATE_18_YO_AS_OF_202409'

In [246]:
df_top

Unnamed: 0,MUNICIPALITY,COUNTY_NAME,EST_REGISTERED_18_YO_MUN,EST_MUNICIPAL_RATE_202409,EST_MUN_REG_RATE_18_YO_AS_OF_202409,POPULATION
0,Auburn,Androscoggin,61.338728,0.709073,0.202121,24061
1,Lewiston,Androscoggin,106.61743,0.798874,0.227719,37121
2,Portland,Aroostook,34.502956,0.966788,0.226903,68408
3,South Portland,Aroostook,11.868086,0.858518,0.201492,26498
4,Westbrook,Aroostook,8.003876,0.752059,0.176506,20400
5,Augusta,Cumberland,86.239949,0.812001,0.301549,18899
6,Bangor,Hancock,63.51203,0.690958,0.285781,31753
7,Biddeford,Penobscot,39.199188,0.734702,0.202696,22552
8,Saco,Penobscot,40.344243,0.836711,0.230839,20381
9,Sanford,Penobscot,37.680335,0.724547,0.199895,21982


In [247]:
#write to BQ
project_id = "tcc-research"
table_id = 'me_output.' + data_date_suffix+ '_me_munest_scorecard_output'

pandas_gbq.to_gbq(df_top, table_id, project_id=project_id, if_exists='replace')

100%|██████████| 1/1 [00:00<?, ?it/s]
