Currently, each record in the table corresponds to a candidate and their votes in a county. You need to reformat the table so that each record corresponds to each county, with fields showing the votes for different candidates in that election year. 
It is possible to do this using the [Pivot Table geoprocessing tool](https://pro.arcgis.com/en/pro-app/tool-reference/data-management/pivot-table.htm) or Excel pivot tables, but Python may make it easier to automate and share.
The animation below illustrates the steps in restructuring the table:

The following code cell performs these steps.
![reformat_table](img/reformat_table.gif "Reformat Table")


In [19]:
c = dask_df["county"].unique().compute()
county = dict((i,dict()) for i in list(c))

Creating a new dataframe would have been done by `dd.DataFrame()` but dask advices us not use this class directly.  Instead use functions like
``dd.read_csv``, ``dd.read_parquet``, or ``dd.from_pandas``.
So, we will work with pandas to create a new dataframe then convert it to a Dask dataframe.

In [49]:
i = 0
data = []

for row in range(len(dask_df)):
    
    df = dask_df.compute()
    
    c = df.loc[row,"county"]
    s = df.loc[row,"state"]
    f = df.loc[row,"FIPS"]
    y = df.loc[row, "year"]
    
    can_nm = df.loc[row, "candidate"]
    party =  df.loc[row, "party"]
    votes =  df.loc[row, "candidatevotes"]
    year = df.loc[row, "year"]
    
    if f not in county[c].keys():
        county[c][f] = {}
        
    county[c][f]['county'] = c
    county[c][f]["fips"] = f
    county[c][f][f"candidate({party.strip()[0]})"] = can_nm
    county[c][f][f"votes ({party.strip()[0]})"] = votes
    county[c][f]['year'] = y

In [50]:
data = []
for key, items in county.items():

    for key, item in items.items():
        data.append(item)

In [51]:
dt = pd.DataFrame(data)
df = dd.from_pandas(dt,npartitions=1)

In [52]:
df.head()

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year
0,Autauga,1001,Hillary Clinton,5936.0,Donald Trump,18172.0,Other,865.0,2016
1,Baldwin,1003,Hillary Clinton,18458.0,Donald Trump,72883.0,Other,3874.0,2016
2,Baldwin,13009,Hillary Clinton,7970.0,Donald Trump,7697.0,Other,449.0,2016
3,Barbour,1005,Hillary Clinton,4871.0,Donald Trump,5454.0,Other,144.0,2016
4,Barbour,54001,Hillary Clinton,1222.0,Donald Trump,4527.0,Other,305.0,2016


***

## Calculate additional columns: Feature Engineering

Here, we will be using the values from the updated table to add additional columns of information, such as the number of votes for a non major party, the percentage of voters for each party, and so on. Each column is referred to as an attribute of the dataset.

##### Check :Calculate an attribute for the total votes

In [53]:
df['votes_total'] = df['votes (d)'] + df['votes (r)'] + df['votes (O)']

In [54]:
df.head()

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total
0,Autauga,1001,Hillary Clinton,5936.0,Donald Trump,18172.0,Other,865.0,2016,24973.0
1,Baldwin,1003,Hillary Clinton,18458.0,Donald Trump,72883.0,Other,3874.0,2016,95215.0
2,Baldwin,13009,Hillary Clinton,7970.0,Donald Trump,7697.0,Other,449.0,2016,16116.0
3,Barbour,1005,Hillary Clinton,4871.0,Donald Trump,5454.0,Other,144.0,2016,10469.0
4,Barbour,54001,Hillary Clinton,1222.0,Donald Trump,4527.0,Other,305.0,2016,6054.0


##### Calculate additional attributes

In [55]:
# Calculate voter share attributes
df['voter_share_major_party'] = (df['votes (d)'] + df['votes (r)']) / df['votes_total']
df['voter_share_dem'] = df['votes (d)'] / df['votes_total']
df['voter_share_rep'] = df['votes (r)'] / df['votes_total']
df['voter_share_other'] = df['votes (O)'] / df['votes_total']

# Calculate raw difference attributes
df['rawdiff_dem_vs_rep'] = df['votes (d)'] - df['votes (r)']
df['rawdiff_rep_vs_dem'] = df['votes (r)'] - df['votes (d)']
df['rawdiff_dem_vs_other'] = df['votes (d)'] - df['votes (O)']
df['rawdiff_rep_vs_other'] = df['votes (r)'] - df['votes (O)']
df['rawdiff_other_vs_dem'] = df['votes (O)'] - df['votes (d)']
df['rawdiff_other_vs_rep'] = df['votes (O)'] - df['votes (r)']

# Calculate percent difference attributes
df['pctdiff_dem_vs_rep'] = (df['votes (d)'] - df['votes (r)']) / df['votes_total']
df['pctdiff_rep_vs_dem'] = (df['votes (r)'] - df['votes (d)']) / df['votes_total']
df['pctdiff_dem_vs_other'] = (df['votes (d)'] - df['votes (O)']) / df['votes_total']
df['pctdiff_rep_vs_other'] = (df['votes (r)'] - df['votes (O)']) / df['votes_total']
df['pctdiff_other_vs_dem'] = (df['votes (O)'] - df['votes (d)']) / df['votes_total']
df['pctdiff_other_vs_rep'] = (df['votes (O)'] - df['votes (r)']) / df['votes_total']

df.head()

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep
0,Autauga,1001,Hillary Clinton,5936.0,Donald Trump,18172.0,Other,865.0,2016,24973.0,0.965363,0.237697,0.727666,0.034637,-12236.0,12236.0,5071.0,17307.0,-5071.0,-17307.0,-0.489969,0.489969,0.203059,0.693028,-0.203059,-0.693028
1,Baldwin,1003,Hillary Clinton,18458.0,Donald Trump,72883.0,Other,3874.0,2016,95215.0,0.959313,0.193856,0.765457,0.040687,-54425.0,54425.0,14584.0,69009.0,-14584.0,-69009.0,-0.571601,0.571601,0.153169,0.72477,-0.153169,-0.72477
2,Baldwin,13009,Hillary Clinton,7970.0,Donald Trump,7697.0,Other,449.0,2016,16116.0,0.972139,0.49454,0.4776,0.027861,273.0,-273.0,7521.0,7248.0,-7521.0,-7248.0,0.01694,-0.01694,0.466679,0.449739,-0.466679,-0.449739
3,Barbour,1005,Hillary Clinton,4871.0,Donald Trump,5454.0,Other,144.0,2016,10469.0,0.986245,0.465278,0.520967,0.013755,-583.0,583.0,4727.0,5310.0,-4727.0,-5310.0,-0.055688,0.055688,0.451524,0.507212,-0.451524,-0.507212
4,Barbour,54001,Hillary Clinton,1222.0,Donald Trump,4527.0,Other,305.0,2016,6054.0,0.94962,0.20185,0.74777,0.05038,-3305.0,3305.0,917.0,4222.0,-917.0,-4222.0,-0.54592,0.54592,0.15147,0.69739,-0.15147,-0.69739


***

## Geoenable the data

You will eventually use this data in a spatial analysis. This means that the data needs to include location information to determine where the data is located on a map. You will geoenable the data, or add location to the data, using existing geoenabled county data.

##### Define the ArcGIS Pro project, database, and existing geoenabled data

In [56]:
# Create variables that represent the ArcGIS Pro project and map
aprx = arcpy.mp.ArcGISProject("CURRENT")
mp = aprx.listMaps('Data Engineering')[0]

# Create a variable that represents the default file geodatabase
fgdb = r"Data Engineering and Visualization.gdb"
aprx.defaultGeodatabase = fgdb
arcpy.env.workspace = fgdb

There are various resources that you can use to find geoenabled data. [ArcGIS Living Atlas of the World](https://livingatlas.arcgis.com) is an authoritative source provided by Esri. Each record in your election data represents information for a county, so you will use a Living Atlas dataset that represents county geometry. This dataset has been downloaded and added to your project.

In [57]:
# Create a variable that represents the county geometry dataset
counties_fc_name = "Counties_2016_VotingAgePopulation"
counties_fc = os.path.join(fgdb, counties_fc_name)

**Note: Executing the following cell may take a few minutes.**

In [58]:
# Load the dataset into a spatially-enabled dataframe
counties_df = pd.DataFrame.spatial.from_featureclass(counties_fc)
#counties_df = dd.from_pandas(counties_df,npartitions=1)

##### The county geometry dataset includes various attributes. You will simplify the dataframe to only include the attributes that you need. The Total_cvap_est attribute represents the total population in each county that are of voting age for the year 2016.

In [59]:
# Modify the dataframe to only include the attributes that are needed
counties_df = counties_df[['OBJECTID', 'GEOID', 'GEONAME',
                           'Total_cvap_est',
                           'SHAPE', 'Shape__Area', 'Shape__Length']]

counties_df.head()

Unnamed: 0,OBJECTID,GEOID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length
0,1,1001,"Autauga County, Alabama",40690,"{'rings': [[[-9619465, 3856529.0001000017], [-...",2208654000.0,249886.4
1,2,1003,"Baldwin County, Alabama",151770,"{'rings': [[[-9746859, 3539643.0001000017], [-...",5671048000.0,1655940.0
2,3,1005,"Barbour County, Alabama",20375,"{'rings': [[[-9468394, 3771591.0001000017], [-...",3257902000.0,320896.4
3,4,1007,"Bibb County, Alabama",17590,"{'rings': [[[-9692114, 3928124.0001000017], [-...",2311999000.0,227918.4
4,5,1009,"Blount County, Alabama",42430,"{'rings': [[[-9623907, 4063676.0001000017], [-...",2456909000.0,292642.9


***

## Join the data

You have a dataframe with election data ('df') and a spatially-enabled dataframe of the county geometry data ('counties_df'). You will merge these datasets into one. 

In [60]:
type(df), type(counties_df)

(<class 'dask.dataframe.core.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

In [61]:
df['fips'].compute().nunique(), counties_df['GEOID'].nunique() 

(3155, 3220)

In [62]:
# rename columns
counties_df = counties_df.rename(columns={'GEOID': 'fips'})
counties_df.head()

Unnamed: 0,OBJECTID,fips,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length
0,1,1001,"Autauga County, Alabama",40690,"{'rings': [[[-9619465, 3856529.0001000017], [-...",2208654000.0,249886.4
1,2,1003,"Baldwin County, Alabama",151770,"{'rings': [[[-9746859, 3539643.0001000017], [-...",5671048000.0,1655940.0
2,3,1005,"Barbour County, Alabama",20375,"{'rings': [[[-9468394, 3771591.0001000017], [-...",3257902000.0,320896.4
3,4,1007,"Bibb County, Alabama",17590,"{'rings': [[[-9692114, 3928124.0001000017], [-...",2311999000.0,227918.4
4,5,1009,"Blount County, Alabama",42430,"{'rings': [[[-9623907, 4063676.0001000017], [-...",2456909000.0,292642.9


In [63]:
geo_df = dd.merge(df.compute(),counties_df, how='left', on='fips')
geo_df.head()

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep,OBJECTID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length
0,Autauga,1001,Hillary Clinton,5936.0,Donald Trump,18172.0,Other,865.0,2016,24973.0,0.965363,0.237697,0.727666,0.034637,-12236.0,12236.0,5071.0,17307.0,-5071.0,-17307.0,-0.489969,0.489969,0.203059,0.693028,-0.203059,-0.693028,1.0,"Autauga County, Alabama",40690.0,"{'rings': [[[-9619465, 3856529.0001000017], [-...",2208654000.0,249886.4
1,Baldwin,1003,Hillary Clinton,18458.0,Donald Trump,72883.0,Other,3874.0,2016,95215.0,0.959313,0.193856,0.765457,0.040687,-54425.0,54425.0,14584.0,69009.0,-14584.0,-69009.0,-0.571601,0.571601,0.153169,0.72477,-0.153169,-0.72477,2.0,"Baldwin County, Alabama",151770.0,"{'rings': [[[-9746859, 3539643.0001000017], [-...",5671048000.0,1655940.0
2,Baldwin,13009,Hillary Clinton,7970.0,Donald Trump,7697.0,Other,449.0,2016,16116.0,0.972139,0.49454,0.4776,0.027861,273.0,-273.0,7521.0,7248.0,-7521.0,-7248.0,0.01694,-0.01694,0.466679,0.449739,-0.466679,-0.449739,392.0,"Baldwin County, Georgia",36225.0,"{'rings': [[[-9270032, 3920184.0001000017], [-...",992118800.0,189429.4
3,Barbour,1005,Hillary Clinton,4871.0,Donald Trump,5454.0,Other,144.0,2016,10469.0,0.986245,0.465278,0.520967,0.013755,-583.0,583.0,4727.0,5310.0,-4727.0,-5310.0,-0.055688,0.055688,0.451524,0.507212,-0.451524,-0.507212,3.0,"Barbour County, Alabama",20375.0,"{'rings': [[[-9468394, 3771591.0001000017], [-...",3257902000.0,320896.4
4,Barbour,54001,Hillary Clinton,1222.0,Donald Trump,4527.0,Other,305.0,2016,6054.0,0.94962,0.20185,0.74777,0.05038,-3305.0,3305.0,917.0,4222.0,-917.0,-4222.0,-0.54592,0.54592,0.15147,0.69739,-0.15147,-0.69739,2993.0,"Barbour County, West Virginia",13410.0,"{'rings': [[[-8893931, 4764677.000100002], [-8...",1477859000.0,190122.8


## Query and calculate attributes

Because you have the voting age population for 2016, you can now calculate the average voter participation (voter turnout) for 2016. The dataframe includes records from 2010-2016 but only has voting age population for 2016. You will need to create a subset dataframe for 2016 before calculating the voter turnout.

In [65]:
# Create a copy of the data, and perform a query
data_2016_df = geo_df.copy()
data_2016_df.query("year == '2016'", inplace=True)
data_2016_df.head()

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep,OBJECTID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length
0,Autauga,1001,Hillary Clinton,5936.0,Donald Trump,18172.0,Other,865.0,2016,24973.0,0.965363,0.237697,0.727666,0.034637,-12236.0,12236.0,5071.0,17307.0,-5071.0,-17307.0,-0.489969,0.489969,0.203059,0.693028,-0.203059,-0.693028,1.0,"Autauga County, Alabama",40690.0,"{'rings': [[[-9619465, 3856529.0001000017], [-...",2208654000.0,249886.4
1,Baldwin,1003,Hillary Clinton,18458.0,Donald Trump,72883.0,Other,3874.0,2016,95215.0,0.959313,0.193856,0.765457,0.040687,-54425.0,54425.0,14584.0,69009.0,-14584.0,-69009.0,-0.571601,0.571601,0.153169,0.72477,-0.153169,-0.72477,2.0,"Baldwin County, Alabama",151770.0,"{'rings': [[[-9746859, 3539643.0001000017], [-...",5671048000.0,1655940.0
2,Baldwin,13009,Hillary Clinton,7970.0,Donald Trump,7697.0,Other,449.0,2016,16116.0,0.972139,0.49454,0.4776,0.027861,273.0,-273.0,7521.0,7248.0,-7521.0,-7248.0,0.01694,-0.01694,0.466679,0.449739,-0.466679,-0.449739,392.0,"Baldwin County, Georgia",36225.0,"{'rings': [[[-9270032, 3920184.0001000017], [-...",992118800.0,189429.4
3,Barbour,1005,Hillary Clinton,4871.0,Donald Trump,5454.0,Other,144.0,2016,10469.0,0.986245,0.465278,0.520967,0.013755,-583.0,583.0,4727.0,5310.0,-4727.0,-5310.0,-0.055688,0.055688,0.451524,0.507212,-0.451524,-0.507212,3.0,"Barbour County, Alabama",20375.0,"{'rings': [[[-9468394, 3771591.0001000017], [-...",3257902000.0,320896.4
4,Barbour,54001,Hillary Clinton,1222.0,Donald Trump,4527.0,Other,305.0,2016,6054.0,0.94962,0.20185,0.74777,0.05038,-3305.0,3305.0,917.0,4222.0,-917.0,-4222.0,-0.54592,0.54592,0.15147,0.69739,-0.15147,-0.69739,2993.0,"Barbour County, West Virginia",13410.0,"{'rings': [[[-8893931, 4764677.000100002], [-8...",1477859000.0,190122.8


You will calculate a new field named voter turnout using field operators in Dask. The operations will apply to all values across the columns. 

In [68]:
# Calculate voter turnout attributes
data_2016_df['voter_turnout'] = data_2016_df['votes_total'] / data_2016_df['Total_cvap_est']
data_2016_df['voter_turnout_majparty'] = (data_2016_df['votes (d)']+data_2016_df['votes (r)']) / data_2016_df['Total_cvap_est']
data_2016_df['voter_turnout_dem'] = data_2016_df['votes (d)'] / data_2016_df['Total_cvap_est']
data_2016_df['voter_turnout_gop'] = data_2016_df['votes (r)'] / data_2016_df['Total_cvap_est']
data_2016_df['voter_turnout_other'] = data_2016_df['votes (O)'] / data_2016_df['Total_cvap_est']
data_2016_df.head()

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep,OBJECTID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length,voter_turnout,voter_turnout_majparty,voter_turnout_dem,voter_turnout_gop,voter_turnout_other
0,Autauga,1001,Hillary Clinton,5936.0,Donald Trump,18172.0,Other,865.0,2016,24973.0,0.965363,0.237697,0.727666,0.034637,-12236.0,12236.0,5071.0,17307.0,-5071.0,-17307.0,-0.489969,0.489969,0.203059,0.693028,-0.203059,-0.693028,1.0,"Autauga County, Alabama",40690.0,"{'rings': [[[-9619465, 3856529.0001000017], [-...",2208654000.0,249886.4,0.613738,0.59248,0.145884,0.446596,0.021258
1,Baldwin,1003,Hillary Clinton,18458.0,Donald Trump,72883.0,Other,3874.0,2016,95215.0,0.959313,0.193856,0.765457,0.040687,-54425.0,54425.0,14584.0,69009.0,-14584.0,-69009.0,-0.571601,0.571601,0.153169,0.72477,-0.153169,-0.72477,2.0,"Baldwin County, Alabama",151770.0,"{'rings': [[[-9746859, 3539643.0001000017], [-...",5671048000.0,1655940.0,0.627364,0.601838,0.121618,0.48022,0.025525
2,Baldwin,13009,Hillary Clinton,7970.0,Donald Trump,7697.0,Other,449.0,2016,16116.0,0.972139,0.49454,0.4776,0.027861,273.0,-273.0,7521.0,7248.0,-7521.0,-7248.0,0.01694,-0.01694,0.466679,0.449739,-0.466679,-0.449739,392.0,"Baldwin County, Georgia",36225.0,"{'rings': [[[-9270032, 3920184.0001000017], [-...",992118800.0,189429.4,0.444886,0.432491,0.220014,0.212478,0.012395
3,Barbour,1005,Hillary Clinton,4871.0,Donald Trump,5454.0,Other,144.0,2016,10469.0,0.986245,0.465278,0.520967,0.013755,-583.0,583.0,4727.0,5310.0,-4727.0,-5310.0,-0.055688,0.055688,0.451524,0.507212,-0.451524,-0.507212,3.0,"Barbour County, Alabama",20375.0,"{'rings': [[[-9468394, 3771591.0001000017], [-...",3257902000.0,320896.4,0.513816,0.506748,0.239067,0.267681,0.007067
4,Barbour,54001,Hillary Clinton,1222.0,Donald Trump,4527.0,Other,305.0,2016,6054.0,0.94962,0.20185,0.74777,0.05038,-3305.0,3305.0,917.0,4222.0,-917.0,-4222.0,-0.54592,0.54592,0.15147,0.69739,-0.15147,-0.69739,2993.0,"Barbour County, West Virginia",13410.0,"{'rings': [[[-8893931, 4764677.000100002], [-8...",1477859000.0,190122.8,0.451454,0.42871,0.091126,0.337584,0.022744


***

## Validate the data

Before continuing with other data preparation, you should confirm that the output data has been successfully created. 

First, you will validate the values for voter turnout. You will remove null values, and because these values represent a fraction (total votes divided by voting age population), you will confirm that the values range between 0 and 1.

In [69]:
# Check for null values
data_2016_df.loc[data_2016_df['voter_turnout'].isnull()]

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep,OBJECTID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length,voter_turnout,voter_turnout_majparty,voter_turnout_dem,voter_turnout_gop,voter_turnout_other
446,District 1,2701,Hillary Clinton,2573.0,Donald Trump,3180.0,Other,885.0,2016,6638.0,0.866677,0.387617,0.47906,0.133323,-607.0,607.0,1688.0,2295.0,-1688.0,-2295.0,-0.091443,0.091443,0.254293,0.345737,-0.254293,-0.345737,,,,,,,,,,,
447,District 2,2702,Hillary Clinton,1585.0,Donald Trump,3188.0,Other,719.0,2016,5492.0,0.869082,0.288602,0.580481,0.130918,-1603.0,1603.0,866.0,2469.0,-866.0,-2469.0,-0.291879,0.291879,0.157684,0.449563,-0.157684,-0.449563,,,,,,,,,,,
448,District 3,2703,Hillary Clinton,1241.0,Donald Trump,5403.0,Other,969.0,2016,7613.0,0.872718,0.163011,0.709707,0.127282,-4162.0,4162.0,272.0,4434.0,-272.0,-4434.0,-0.546696,0.546696,0.035728,0.582425,-0.035728,-0.582425,,,,,,,,,,,
449,District 4,2704,Hillary Clinton,4162.0,Donald Trump,4070.0,Other,1289.0,2016,9521.0,0.864615,0.437139,0.427476,0.135385,92.0,-92.0,2873.0,2781.0,-2873.0,-2781.0,0.009663,-0.009663,0.301754,0.292091,-0.301754,-0.292091,,,,,,,,,,,
450,District 5,2705,Hillary Clinton,3187.0,Donald Trump,3683.0,Other,1036.0,2016,7906.0,0.86896,0.403112,0.465849,0.13104,-496.0,496.0,2151.0,2647.0,-2151.0,-2647.0,-0.062737,0.062737,0.272072,0.334809,-0.272072,-0.334809,,,,,,,,,,,
451,District 6,2706,Hillary Clinton,2536.0,Donald Trump,4929.0,Other,995.0,2016,8460.0,0.882388,0.299764,0.582624,0.117612,-2393.0,2393.0,1541.0,3934.0,-1541.0,-3934.0,-0.282861,0.282861,0.182151,0.465012,-0.182151,-0.465012,,,,,,,,,,,
452,District 7,2707,Hillary Clinton,1510.0,Donald Trump,5935.0,Other,849.0,2016,8294.0,0.897637,0.182059,0.715578,0.102363,-4425.0,4425.0,661.0,5086.0,-661.0,-5086.0,-0.533518,0.533518,0.079696,0.613214,-0.079696,-0.613214,,,,,,,,,,,
453,District 8,2708,Hillary Clinton,1218.0,Donald Trump,6126.0,Other,729.0,2016,8073.0,0.909699,0.150873,0.758826,0.090301,-4908.0,4908.0,489.0,5397.0,-489.0,-5397.0,-0.607952,0.607952,0.060572,0.668525,-0.060572,-0.668525,,,,,,,,,,,
454,District 9,2709,Hillary Clinton,1843.0,Donald Trump,6100.0,Other,1011.0,2016,8954.0,0.88709,0.20583,0.68126,0.11291,-4257.0,4257.0,832.0,5089.0,-832.0,-5089.0,-0.47543,0.47543,0.092919,0.568349,-0.092919,-0.568349,,,,,,,,,,,
455,District 10,2710,Hillary Clinton,1808.0,Donald Trump,6255.0,Other,977.0,2016,9040.0,0.891925,0.2,0.691925,0.108075,-4447.0,4447.0,831.0,5278.0,-831.0,-5278.0,-0.491925,0.491925,0.091925,0.58385,-0.091925,-0.58385,,,,,,,,,,,


In [70]:
# Remove records with no voter turnout value
data_2016_df = data_2016_df.loc[data_2016_df['voter_turnout'].notnull()]

In [71]:
# Run a describe to get the distribution of voter turnout values
data_2016_df['voter_turnout'].describe()

count    3111.000000
mean        0.594240
std         0.093300
min         0.158585
25%         0.530583
50%         0.595269
75%         0.655690
max         1.121277
Name: voter_turnout, dtype: float64

The describe function indicates that there are voter turnout values over one, indicating a voter turnout above 100%. You will further investigate by querying for these records.

In [72]:
# Perform query for voter turnout above 100%
data_2016_df.loc[data_2016_df['voter_turnout'] > 1]

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep,OBJECTID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length,voter_turnout,voter_turnout_majparty,voter_turnout_dem,voter_turnout_gop,voter_turnout_other
951,San Juan,8111,Hillary Clinton,265.0,Donald Trump,215.0,Other,26.0,2016,506.0,0.948617,0.523715,0.424901,0.051383,50.0,-50.0,239.0,189.0,-239.0,-189.0,0.098814,-0.098814,0.472332,0.373518,-0.472332,-0.373518,301.0,"San Juan County, Colorado",495.0,"{'rings': [[[-11964863, 4528625.000100002], [-...",1611963000.0,209299.379233,1.022222,0.969697,0.535354,0.434343,0.052525
2385,Harding,35021,Hillary Clinton,156.0,Donald Trump,311.0,Other,60.0,2016,527.0,0.886148,0.296015,0.590133,0.113852,-155.0,155.0,96.0,251.0,-96.0,-251.0,-0.294118,0.294118,0.182163,0.476281,-0.182163,-0.476281,1807.0,"Harding County, New Mexico",470.0,"{'rings': [[[-11578210, 4330676.000100002], [-...",8400382000.0,492631.196575,1.121277,0.993617,0.331915,0.661702,0.12766
2903,Loving,48301,Hillary Clinton,4.0,Donald Trump,58.0,Other,3.0,2016,65.0,0.953846,0.061538,0.892308,0.046154,-54.0,54.0,1.0,55.0,-1.0,-55.0,-0.830769,0.830769,0.015385,0.846154,-0.015385,-0.846154,2674.0,"Loving County, Texas",60.0,"{'rings': [[[-11502370, 3717641.0001000017], [...",2435674000.0,254898.035389,1.083333,1.033333,0.066667,0.966667,0.05
2908,McMullen,48311,Hillary Clinton,40.0,Donald Trump,454.0,Other,5.0,2016,499.0,0.98998,0.08016,0.90982,0.01002,-414.0,414.0,35.0,449.0,-35.0,-449.0,-0.829659,0.829659,0.07014,0.8998,-0.07014,-0.8998,2679.0,"McMullen County, Texas",460.0,"{'rings': [[[-10946606, 3326438.0001000017], [...",3882883000.0,253408.774844,1.084783,1.073913,0.086957,0.986957,0.01087


There are four counties with very low population that resulted in voter turnout values above 100%. You could remove these records from the data or do additional research to identify the source of this issue. 

***

## Update validated data

After reviewing the Census Bureau voting age population data for 2016, you determined that these counties have a low voting age population with a fairly high margin of error. This may be the reason why these counties have a voter turnout rate higher than 100%. You will recalculate the voter turnout field for these counties using the upper range of their margin of error: 
- San Juan County, Colorado: 574
- Harding County, New Mexico: 562
- Loving County, Texas: 86
- McMullen County, Texas: 566

**Note: This information was extracted from this [table](https://data.census.gov/cedsci/table?q=voting%20age%20population%202016&g=0500000US08111,35021,48301,48311&hidePreview=true&table=DP05&tid=ACSDP5Y2016.DP05&t=Age%20and%20Sex&y=2016&lastDisplayedRow=6&vintage=2016&mode=&moe=true).**

In [75]:
# Correct each county
(data_2016_df.loc[data_2016_df['fips'] == "08111", "Total_cvap_est"])= 574
(data_2016_df.loc[data_2016_df['fips'] == "35021", "Total_cvap_est"]) = 562
(data_2016_df.loc[data_2016_df['fips'] == "48301", "Total_cvap_est"]) = 86
(data_2016_df.loc[data_2016_df['fips'] == "48311", "Total_cvap_est"]) = 566

In [80]:
data_2016_df.head()

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep,OBJECTID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length,voter_turnout,voter_turnout_majparty,voter_turnout_dem,voter_turnout_gop,voter_turnout_other
0,Autauga,1001,Hillary Clinton,5936.0,Donald Trump,18172.0,Other,865.0,2016,24973.0,0.965363,0.237697,0.727666,0.034637,-12236.0,12236.0,5071.0,17307.0,-5071.0,-17307.0,-0.489969,0.489969,0.203059,0.693028,-0.203059,-0.693028,1.0,"Autauga County, Alabama",40690.0,"{'rings': [[[-9619465, 3856529.0001000017], [-...",2208654000.0,249886.4,0.613738,0.59248,0.145884,0.446596,0.021258
1,Baldwin,1003,Hillary Clinton,18458.0,Donald Trump,72883.0,Other,3874.0,2016,95215.0,0.959313,0.193856,0.765457,0.040687,-54425.0,54425.0,14584.0,69009.0,-14584.0,-69009.0,-0.571601,0.571601,0.153169,0.72477,-0.153169,-0.72477,2.0,"Baldwin County, Alabama",151770.0,"{'rings': [[[-9746859, 3539643.0001000017], [-...",5671048000.0,1655940.0,0.627364,0.601838,0.121618,0.48022,0.025525
2,Baldwin,13009,Hillary Clinton,7970.0,Donald Trump,7697.0,Other,449.0,2016,16116.0,0.972139,0.49454,0.4776,0.027861,273.0,-273.0,7521.0,7248.0,-7521.0,-7248.0,0.01694,-0.01694,0.466679,0.449739,-0.466679,-0.449739,392.0,"Baldwin County, Georgia",36225.0,"{'rings': [[[-9270032, 3920184.0001000017], [-...",992118800.0,189429.4,0.444886,0.432491,0.220014,0.212478,0.012395
3,Barbour,1005,Hillary Clinton,4871.0,Donald Trump,5454.0,Other,144.0,2016,10469.0,0.986245,0.465278,0.520967,0.013755,-583.0,583.0,4727.0,5310.0,-4727.0,-5310.0,-0.055688,0.055688,0.451524,0.507212,-0.451524,-0.507212,3.0,"Barbour County, Alabama",20375.0,"{'rings': [[[-9468394, 3771591.0001000017], [-...",3257902000.0,320896.4,0.513816,0.506748,0.239067,0.267681,0.007067
4,Barbour,54001,Hillary Clinton,1222.0,Donald Trump,4527.0,Other,305.0,2016,6054.0,0.94962,0.20185,0.74777,0.05038,-3305.0,3305.0,917.0,4222.0,-917.0,-4222.0,-0.54592,0.54592,0.15147,0.69739,-0.15147,-0.69739,2993.0,"Barbour County, West Virginia",13410.0,"{'rings': [[[-8893931, 4764677.000100002], [-8...",1477859000.0,190122.8,0.451454,0.42871,0.091126,0.337584,0.022744


In [81]:
data_2016_df.columns

Index(['county', 'fips', 'candidate(d)', 'votes (d)', 'candidate(r)',
       'votes (r)', 'candidate(O)', 'votes (O)', 'year', 'votes_total',
       'voter_share_major_party', 'voter_share_dem', 'voter_share_rep',
       'voter_share_other', 'rawdiff_dem_vs_rep', 'rawdiff_rep_vs_dem',
       'rawdiff_dem_vs_other', 'rawdiff_rep_vs_other', 'rawdiff_other_vs_dem',
       'rawdiff_other_vs_rep', 'pctdiff_dem_vs_rep', 'pctdiff_rep_vs_dem',
       'pctdiff_dem_vs_other', 'pctdiff_rep_vs_other', 'pctdiff_other_vs_dem',
       'pctdiff_other_vs_rep', 'OBJECTID', 'GEONAME', 'Total_cvap_est',
       'SHAPE', 'Shape__Area', 'Shape__Length', 'voter_turnout',
       'voter_turnout_majparty', 'voter_turnout_dem', 'voter_turnout_gop',
       'voter_turnout_other'],
      dtype='object')

In [83]:
# Recalculate voter turnout fields
data_2016_df['voter_turnout'] = (data_2016_df['votes_total'] / data_2016_df['Total_cvap_est'])
data_2016_df['voter_turnout_majparty'] = ((data_2016_df['votes (d)']+data_2016_df['votes (r)']) / data_2016_df['Total_cvap_est'])
data_2016_df['voter_turnout_dem'] = (data_2016_df['votes (d)'] / data_2016_df['Total_cvap_est'])
data_2016_df['voter_turnout_gop'] = (data_2016_df['votes (r)'] / data_2016_df['Total_cvap_est'])
data_2016_df['voter_turnout_other'] = (data_2016_df['votes (O)'] / data_2016_df['Total_cvap_est'])

To confirm that this correction addressed the issue, you will again query for counties with a voter turnout value above 100%.

In [85]:
data_2016_df.loc[data_2016_df['voter_turnout'] > 1]

Unnamed: 0,county,fips,candidate(d),votes (d),candidate(r),votes (r),candidate(O),votes (O),year,votes_total,voter_share_major_party,voter_share_dem,voter_share_rep,voter_share_other,rawdiff_dem_vs_rep,rawdiff_rep_vs_dem,rawdiff_dem_vs_other,rawdiff_rep_vs_other,rawdiff_other_vs_dem,rawdiff_other_vs_rep,pctdiff_dem_vs_rep,pctdiff_rep_vs_dem,pctdiff_dem_vs_other,pctdiff_rep_vs_other,pctdiff_other_vs_dem,pctdiff_other_vs_rep,OBJECTID,GEONAME,Total_cvap_est,SHAPE,Shape__Area,Shape__Length,voter_turnout,voter_turnout_majparty,voter_turnout_dem,voter_turnout_gop,voter_turnout_other


No records are returned, indicating that there are no counties with a turnout value above 100%. Well done! You have cleaned the data. Next, you will convert the dataframe to a permanent dataset called a feature class. Feature classes are stored in an ArcGIS Pro file geodatabase.

***

## Convert dataframes to feature classes

You will use the ArcGIS API for Python, imported at the beginning of this script, to export the spatially-enabled dataframe to a feature class.

**Note: Executing the following cell may take a few minutes**

In [86]:
# Create a feature class for the 2016 presidential election 
out_2016_fc_name = "county_elections_pres_2016"
out_2016_fc = data_2016_df.spatial.to_featureclass(os.path.join(fgdb, out_2016_fc_name))
out_2016_fc

RuntimeError: The operation was attempted on an empty geometry.

1. At the top of the page, click the Data Engineering map tab.

2. Drag the Data Engineering map tab to display as its own window. 

3. Review the feature class that was added to the Data Engineering map.

![DataFrameToFeatureClass](img/DataFrameToFeatureClass.PNG "Map of counties, with missing county")

**Note: The color of the data will vary every time it is added to the map.** 
`

Part 2 entails:
- Geoenable data
- Join the data
- Query and calculate attributes
- Validate the data
- Update validate data
- Convert dataframe to feature classes
- Correct for missing values