# GOTV.ML

## Data Scrubbing
The dataset utilized is publicly available voter registration data for the State of Rhode Island from the Rhode Island Secretarty of State over the course of eight statewide elections from 2018 to 2022. 

### Objectives
* Output statewide dataframe w/following features:
    * City
    * Zip Code
    * Year of Birth (experimental feature)
    * Current Party 
    * Election 3 
    * Election 4 
    * Election 5 
    * Election 6
    * Election 7 
    * Election 8
    * Party 5
    * Party 6
    * Party 8 <br><br>
* Output dataframe with only voters in South Kingstown w/following features:
    * City
    * Zip Code
    * Year of Birth
    * Current Party
    * Election 3
    * Election 4
    * Election 5
    * Election 6
    * Election 7
    * Election 8
    * Party 5
    * Party 6
    * Party 8 <br><br>
* Our target features will be:
    * Election 2
    * Party 2


### Importing Python Modules/Libraries

All systems/configurations:

In [None]:
import pandas as pd
import numpy as np

path = ""

Google Colab: 

In [None]:
# Uncomment cell if using Google Colab

# from google.colab import drive
# drive.mount("/content/drive")

# path = "/content/drive/My Drive/csc-461-final-project/"

### Cleaning our data

#### Reading CSVs into dataframes

The State of Rhode Island provides voter data in two files: one file containing basic voter registration data, such as current party, address, date of birth, etc. and another file containing each the vote history for each registered voter, such as the prior elections they have voted in along with their party affiliation for each, if applicable.

Importing the statewide voter history into a dataframe and inspecting our data.

In [None]:
Statewide_VoterFile = pd.read_csv(path + "data/VoterHistory_Statewide.csv", low_memory=False)

In [None]:
print(Statewide_VoterFile.shape)
Statewide_VoterFile.head()

(816297, 68)


Unnamed: 0,VOTER ID,LAST NAME,FIRST NAME,MIDDLE NAME,SUFFIX,DATE 1,ELECTION 1,TYPE 1,PRECINCT 1,PARTY 1,...,CONGRESSIONAL DISTRICT,STATE SENATE DISTRICT,STATE REP DISTRICT,PRECINCT,WARD/COUNCIL,WARD DISTRICT,SCHOOL COMMITTEE DISTRICT,SPECIAL DISTRICT,FIRE DISTRICT,STATUS
0,7000241358,AADAL FERRARA,LISA,C,,11/8/2022,STATEWIDE GENERAL ELECTION,E,707.0,,...,2,26,41,707,4.0,1.0,4.0,,,A
1,7001407044,AAIN,QURATUL,,,,,,,,...,2,28,16,721,3.0,4.0,3.0,,,A
2,21001110470,AAKRE,SHERI,M,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3306.0,,...,1,12,70,3306,,,,,,A
3,21001103199,AAKRE,THOR,D,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3306.0,,...,1,12,70,3306,,,,,,A
4,35001545945,AALTO,JILL,ANNE,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3514.0,,...,2,30,22,3514,5.0,3.0,2.0,,,A


Importing the statewide voter file into a dataframe and inspecting our data.

In [None]:
Statewide_VoterHistory = pd.read_csv(path + "data/VoterHistory_Statewide.csv", low_memory=False)

In [None]:
print(Statewide_VoterHistory.shape)
Statewide_VoterHistory.head()

(816297, 68)


Unnamed: 0,VOTER ID,LAST NAME,FIRST NAME,MIDDLE NAME,SUFFIX,DATE 1,ELECTION 1,TYPE 1,PRECINCT 1,PARTY 1,...,CONGRESSIONAL DISTRICT,STATE SENATE DISTRICT,STATE REP DISTRICT,PRECINCT,WARD/COUNCIL,WARD DISTRICT,SCHOOL COMMITTEE DISTRICT,SPECIAL DISTRICT,FIRE DISTRICT,STATUS
0,7000241358,AADAL FERRARA,LISA,C,,11/8/2022,STATEWIDE GENERAL ELECTION,E,707.0,,...,2,26,41,707,4.0,1.0,4.0,,,A
1,7001407044,AAIN,QURATUL,,,,,,,,...,2,28,16,721,3.0,4.0,3.0,,,A
2,21001110470,AAKRE,SHERI,M,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3306.0,,...,1,12,70,3306,,,,,,A
3,21001103199,AAKRE,THOR,D,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3306.0,,...,1,12,70,3306,,,,,,A
4,35001545945,AALTO,JILL,ANNE,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3514.0,,...,2,30,22,3514,5.0,3.0,2.0,,,A


#### Merging dataframes

In [None]:
different_columns = Statewide_VoterHistory.columns.difference(Statewide_VoterFile.columns)
different_columns = different_columns.tolist()
different_columns.append("VOTER ID")

In [None]:
Statewide = pd.merge(Statewide_VoterFile, Statewide_VoterHistory[different_columns], on='VOTER ID')

In [None]:
print(Statewide.shape)
Statewide.head()

(816297, 68)


Unnamed: 0,VOTER ID,LAST NAME,FIRST NAME,MIDDLE NAME,SUFFIX,DATE 1,ELECTION 1,TYPE 1,PRECINCT 1,PARTY 1,...,CONGRESSIONAL DISTRICT,STATE SENATE DISTRICT,STATE REP DISTRICT,PRECINCT,WARD/COUNCIL,WARD DISTRICT,SCHOOL COMMITTEE DISTRICT,SPECIAL DISTRICT,FIRE DISTRICT,STATUS
0,7000241358,AADAL FERRARA,LISA,C,,11/8/2022,STATEWIDE GENERAL ELECTION,E,707.0,,...,2,26,41,707,4.0,1.0,4.0,,,A
1,7001407044,AAIN,QURATUL,,,,,,,,...,2,28,16,721,3.0,4.0,3.0,,,A
2,21001110470,AAKRE,SHERI,M,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3306.0,,...,1,12,70,3306,,,,,,A
3,21001103199,AAKRE,THOR,D,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3306.0,,...,1,12,70,3306,,,,,,A
4,35001545945,AALTO,JILL,ANNE,,11/8/2022,STATEWIDE GENERAL ELECTION,R,3514.0,,...,2,30,22,3514,5.0,3.0,2.0,,,A


In [None]:
Statewide.columns

Index(['VOTER ID', 'LAST NAME', 'FIRST NAME', 'MIDDLE NAME', 'SUFFIX',
       'DATE 1', 'ELECTION 1', 'TYPE 1', 'PRECINCT 1', 'PARTY 1', 'DATE 2',
       'ELECTION 2', 'TYPE 2', 'PRECINCT 2', 'PARTY 2', 'DATE 3', 'ELECTION 3',
       'TYPE 3', 'PRECINCT 3', 'PARTY 3', 'DATE 4', 'ELECTION 4', 'TYPE 4',
       'PRECINCT 4', 'PARTY 4', 'DATE 5', 'ELECTION 5', 'TYPE 5', 'PRECINCT 5',
       'PARTY 5', 'DATE 6', 'ELECTION 6', 'TYPE 6', 'PRECINCT 6', 'PARTY 6',
       'DATE 7', 'ELECTION 7', 'TYPE 7', 'PRECINCT 7', 'PARTY 7', 'DATE 8',
       'ELECTION 8', 'TYPE 8', 'PRECINCT 8', 'PARTY 8', 'CURRENT PARTY',
       'YEAR OF BIRTH', 'STREET NUMBER', 'SUFFIX A', 'SUFFIX B', 'STREET NAME',
       'STREET NAME 2', 'UNIT', 'CITY', 'POSTAL CITY', 'STATE', 'ZIP CODE',
       'ZIP CODE 4', 'CONGRESSIONAL DISTRICT', 'STATE SENATE DISTRICT',
       'STATE REP DISTRICT', 'PRECINCT', 'WARD/COUNCIL', 'WARD DISTRICT',
       'SCHOOL COMMITTEE DISTRICT', 'SPECIAL DISTRICT', 'FIRE DISTRICT',
       'STATUS']

#### Creating new dataframes & selecting features

Dropping unnecessary features so we can create a dataset featuring only South Kingstown voters, but maintaining their address in one of the dataframes for future feature engineering.

In [None]:
SouthKingstown_Exp = Statewide.loc[Statewide["CITY"] == "SOUTH KINGSTOWN"]

In [None]:
SouthKingstown = Statewide.loc[Statewide["CITY"] == "SOUTH KINGSTOWN"]

In [None]:
Statewide = Statewide[["CITY", "ZIP CODE", "CURRENT PARTY", "YEAR OF BIRTH",   
"ELECTION 3", "ELECTION 4", "ELECTION 5", "ELECTION 6", "ELECTION 7", "ELECTION 8", "PARTY 5", "PARTY 6", "PARTY 8", "ELECTION 2", "PARTY 2"]]

In [None]:
SouthKingstown_Exp = SouthKingstown_Exp[["STREET NUMBER", "STREET NAME", "POSTAL CITY", "CITY", "ZIP CODE", "CURRENT PARTY", "YEAR OF BIRTH",   
"ELECTION 3", "ELECTION 4", "ELECTION 5", "ELECTION 6", "ELECTION 7", "ELECTION 8", "PARTY 5", "PARTY 6", "PARTY 8", "ELECTION 2", "PARTY 2"]]

In [None]:
SouthKingstown = SouthKingstown[["CITY", "ZIP CODE", "CURRENT PARTY", "YEAR OF BIRTH",   
"ELECTION 3", "ELECTION 4", "ELECTION 5", "ELECTION 6", "ELECTION 7", "ELECTION 8", "PARTY 5", "PARTY 6", "PARTY 8", "ELECTION 2", "PARTY 2"]]

In [None]:
SouthKingstown_Exp.head()

Unnamed: 0,STREET NUMBER,STREET NAME,POSTAL CITY,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
30,10,CARPENTER DR,WAKEFIELD,SOUTH KINGSTOWN,2879,D,1941,SPECIAL STATEWIDE REFERENDA ELECTION,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,PRESIDENTIAL PRIMARY,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,D,D,D,STATEWIDE PRIMARY,D
31,10,CARPENTER DR,WAKEFIELD,SOUTH KINGSTOWN,2879,D,1940,SPECIAL STATEWIDE REFERENDA ELECTION,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,PRESIDENTIAL PRIMARY,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,D,D,D,STATEWIDE PRIMARY,D
123,50,HOPKINS LN,WAKEFIELD,SOUTH KINGSTOWN,2879,R,1990,,STATEWIDE GENERAL ELECTION,,,,,,,,,
233,14,FAGAN CT,WAKEFIELD,SOUTH KINGSTOWN,2879,R,1981,,STATEWIDE GENERAL ELECTION,,,STATEWIDE GENERAL ELECTION,,,,,,
241,14,FAGAN CT,WAKEFIELD,SOUTH KINGSTOWN,2879,D,1984,,STATEWIDE GENERAL ELECTION,,,,,,,,,


In [None]:
SouthKingstown.head()

Unnamed: 0,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
30,SOUTH KINGSTOWN,2879,D,1941,SPECIAL STATEWIDE REFERENDA ELECTION,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,PRESIDENTIAL PRIMARY,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,D,D,D,STATEWIDE PRIMARY,D
31,SOUTH KINGSTOWN,2879,D,1940,SPECIAL STATEWIDE REFERENDA ELECTION,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,PRESIDENTIAL PRIMARY,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,D,D,D,STATEWIDE PRIMARY,D
123,SOUTH KINGSTOWN,2879,R,1990,,STATEWIDE GENERAL ELECTION,,,,,,,,,
233,SOUTH KINGSTOWN,2879,R,1981,,STATEWIDE GENERAL ELECTION,,,STATEWIDE GENERAL ELECTION,,,,,,
241,SOUTH KINGSTOWN,2879,D,1984,,STATEWIDE GENERAL ELECTION,,,,,,,,,


In [None]:
Statewide.head()

Unnamed: 0,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
0,CRANSTON,2921,U,1962,SPECIAL STATEWIDE REFERENDA ELECTION,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,R,,D,,
1,CRANSTON,2920,D,1984,,STATEWIDE GENERAL ELECTION,,,,,,,,,
2,TIVERTON,2878,R,1968,SPECIAL STATEWIDE REFERENDA ELECTION,STATEWIDE GENERAL ELECTION,,PRESIDENTIAL PRIMARY,STATEWIDE GENERAL ELECTION,STATEWIDE PRIMARY,,R,R,STATEWIDE PRIMARY,R
3,TIVERTON,2878,R,1962,SPECIAL STATEWIDE REFERENDA ELECTION,STATEWIDE GENERAL ELECTION,,PRESIDENTIAL PRIMARY,STATEWIDE GENERAL ELECTION,,,R,,STATEWIDE PRIMARY,R
4,WARWICK,2889,R,1996,,STATEWIDE GENERAL ELECTION,,,,,,,,,


#### Recoding features

TODO: Explain rationale here

In [None]:
dict_party = {np.nan: 0, "N": 0, "NP": 0, "U": 1, "D": 2, "R": 3, "M": 4}
dict_elections = {"STATEWIDE GENERAL ELECTION": 1, "STATEWIDE PRIMARY": 1, "SPECIAL STATEWIDE REFERENDA ELECTION": 1, "STATEWIDE GENERAL ELECTION": 1, "STATEWIDE PRIMARY": 1, "PRESIDENTIAL PRIMARY": 1, "STATEWIDE GENERAL ELECTION": 1, "STATEWIDE PRIMARY": 1, np.nan: 0}
dict_cities = {"BARRINGTON": 0, "BRISTOL": 1, "BURRILLVILLE": 2, "CENTRAL FALLS": 3, "CHARLESTOWN": 4, "COVENTRY": 5, "CRANSTON": 6, "CUMBERLAND": 7, "EAST GREENWICH": 8, "EAST PROVIDENCE": 9, "EXETER": 10, "FOSTER": 11, "GLOCESTER": 12, "HOPKINTON": 13, "JAMESTOWN": 14, "JOHNSTON": 15, "LINCOLN": 16, "LITTLE COMPTON": 17, "MIDDLETOWN": 18, "NARRAGANSETT": 19, "NEW SHOREHAM": 20, "NEWPORT": 21, "NORTH KINGSTOWN": 22, "NORTH PROVIDENCE": 23, "NORTH SMITHFIELD": 24, "PAWTUCKET": 25, "PORTSMOUTH": 26, "PROVIDENCE": 27, "RICHMOND": 28, "SCITUATE": 29, "SMITHFIELD": 30, "SOUTH KINGSTOWN": 31, "TIVERTON": 32, "WARREN": 33, "WARWICK": 34, "WEST GREENWICH": 35, "WEST WARWICK": 36, "WESTERLY": 37, "WOONSOCKET": 38}


In [None]:
Statewide.replace({"PARTY 2": dict_party, "PARTY 5": dict_party, "PARTY 6": dict_party, "PARTY 8": dict_party, "ELECTION 1": dict_elections, "ELECTION 2": dict_elections, "ELECTION 3": dict_elections, "ELECTION 4": dict_elections, "ELECTION 5": dict_elections, "ELECTION 6": dict_elections, "ELECTION 7": dict_elections, "ELECTION 8": dict_elections, "CITY": dict_cities, "CURRENT PARTY": dict_party}, inplace=True)
SouthKingstown_Exp.replace({"PARTY 2": dict_party, "PARTY 5": dict_party, "PARTY 6": dict_party, "PARTY 8": dict_party, "ELECTION 1": dict_elections, "ELECTION 2": dict_elections, "ELECTION 3": dict_elections, "ELECTION 4": dict_elections, "ELECTION 5": dict_elections, "ELECTION 6": dict_elections, "ELECTION 7": dict_elections, "ELECTION 8": dict_elections, "CITY": dict_cities, "CURRENT PARTY": dict_party}, inplace=True)
SouthKingstown.replace({"PARTY 2": dict_party, "PARTY 5": dict_party, "PARTY 6": dict_party, "PARTY 8": dict_party, "ELECTION 1": dict_elections, "ELECTION 2": dict_elections, "ELECTION 3": dict_elections, "ELECTION 4": dict_elections, "ELECTION 5": dict_elections, "ELECTION 6": dict_elections, "ELECTION 7": dict_elections, "ELECTION 8": dict_elections, "CITY": dict_cities, "CURRENT PARTY": dict_party}, inplace=True)

In [None]:
Statewide.head()

Unnamed: 0,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
0,6,2921,1,1962,1.0,1.0,1.0,0.0,1.0,1.0,3,0,2,0.0,0
1,6,2920,2,1984,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0
2,32,2878,3,1968,1.0,1.0,0.0,1.0,1.0,1.0,0,3,3,1.0,3
3,32,2878,3,1962,1.0,1.0,0.0,1.0,1.0,0.0,0,3,0,1.0,3
4,34,2889,3,1996,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0


In [None]:
SouthKingstown_Exp.head()

Unnamed: 0,STREET NUMBER,STREET NAME,POSTAL CITY,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
30,10,CARPENTER DR,WAKEFIELD,31,2879,2,1941,1.0,1.0,1.0,1.0,1.0,1.0,2,2,2,1.0,2
31,10,CARPENTER DR,WAKEFIELD,31,2879,2,1940,1.0,1.0,1.0,1.0,1.0,1.0,2,2,2,1.0,2
123,50,HOPKINS LN,WAKEFIELD,31,2879,3,1990,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0
233,14,FAGAN CT,WAKEFIELD,31,2879,3,1981,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0.0,0
241,14,FAGAN CT,WAKEFIELD,31,2879,2,1984,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0


In [None]:
SouthKingstown.head()

Unnamed: 0,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
30,31,2879,2,1941,1.0,1.0,1.0,1.0,1.0,1.0,2,2,2,1.0,2
31,31,2879,2,1940,1.0,1.0,1.0,1.0,1.0,1.0,2,2,2,1.0,2
123,31,2879,3,1990,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0
233,31,2879,3,1981,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0.0,0
241,31,2879,2,1984,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0


### Optimizing memory usage

#### Statewide

Checking and changing our datatypes for the Statewide dataframe.

In [None]:
Statewide.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 816297 entries, 0 to 816296
Data columns (total 15 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   CITY           816297 non-null  int64  
 1   ZIP CODE       816297 non-null  int64  
 2   CURRENT PARTY  816297 non-null  int64  
 3   YEAR OF BIRTH  816297 non-null  int64  
 4   ELECTION 3     816297 non-null  float64
 5   ELECTION 4     816297 non-null  float64
 6   ELECTION 5     816297 non-null  float64
 7   ELECTION 6     816297 non-null  float64
 8   ELECTION 7     816297 non-null  float64
 9   ELECTION 8     816297 non-null  float64
 10  PARTY 5        816297 non-null  int64  
 11  PARTY 6        816297 non-null  int64  
 12  PARTY 8        816297 non-null  int64  
 13  ELECTION 2     816297 non-null  float64
 14  PARTY 2        816297 non-null  int64  
dtypes: float64(7), int64(8)
memory usage: 99.6 MB


In [None]:
print(Statewide["ZIP CODE"].max())
print(Statewide["ZIP CODE"].min())

2921
2802


In [None]:
print(np.iinfo(np.int8).max)
print(np.iinfo(np.int8).min)

print(np.iinfo(np.int16).max)
print(np.iinfo(np.int16).min)

print(np.iinfo(np.int32).max)
print(np.iinfo(np.int32).min)

127
-128
32767
-32768
2147483647
-2147483648


In [None]:
Statewide = Statewide.astype({"CITY": np.int8, "ZIP CODE": np.int16, "CURRENT PARTY": np.int8, "YEAR OF BIRTH": np.int16, "ELECTION 3": np.int8, "ELECTION 4": np.int8, "ELECTION 5": np.int8, "ELECTION 6": np.int8, "ELECTION 7": np.int8, "ELECTION 8": np.int8, "PARTY 5": np.int8, "PARTY 6": np.int8, "PARTY 8": np.int8, "ELECTION 2": np.int8, "PARTY 2": np.int8})

In [None]:
Statewide.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 816297 entries, 0 to 816296
Data columns (total 15 columns):
 #   Column         Non-Null Count   Dtype
---  ------         --------------   -----
 0   CITY           816297 non-null  int8 
 1   ZIP CODE       816297 non-null  int16
 2   CURRENT PARTY  816297 non-null  int8 
 3   YEAR OF BIRTH  816297 non-null  int16
 4   ELECTION 3     816297 non-null  int8 
 5   ELECTION 4     816297 non-null  int8 
 6   ELECTION 5     816297 non-null  int8 
 7   ELECTION 6     816297 non-null  int8 
 8   ELECTION 7     816297 non-null  int8 
 9   ELECTION 8     816297 non-null  int8 
 10  PARTY 5        816297 non-null  int8 
 11  PARTY 6        816297 non-null  int8 
 12  PARTY 8        816297 non-null  int8 
 13  ELECTION 2     816297 non-null  int8 
 14  PARTY 2        816297 non-null  int8 
dtypes: int16(2), int8(13)
memory usage: 19.5 MB


In [None]:
Statewide.head()

Unnamed: 0,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
0,6,2921,1,1962,1,1,1,0,1,1,3,0,2,0,0
1,6,2920,2,1984,0,1,0,0,0,0,0,0,0,0,0
2,32,2878,3,1968,1,1,0,1,1,1,0,3,3,1,3
3,32,2878,3,1962,1,1,0,1,1,0,0,3,0,1,3
4,34,2889,3,1996,0,1,0,0,0,0,0,0,0,0,0


By changing our datatypes, we realized the following memory savings:

**Before:** 99.6 MB  
**After:** 19.5 MB

We decreased our memory usage by **FILL IN HERE**.


#### SouthKingstown

In [None]:
SouthKingstown.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23271 entries, 30 to 816289
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   CITY           23271 non-null  int64  
 1   ZIP CODE       23271 non-null  int64  
 2   CURRENT PARTY  23271 non-null  int64  
 3   YEAR OF BIRTH  23271 non-null  int64  
 4   ELECTION 3     23271 non-null  float64
 5   ELECTION 4     23271 non-null  float64
 6   ELECTION 5     23271 non-null  float64
 7   ELECTION 6     23271 non-null  float64
 8   ELECTION 7     23271 non-null  float64
 9   ELECTION 8     23271 non-null  float64
 10  PARTY 5        23271 non-null  int64  
 11  PARTY 6        23271 non-null  int64  
 12  PARTY 8        23271 non-null  int64  
 13  ELECTION 2     23271 non-null  float64
 14  PARTY 2        23271 non-null  int64  
dtypes: float64(7), int64(8)
memory usage: 2.8 MB


In [None]:
SouthKingstown = SouthKingstown.astype({"CITY": np.int8, "ZIP CODE": np.int16, "CURRENT PARTY": np.int8, "YEAR OF BIRTH": np.int16, "ELECTION 3": np.int8, "ELECTION 4": np.int8, "ELECTION 5": np.int8, "ELECTION 6": np.int8, "ELECTION 7": np.int8, "ELECTION 8": np.int8, "PARTY 5": np.int8, "PARTY 6": np.int8, "PARTY 8": np.int8, "ELECTION 2": np.int8, "PARTY 2": np.int8})

In [None]:
SouthKingstown.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23271 entries, 30 to 816289
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   CITY           23271 non-null  int8 
 1   ZIP CODE       23271 non-null  int16
 2   CURRENT PARTY  23271 non-null  int8 
 3   YEAR OF BIRTH  23271 non-null  int16
 4   ELECTION 3     23271 non-null  int8 
 5   ELECTION 4     23271 non-null  int8 
 6   ELECTION 5     23271 non-null  int8 
 7   ELECTION 6     23271 non-null  int8 
 8   ELECTION 7     23271 non-null  int8 
 9   ELECTION 8     23271 non-null  int8 
 10  PARTY 5        23271 non-null  int8 
 11  PARTY 6        23271 non-null  int8 
 12  PARTY 8        23271 non-null  int8 
 13  ELECTION 2     23271 non-null  int8 
 14  PARTY 2        23271 non-null  int8 
dtypes: int16(2), int8(13)
memory usage: 568.1 KB


In [None]:
SouthKingstown.head()

Unnamed: 0,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
30,31,2879,2,1941,1,1,1,1,1,1,2,2,2,1,2
31,31,2879,2,1940,1,1,1,1,1,1,2,2,2,1,2
123,31,2879,3,1990,0,1,0,0,0,0,0,0,0,0,0
233,31,2879,3,1981,0,1,0,0,1,0,0,0,0,0,0
241,31,2879,2,1984,0,1,0,0,0,0,0,0,0,0,0


By changing our datatypes, we realized the following memory savings:

**Before:** 2.8 MB  
**After:** 0.5681 MB

We decreased our memory usage by **[FILL IN HERE]**.


#### SouthKingstown_Exp

In [None]:
SouthKingstown_Exp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23271 entries, 30 to 816289
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   STREET NUMBER  23271 non-null  int64  
 1   STREET NAME    23271 non-null  object 
 2   POSTAL CITY    23271 non-null  object 
 3   CITY           23271 non-null  int64  
 4   ZIP CODE       23271 non-null  int64  
 5   CURRENT PARTY  23271 non-null  int64  
 6   YEAR OF BIRTH  23271 non-null  int64  
 7   ELECTION 3     23271 non-null  float64
 8   ELECTION 4     23271 non-null  float64
 9   ELECTION 5     23271 non-null  float64
 10  ELECTION 6     23271 non-null  float64
 11  ELECTION 7     23271 non-null  float64
 12  ELECTION 8     23271 non-null  float64
 13  PARTY 5        23271 non-null  int64  
 14  PARTY 6        23271 non-null  int64  
 15  PARTY 8        23271 non-null  int64  
 16  ELECTION 2     23271 non-null  float64
 17  PARTY 2        23271 non-null  int64  
dtypes: f

In [None]:
SouthKingstown_Exp = SouthKingstown_Exp.astype({"CITY": np.int8, "ZIP CODE": np.int16, "CURRENT PARTY": np.int8, "YEAR OF BIRTH": np.int16, "ELECTION 3": np.int8, "ELECTION 4": np.int8, "ELECTION 5": np.int8, "ELECTION 6": np.int8, "ELECTION 7": np.int8, "ELECTION 8": np.int8, "PARTY 5": np.int8, "PARTY 6": np.int8, "PARTY 8": np.int8, "ELECTION 2": np.int8, "PARTY 2": np.int8})

In [None]:
SouthKingstown_Exp.head()

Unnamed: 0,STREET NUMBER,STREET NAME,POSTAL CITY,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,ELECTION 2,PARTY 2
30,10,CARPENTER DR,WAKEFIELD,31,2879,2,1941,1,1,1,1,1,1,2,2,2,1,2
31,10,CARPENTER DR,WAKEFIELD,31,2879,2,1940,1,1,1,1,1,1,2,2,2,1,2
123,50,HOPKINS LN,WAKEFIELD,31,2879,3,1990,0,1,0,0,0,0,0,0,0,0,0
233,14,FAGAN CT,WAKEFIELD,31,2879,3,1981,0,1,0,0,1,0,0,0,0,0,0
241,14,FAGAN CT,WAKEFIELD,31,2879,2,1984,0,1,0,0,0,0,0,0,0,0,0


In [None]:
SouthKingstown_Exp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23271 entries, 30 to 816289
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   STREET NUMBER  23271 non-null  int64 
 1   STREET NAME    23271 non-null  object
 2   POSTAL CITY    23271 non-null  object
 3   CITY           23271 non-null  int8  
 4   ZIP CODE       23271 non-null  int16 
 5   CURRENT PARTY  23271 non-null  int8  
 6   YEAR OF BIRTH  23271 non-null  int16 
 7   ELECTION 3     23271 non-null  int8  
 8   ELECTION 4     23271 non-null  int8  
 9   ELECTION 5     23271 non-null  int8  
 10  ELECTION 6     23271 non-null  int8  
 11  ELECTION 7     23271 non-null  int8  
 12  ELECTION 8     23271 non-null  int8  
 13  PARTY 5        23271 non-null  int8  
 14  PARTY 6        23271 non-null  int8  
 15  PARTY 8        23271 non-null  int8  
 16  ELECTION 2     23271 non-null  int8  
 17  PARTY 2        23271 non-null  int8  
dtypes: int16(2), int64(1), i

### Exporting data

In [None]:
rename_cols = {"ELECTION 2": "TGT STATEWIDE PRIMARY", "PARTY 2": "TGT PARTY AFFILIATION"}

Statewide.rename(columns=rename_cols, inplace=True)
SouthKingstown.rename(columns=rename_cols, inplace=True)
SouthKingstown_Exp.rename(columns=rename_cols, inplace=True)

In [None]:
Statewide.head()

Unnamed: 0,CITY,ZIP CODE,CURRENT PARTY,YEAR OF BIRTH,ELECTION 3,ELECTION 4,ELECTION 5,ELECTION 6,ELECTION 7,ELECTION 8,PARTY 5,PARTY 6,PARTY 8,TGT STATEWIDE PRIMARY,TGT PARTY AFFILIATION
0,6,2921,1,1962,1,1,1,0,1,1,3,0,2,0,0
1,6,2920,2,1984,0,1,0,0,0,0,0,0,0,0,0
2,32,2878,3,1968,1,1,0,1,1,1,0,3,3,1,3
3,32,2878,3,1962,1,1,0,1,1,0,0,3,0,1,3
4,34,2889,3,1996,0,1,0,0,0,0,0,0,0,0,0


#### Exporting to CSV

In [None]:
Statewide.to_csv("data/Statewide.csv", index = False)

In [None]:
SouthKingstown.to_csv("data/SouthKingstown.csv", index = False)

In [None]:
SouthKingstown_Exp.to_csv("data/SouthKingstown_Exp.csv", index = False)

#### Exporting to a pickle file

In [None]:
Statewide.to_pickle("data/Statewide.pkl")

In [None]:
SouthKingstown.to_pickle("data/SouthKingstown.pkl")

In [None]:
SouthKingstown_Exp.to_pickle("data/SouthKingstown_Exp.pkl")