# Election Ads Project Writeup
Samuel Robbins

### Abstract

The ultimate goal of this project is to determine the most competitive and cost effective congressional districts to focus advertising spending on in upcoming elections. We hypothesize that a robust analysis of past election data, combined with information on the average ad spending in different media markets will allow us to make informed and impactful decisions on where to spend a limited amount of money in the next election cycle. I suggest an initial exploratory data analysis to identify trend in election results and determine key swing districts to focus on.

From an initial analysis, I would recommend focusing ad spending in two categories: (i) NY-1, NY-24, CA-21, IA-1, and IA-2 to attempt to flip seats from R to D, and (ii) PA-8, NY-19, IL-17, IA-3, and AZ-1 to attempt to hold seats with a slight D advantage. Additional data on what makes an effective ad can be determined through further regression analysis of ads in previous election cycles. More precise data for smaller geographic boundaries (e.g. counties, precincts) should also be used to more effectively target potential voter groups. 

### Design

#### Client Opportunity
The proposed client for this project is a progressive political ad company, such as [Putnam Partners](https://putnampartners.net/political-campaigns), that is looking for insights into where to direct their political ad spending. They want to reach the highest number of voters for the most economical price, while staying on budget and attempting to win and hold the most competitive seats.

#### Impact Hypothesis
The desired impact of this project is to inform decisions about where to direct ad spending in competitive races to reach potential voters by weighing factors like media market costs, average partisan swing, and number of voters. The **impact hypothesis** then, is that swing states/seats outside of major media markets (e.g. New York, LA, Miami) that have a higher average Democratic Party performance are the best places to direct political ad spending for greater impact. These markets will be cheaper on average and more likely to reach a winning margin of potential voters.

#### Solution Path
The proposed solution path is to take past election data and determine how the partisan swing of a district has evolved in recent elections. I will narrow the dataset down to the districts that are either trending toward the Democratic party or have been within 5 pts in the last two elections. This will allow me to focus only on the districts that are most competitive. I will then merge the voting history dataset with the media costs dataset based on a districts location. This will allow me to make a map of media costs and voting history across the US and integrate the two to determine the best locations for focused ad spending. 

### Data

Two main datasets were used in this analysis:
> (1) Congressional Election data from 1976-present were obtained from [MIT election labs](https://electionlab.mit.edu/data). Only data from 2012-2020 were used because the districts are baseon on the 2010 census and consistent throughout the time frame. This data was used to determine the yearly and average partisan lean, the number of potential swing voters in each district, and winning vote margins (among other features). 

> (2) Election advertisement data was obtained from [Wesleyan Media Lab](https://mediaproject.wesleyan.edu/dataaccess/), which provided data on over 1,000,000 million ads shown during the 2018 election cycle. Each row of data represented a specific ad, shown at a specific time, on a specific network, for a specific congressional race, and for an estimated cost. The dataset also contained coded features on the ad content, but that is outside the scope of this initial data anaylsis. 

### Algorithms

#### Data Cleaning and EDA
Data cleaning and primary data analysis was conducted using Excel. Classifications based on aggregated values (see below) were created to identify swing districts in each election cycle and the margin of victory in those districts.

#### Aggregation
Election results were summarized for the 2012-2020 elections using pivot tables to determine the number and percentage of votes for each major party candidate. These aggregations were then used to determine the yearly and 10-year average of the partisan lean of each congressional district. 

Election advertisment data were summarized for each house district and media market using pivot tables. Given that media market data is proprietary to Nielsen, aggregations made across house district were used. These data were joined to the election results data to make determinations about the most cost-effective swing districts to focus on. 

### Tools

- I used IBM SPSS Statistics to access election advertising data and convert to excel format.
- I used Excel for data cleaning, aggregation, and analysis.
- I used Tableau for data visualization and creating interactive voter maps. 

### Communication

In addition to the slides and visuals presented here, Tableau dashboards for [individual election years](https://public.tableau.com/app/profile/samuel.robbins/viz/CongressionalElectionAnalysis-Yearlylook/CongressionalElectionsYearlyAnalysis?publish=yes) and [10-year trend](https://public.tableau.com/app/profile/samuel.robbins/viz/ElectionAdAnalysis-10yearlook/HouseElectionAnalysis2012-2020?publish=yes) are available, and this will be posted on my personal GitHub.

![yearly_analysis.png](attachment:yearly_analysis.png)

![10_yr_analysis.png](attachment:10_yr_analysis.png)