#  Data Analysis: Moral Foundations Theory
---
<img src="https://c1.staticflickr.com/7/6240/6261650491_0cd6c701bb_b.jpg" style="width: 500px; height: 275px;" />

### Professor Amy Tick

Moral Foundations Theory (MFT) hypothesizes that people's sensitivity to the foundations is different based on their political ideology: liberals are more sensitive to care and fairness, while conservatives are equally sensitive to all five. Here, we'll explore whether we can find evidence for MFT in the campaign speeches of 2016 United States presidential candidates. For our main analysis, we'll go through the data science process we learned in Day 1 to recreate a simplified version of the analysis done by Jesse Graham, Jonathan Haidt, and Brian A. Nosek in their 2009 paper ["Liberals and Conservatives Rely on Different Sets of Moral Foundations"](http://projectimplicit.net/nosek/papers/GHN2009.pdf). In part 3, we'll look at other NLP techniques that might be useful in applying this theory.

*Estimated Time: 50 minutes*

---

### Topics Covered
- Plotting data with MatPlotLib
- Interpreting graphs
- Textual analysis methods

### Table of Contents


1 - [Data Set and Test Statistic](#section 1)<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.1 - [2016 Campaign Speeches](#subsection 1)<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.2 - [Moral Foundations Dictionary](#subsection 2) <br>

2 - [Exploratory Data Analysis](#section 2)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.1 - [Hypothesis](#subsection 3)<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.2 - [Democrats](#subsection 4)<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.3 - [Republicans](#subsection 5) <br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.4 - [Democrats vs Republicans](#subsection 6) <br>

3 - [Further explorations](#section 3)<br>




**Dependencies:**

In [3]:
from datascience import *
import numpy as np
import matplotlib as plt
%matplotlib inline
import json

---
## Part 1: Data Set and Test Statistic  <a id='section 1'></a>

As data scientists starting a new analysis, we know we need to start with two things: some data and a question. In Part 1, we'll get familiar with our data set and determine a way to answer our question using the data.


### 2016 Campaign Speeches <a id='subsection 1'></a>

Our data set is the texts of speeches from the 2016 US presidential campaign. Run the cell below to load the data.

In [72]:
# load the data from csv files into a table
campaign_data = Table()
import os
for file in os.listdir(path='csv'):
    if len(campaign_data) == 0:
        campaign_data = Table().read_table('csv/' + file)
    else:
        campaign_data.append(Table().read_table('csv/' + file))

campaign_data

Candidate,Party,Type,Date,Title,Speech
Jeb Bush,R,c,"June 15, 2015",b'Remarks Announcing Candidacy for President at Miami Da ...,b'Thank you all very much. I always feel welcome at Miam ...
Jeb Bush,R,c,"July 30, 2015",b'Remarks to the National Urban League Conference in For ...,"b""Thank you all very much. I appreciate your hospitality ..."
Jeb Bush,R,c,"August 11, 2015",b'Remarks at the Ronald Reagan Presidential Library in S ...,"b""Thank you very much. It's good to be with all of you, ..."
Jeb Bush,R,c,"September 9, 2015","b'Remarks in Garner, North Carolina'",b'Thank you very much. I appreciate your hospitality tod ...
Jeb Bush,R,c,"November 2, 2015","b'Remarks in Tampa, Florida'",b'Thank you. It\'s great to be in Tampa with so many fri ...
Jeb Bush,R,c,"November 18, 2015","b'Remarks at The Citadel in Charleston, South Carolina'",b'Thank you very much.I appreciate the hospitality of th ...
Jeb Bush,R,p,"June 14, 2015",b'Press Release - The Best Conservative Governor in America',"b'""Florida is a place where conservative principles have ..."
Jeb Bush,R,p,"June 14, 2015",b'Press Release - Jeb: Today and Tomorrow',"b'As Jeb has traveled the country, listening to voters h ..."
Jeb Bush,R,p,"June 15, 2015",b'Press Release - All In For Jeb',"b'Tony Alonso (AKA ""Asik"") is not a political partisan. ..."
Jeb Bush,R,p,"June 15, 2015",b'Press Release - Greatest Century',"b""As Governor of Florida, Jeb Bush made a difference and ..."


Take a moment to look at this table. What information does it contain? What are the different columns? What does each row represent? How large is this table altogether? Hint: there are three different Types- 'c' for campaign speech, 'p' for press release, and 's' for statement- and two different Parties- 'R' for Republican and 'D' for Democrat.

In [None]:
# answer

In Day 1, we learned that the first step in the data science process is data cleaning. While this data set is mostly cleaned (how can we tell?), it does contain some information we don't care about: the press releases and statements. Run the next cell to create a table with only Type 'c' documents.

In [73]:
# create a new table containing only campaign speeches
speeches = campaign_data.where('Type', 'c')
speeches

Candidate,Party,Type,Date,Title,Speech
Jeb Bush,R,c,"June 15, 2015",b'Remarks Announcing Candidacy for President at Miami Da ...,b'Thank you all very much. I always feel welcome at Miam ...
Jeb Bush,R,c,"July 30, 2015",b'Remarks to the National Urban League Conference in For ...,"b""Thank you all very much. I appreciate your hospitality ..."
Jeb Bush,R,c,"August 11, 2015",b'Remarks at the Ronald Reagan Presidential Library in S ...,"b""Thank you very much. It's good to be with all of you, ..."
Jeb Bush,R,c,"September 9, 2015","b'Remarks in Garner, North Carolina'",b'Thank you very much. I appreciate your hospitality tod ...
Jeb Bush,R,c,"November 2, 2015","b'Remarks in Tampa, Florida'",b'Thank you. It\'s great to be in Tampa with so many fri ...
Jeb Bush,R,c,"November 18, 2015","b'Remarks at The Citadel in Charleston, South Carolina'",b'Thank you very much.I appreciate the hospitality of th ...
Ben Carson,R,c,"May 4, 2015","b'Remarks Announcing Candidacy for President in Detroit, ...",b'Thank you. We have limited time. Thank you. Thank you ...
Ben Carson,R,c,"July 19, 2016",b'Remarks to the Republican National Convention in Cleve ...,"b'Thank you. Thank you. Thank you, everyone. Thank you. ..."
Lincoln Chafee,D,c,"June 3, 2015",b'Remarks Announcing Candidacy for President at George M ...,"b'Thank you, Bob. Thank you, Bob and Mark, very much. A ..."
Lincoln Chafee,D,c,"July 17, 2015",b'Remarks at the Iowa Democrats Hall of Fame Dinner in C ...,b'Congratulations to the Hall of Fame Inductees.Thank yo ...


### Moral Foundations Dictionary <a id='subsection 2'></a>

In ["Liberals and Conservatives Rely on Different Sets of Moral Foundations"](http://projectimplicit.net/nosek/papers/GHN2009.pdf), one of the methods Graham, Haidt, and Nosek use to measure people's use of Moral Foundations Theory is to count how often they use words related to each foundation. This will be our test statistic for today. To calculate it, we'll need a dictionary of words related to each moral foundation. Run the cell below to load the dictionary you created in the first module.

In [5]:
# Run this cell to load the dictionary into a variable
with open('foundations_dict.json') as json_data:
    mft_dict = json.load(json_data)

# Show the dictionary entry for the 'care' foundation
mft_dict['care']

['care',
 'attent',
 'aid',
 'tend',
 'caution',
 'precaut',
 'forethought',
 'concern',
 'charg',
 'tutelag',
 'guardianship',
 'mainten',
 'upkeep',
 'give car',
 'manag',
 'handl',
 'worri']

Graham, Haidt, and Nosek also used a dictionary to calculate their test statistic, but their dictionary was created in a very different way. From the paper:
> Dictionary development had an expansive phase and a contractive phase, all occurring before reading the sermons. In the expansive phase Jesse Graham and five research assistants generated as many associations, synonyms, and antonyms for the base foundation words as possible, using thesauruses and conver- sations with colleagues. This included full words and word stems (for instance, nation  covers national, nationalistic, etc.). The resulting lists included foundation-supporting words (e.g., kind- ness, equality, patriot, obey, wholesome), as well as foundation- violating words (e.g., hurt, prejudice, betray, disrespect, disgust- ing). In the contractive phase, Jesse Graham and Jonathan Haidt deleted words that seemed too distantly related to the five foun- dations and also words whose primary meanings were not moral (e.g., just more often means only than fair).

How is their process similar to how you made your dictionary? How is it different? What are some pros and cons to each method?

In [None]:
# answer

---
## Part 2: Exploratory Data Analysis <a id='section 2'></a>

Now that we have our speech data and our dictionary, we can start our analysis. First, we'll formally state our hypothesis. Then, to visualize the data we'll perform 3 steps:
1. Count the occurances of words from our dictionary in each speech
2. Calculate how often words from each category are used by each political party
3. Plot the proportions on a bar graph

### Hypothesis <a id='subsection 3'></a>

An important part of data science is understanding the question you're trying to answer and formulating an appropriate hypothesis. The hypothesis must be testable given your data, and you must be able to say what kinds of results would support or refute your hypothesis _even before you've done any analysis_. 

Today, our question asks whether the word use of 2016 presidential candidates aligns with Moral Foundations Theory.

Think about what you know about Moral Foundations Theory. If this data is consistent with the theory, what should our analysis show for Republican candidates? What about for Democratic candidates? Try sketching a possible graph for each political party, assuming that candidates' speech aligns with the theory.

In [None]:
# answer

### Democrats <a id='subsection 4'></a>

### Republicans <a id='subsection 5'></a>

### Democrats vs Republicans <a id='subsection 6'></a>

---
## Part 3: Further explorations <a id='section 3'></a>

Intro to section 3 here.

In [8]:
# CODE

---

## Bibliography

Election documents scraped from http://www.presidency.ucsb.edu/2016_election.php

---
Notebook developed by: Keeley Takimoto, Sean Seungwoo Son, Sujude Dalieh

Data Science Modules: http://data.berkeley.edu/education/modules
