# Part 1: causality


## Preparation

*This is Exercise 2.8.3 in Imai (2016).*

One longstanding debate in the study of international relations
concerns the question of whether individual political leaders can make a difference.  Some emphasize that leaders with different ideologies and personalities can significantly affect the course of a nation. Others argue that political leaders are severely constrained by historical and institutional forces.  Did individuals like Hitler, Mao, Roosevelt, and Churchill make a big difference?  The difficulty of empirically testing these arguments stems from the fact that the change of leadership is not random and there are many confounding factors to be adjusted for.

In this exercise, we consider a *natural experiment* in which the
success or failure of assassination attempts is assumed to be
essentially random.

This exercise is based on:
Jones, Benjamin F, and Benjamin A Olken. 2009. “[Hit or Miss?
 The Effect of Assassinations on Institutions and
 War.](http://dx.doi.org/10.1257/mac.1.2.55)”
 *American Economic Journal: Macroeconomics* 1(2): 55–87.

You load the data as follows


```python
gh_raw = "https://raw.githubusercontent.com/"
user = "kosukeimai/"
repo = 'qss/'
branch = "master/"
filepath = "CAUSALITY/leaders.csv"

url = gh_raw + user + repo + branch + filepath
df = pd.read_csv(url)```

Each observation of the CSV data set
`leaders.csv` contains information about an assassination
attempt.  The variables are:

- `country`: The name of the country
- `year`: Year of assassination
- `leadername`: Name of leader who was targeted
- `age`: Age of the targeted leader
- `politybefore`: Average polity score during the 3 year period prior to the attempt
- `polityafter`: Average polity score during the 3 year period after the attempt
- `civilwarbefore`: 1 if country is in civil war during the 3 year period prior to the attempt, or 0
- `civilwarafter`: 1 if country is in civil war during the 3 year period after the attempt, or 0
- `interwarbefore`: 1 if country is in international war during the 3 year period prior to the attempt, or 0
- `interwarafter`: 1 if country is in international war during the 3 year period after
the attempt, or 0
- `result`: Result of the assassination attempt, one of 10 categories described
below

The `polity` variable represents the so-called *polity score*
from the Polity Project.  The Polity Project systematically documents
and quantifies the regime types of all countries in the world from
1800.  The polity score is a 21-point scale ranging from -10
(hereditary monarchy) to 10 (consolidated democracy).  

The `result` variable is a 10 category factor variable describing
the result of each assassination attempt.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.formula.api as smf

sns.set()

%matplotlib inline

In [2]:

# Read data
gh_raw = "https://raw.githubusercontent.com/"
user = "kosukeimai/"
repo = 'qss/'
branch = "master/"
file = "CAUSALITY/leaders.csv"

url = gh_raw + user + repo + branch + file
df = pd.read_csv(url)

## Question 1

- How many assassination attempts are recorded in the data?  

- How many countries experience at least one leader assassination attempt?
(recall that the `nunique` method for Series returns its number of unique values)

- What is the average number of assination attempts (per year) among these countries?

## Question 2
Create a new binary variable named `success` that is equal
to 1 if a leader dies from the attack and to 0 if the leader
survives.  Store this new variable as part of the original data
frame.  

What is the overall success rate of leader assassination?

Does the result speak to the validity of the assumption that the
success of assassination attempts is randomly determined? Are the outcomes balanced?

## Question 3


Investigate whether the average polity score over 3 years prior
  to an assassination attempt differs on average between successful
  and failed attempts.  Also, examine whether there is any difference
  in the age of targeted leaders between successful and failed
  attempts.  Briefly interpret the results in light of the validity of
  the aforementioned assumption.


## Question 4

Repeat the same analysis as in the previous question, but this
  time using the country's experience of civil and international war.
  Create a new binary variable in the data frame called
  `warbefore`.  Code the variable such that it is equal to 1 if
  a country is in either civil or international war during the 3 years
  prior to an assassination attempt.  Provide a brief interpretation
  of the result.



## Question 5

- Does successful leader assassination cause democratization?
- Does successful leader assassination lead countries to war?

# Part 2: regression in supervised machine learning

In this second part of the exercise we will implement a machine learning model for predicting tips. We will use the same data as in Exercise 1.

Load the tips data from Seaborn.

## Question 1

Specify y and X for the supervised machine learning problem. As part of this convert categorical/discrete data to dummy variables. Note that Pandas *get_dummies* with the keyword argument `drop_first=True` may be useful.


## Question 2

Using 5-fold cross-validation compare the MAE and RMSE of OLS and ridge regression. For ridge regression use `alpha=100` and `random_state=123`.

## Question 3

Explain in words why the ridge regression performs better than the OLS regression.