![Banner](images/dsep-banner.png)
# **Welcome to the Applied Methods for Social Sciences in Python Workshop**

By John Park, Simran, Barry, etc

In Collaboration with the Division of Data Science's [Data Peer Consulting](https://data.berkeley.edu/ds-peer-consulting)

### BEFORE WE BEGIN PLEASE COMPLETE THIS SURVEY
[TODO](https://forms.gle/rEtcPP1VRRJTsjxQ9)  

## John Park
![John](https://data.berkeley.edu/sites/default/files/styles/width_400/public/john_pic_-_john_park_0.jpg?itok=-kg9pNQg&timestamp=1599267808)

Quick Facts About Me:

    🐻 Senior at Cal
    🎒 Studying Computer Science and Economics
    🏢 Interned and returning full-time as an SDE@Amazon
    📊 Joined the Data Peer Consulting team in Fall 2018

How to Reach Me:

    📮 Email: jhp@berkeley.edu

## Simran Sachdev
![Simran](https://data.berkeley.edu/sites/default/files/styles/width_400/public/headshot_-_simran_sachdev.jpg?itok=PgdDBm5M&timestamp=1599267430)

Quick Facts About Me:

    🐻 Senior at Cal
    🎒 Studying Data Science and Applied Math
    🏢 Data Scientist Intern at Boston Scientific
    📊 Joined the Data Peer Consulting team in Spring 2020

How to Reach Me:

    📮 Email: ssach@berkeley.edu

## Barry's bio (altho do we realllllly need it?)

## Spencer's bio (very important)

<a class="anchor" id="tof"></a>
## Table of Contents

Use anchors to set these hyperlinks to jump to certain locations in the notebook. 

- [Introduction](#1)
- [Example](#2)
- [Reference Sheets](#rs)

---

## Workshop Goals

The goal of this workshop is to cover the fundamental tools offered in Python for applied methods in the social sciences. 

    - Learn how to import and modify data in Python via Pandas
    - Apply OLS and related statistical techniques
    - Demonstrate a basic example workflow from raw data to analysis
    
Specifically, we will be working with statsmodels and linearmodels. These packages  provide a wide range of useful statistical tools for the social sciences, including but not limited to least squares, panel methods, mixed models, etc.

--- 

## Why Python?

Python is one of the most popular general-purpose computing languages, and for good reason. Python's readability, maintainability, and robust community support, especially in the IPython sphere, makes it a strong choice for data science, including for the social sciences. 

### Transitioning from R

Both statsmodels and linearmodels provides support for R-like formula expressions. You can read more about it [here](https://www.statsmodels.org/stable/examples/notebooks/generated/formulas.html).

## Introduction 

Lets go through a basic example from the [statsmodels documentation](https://www.statsmodels.org/stable/gettingstarted.html). The dataset is a collection of historical data used in support of Andre-Michel Guerry’s 1833 Essay on the Moral Statistics of France.

In [6]:
### standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

In [11]:
# Grab sample data from Statsmodels dataset repo

df = sm.datasets.get_rdataset("Guerry", "HistData").data

In [12]:
### Note that the data is already in a Pandas DataFrame

type(df)

pandas.core.frame.DataFrame

In [13]:
df

Unnamed: 0,dept,Region,Department,Crime_pers,Crime_prop,Literacy,Donations,Infants,Suicides,MainCity,...,Crime_parents,Infanticide,Donation_clergy,Lottery,Desertion,Instruction,Prostitutes,Distance,Area,Pop1831
0,1,E,Ain,28870,15890,37,5098,33120,35039,2:Med,...,71,60,69,41,55,46,13,218.372,5762,346.03
1,2,N,Aisne,26226,5521,51,8901,14572,12831,2:Med,...,4,82,36,38,82,24,327,65.945,7369,513.00
2,3,C,Allier,26747,7925,13,10973,17044,114121,2:Med,...,46,42,76,66,16,85,34,161.927,7340,298.26
3,4,E,Basses-Alpes,12935,7289,46,2733,23018,14238,1:Sm,...,70,12,37,80,32,29,2,351.399,6925,155.90
4,5,E,Hautes-Alpes,17488,8174,69,6962,23076,16171,1:Sm,...,22,23,64,79,35,7,1,320.280,5549,129.10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,86,W,Vienne,15010,4710,25,8922,35224,21851,2:Med,...,20,1,44,40,38,65,18,170.523,6990,282.73
82,87,C,Haute-Vienne,16256,6402,13,13817,19940,33497,2:Med,...,68,6,78,55,11,84,7,198.874,5520,285.13
83,88,E,Vosges,18835,9044,62,4040,14978,33029,2:Med,...,58,34,5,14,85,11,43,174.477,5874,397.99
84,89,C,Yonne,18006,6516,47,4276,16616,12789,2:Med,...,32,22,35,51,66,27,272,81.797,7427,352.49


Lets explore and see if literacy rates are associated with per capita entries in the Royal Lottery. Our model will need to control for wealth in each region, as well as utilize dummy variables to control for unobserved heterogeneity due to regional effects. 

We will use OLS to estimate this model, as described below:

$$\hat{\beta} = (X'X)^{-1}X'y$$

Where $y$ is a $N * 1$ column of per-capita lottery wagers (Lottery), while $X$ is an $N * 7$ matrix with a constant, literacy, wealth, and 4 regional dummy variables.

To actually implement this model, there are a couple options, especially with regards to how we could encode the dummy variables. 

## TODO 

Finish basic intro + modified model excercise 

##         Example: Contingent valuation

The file <code>LoomistForestCVDataset.csv</code> contains a subset of the dataset used by Loomis et al. (1996). The dataset consists of five columns. The first column lists a bid amount randomly proposed to a respondent to assess their willingness to pay for a fire management program for old growth Pacific Northwest forests. The second column lists the number of respondents saying yes to the project under version one of the survey. The third column lists the number of respondents saying no to the project under version one of the survey. Columns four and five give yes and no responses for the same bid amount but under a different version of the survey. 

In [4]:
loomisCVD = pd.read_csv("LoomisCVData.csv")
loomisCVD

Unnamed: 0,BidAmount,NumYes_v1,NumNo_v1,NumYes_v2,NumNo_v2
0,2,3,2,8,1
1,5,6,2,7,1
2,8,4,1,5,4
3,10,5,1,8,2
4,12,6,4,7,1
5,15,10,0,4,1
6,20,6,1,3,2
7,25,0,1,2,1
8,30,3,1,5,0
9,35,3,2,6,0


1. Lets begin by writing a short Python script to transform the given datafile into one respondent per row form. The first column should equal $D = 1$ if the respondent answered yes, $D = 0$ if they answered no. The second column should give the bid amount, $A$. The third column is $X = 1$ if the response was solicited from version one of the survey and $X = 0$ if it was from version two. 

In [13]:
def transform_data(df):
    D_ = []
    A_ = []
    X_ = []
    
    def add_trow(d, a, x):
        D_.append(d)
        A_.append(a)
        X_.append(x)

    for index, row in df.iterrows():
        b = row['BidAmount']
        for i in range(row['NumYes_v1']):
            add_trow(1,b,1)
        for i in range(row['NumNo_v1']):
            add_trow(0,b,1)
        for i in range(row['NumYes_v2']):
            add_trow(1,b,0)
        for i in range(row['NumNo_v2']):
            add_trow(0,b,0)
            
    return pd.DataFrame(columns=['D', 'A', 'X'], data=np.array([D_, A_, X_]).T)

In [15]:
cvd_transformed = transform_data(loomisCVD)

### transformed data set
print(cvd_transformed)

     D    A  X
0    1    2  1
1    1    2  1
2    1    2  1
3    0    2  1
4    0    2  1
..  ..  ... ..
255  0  300  0
256  0  300  0
257  0  300  0
258  0  300  0
259  0  300  0

[260 rows x 3 columns]


2. Let's assume that willingness-to-pay for the fire management program for a randomly sampled person is: $$ W = \alpha + X'\beta + V$$ where $V | X, A ∼ \it  \mathcal{N}(0, \sigma^2)$ captures heterogeneity in willingness-to-pay across individuals. 

(Is this assumption robust? See errata)

3. Assume that individuals respond yes to the proposal if their willingness-to-pay exceeds the bid they were offered. Then:

$$Pr (D = 1| X, A) = \Phi\left(\frac \alpha \sigma - \frac 1 \sigma A + X' \frac \beta \sigma \right)$$ with $\Phi(*)$ the CDF of the standard normal distribution.

(See errata for proof)

4. Use probit regression analysis to construct estimates of the composite parameters $\frac \alpha \sigma$, $- \frac 1 \sigma$, $\frac \beta \sigma$. From these estimates recover estimates of the fundamental preference parameters $\alpha$, $\beta$ and $\sigma$. Describe and implement a bootstrap procedure to construct standard error estimates for these parameters. Summarize you results in a table and provide a brief discussion.

In [2]:
#define endog, explanatory variables
X = sm.add_constant(cvd_transformed[['A', 'X']])
y = cvd_transformed['D']

#probit model
probit_model = sm.Probit(y, X)
results = probit_model.fit()
param_estimates = results.params
print(param_estimates)

NameError: name 'sm' is not defined

In [17]:
### Bootstrap Procedure: Probit on const, A, X

B = 1000
N = 260

def param_bootstrap(df):
    const = []
    A_ = []
    X_ = []
    for i in range(B):
        
        sample_df = df.sample(n=N, replace=True)
        
        # probit procedure from earlier
        X = sm.add_constant(sample_df[['A', 'X']])
        y = sample_df['D']
        
        probit_model = sm.Probit(y, X)
        results = probit_model.fit()
        
        const.append(results.params['const'])
        A_.append(results.params['A'])
        X_.append(results.params['X'])
    return pd.DataFrame(columns=['alpha', 'sigma', 'beta'], data=np.array([const, A_, X_]).T)

In [18]:
bootstrap_param_results = param_bootstrap(cvd_transformed)

Optimization terminated successfully.
         Current function value: 0.577221
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.569174
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.612461
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.564103
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.621720
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.566042
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.578533
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.568896
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.615288
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.596924
  

Optimization terminated successfully.
         Current function value: 0.539062
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.612357
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.546062
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.565004
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.564359
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.605495
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.583850
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.485107
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.605986
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.536157
  

Optimization terminated successfully.
         Current function value: 0.582817
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.604888
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.549689
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.526411
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.594457
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.531482
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.550122
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.592278
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.568665
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.573104
  

Optimization terminated successfully.
         Current function value: 0.559426
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.574257
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.552850
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.564311
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.577779
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.570720
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.573368
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.580601
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.589175
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.583742
  

         Iterations 6
Optimization terminated successfully.
         Current function value: 0.618431
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.584598
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.564211
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.575374
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.592202
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.568234
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.521141
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.542374
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.535576
         Iterations 6
Optimization terminated successfully.
         Current funct

Optimization terminated successfully.
         Current function value: 0.598703
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.496763
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.577131
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.562582
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.571530
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.588278
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.577362
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.579620
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.514476
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.580640
  

Optimization terminated successfully.
         Current function value: 0.602892
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.591330
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.579594
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.597503
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.571458
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.598309
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.586319
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.595630
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.574968
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.566141
  

         Current function value: 0.559777
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.633761
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.549877
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.566374
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.589248
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.612666
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.543575
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.570772
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.623538
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.600306
         Iterations 6
Optimization termi

Optimization terminated successfully.
         Current function value: 0.593562
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.557387
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.592012
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.542044
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.570695
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.556848
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.580977
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.533119
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.549827
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.584249
  

Optimization terminated successfully.
         Current function value: 0.602975
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.594922
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.530629
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.601658
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.576671
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.613701
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.549626
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.619539
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.579625
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.571247
  

Optimization terminated successfully.
         Current function value: 0.559276
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.566947
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.564764
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.594589
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.573883
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.528734
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.576348
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.508278
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.610416
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.541248
  

Optimization terminated successfully.
         Current function value: 0.569486
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.572435
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.527484
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.532846
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.576109
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.551565
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.593689
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.545136
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.565840
         Iterations 6


In [19]:
#estimated parameters
estimate_sigma = -(1 / param_estimates['A'])
estimate_beta = estimate_sigma * param_estimates['X']
estimate_alpha = estimate_sigma * param_estimates['const']

#estimated parameters from boot strap
sigma_bs = -(1/bootstrap_param_results['sigma'])
beta_bs = bootstrap_param_results['beta'] * sigma_bs
alpha_bs = sigma_bs * bootstrap_param_results['alpha']

bs_params=pd.DataFrame(columns=['alpha', 'sigma', 'beta'], data=np.array([alpha_bs, sigma_bs, beta_bs]).T)

print(estimate_alpha,estimate_sigma, estimate_beta)
print(bs_params.describe())

89.93127075029903 104.96104077116776 -16.864657183530923
             alpha        sigma         beta
count  1000.000000  1000.000000  1000.000000
mean     89.759382   104.632860   -16.721134
std      13.465770    16.759658    18.233961
min      55.268984    59.705079   -74.850496
25%      80.600313    92.855216   -29.356307
50%      89.257741   103.921370   -16.895612
75%      98.753326   114.438039    -5.370041
max     137.295144   166.904707    55.173546


Our recovered parameters from our probit regression onto the dataset is:

$$\hat{\alpha} = 89.93$$
$$\hat{\sigma} = 104.96$$
$$\hat{\beta} = -16.86$$

Our bootstrap recovered params have the characteristics as shown in the table above, with recovered standard errors as follows:

$$SE(\hat{\alpha})  = 13.22 $$
$$SE(\hat{\sigma}) = 17.44 $$
$$SE(\hat{\beta}) = 17.80 $$

### Discussion:

Our results for the parameters were as expected; the sign of the coefficient on beta is interesting however. Loomis, in the literature, found no statistical difference between the two surveys. Indeed, although our computed coefficient on beta is less than zero, it falls well within a $0.05$-significance around zero (which indicates that we also found no statistical difference between the surveys, as our confidence band would be $[-35.6, 35.6]$, which our computed beta falls within). Thus our results fall in line with what was expected both from the literature and intuitively (such as higher bid prices indicating lower probability of support). 

You are part of an environmental conservation group that is campaigning for a ballot initiative that would fund a fire management program like the one studied by Loomis et al. (1996). The type of initiative you wrote needs to pass with a majority of 67 percent. Your organization wrote the ballot initiative with a proposed tax of $\hat{A}^{∗} −0.05$ per person, with $\hat{A}^{∗}$ equal to $$ \hat{A}^{∗} = \hat{\alpha} − \hat{\sigma}\Phi^{-1}(0.67)$$ Here $\hat{\alpha}$ and $\hat{\sigma}$ correspond to your point estimates from question 4 above. Explain the
reasoning behind choosing the proposed tax in this way? Construct an estimate of this
tax (as well as a standard error using the bootstrap).

In [20]:
from scipy.stats import norm

tax_estimate = estimate_alpha - estimate_sigma*norm.ppf(0.67) - 0.05
print(tax_estimate)

43.70752703229726


In [None]:
### Bootstrap Procedure: Tax = A^* - 0.05*

B = 10000
N = 260
def a_est_bootstrap(df):
    
    a_hats = []
    
    for i in range(B):
        
        #sample with replacement
        sample_df = df.sample(n=N, replace=True)
        
        X = sm.add_constant(sample_df[['A', 'X']])
        y = sample_df['D']
        
        probit_model = sm.Probit(y, X)
        results = probit_model.fit()
        
        sigma_hat = -(1 / results.params['A'])
        alpha_hat = sigma_hat * results.params['const']
        
        a_hat_estimate = alpha_hat - (sigma_hat*norm.ppf(0.67)) - 0.05
        a_hats.append(a_hat_estimate)
        
    return pd.DataFrame(columns=['a_hat'], data=np.array([a_hats]).T)

In [None]:
a_hat_bs = a_est_bootstrap(cvd_transformed)
a_hat_bs

In [82]:
print(a_hat_estimate)
print(a_hat_bs.describe())

43.70752703229727
              a_hat
count  10000.000000
mean      43.499230
std       12.815674
min      -17.709199
25%       35.109533
50%       43.667682
75%       52.098677
max       97.761347


We construct the tax in this fashion in order to target the WTP of at least $67\%$ of the population (i.e. to find the tax-level that is less than or equal to the WTP of at least $67\%$ of the population). This is ensured by the $\hat{\sigma}\Phi^{-1}(0.67)$ term, as it basically moves $\hat{\alpha}$ (the mean willingness to pay) by the amount needed to capture $67\%$ of the sampled population willingness to pay. 

We find our estimate of the tax to be: $43.70$ and the standard error via bootstrap to be $12.81$ as shown in the result above.

<a class="anchor" id="rs"></a>
### Reference Sheets!
[Back to Table of Contents](#tof)

Links updated as of 1/1/11.

- [NumPy Cheat Sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)  
- [Pandas Cheat Sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)  
- [Matplotlib Cheat Sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf)  
- [Seaborn Cheat Sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf)

Documentation

- [Statsmodels Docs](https://www.statsmodels.org/stable/index.html)
- [Linearmodels Docs](https://bashtage.github.io/linearmodels/)


### More Resources
1. Data Peer Consultants - That's us! We help undergrads and graduate students with projects, research, and more! Come to our drop-in hours.  
https://data.berkeley.edu/ds-peer-consulting

2. Towards Data Science - Website full of good blogs and helpful introductions to data science stuff.  
https://towardsdatascience.com/

3. Stack Overflow // Google - A great data scientist is adept at using StackOverflow and Google to find the answers to their bugs. More likely than not, someone out there has ran into the exact same problem as you, so might as well use their solutions as a resource!

## Thanks for Coming! PLEASE COMPLETE THIS POST-WORKSHOP SURVEY!  
[TODO](https://forms.gle/gfuYbKTFEscnkrMY8)

### Errata

#### 2. 
The survey design ensures independence of $V$ and $X$ and $A$ as $X$ and $A$ are randomly assigned to the population without any selection critereon. More specifically, the general random selection of surveyees coupled with the further random assignment of $X$ (via survey version) ensures independence by construction. 

#### 3.

We begin our analysis by examining the point of indifference between $W$ and $A$:

$$A = W$$

Standardize

$$\frac A \sigma = \frac W \sigma$$

Expand $W$

$$\frac{A}{\sigma} = \frac{\alpha}{\sigma} + X^{'} \frac{\beta}{\sigma} + \frac{V}{\sigma}$$

Normalize to mean zero:

$$\frac{\alpha}{\sigma} - \frac{1}{\sigma} A + X^{'} \frac{\beta}{\sigma} + \frac{V}{\sigma}$$

Let the above expression be $Y$. We realize that:

$$Pr(D=1|X,A) = Pr(Y > 0 | X, A)$$

Substituting:

$$Pr(D=1|X,A) = Pr\left(\frac{\alpha}{\sigma} - \frac{1}{\sigma} A + X^{'} \frac{\beta}{\sigma} + \frac{V}{\sigma} > 0 | X, A \right)$$


$$ = Pr\left(\frac{\alpha}{\sigma} - \frac{1}{\sigma} A + X^{'} \frac{\beta}{\sigma} + \frac{V}{\sigma}> 0 | X, A \right)$$


$$ = Pr\left(\frac{V}{\sigma} > -\frac{\alpha}{\sigma} + \frac{1}{\sigma} A - X^{'} \frac{\beta}{\sigma}| X, A \right)$$

By symmetry of the normal distribution and as $V | X, A ∼ \it  \mathcal{N}(0, \sigma^2)$:

$$ = Pr\left(\frac{V}{\sigma} < \frac{\alpha}{\sigma} - \frac{1}{\sigma} A + X^{'} \frac{\beta}{\sigma}| X, A \right)$$

Thus:

$$ Pr(D=1|X,A) = \Phi\left(\frac{\alpha}{\sigma} - \frac{1}{\sigma} A + X^{'} \frac{\beta}{\sigma}\right)$$