### Hypothesis: Having (more) daughters makes legislators more likely to vote liberally (in terms of political alignment, and in contrast to conservatively) on issues concerning women.

As a measure of a liberal voting record, we use scores assigned by the American Association of University Women (AAUW), a liberal group that concerns itself with issues of interest to women. For the 108th Congress, the AAUW selected 9 pieces of legislation in the areas of education, equality and reproductive rights. The AAUW then assigned a score to each member of Congress. The scores range from 0 to 100 and measure the percentage of times the legislator voted in favor of the position held by the AAUW.

 The dataset `legislators.dta` contains the following characteristics for a random sample of 386 members of the 108th Congress:
 
 * $ngirls$ number of daughters
 * $totchi$ total number of children
 * $age$ age
 * $female$ indicator for being female
 * $repub$ indicator for being a Republican
 * $moredef$ proportion of people in the legislator's district who are in favor of "more spending on defense" 
 * $aauw$ AAUW score
 
(For the purposes of this exercise, you can assume all members of the 108th Congress were either Democrats or Republicans and were either male or female.) 

### Estimate and report results for the following regression models:

In [27]:
library(tidyverse)
library(haven)

leg <- read_dta("legislators.dta")
leg <- mutate (leg, ngirls2 = leg$ngirls**2)

reg1 <- lm(aauw ~ female+repub+ngirls, data = leg)
reg2 <- lm(aauw ~ female+repub+ngirls+ngirls2+totchi, data = leg)
reg3 <- lm(aauw ~ female+repub+ngirls+repub*ngirls+repub*ngirls2+totchi+moredef, data = leg)

summary(reg1)  
summary(reg2)
summary(reg3)


Call:
lm(formula = aauw ~ female + repub + ngirls, data = leg)

Residuals:
    Min      1Q  Median      3Q     Max 
-86.215  -6.668  -5.976  13.439  56.024 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  86.5608     1.6251  53.266  < 2e-16 ***
female       11.4167     2.8473   4.010 7.31e-05 ***
repub       -79.5468     1.7993 -44.210  < 2e-16 ***
ngirls       -0.3460     0.7894  -0.438    0.661    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.4 on 382 degrees of freedom
Multiple R-squared:  0.8449,	Adjusted R-squared:  0.8437 
F-statistic: 693.9 on 3 and 382 DF,  p-value: < 2.2e-16



Call:
lm(formula = aauw ~ female + repub + ngirls + ngirls2 + totchi, 
    data = leg)

Residuals:
    Min      1Q  Median      3Q     Max 
-88.508  -7.606  -1.361  11.839  54.859 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  88.1609     1.9844  44.428  < 2e-16 ***
female       11.3457     2.8444   3.989 7.97e-05 ***
repub       -78.8260     1.8076 -43.609  < 2e-16 ***
ngirls        2.6152     1.7157   1.524   0.1283    
ngirls2      -0.1932     0.3217  -0.601   0.5485    
totchi       -2.0752     0.8056  -2.576   0.0104 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.28 on 380 degrees of freedom
Multiple R-squared:  0.8478,	Adjusted R-squared:  0.8458 
F-statistic: 423.5 on 5 and 380 DF,  p-value: < 2.2e-16



Call:
lm(formula = aauw ~ female + repub + ngirls + repub * ngirls + 
    repub * ngirls2 + totchi + moredef, data = leg)

Residuals:
    Min      1Q  Median      3Q     Max 
-85.436  -7.964  -1.367  11.292  54.591 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    95.5997     3.6777  25.995  < 2e-16 ***
female         11.6079     2.8334   4.097 5.13e-05 ***
repub         -79.4364     3.0424 -26.110  < 2e-16 ***
ngirls          0.4452     3.1682   0.141   0.8883    
ngirls2         0.5286     0.8568   0.617   0.5376    
totchi         -2.0364     0.8066  -2.525   0.0120 *  
moredef        -0.3166     0.1247  -2.540   0.0115 *  
repub:ngirls    2.1281     3.6217   0.588   0.5571    
repub:ngirls2  -0.7477     0.9302  -0.804   0.4220    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.2 on 377 degrees of freedom
Multiple R-squared:  0.8505,	Adjusted R-squared:  0.8474 
F-statistic: 268.2 on 8 and 377 DF, 

### Best fit model and why

Model 3 is the best fit because it has the lowest standard error (17.4 on 382 degrees of freedom > 17.28 on 380 degrees of freedom > 17.2 on 377 degrees of freedom). Additionally, it has the highest R-squared value, indicating that this regression model explains more of the variation in the response variable compared to the other models. R-squared shows the proportion of the variance in the response variable that can be explained by the predictor variables in a regression model.

### Interpretting marginal effect at mean of the number of daughters on the AAUW score in each model

In [28]:
mean_girls <- mean(leg$ngirls)

reg1_coeff<-summary(reg1)$coef
reg2_coeff<-summary(reg2)$coef
reg3_coeff<-summary(reg3)$coef

reg1_coeff[4,1]
reg2_coeff[4,1]+ 2 * reg2_coeff[5,1] * mean_girls
reg3_coeff[4,1]+ 2 * reg3_coeff[5,1] * mean_girls + reg3_coeff[8,1] + 2 *reg3_coeff[9,1]*mean_girls
reg3_coeff[4,1]+ 2 * reg3_coeff[5,1] * mean_girls

**Model 1 = -0.35** 

In Model One, the number of daughters has a slight negative effect on the AAUW score (voting in favor), while holding other variables constant. This suggests that as the number of daughters increases, the likelihood of voting in favor, as measured by the AAUW score, slightly decreases when other factors are controlled for.

**Model 2 = 2.15**

In Model Two, the effect of the number of daughters on the AAUW score is positive. This suggests that, with all other variables remaining constant, an increase in the number of daughters is associated with a higher likelihood of a congressperson voting in favor, as indicated by the AAUW score.

**Model 3 (republicans = 0) = 2.04**
**Model 3 (republicans = 1) = 1.73**

In Model 3, the effect of the number of daughters on the AAUW score is positive for both Democrats and Republicans while other variables remain constant. However, this positive effect is more pronounced for Democrats (Republicans=0), indicating that the number of daughters has a stronger influence on the AAUW score for Democrats compared to Republicans.


### Testing the effect of number of daughters on AAUW scores using the second model

$$ H0: {\beta_3}, {\beta_4} = 0$$
$$ H1: {\beta_3}, {\beta_4} \not= 0$$

In [29]:
restricted = lm (aauw ~ female+repub+totchi, data=leg)
unrestricted = reg2

R2r = summary(restricted)$r.squared
R2u = summary(unrestricted)$r.squared
q= 2
k=5
n=nobs(restricted)


F = ((R2u-R2r)/q)/ ((1-R2u)/(n-k-1))
F

Our F statistics is 1.477 and the critical value is 3 according to the 95% significance level we will fail to reject the null.

### Predicting the AAUW score for male democrats who have 3 daughters and 0 sons, and who have 36% of constituents who want more spending on defense on average. Suggesting a 95% Cl for the predicted value.

In [37]:
ngirls_n = leg$ngirls -3
ngirls2_n = ngirls_n**2
repubngirls_n = ngirls_n*leg$repub
repubngirls2_n = ngirls2_n*leg$repub
totchi_n = leg$totchi-3
moredef_n = leg$moredef-36

restricted = lm(aauw ~female+repub+ngirls_n+ngirls2_n+repubngirls_n+repubngirls2_n+totchi_n+moredef_n, data=leg)
sigma= summary(restricted)$sigma^2
SE = summary(restricted)$coeff[1,2]
B0= summary(restricted)$coeff[1,1]

upper_CI = B0 + ((1.96)*SE)
lower_CI = B0 - ((1.96)*SE)

upper_CI
lower_CI

AAUW_SCORE= summary(restricted)$coeff[1,1]
AAUW_SCORE

The confidence interval is [77.72, 90.66]. The predicted AAUW score is 84.19.