In [1]:
from __future__ import division
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

### 1. Run a multiple linear probability model (have at least 2 Xs in the model). Tell me how you think your independent variables will affect your dependent variable. Interpret your results. Were your expectations correct? Why or why not?

My dependent variable is:

- What about sexual relations between two adults of the same sex -- do you think it is always wrong (1), almost always wrong (2), wrong only sometimes (3), or not wrong at all (4)? (HOMOSEX)

My independent variables are:

- What race do you consider yourself? (RACE)

 1) White

 2) Black

 3) Other

- How many (family) are strongly liberal? (ACQFMLIB)

 1) 0

 2) 1

 3) 2-5

 4) 6-10

 5) More than 10

I think white people are more open to homosex than black and other people, and people with more liberal families are more open to homosex.

In [2]:
g = pd.read_csv("GSS.2006.csv")
g.head()

Unnamed: 0,vpsu,vstrat,adults,ballot,dateintv,famgen,form,formwt,gender1,hompop,...,away7,gender14,old14,relate14,relhh14,relhhd14,relsp14,where12,where6,where7
0,1,1957,1,3,316,2,1,1,2,3,...,,,,,,,,,,
1,1,1957,2,2,630,1,2,1,2,2,...,,,,,,,,,,
2,1,1957,2,2,314,2,1,1,2,2,...,,,,,,,,,,
3,1,1957,1,1,313,1,2,1,2,1,...,,,,,,,,,,
4,1,1957,3,1,322,2,2,1,2,3,...,,,,,,,,,,


In [3]:
# Get rid of all missings
sub = g.dropna(subset = ["homosex", "race", "acqfmlib"])
sub = sub.copy()

In [4]:
# Make the dependent variable be binary
conditions = [
    (sub['homosex'] <= 2) ,
    (sub['homosex'] > 2)]
choices = [0, 1]
sub['hms'] = np.select(conditions, choices, default=np.nan)

In [5]:
# Check that the recode worked okay
pd.crosstab(index=sub["hms"], columns="count")

col_0,count
hms,Unnamed: 1_level_1
0.0,155
1.0,98


In [7]:
# Make one independent variable be binary
conditions = [
    (sub['acqfmlib'] <= 2) ,
    (sub['acqfmlib'] > 2)]
choices = [0, 1]
sub['lbfam'] = np.select(conditions, choices, default=np.nan)

In [8]:
# Check that the recode worked okay
pd.crosstab(index=sub["lbfam"], columns="count")

col_0,count
lbfam,Unnamed: 1_level_1
0.0,80
1.0,173


In [9]:
lm1 = smf.ols(formula = 'hms ~ C(race) + lbfam', data = sub).fit()
print (lm1.summary())

                            OLS Regression Results                            
Dep. Variable:                    hms   R-squared:                       0.062
Model:                            OLS   Adj. R-squared:                  0.051
Method:                 Least Squares   F-statistic:                     5.529
Date:                Thu, 30 Nov 2023   Prob (F-statistic):            0.00109
Time:                        03:57:34   Log-Likelihood:                -168.88
No. Observations:                 253   AIC:                             345.8
Df Residuals:                     249   BIC:                             359.9
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept        0.3291      0.055      5.957   

On average, black people were -0.275 points less open to homosex than white people, and other people were 0.081 points more open to homosex than white people, net of the number of liberal families. Since white people were more open to homosex than black people but not than other people, my expectation of the effect of races is partially correct. The result of black people is statistically significant since its p-value < 0.05, but the result of other people is not, with a p-value of 0.461.

Moreover, net of race, people with many liberal families were 0.132 points more open to homosex than those with one or no liberal family, on average. Since people with more liberal families were more open to homosex, my expectation of the effect of families is correct. The result is statistically significant since its p-value < 0.05.

The r-squared is 0.062, meaning that 6.2% of the variation in people's views about homosex can be explained by the model.

### 2. Run a multiple (binary) logistic model. (It can be the same as the above LPM or a new model.) If it is a new model, tell me how you think your independent variables will affect your dependent variable. Interpret your results in the logit scale. Were your expectations correct? Why or why not?

In [10]:
logit1 = sm.formula.logit(formula = 'hms ~ C(race) + lbfam', data = sub).fit()
print (logit1.summary())

Optimization terminated successfully.
         Current function value: 0.632890
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:                    hms   No. Observations:                  253
Model:                          Logit   Df Residuals:                      249
Method:                           MLE   Df Model:                            3
Date:                Thu, 30 Nov 2023   Pseudo R-squ.:                 0.05192
Time:                        03:57:34   Log-Likelihood:                -160.12
converged:                       True   LL-Null:                       -168.89
Covariance Type:            nonrobust   LLR p-value:                 0.0005479
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept       -0.7474      0.255     -2.928      0.003      -1.248      -0.247
C(race)[T.2]    -1.4961

Net of the number of liberal families, black people have a logit of being open to homosex that is -1.496 lower than white people, and other people have a logit that is 0.320 higher than white people, on average. Since white people were more open to homosex than black people but not than other people, my expectation of the effect of races is partially correct. The result of black people is statistically significant since its p-value < 0.05, but the result of other people is not, with a p-value of 0.491.

Moreover, net of race, people with many liberal families have a logit of being open to homosex that is 0.608 higher than those with one or no liberal family, on average. Since people with more liberal families were more open to homosex, my expectation of the effect of families is correct. The result is statistically significant since its p-value < 0.05.

The r-squared is 0.052, meaning that 5.2% of the variation in people's views about homosex can be explained by the model.

### 3. Get odds ratios from your logit model in Question 2 and interpret some of them.

In [11]:
np.exp(logit1.params)

Intercept       0.473590
C(race)[T.2]    0.224005
C(race)[T.3]    1.377657
lbfam           1.838130
dtype: float64

Net of the number of liberal families, black people's odds of being open to homosex are 77.6% lower than white people, and other people's odds of being open to homosex are 37.7% higher than white people.

And net of race, the odds of being open to homosex for people with many liberal families are 83.8% larger than those with one or no liberal family.

### 4. Extra Credit: Get predicted probabilities from your logit model in Question 2 for some constellations of X values and interpret the results.

In [12]:
def logit2prob (logit):
    odds = np.exp(logit)
    prob = odds / (1 + odds)
    return(prob);

intercept = logit1.params.Intercept
b_race_2 = logit1.params['C(race)[T.2]']
b_race_3 = logit1.params['C(race)[T.3]']
b_lbfam = logit1.params.lbfam

I am going to find out the predicted probability of being open to homosex for black people with many liberal families.

In [13]:
logits_hms = intercept + (1 * b_race_2) + (1 * b_lbfam)
logit2prob(logits_hms)

0.16318041921959836

The predicted probability is 0.163, meaning that black people with many liberal families only have a 16.3% probability of being open to homosex.

Then, I am going to find out the predicted probability of being open to homosex for other people with one or no liberal family.

In [14]:
logits_hms = intercept + (1 * b_race_3) + (0 * b_lbfam)
logit2prob(logits_hms)

0.39483585542354166

The predicted probability is 0.395, meaning that other people with one or no liberal family have a 39.5% probability of being open to homosex.