# Table of Contents

* [Imports](#Imports)
* [Data Read In](#Data-Read-In)
* Model Fitting
    * [Parameter Combination](#Parameter-Combination)
    * [Split On Sex](#Split-On-Sex)
    * [Split On Sex Parameter Combination](#Split-On-Sex-Parameter-Combination)
* Model Evaluation
    * [Cross Validation](#Cross-Validation)
* Bootstrapping
    * [Setup](#Setup)
    * [Running](#Running)
    * [Results](#Results)

# Imports
[Back to Top](#Table-of-Contents)

In [1]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from itertools import combinations

# Data Read In
[Back to Top](#Table-of-Contents)

In [17]:
root = "../data/"

knnDat = pd.read_csv(root + "violenceKNN_splitForImpute.csv")
features = pd.read_csv(root + "Economic_Data.csv", index_col=0)
knnSplResp = pd.read_csv(root + "vioRespKNNSexSplit.csv")
#meanSplResp = pd.read_csv(root + "vioRespMeanSexSplit.csv")
#MedSplResp = pd.read_csv(root + "vioRespMedSexSplit.csv")
#ModeSplResp = pd.read_csv(root + "vioRespModeSexSplit.csv")

In [18]:
knnSplResp["sex"] = knnSplResp["sex"].replace({"Male" : 1,
                                               "Female" : 0})

# Model Fitting

## Parameter Combination
[Back to Top](#Table-of-Contents)

In [20]:
dropCols = ["Year", 
            "State / County Name",
            "Ages 5 to 17 in Families SAIPE Poverty Universe",
            "Ages 5 to 17 in Families in Poverty Count",
            "Ages 5 to 17 in Families in Poverty Percent",
            "All Ages in Poverty Count",
            "All Ages SAIPE Poverty Universe",
            "Under Age 18 SAIPE Poverty Universe",
            "Under Age 18 in Poverty Count"]
knnDat = knnDat.merge(features, how="inner", left_on=["year", "sitename"],
                      right_on=["Year", "State / County Name"]).drop(dropCols, axis=1)
knnDat.columns = ['year', 'sitename', 'violenceScore',
       'AllAgesInPovertyPercent', 'UnderAge18inPovertyPercent',
       'MedianHouseholdIncomeInDollars', 'UnemploymentRate', 'Population',
       'SNAP']

In [21]:
allFeatures = np.asarray(knnDat.drop(["sitename", "violenceScore"], axis=1).columns)

In [22]:
modelsAIC = pd.DataFrame(columns=["AIC", "features"])
for nComb in reversed(range(1, 7)):
    combs = combinations(allFeatures, nComb)
    for comb in combs:
        equ = "violenceScore ~ "
        i = 0
        for feat in comb:
            equ += feat 
            if len(comb) - 1 != i:
                equ += "+"
            i += 1
        md = smf.mixedlm(equ, knnDat, 
                         groups=knnDat["sitename"])
        mdf = md.fit(reml=False)
        modelsAIC = modelsAIC.append({'AIC' : mdf.aic, "features" : comb}, ignore_index=True)







In [23]:
modelsAIC.sort_values("AIC")

Unnamed: 0,AIC,features
101,772.352482,"(year, UnemploymentRate)"
69,773.931227,"(year, UnderAge18inPovertyPercent, Unemploymen..."
76,774.176704,"(year, UnemploymentRate, SNAP)"
75,774.197420,"(year, UnemploymentRate, Population)"
72,774.232398,"(year, MedianHouseholdIncomeInDollars, Unemplo..."
...,...,...
121,870.691511,"(UnderAge18inPovertyPercent,)"
111,872.638754,"(UnderAge18inPovertyPercent, Population)"
120,874.095724,"(AllAgesInPovertyPercent,)"
107,876.185382,"(AllAgesInPovertyPercent, Population)"


In [24]:
allFeatures = np.asarray(knnDat.drop(["sitename", "violenceScore", "year"], axis=1).columns)

modelsAIC = pd.DataFrame(columns=["AIC", "features"])
for nComb in reversed(range(1, 7)):
    combs = combinations(allFeatures, nComb)
    for comb in combs:
        equ = "violenceScore ~ "
        i = 0
        for feat in comb:
            equ += feat 
            if len(comb) - 1 != i:
                equ += "+"
            i += 1
        md = smf.mixedlm(equ, knnDat, 
                         groups=knnDat["sitename"])
        mdf = md.fit(reml=False)
        modelsAIC = modelsAIC.append({'AIC' : mdf.aic, "features" : comb}, ignore_index=True)







In [25]:
modelsAIC.sort_values("AIC")

Unnamed: 0,AIC,features
55,784.257982,"(UnemploymentRate, SNAP)"
39,785.452798,"(MedianHouseholdIncomeInDollars, UnemploymentR..."
41,785.666856,"(UnemploymentRate, Population, SNAP)"
36,786.048424,"(UnderAge18inPovertyPercent, UnemploymentRate,..."
11,786.171478,"(AllAgesInPovertyPercent, UnderAge18inPovertyP..."
...,...,...
58,870.691511,"(UnderAge18inPovertyPercent,)"
49,872.638754,"(UnderAge18inPovertyPercent, Population)"
57,874.095724,"(AllAgesInPovertyPercent,)"
45,876.185382,"(AllAgesInPovertyPercent, Population)"


In [26]:
md = smf.mixedlm("violenceScore ~ 1", knnDat, groups=knnDat["sitename"])
mdf = md.fit(reml=False)
print(mdf.summary())#-431.9609 



           Mixed Linear Model Regression Results
Model:            MixedLM Dependent Variable: violenceScore
No. Observations: 122     Method:             ML           
No. Groups:       14      Scale:              69.5832      
Min. group size:  8       Log-Likelihood:     -435.5395    
Max. group size:  9       Converged:          No           
Mean group size:  8.7                                      
------------------------------------------------------------
             Coef.   Std.Err.    z     P>|z|  [0.025  0.975]
------------------------------------------------------------
Intercept    38.443     0.980  39.245  0.000  36.523  40.363
Group Var     5.439     0.535                               





In [27]:
mdf.aic

877.0790323019997

## Split On Sex
[Back to Top](#Table-of-Contents)

In [80]:
dropCols = ["Year", 
            "State / County Name",
            "Ages 5 to 17 in Families SAIPE Poverty Universe",
            "Ages 5 to 17 in Families in Poverty Count",
            "Ages 5 to 17 in Families in Poverty Percent",
            "All Ages in Poverty Count",
            "All Ages SAIPE Poverty Universe",
            "Under Age 18 SAIPE Poverty Universe",
            "Under Age 18 in Poverty Count"]
dat = knnSplResp.merge(features, how="inner", left_on=["year", "sitename"],
                       right_on=["Year", "State / County Name"]).drop(dropCols, axis=1)

In [81]:
dat.columns = ['year', 'sitename', 'sex', 'violenceScore',
       'AllAgesInPovertyPercent', 'UnderAge18inPovertyPercent',
       'MedianHouseholdIncomeInDollars', 'UnemploymentRate', 'Population',
       'SNAP']

In [84]:
md = smf.ols("violenceScore ~ sex*UnemploymentRate + sex*SNAP", dat)
mdf = md.fit()
print(mdf.aic)
print(mdf.summary())

1640.0872233672335
                            OLS Regression Results                            
Dep. Variable:          violenceScore   R-squared:                       0.826
Model:                            OLS   Adj. R-squared:                  0.822
Method:                 Least Squares   F-statistic:                     226.2
Date:                Sat, 20 Nov 2021   Prob (F-statistic):           2.86e-88
Time:                        14:53:55   Log-Likelihood:                -814.04
No. Observations:                 244   AIC:                             1640.
Df Residuals:                     238   BIC:                             1661.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                                   coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------

In [79]:
md = smf.mixedlm("violenceScore ~ sex + UnemploymentRate + SNAP", dat, groups=dat["sitename"])
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1657.5762363435827
           Mixed Linear Model Regression Results
Model:             MixedLM Dependent Variable: violenceScore
No. Observations:  244     Method:             ML           
No. Groups:        14      Scale:              46.1138      
Min. group size:   16      Log-Likelihood:     -822.7881    
Max. group size:   18      Converged:          Yes          
Mean group size:   17.4                                     
------------------------------------------------------------
                 Coef.  Std.Err.    z    P>|z| [0.025 0.975]
------------------------------------------------------------
Intercept        47.548    2.681  17.734 0.000 42.293 52.803
sex[T.Male]      26.111    0.869  30.032 0.000 24.407 27.816
UnemploymentRate  1.571    0.179   8.760 0.000  1.220  1.923
SNAP             -0.247    0.020 -12.379 0.000 -0.287 -0.208
Group Var         7.169    0.572                            



In [80]:
md = smf.mixedlm("violenceScore ~ sex*UnemploymentRate + sex*SNAP", dat, groups=dat["sitename"])
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1622.658252423583
                 Mixed Linear Model Regression Results
Model:                MixedLM     Dependent Variable:     violenceScore
No. Observations:     244         Method:                 ML           
No. Groups:           14          Scale:                  38.9244      
Min. group size:      16          Log-Likelihood:         -803.3291    
Max. group size:      18          Converged:              Yes          
Mean group size:      17.4                                             
-----------------------------------------------------------------------
                             Coef.  Std.Err.   z    P>|z| [0.025 0.975]
-----------------------------------------------------------------------
Intercept                    35.895    3.358 10.689 0.000 29.313 42.476
sex[T.Male]                  49.469    4.579 10.804 0.000 40.495 58.442
UnemploymentRate              1.126    0.226  4.989 0.000  0.684  1.568
sex[T.Male]:UnemploymentRate  0.870    0.307  2.830 0.005  0.26

In [81]:
md = smf.mixedlm("violenceScore ~ sex + UnemploymentRate + SNAP", dat, groups=dat["sitename"], re_formula="~sex")
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1660.6332095921155
               Mixed Linear Model Regression Results
Model:                MixedLM   Dependent Variable:   violenceScore
No. Observations:     244       Method:               ML           
No. Groups:           14        Scale:                45.8600      
Min. group size:      16        Log-Likelihood:       -822.3166    
Max. group size:      18        Converged:            Yes          
Mean group size:      17.4                                         
-------------------------------------------------------------------
                        Coef.  Std.Err.    z    P>|z| [0.025 0.975]
-------------------------------------------------------------------
Intercept               47.471    2.644  17.952 0.000 42.288 52.654
sex[T.Male]             26.109    0.906  28.820 0.000 24.334 27.885
UnemploymentRate         1.580    0.179   8.808 0.000  1.228  1.931
SNAP                    -0.247    0.020 -12.410 0.000 -0.286 -0.208
Group Var                4.807    0.622     

In [82]:
md = smf.mixedlm("violenceScore ~ sex*UnemploymentRate + sex*SNAP", dat, groups=dat["sitename"], re_formula="~sex")
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1625.7340980885758
                 Mixed Linear Model Regression Results
Model:                MixedLM     Dependent Variable:     violenceScore
No. Observations:     244         Method:                 ML           
No. Groups:           14          Scale:                  38.0806      
Min. group size:      16          Log-Likelihood:         -802.8670    
Max. group size:      18          Converged:              Yes          
Mean group size:      17.4                                             
-----------------------------------------------------------------------
                             Coef.  Std.Err.   z    P>|z| [0.025 0.975]
-----------------------------------------------------------------------
Intercept                    35.854    3.324 10.785 0.000 29.339 42.370
sex[T.Male]                  49.555    4.590 10.798 0.000 40.560 58.551
UnemploymentRate              1.116    0.232  4.813 0.000  0.662  1.571
sex[T.Male]:UnemploymentRate  0.901    0.323  2.791 0.005  0.2

## Split On Sex Parameter Combination
[Back to Top](#Table-of-Contents)

In [5]:
dropCols = ["Year", 
            "State / County Name",
            "Ages 5 to 17 in Families SAIPE Poverty Universe",
            "Ages 5 to 17 in Families in Poverty Count",
            "Ages 5 to 17 in Families in Poverty Percent",
            "All Ages in Poverty Count",
            "All Ages SAIPE Poverty Universe",
            "Under Age 18 SAIPE Poverty Universe",
            "Under Age 18 in Poverty Count"]

dat = knnSplResp.merge(features, how="inner", left_on=["year", "sitename"],
                       right_on=["Year", "State / County Name"]).drop(dropCols, axis=1)

dat.columns = ['year', 'sitename',
               'sex', 'violenceScore',
               'AllAgesInPovertyPercent',
               'UnderAge18inPovertyPercent', 
               'MedianHouseholdIncomeInDollars',
               'UnemploymentRate', 'Population',
               'SNAP']

datMales = dat[dat["sex"] == 1]
datFemales = dat[dat["sex"] == 0]

In [91]:
def allParamComb(dat, allFeatures):
    modelsAIC = pd.DataFrame(columns=["AIC", "features"])
    didntWork = []
    for nComb in reversed(range(1, 7)):
        # creates all combinations of length nComb
        combs = combinations(allFeatures, nComb)
        for comb in combs:
            # initializes the equations
            equ = "violenceScore ~"
            i = 0
            
            # makes the equation from all combs
            for feat in comb:
                equ += feat 
                if len(comb) - 1 != i:
                    equ += "+"
                i += 1
            
            # fits the model
            try:
                md = smf.mixedlm(equ, dat, 
                                 groups=dat["sitename"])
                mdf = md.fit(reml=False)
                modelsAIC = modelsAIC.append({'AIC' : mdf.aic,
                                              "features" : comb},
                                             ignore_index=True)
            except:
                didntWork.append(equ)
    
    print("Didn't Work", didntWork)
    return modelsAIC

In [92]:
allFeatures = np.asarray(dat.drop(["sitename",
                                   "violenceScore",
                                   "sex"],
                                  axis=1).columns)

maleModelsAIC = allParamComb(datMales, allFeatures)
femaleModelsAIC = allParamComb(datFemales, allFeatures)





Didn't Work []






Didn't Work []


In [93]:
maleModelsAIC.sort_values("AIC")

Unnamed: 0,AIC,features
101,838.593040,"(year, UnemploymentRate)"
76,839.735151,"(year, UnemploymentRate, SNAP)"
65,840.037641,"(year, AllAgesInPovertyPercent, UnemploymentRate)"
75,840.339865,"(year, UnemploymentRate, Population)"
69,840.464688,"(year, UnderAge18inPovertyPercent, Unemploymen..."
...,...,...
121,954.538224,"(UnderAge18inPovertyPercent,)"
111,956.689661,"(UnderAge18inPovertyPercent, Population)"
120,956.768956,"(AllAgesInPovertyPercent,)"
124,958.028958,"(Population,)"


In [94]:
femaleModelsAIC.sort_values("AIC")

Unnamed: 0,AIC,features
71,751.915476,"(year, UnderAge18inPovertyPercent, SNAP)"
69,751.973126,"(year, UnderAge18inPovertyPercent, Unemploymen..."
99,752.681740,"(year, UnderAge18inPovertyPercent)"
101,752.741765,"(year, UnemploymentRate)"
65,753.054362,"(year, AllAgesInPovertyPercent, UnemploymentRate)"
...,...,...
121,800.067935,"(UnderAge18inPovertyPercent,)"
111,800.177094,"(UnderAge18inPovertyPercent, Population)"
120,801.471199,"(AllAgesInPovertyPercent,)"
107,803.494639,"(AllAgesInPovertyPercent, Population)"


In [95]:
allFeatures = np.asarray(dat.drop(["sitename",
                                   "violenceScore",
                                   "year", "sex"],
                                  axis=1).columns)

maleModelsAIC = allParamComb(datMales, allFeatures)
femaleModelsAIC = allParamComb(datFemales, allFeatures)





Didn't Work []






Didn't Work []


In [96]:
maleModelsAIC.sort_values("AIC")

Unnamed: 0,AIC,features
4,851.952430,"(AllAgesInPovertyPercent, UnderAge18inPovertyP..."
11,851.982874,"(AllAgesInPovertyPercent, UnderAge18inPovertyP..."
30,852.085407,"(AllAgesInPovertyPercent, UnemploymentRate, SNAP)"
55,852.376483,"(UnemploymentRate, SNAP)"
41,852.855823,"(UnemploymentRate, Population, SNAP)"
...,...,...
58,954.538224,"(UnderAge18inPovertyPercent,)"
49,956.689661,"(UnderAge18inPovertyPercent, Population)"
57,956.768956,"(AllAgesInPovertyPercent,)"
61,958.028958,"(Population,)"


In [104]:
femaleModelsAIC.sort_values("AIC")

Unnamed: 0,AIC,features
36,760.217388,"(UnderAge18inPovertyPercent, UnemploymentRate,..."
55,761.176652,"(UnemploymentRate, SNAP)"
30,761.298159,"(AllAgesInPovertyPercent, UnemploymentRate, SNAP)"
11,761.509663,"(AllAgesInPovertyPercent, UnderAge18inPovertyP..."
39,761.982929,"(MedianHouseholdIncomeInDollars, UnemploymentR..."
...,...,...
58,800.067935,"(UnderAge18inPovertyPercent,)"
49,800.177094,"(UnderAge18inPovertyPercent, Population)"
57,801.471199,"(AllAgesInPovertyPercent,)"
45,803.494639,"(AllAgesInPovertyPercent, Population)"


In [105]:
maleModelsAIC.sort_values("AIC").features.iloc[:2,].values

array([('AllAgesInPovertyPercent', 'UnderAge18inPovertyPercent', 'UnemploymentRate', 'Population', 'SNAP'),
       ('AllAgesInPovertyPercent', 'UnderAge18inPovertyPercent', 'UnemploymentRate', 'SNAP')],
      dtype=object)

In [106]:
femaleModelsAIC.sort_values("AIC").features.iloc[:2,].values

array([('UnderAge18inPovertyPercent', 'UnemploymentRate', 'SNAP'),
       ('UnemploymentRate', 'SNAP')], dtype=object)

In [6]:
md = smf.mixedlm("violenceScore ~ UnemploymentRate + sex*year", dat, groups=dat["sitename"])
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1590.9309200620357
              Mixed Linear Model Regression Results
Model:              MixedLM   Dependent Variable:   violenceScore
No. Observations:   244       Method:               ML           
No. Groups:         14        Scale:                33.3495      
Min. group size:    16        Log-Likelihood:       -788.4655    
Max. group size:    18        Converged:            Yes          
Mean group size:    17.4                                         
-----------------------------------------------------------------
                  Coef.   Std.Err.   z    P>|z|  [0.025   0.975] 
-----------------------------------------------------------------
Intercept        1240.826  209.254  5.930 0.000  830.696 1650.956
UnemploymentRate    0.506    0.154  3.278 0.001    0.204    0.809
sex              2197.946  291.250  7.547 0.000 1627.107 2768.785
year               -0.606    0.104 -5.825 0.000   -0.809   -0.402
sex:year           -1.080    0.145 -7.457 0.000   -1.364   -0.796
Group

In [7]:
md = smf.mixedlm("violenceScore ~ sex*UnemploymentRate + sex*SNAP + sex*year", dat, groups=dat["sitename"])
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1596.6449394236654
               Mixed Linear Model Regression Results
Model:               MixedLM    Dependent Variable:    violenceScore
No. Observations:    244        Method:                ML           
No. Groups:          14         Scale:                 33.2886      
Min. group size:     16         Log-Likelihood:        -788.3225    
Max. group size:     18         Converged:             Yes          
Mean group size:     17.4                                           
--------------------------------------------------------------------
                      Coef.   Std.Err.   z    P>|z|  [0.025  0.975] 
--------------------------------------------------------------------
Intercept            1450.916  580.222  2.501 0.012 313.702 2588.131
sex                  1854.080  713.562  2.598 0.009 455.524 3252.637
UnemploymentRate        0.406    0.330  1.230 0.219  -0.241    1.054
sex:UnemploymentRate    0.162    0.399  0.406 0.685  -0.620    0.944
SNAP                    0.026  

In [9]:
md = smf.mixedlm("violenceScore ~ sex*UnemploymentRate + sex*SNAP + sex:UnderAge18inPovertyPercent", dat, groups=dat["sitename"])
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1617.9116461546412
                  Mixed Linear Model Regression Results
Model:                  MixedLM     Dependent Variable:     violenceScore
No. Observations:       244         Method:                 ML           
No. Groups:             14          Scale:                  37.0685      
Min. group size:        16          Log-Likelihood:         -799.9558    
Max. group size:        18          Converged:              Yes          
Mean group size:        17.4                                             
-------------------------------------------------------------------------
                               Coef.  Std.Err.   z    P>|z| [0.025 0.975]
-------------------------------------------------------------------------
Intercept                      36.140    3.327 10.862 0.000 29.619 42.661
sex                            54.400    4.851 11.214 0.000 44.891 63.908
UnemploymentRate                1.014    0.226  4.497 0.000  0.572  1.456
sex:UnemploymentRate            1.271

In [8]:
md = smf.mixedlm("violenceScore ~ sex*UnemploymentRate + sex*SNAP + sex*UnderAge18inPovertyPercent", dat, groups=dat["sitename"])
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1619.7110180969696
                  Mixed Linear Model Regression Results
Model:                  MixedLM     Dependent Variable:     violenceScore
No. Observations:       244         Method:                 ML           
No. Groups:             14          Scale:                  37.3713      
Min. group size:        16          Log-Likelihood:         -799.8555    
Max. group size:        18          Converged:              Yes          
Mean group size:        17.4                                             
-------------------------------------------------------------------------
                               Coef.  Std.Err.   z    P>|z| [0.025 0.975]
-------------------------------------------------------------------------
Intercept                      34.786    4.420  7.870 0.000 26.123 43.449
sex                            54.714    4.917 11.126 0.000 45.075 64.352
UnemploymentRate                0.963    0.253  3.808 0.000  0.467  1.459
sex:UnemploymentRate            1.296

In [10]:
md = smf.mixedlm("violenceScore ~ sex*UnemploymentRate + sex*SNAP", dat, groups=dat["sitename"])
mdf = md.fit(reml=False)
print(mdf.aic)
print(mdf.summary())

1622.658252423583
             Mixed Linear Model Regression Results
Model:              MixedLM  Dependent Variable:  violenceScore
No. Observations:   244      Method:              ML           
No. Groups:         14       Scale:               38.9244      
Min. group size:    16       Log-Likelihood:      -803.3291    
Max. group size:    18       Converged:           Yes          
Mean group size:    17.4                                       
---------------------------------------------------------------
                     Coef.  Std.Err.   z    P>|z| [0.025 0.975]
---------------------------------------------------------------
Intercept            35.895    3.358 10.689 0.000 29.313 42.476
sex                  49.469    4.579 10.804 0.000 40.495 58.442
UnemploymentRate      1.126    0.226  4.989 0.000  0.684  1.568
sex:UnemploymentRate  0.870    0.307  2.830 0.005  0.268  1.473
SNAP                 -0.134    0.026 -5.207 0.000 -0.184 -0.083
sex:SNAP             -0.227    0.03

# Model Evaluation
## Cross Validation
[Back to Top](#Table-of-Contents)

To do the cross validation we will leave out n counties in each run, then see how the model performs on these. We will do this with every n-pair of counties. We will use this to find what model predicts the best, and if the split on sex makes for a better prediction.

Note:

"Male" : 1  
"Female" : 0

In [28]:
def CrossValMSE(modelEqu, modelGroup, nLeftOut, dat, mseSplitOn=None):
    leftOutCols = combinations(dat[modelGroup].unique(),
                               nLeftOut)
    mses = []
    if mseSplitOn is not None:
        msesSplit = {}
        for group in dat[mseSplitOn].unique():
            msesSplit[group] = []
            
    for lo in leftOutCols:
        md = smf.mixedlm(modelEqu, 
                         dat[~dat[modelGroup].isin(lo)],
                         groups=dat[~dat[modelGroup].isin(lo)][modelGroup])
        mdf = md.fit(reml=False)
        pred = mdf.predict(dat[dat[modelGroup].isin(lo)])
        mses.append(np.square(dat[dat[modelGroup].isin(lo)]["violenceScore"] 
                              - pred)
                    .sum())
        if mseSplitOn is not None:
            for group in dat[mseSplitOn].unique():
                pred = mdf.predict(dat[(dat[modelGroup].isin(lo))
                                       & (dat[mseSplitOn] == group)])
                msesSplit[group].append(np.square(dat[(dat[modelGroup].isin(lo)) 
                                                & (dat[mseSplitOn] == group)]["violenceScore"] 
                                  - pred)
                         .sum())
    if mseSplitOn is not None:
        return np.mean(mses), pd.DataFrame(msesSplit).mean()
    return np.mean(mses)

In [29]:
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)

In [67]:
CrossValMSE("violenceScore ~ sex*UnemploymentRate + sex*SNAP", "sitename",
            2, dat, mseSplitOn="sex")

(1763.1716347685156,
 0     613.458954
 1    1149.712681
 dtype: float64)

In [68]:
CrossValMSE("violenceScore ~ sex*UnemploymentRate + sex*SNAP + sex*UnderAge18inPovertyPercent",
            "sitename", 2, dat, mseSplitOn="sex")

(1905.9720669190135,
 0     633.964273
 1    1272.007794
 dtype: float64)

In [69]:
CrossValMSE("violenceScore ~ sex*UnemploymentRate + sex*SNAP + sex:UnderAge18inPovertyPercent",
            "sitename", 2, dat, mseSplitOn="sex")

(1888.1006708149769,
 0     619.724057
 1    1268.376614
 dtype: float64)

In [70]:
CrossValMSE("violenceScore ~ sex*UnemploymentRate + sex*SNAP + sex*year", "sitename",
            2, dat, mseSplitOn="sex")

(1793.5042402507318,
 0     671.458066
 1    1122.046174
 dtype: float64)

In [71]:
CrossValMSE("violenceScore ~ UnemploymentRate + sex*year", "sitename",
            2, dat, mseSplitOn="sex")

(1756.437523886009,
 0     643.193382
 1    1113.244142
 dtype: float64)

In [30]:
CrossValMSE("violenceScore ~ UnemploymentRate + SNAP", "sitename", 2, knnDat)

667.4821611912405

In [31]:
CrossValMSE("violenceScore ~ UnemploymentRate + year", "sitename", 2, knnDat)

683.9819454578279

# BootStrapping
## Setup
[Back to Top](#Table-of-Contents)

## Running
[Back to Top](#Table-of-Contents)

## Results
[Back to Top](#Table-of-Contents)