This is the notebook for performing linear regression model.

* Build linear regression on different dependent variables: Number of senior-involved accident, number of dead senior and number of dead senior pedestrain.
* post: equals to 1 if this accident occurred after 2009, 0 otherwise
* round 1,2,3: equals to 1 if this census tract contains safety zone, 0 otherwise.
* We care about the direction of the interaction term to be negative, because negative means the SSfS efficiently decreases the number of accidents after implementation.
* Descriptive stats for control variables
* Add controls - anything that could be different between an area that has been treated by the SPFA program and an area that has not been treated:
    * Characteristics of the area (share of residential v commercial buildings)
    * Number of people involved in crash
    * Weather - need to harmonize across years without losing observations
    * Drinking/drugs
    * Time of crash

Conclusion:
- Number of senior-involved accident has the highest R-squared (0.686) while number of dead senior has the lowest R-squared(0.572).
- Both of them has a significant and negative interaction term, which means SSfS is efficient.

## Import Data

In [1]:
import pandas as pd
import statsmodels as sm
import statsmodels.formula.api as smf
import statsmodels.api as sma

One row per census, and one column: number of accident in each census track, one column: whether it contains safe zone

In [2]:
pwd

'/projects/cps2019_aging/shared/Github'

In [3]:
ct = pd.read_csv("../data/ct_df.csv")
ct.head()

Unnamed: 0,SENIOR,S_PED,S_NOT_PED,S_PED_DEAD,S_SURVIVED,S_DEAD,S_DRINKING+,S_DRINKING-,S_MALE,S_FEMALE,...,AFTERNOON,NIGHT,MIDNIGHT,MONTH_1,MONTH_2,MONTH_3,rounds_1.0,rounds_2.0,rounds_3.0,YEAR
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2001
1,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2001
2,3.0,1.0,2.0,1.0,0.0,3.0,0.0,1.0,3.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,2001
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,2001
4,2.0,2.0,0.0,2.0,0.0,2.0,0.0,2.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2001


In [4]:
ct.S_PED_DEAD.sum()

916.0

In [5]:
ct['post'] = ct['YEAR'] >= 2009

In [6]:
ct['post'].unique()

array([False, True], dtype=object)

In [7]:
ct['post'] = ct['post'].astype(str)

In [8]:
ct['post'] = ct['post'].replace('False', 0)
ct['post'] = ct['post'].replace('True', 1)

In [9]:
ct.head()

Unnamed: 0,SENIOR,S_PED,S_NOT_PED,S_PED_DEAD,S_SURVIVED,S_DEAD,S_DRINKING+,S_DRINKING-,S_MALE,S_FEMALE,...,NIGHT,MIDNIGHT,MONTH_1,MONTH_2,MONTH_3,rounds_1.0,rounds_2.0,rounds_3.0,YEAR,post
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2001,0
1,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2001,0
2,3.0,1.0,2.0,1.0,0.0,3.0,0.0,1.0,3.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,2001,0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,2001,0
4,2.0,2.0,0.0,2.0,0.0,2.0,0.0,2.0,1.0,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2001,0


In [10]:
ct.rename(columns={"rounds_1.0":"rounds_1","rounds_2.0":"rounds_2","rounds_3.0":"rounds_3"},inplace=True)

In [11]:
ct.columns

Index(['SENIOR', 'S_PED', 'S_NOT_PED', 'S_PED_DEAD', 'S_SURVIVED', 'S_DEAD',
       'S_DRINKING+', 'S_DRINKING-', 'S_MALE', 'S_FEMALE', 'S_DRUG+',
       'S_DRUG-', 'S_DEAD_AFTER', 'S_DEAD_SCENE', 'S_DEAD_ROUTE', 'YOUNG',
       'Y_PED', 'Y_NOT_PED', 'Y_SURVIVED', 'Y_DEAD', 'Y_PED_DEAD',
       'Y_DRINKING+', 'Y_DRINKING-', 'Y_MALE', 'Y_FEMALE', 'Y_DRUG+',
       'Y_DRUG-', 'Y_DEAD_AFTER', 'Y_DEAD_SCENE', 'Y_DEAD_ROUTE', 'FATALS',
       'WEATHER_GOOD', 'WEATHER_RAIN', 'WEATHER_SLEET', 'WEATHER_SNOW',
       'WEATHER_FOG', 'WEATHER_CLOUDY', 'LGT_COND_DAYLIGHT',
       'LGT_COND_DARK_NOT_LIGHTED', 'LGT_COND_DARK_LIGHTED', 'LGT_COND_DAWN',
       'LGT_COND_DUSK', 'LGT_COND_DARK_UNKNOWN_LIGHTING', 'WEEKDAY', 'WEEKEND',
       'YEAR_Q1', 'YEAR_Q2', 'YEAR_Q3', 'MONTH_Q4', 'MORNING', 'NOON',
       'AFTERNOON', 'NIGHT', 'MIDNIGHT', 'MONTH_1', 'MONTH_2', 'MONTH_3',
       'rounds_1', 'rounds_2', 'rounds_3', 'YEAR', 'post'],
      dtype='object')

Definition of time-related variables:
- YEAR_Q1: January, Febuary, March
- YEAR_Q2: April, May, June
- YEAR_Q3: July, August, September
- YEAR_Q4: October, November, December
- MORNING: 6-11
- NOON: 12-14
- AFTERNOON: 15-19
- NIGHT: 20-23
- MIDNIGHT: 0-5

We got three dependent variables that we want to estimate:
    1. Number of senior-involved accidents
    2. Number of dead senior
    3. Number of dead senior pedestrain

# Y = Number of senior-involved accidents 

In [12]:
mod = smf.ols(formula="SENIOR ~ post+post*rounds_1", data = ct).fit()
print(mod.summary())

                            OLS Regression Results                            
Dep. Variable:                 SENIOR   R-squared:                       0.067
Model:                            OLS   Adj. R-squared:                  0.067
Method:                 Least Squares   F-statistic:                     359.5
Date:                Wed, 17 Jul 2019   Prob (F-statistic):          1.65e-225
Time:                        08:32:48   Log-Likelihood:                -6896.6
No. Observations:               15028   AIC:                         1.380e+04
Df Residuals:                   15024   BIC:                         1.383e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------
Intercept         0.1036      0.005     22.559

- R-squared: 0.067
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [13]:
ct.columns

Index(['SENIOR', 'S_PED', 'S_NOT_PED', 'S_PED_DEAD', 'S_SURVIVED', 'S_DEAD',
       'S_DRINKING+', 'S_DRINKING-', 'S_MALE', 'S_FEMALE', 'S_DRUG+',
       'S_DRUG-', 'S_DEAD_AFTER', 'S_DEAD_SCENE', 'S_DEAD_ROUTE', 'YOUNG',
       'Y_PED', 'Y_NOT_PED', 'Y_SURVIVED', 'Y_DEAD', 'Y_PED_DEAD',
       'Y_DRINKING+', 'Y_DRINKING-', 'Y_MALE', 'Y_FEMALE', 'Y_DRUG+',
       'Y_DRUG-', 'Y_DEAD_AFTER', 'Y_DEAD_SCENE', 'Y_DEAD_ROUTE', 'FATALS',
       'WEATHER_GOOD', 'WEATHER_RAIN', 'WEATHER_SLEET', 'WEATHER_SNOW',
       'WEATHER_FOG', 'WEATHER_CLOUDY', 'LGT_COND_DAYLIGHT',
       'LGT_COND_DARK_NOT_LIGHTED', 'LGT_COND_DARK_LIGHTED', 'LGT_COND_DAWN',
       'LGT_COND_DUSK', 'LGT_COND_DARK_UNKNOWN_LIGHTING', 'WEEKDAY', 'WEEKEND',
       'YEAR_Q1', 'YEAR_Q2', 'YEAR_Q3', 'MONTH_Q4', 'MORNING', 'NOON',
       'AFTERNOON', 'NIGHT', 'MIDNIGHT', 'MONTH_1', 'MONTH_2', 'MONTH_3',
       'rounds_1', 'rounds_2', 'rounds_3', 'YEAR', 'post'],
      dtype='object')

In [14]:
ct.rename(columns={"S_DRINKING+":"S_DRINKING","S_DRUG+":"S_DRUG"},inplace=True)

In [15]:
mod2 = smf.ols(formula="SENIOR ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+NOON+\
       AFTERNOON+NIGHT+rounds_2+rounds_3", data = ct).fit()
print(mod2.summary())

                            OLS Regression Results                            
Dep. Variable:                 SENIOR   R-squared:                       0.477
Model:                            OLS   Adj. R-squared:                  0.476
Method:                 Least Squares   F-statistic:                     526.8
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:32:53   Log-Likelihood:                -2543.3
No. Observations:               15028   AIC:                             5141.
Df Residuals:                   15001   BIC:                             5346.
Df Model:                          26                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.477
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [16]:
mod3 = smf.ols(formula="SENIOR ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+MORNING+NOON+\
       AFTERNOON+NIGHT+rounds_2+rounds_3+S_MALE", data = ct).fit()
print(mod3.summary())

                            OLS Regression Results                            
Dep. Variable:                 SENIOR   R-squared:                       0.686
Model:                            OLS   Adj. R-squared:                  0.685
Method:                 Least Squares   F-statistic:                     1169.
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:33:00   Log-Likelihood:                 1281.6
No. Observations:               15028   AIC:                            -2505.
Df Residuals:                   14999   BIC:                            -2284.
Df Model:                          28                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.686
- Coefficient of interaction term: negative
- Significance of interaction term: no

# Y = Number of dead senior pedestrain

In [17]:
mod4 = smf.ols(formula="S_PED_DEAD ~ post+post*rounds_1", data = ct).fit()
print(mod4.summary())

                            OLS Regression Results                            
Dep. Variable:             S_PED_DEAD   R-squared:                       0.068
Model:                            OLS   Adj. R-squared:                  0.068
Method:                 Least Squares   F-statistic:                     367.3
Date:                Wed, 17 Jul 2019   Prob (F-statistic):          3.04e-230
Time:                        08:33:02   Log-Likelihood:                -108.54
No. Observations:               15028   AIC:                             225.1
Df Residuals:                   15024   BIC:                             255.6
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------
Intercept         0.0589      0.003     20.133

- R-squared: 0.068
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [18]:
mod5 = smf.ols(formula="S_PED_DEAD ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+MORNING+NOON+\
       AFTERNOON+NIGHT", data = ct).fit()
print(mod5.summary())

                            OLS Regression Results                            
Dep. Variable:             S_PED_DEAD   R-squared:                       0.436
Model:                            OLS   Adj. R-squared:                  0.435
Method:                 Least Squares   F-statistic:                     464.1
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:33:03   Log-Likelihood:                 3664.3
No. Observations:               15028   AIC:                            -7277.
Df Residuals:                   15002   BIC:                            -7078.
Df Model:                          25                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.436
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [19]:
mod6 = smf.ols(formula="S_PED_DEAD ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+MORNING+NOON+\
       AFTERNOON+NIGHT+rounds_2+rounds_3", data = ct).fit()
print(mod6.summary())

                            OLS Regression Results                            
Dep. Variable:             S_PED_DEAD   R-squared:                       0.445
Model:                            OLS   Adj. R-squared:                  0.444
Method:                 Least Squares   F-statistic:                     445.3
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:33:05   Log-Likelihood:                 3782.3
No. Observations:               15028   AIC:                            -7509.
Df Residuals:                   15000   BIC:                            -7295.
Df Model:                          27                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.445
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [20]:
mod7 = smf.ols(formula="S_PED_DEAD ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+MORNING+NOON+\
       AFTERNOON+NIGHT+MONTH_1+MONTH_2+rounds_2+rounds_3+S_MALE", data = ct).fit()
print(mod7.summary())

                            OLS Regression Results                            
Dep. Variable:             S_PED_DEAD   R-squared:                       0.572
Model:                            OLS   Adj. R-squared:                  0.571
Method:                 Least Squares   F-statistic:                     667.6
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:33:06   Log-Likelihood:                 5733.0
No. Observations:               15028   AIC:                        -1.140e+04
Df Residuals:                   14997   BIC:                        -1.117e+04
Df Model:                          30                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.572
- Coefficient of interaction term: negative
- Significance of interaction term: yes

 # Y = Number of dead senior

In [21]:
mod8 = smf.ols(formula="S_DEAD ~ post+post*rounds_1", data = ct).fit()
print(mod8.summary())

                            OLS Regression Results                            
Dep. Variable:                 S_DEAD   R-squared:                       0.076
Model:                            OLS   Adj. R-squared:                  0.076
Method:                 Least Squares   F-statistic:                     411.7
Date:                Wed, 17 Jul 2019   Prob (F-statistic):          5.09e-257
Time:                        08:33:08   Log-Likelihood:                -4485.8
No. Observations:               15028   AIC:                             8980.
Df Residuals:                   15024   BIC:                             9010.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------
Intercept         0.1036      0.004     26.484

- R-squared: 0.076
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [22]:
mod9 = smf.ols(formula="S_DEAD ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+MORNING+NOON+\
       AFTERNOON+NIGHT", data = ct).fit()
print(mod9.summary())

                            OLS Regression Results                            
Dep. Variable:                 S_DEAD   R-squared:                       0.436
Model:                            OLS   Adj. R-squared:                  0.435
Method:                 Least Squares   F-statistic:                     463.7
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:33:12   Log-Likelihood:                -777.53
No. Observations:               15028   AIC:                             1607.
Df Residuals:                   15002   BIC:                             1805.
Df Model:                          25                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.436
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [23]:
mod10 = smf.ols(formula="S_DEAD ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+MORNING+NOON+\
       AFTERNOON+NIGHT+rounds_2+rounds_3", data = ct).fit()
print(mod10.summary())

                            OLS Regression Results                            
Dep. Variable:                 S_DEAD   R-squared:                       0.441
Model:                            OLS   Adj. R-squared:                  0.440
Method:                 Least Squares   F-statistic:                     438.3
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:33:16   Log-Likelihood:                -708.81
No. Observations:               15028   AIC:                             1474.
Df Residuals:                   15000   BIC:                             1687.
Df Model:                          27                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.441
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [24]:
mod11 = smf.ols(formula="S_DEAD ~ post+post*rounds_1+S_DRINKING+S_DRUG+WEATHER_GOOD+\
        WEATHER_RAIN+WEATHER_SLEET+WEATHER_SNOW+WEATHER_FOG+WEATHER_CLOUDY+LGT_COND_DAYLIGHT\
       +LGT_COND_DARK_NOT_LIGHTED+LGT_COND_DARK_LIGHTED+LGT_COND_DAWN+\
       LGT_COND_DUSK+LGT_COND_DARK_UNKNOWN_LIGHTING+WEEKDAY+\
       YEAR_Q1+YEAR_Q2+YEAR_Q3+MORNING+NOON+\
       AFTERNOON+NIGHT+rounds_2+rounds_3+S_MALE", data = ct).fit()
print(mod11.summary())

                            OLS Regression Results                            
Dep. Variable:                 S_DEAD   R-squared:                       0.603
Model:                            OLS   Adj. R-squared:                  0.602
Method:                 Least Squares   F-statistic:                     813.7
Date:                Wed, 17 Jul 2019   Prob (F-statistic):               0.00
Time:                        08:33:20   Log-Likelihood:                 1862.4
No. Observations:               15028   AIC:                            -3667.
Df Residuals:                   14999   BIC:                            -3446.
Df Model:                          28                                         
Covariance Type:            nonrobust                                         
                                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------
Intercept   

- R-squared: 0.603
- Coefficient of interaction term: negative
- Significance of interaction term: yes

In [25]:
ct.describe().to_csv("ct_describe.csv")

In [28]:
des = ct.describe()

In [29]:
des.columns

Index(['SENIOR', 'S_PED', 'S_NOT_PED', 'S_PED_DEAD', 'S_SURVIVED', 'S_DEAD',
       'S_DRINKING', 'S_DRINKING-', 'S_MALE', 'S_FEMALE', 'S_DRUG', 'S_DRUG-',
       'S_DEAD_AFTER', 'S_DEAD_SCENE', 'S_DEAD_ROUTE', 'YOUNG', 'Y_PED',
       'Y_NOT_PED', 'Y_SURVIVED', 'Y_DEAD', 'Y_PED_DEAD', 'Y_DRINKING+',
       'Y_DRINKING-', 'Y_MALE', 'Y_FEMALE', 'Y_DRUG+', 'Y_DRUG-',
       'Y_DEAD_AFTER', 'Y_DEAD_SCENE', 'Y_DEAD_ROUTE', 'FATALS',
       'WEATHER_GOOD', 'WEATHER_RAIN', 'WEATHER_SLEET', 'WEATHER_SNOW',
       'WEATHER_FOG', 'WEATHER_CLOUDY', 'LGT_COND_DAYLIGHT',
       'LGT_COND_DARK_NOT_LIGHTED', 'LGT_COND_DARK_LIGHTED', 'LGT_COND_DAWN',
       'LGT_COND_DUSK', 'LGT_COND_DARK_UNKNOWN_LIGHTING', 'WEEKDAY', 'WEEKEND',
       'YEAR_Q1', 'YEAR_Q2', 'YEAR_Q3', 'MONTH_Q4', 'MORNING', 'NOON',
       'AFTERNOON', 'NIGHT', 'MIDNIGHT', 'MONTH_1', 'MONTH_2', 'MONTH_3',
       'rounds_1', 'rounds_2', 'rounds_3', 'YEAR', 'post'],
      dtype='object')

In [30]:
des = des[["SENIOR","S_PED","S_PED_DEAD","S_DEAD","S_DRINKING","S_MALE","S_DRUG","WEATHER_GOOD","WEATHER_RAIN",
          "WEATHER_SNOW","WEATHER_FOG","WEATHER_CLOUDY","LGT_COND_DAYLIGHT","LGT_COND_DARK_LIGHTED","LGT_COND_DAWN",
          "WEEKDAY","YEAR_Q1","YEAR_Q2","YEAR_Q3","MORNING","NOON","AFTERNOON","NIGHT","MIDNIGHT","rounds_1",
           "rounds_2","rounds_3"]]

In [33]:
des = des.T

In [27]:
des.drop(columns={"count","min","25%","50%","75%","max"},axis=1,inplace)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
SENIOR,15028.0,0.106268,0.396404,0.0,0.0,0.0,0.0,6.0
S_PED,15028.0,0.061951,0.256067,0.0,0.0,0.0,0.0,3.0
S_NOT_PED,15028.0,0.044317,0.246960,0.0,0.0,0.0,0.0,6.0
S_PED_DEAD,15028.0,0.060953,0.252513,0.0,0.0,0.0,0.0,3.0
S_SURVIVED,15028.0,0.021826,0.172055,0.0,0.0,0.0,0.0,4.0
S_DEAD,15028.0,0.084442,0.339287,0.0,0.0,0.0,0.0,6.0
S_DRINKING,15028.0,0.000998,0.031579,0.0,0.0,0.0,0.0,1.0
S_DRINKING-,15028.0,0.014573,0.135478,0.0,0.0,0.0,0.0,3.0
S_MALE,15028.0,0.047711,0.243474,0.0,0.0,0.0,0.0,4.0
S_FEMALE,15028.0,0.036133,0.205303,0.0,0.0,0.0,0.0,4.0
