# The Associated Press and Life Expectancy

**Story:** [AP analysis: Unemployment, income affect life expectancy](https://www.apnews.com/66ac44186b6249709501f07a7eab36da)

**Author:** Nicky Forster, Associated Press

**Topics:** Census Data, Linear Regression

**Datasets**

* **R12221544_SL140.csv:** ACS 2015 5-year, tract level, from [Social Explorer](https://www.socialexplorer.com)
    - Table B23025: Employment Status
    - **R12221544.txt** is the data dictionary
* **R12221544_SL140.csv:** ACS 2015 5-year, tract level, from [Social Explorer](https://www.socialexplorer.com)
    - Table B23025: Employment Status
    - Table B06009: Educational Attainment
    - Table B03002: Race
    - Table B19013: Median income
    - Table C17002: Ratio of income to poverty level
    - **R12221544.txt** is the data dictionary
* **US_A.CSV:** life expectancy by census tract, from [USALEEP](https://www.cdc.gov/nchs/nvss/usaleep/usaleep.html)
    - **Record_Layout_CensusTract_Life_Expectancy.pdf** is data dictionary

# What's the story?

We're trying to figure out how the **life expectancy in a census tract** is related to other factors like unemployment, income, and others.

# PREPWORK BONUS!

Download the data yourself from Social Explorer and USALEEP (linked above) instead of relying on the data included.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import numpy as np
from statsmodels.sandbox.regression.predstd import wls_prediction_std

In [2]:
pd.options.display.max_columns = 200
# pd.reset_option('display.max_columns')

## Reading in our data

### Read in `US_A.CSV`

Rename any columns with weird or not-understandable names as something more descriptive.

In [3]:
lifeexp_df = pd.read_csv('sources/US_A.csv', dtype={'Tract ID':str,'STATE2KX':str,'CNTY2KX':str,'TRACT2KX':str})
lifeexp_df.rename(columns={'Tract ID':'FIPS',
                           'STATE2KX':'StateFIPS',
                           'CNTY2KX':'CountyFIPS',
                           'TRACT2KX':'Tract',
                           'e(0)':'Life_Exp',
                           'se(e(0))':'Life_Exp_StdError',
                           'Abridged life table flag':'Life_Table_Flag'
                          }, inplace=True)
lifeexp_df = lifeexp_df.drop(['StateFIPS','CountyFIPS','Tract','Life_Exp_StdError','Life_Table_Flag'], axis=1)
lifeexp_df.sample(10)

Unnamed: 0,FIPS,Life_Exp
52621,42079216900,84.6
60928,48453001823,79.2
64813,53061053604,79.2
52515,42077006401,77.9
36994,33011018501,77.5
41298,36055007500,73.8
25740,21157950600,74.3
3106,5131000200,72.2
26796,22087030602,74.2
9234,6075047901,82.2


### Open `R12221544_SL140.csv`

You'll need to give an option to `pd.read_csv` to make sure it's read in successfully.

In [4]:
columns = ['Geo_FIPS',
           'Geo_STUSAB',
           'Geo_NAME',
           'ACS15_5yr_B23025001',
           'ACS15_5yr_B23025002',
           'ACS15_5yr_B23025003',
           'ACS15_5yr_B23025004',
           'ACS15_5yr_B23025005',
           'ACS15_5yr_B23025006',
           'ACS15_5yr_B23025007']
workforce_df = pd.read_csv('sources/R12221544_SL140.csv', encoding='latin-1', usecols=columns, dtype={'Geo_FIPS':str})
workforce_df.rename(columns={'Geo_FIPS':'FIPS',
                             'Geo_STUSAB':'State',
                             'Geo_NAME':'County',
                             'ACS15_5yr_B23025001':'Labor_Total',
                             'ACS15_5yr_B23025002':'Labor_Force',
                             'ACS15_5yr_B23025003':'LF_Civilian',
                             'ACS15_5yr_B23025004':'LF_C_Employed',
                             'ACS15_5yr_B23025005':'LF_C_Unemployed',
                             'ACS15_5yr_B23025006':'LF_Armed_Forces',
                             'ACS15_5yr_B23025007':'Non_Labor_Force'
                            }, inplace=True)
workforce_df.County = workforce_df.County.str.extract(r', (.*), ', expand=True)
workforce_df.State = workforce_df.State.str.upper()
workforce_df.sample(15)

Unnamed: 0,FIPS,County,State,Labor_Total,Labor_Force,LF_Civilian,LF_C_Employed,LF_C_Unemployed,LF_Armed_Forces,Non_Labor_Force
41256,34013001400,Essex County,NJ,1947,1228,1228,916,312,0,719
64555,48201533902,Harris County,TX,3143,1927,1927,1618,309,0,1216
57396,42087960400,Mifflin County,PA,2552,1482,1482,1346,136,0,1070
51102,39035141300,Cuyahoga County,OH,2468,1596,1596,1477,119,0,872
42489,34035052603,Somerset County,NJ,3514,2167,2167,2100,67,0,1347
10784,6085511500,Santa Clara County,CA,6434,4319,4319,4217,102,0,2115
62509,48085031405,Collin County,TX,14374,9996,9996,9580,416,0,4378
43629,36007000900,Broome County,NY,1329,699,699,664,35,0,630
67521,50007000900,Chittenden County,VT,2435,1551,1551,1433,118,0,884
2662,4019004642,Pima County,AZ,2749,1144,1144,1016,128,0,1605


#### Filter out any columns we aren't interested in

#### Create a new column for percent unemployment

We'll be using the total population in the census tract as the baseline for employment.

In [5]:
workforce_df['Unemployment_Rate'] = workforce_df.LF_C_Unemployed / workforce_df.Labor_Force

## Merging the data

Merge the dataframes together based on their census tract.

In [6]:
print(f'Work Force table: {workforce_df.shape} and Life Expectancy table: {lifeexp_df.shape}')

Work Force table: (74001, 11) and Life Expectancy table: (65662, 2)


In [7]:
workforce_df = workforce_df.merge(lifeexp_df, how='left', left_on='FIPS', right_on='FIPS')

In [8]:
workforce_df.sample(10)

Unnamed: 0,FIPS,County,State,Labor_Total,Labor_Force,LF_Civilian,LF_C_Employed,LF_C_Unemployed,LF_Armed_Forces,Non_Labor_Force,Unemployment_Rate,Life_Exp
58666,44007002900,Providence County,RI,5355,3342,3342,3076,266,0,2013,0.079593,76.7
3688,6001422300,Alameda County,CA,3421,2302,2302,2148,154,0,1119,0.066898,85.7
35981,27053024200,Hennepin County,MN,2570,2001,1995,1934,61,6,569,0.030485,80.6
67113,49035112404,Salt Lake County,UT,3471,2720,2699,2402,297,21,751,0.109191,76.9
71818,55035000200,Eau Claire County,WI,3764,2495,2495,2409,86,0,1269,0.034469,
54522,41003000900,Benton County,OR,4712,3031,3031,2718,313,0,1681,0.103266,87.6
45966,36065023300,Oneida County,NY,2197,1581,1581,1409,172,0,616,0.108792,81.2
44414,36047016600,Kings County,NY,1696,1015,1015,917,98,0,681,0.096552,81.0
8937,6071002015,San Bernardino County,CA,3895,2594,2594,2334,260,0,1301,0.100231,78.2
33373,26073000800,Isabella County,MI,3109,2189,2189,1933,256,0,920,0.116948,


## Running the regression

Using the `statsmodels` package, run a linear regression to find the coefficient relating unemployment and life expectancy.

In [9]:
workforce_df = workforce_df.dropna(subset=['Life_Exp','Unemployment_Rate'])

In [10]:
X = workforce_df[['Unemployment_Rate']]
X = sm.add_constant(X)
Y = workforce_df.Life_Exp               

mod = sm.OLS(Y,X)
res = mod.fit()
res.summary()

  return ptp(axis=axis, out=out, **kwargs)


0,1,2,3
Dep. Variable:,Life_Exp,R-squared:,0.219
Model:,OLS,Adj. R-squared:,0.219
Method:,Least Squares,F-statistic:,18460.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,00:02:31,Log-Likelihood:,-176050.0
No. Observations:,65662,AIC:,352100.0
Df Residuals:,65660,BIC:,352100.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,81.2989,0.026,3120.604,0.000,81.248,81.350
Unemployment_Rate,-33.8243,0.249,-135.870,0.000,-34.312,-33.336

0,1,2,3
Omnibus:,488.287,Durbin-Watson:,1.189
Prob(Omnibus):,0.0,Jarque-Bera (JB):,721.801
Skew:,-0.068,Prob(JB):,1.8300000000000003e-157
Kurtosis:,3.496,Cond. No.,18.2


Translate that into the form **"every X percentage point change in unemployment translates to a Y change in life expectancy"**

## Bringing more columns into the mix

Only dealing with unemployment seems kind of narrow-minded, let's expand our reach a bit.

### Read in `R12221550_SL140.csv`

It's also from the Census, and has many, many, many more columns available to you compared to the list dataset.

In [11]:
columns = ['Geo_FIPS',
           'Geo_STUSAB',
           'Geo_NAME',
           'ACS15_5yr_B03002001',
           'ACS15_5yr_B03002003',
           'ACS15_5yr_B03002013',
           'ACS15_5yr_B03002004',
           'ACS15_5yr_B03002014',
           'ACS15_5yr_B03002012',
           'ACS15_5yr_B06009001',
           'ACS15_5yr_B06009002',
           'ACS15_5yr_B06009003',
           'ACS15_5yr_B06009004',
           'ACS15_5yr_B06009005',
           'ACS15_5yr_B06009006',
           'ACS15_5yr_C17002001',
           'ACS15_5yr_C17002002',
           'ACS15_5yr_C17002003',
           'ACS15_5yr_C17002004',
           'ACS15_5yr_C17002005',
           'ACS15_5yr_C17002006',
           'ACS15_5yr_C17002007',
           'ACS15_5yr_C17002008',
           'ACS15_5yr_B19013001',
           'ACS15_5yr_B23025002',
           'ACS15_5yr_B23025005']
workEthEdu_df = pd.read_csv('sources/R12221550_SL140.csv', encoding='latin-1', usecols=columns, dtype={'Geo_FIPS':str})
workEthEdu_df.rename(columns={'Geo_FIPS':'FIPS',
                              'Geo_STUSAB':'State',
                              'Geo_NAME':'County',
                              'ACS15_5yr_B03002001':'Eth_Total',
                              'ACS15_5yr_B03002003':'Eth_White_NonHisp',
                              'ACS15_5yr_B03002013':'Eth_White_Hisp',
                              'ACS15_5yr_B03002004':'Eth_Black_NonHisp',
                              'ACS15_5yr_B03002014':'Eth_Black_Hisp',
                              'ACS15_5yr_B03002012':'Eth_Hispanic_Total',
                              'ACS15_5yr_B06009001':'Edu_Total',
                              'ACS15_5yr_B06009002':'Edu_Lessser',
                              'ACS15_5yr_B06009003':'Edu_HighSchool',
                              'ACS15_5yr_B06009004':'Edu_SomeCollege',
                              'ACS15_5yr_B06009005':'Edu_Bachelor',
                              'ACS15_5yr_B06009006':'Edu_Graduate',
                              'ACS15_5yr_C17002001':'Poverty_Total',
                              'ACS15_5yr_C17002002':'Pov_0p00_0p50',
                              'ACS15_5yr_C17002003':'Pov_0p50_0p99',
                              'ACS15_5yr_C17002004':'Pov_1p00_1p24',
                              'ACS15_5yr_C17002005':'Pov_1p25_1p49',
                              'ACS15_5yr_C17002006':'Pov_1p50_1p84',
                              'ACS15_5yr_C17002007':'Pov_1p85_1p99',
                              'ACS15_5yr_C17002008':'Pov_2p00_plus',
                              'ACS15_5yr_B19013001':'Median_Household_Income',
                              'ACS15_5yr_B23025002':'Labor_Force',
                              'ACS15_5yr_B23025005':'Labor_Unemployed'
                             }, inplace=True)
workEthEdu_df.County = workEthEdu_df.County.str.extract(r', (.*), ', expand=True)
workEthEdu_df.State = workEthEdu_df.State.str.upper()
workEthEdu_df.sample(10)

Unnamed: 0,FIPS,County,State,Eth_Total,Eth_White_NonHisp,Eth_Black_NonHisp,Eth_Hispanic_Total,Eth_White_Hisp,Eth_Black_Hisp,Edu_Total,Edu_Lessser,Edu_HighSchool,Edu_SomeCollege,Edu_Bachelor,Edu_Graduate,Poverty_Total,Pov_0p00_0p50,Pov_0p50_0p99,Pov_1p00_1p24,Pov_1p25_1p49,Pov_1p50_1p84,Pov_1p85_1p99,Pov_2p00_plus,Median_Household_Income,Labor_Force,Labor_Unemployed
66102,48439113002,Tarrant County,TX,7494,883,2645,3558,3087,0,4297.0,1003.0,1334.0,1277.0,503.0,180.0,7414,796,651,373,1046,1163,195,3190,36566.0,3866,260
9007,6071003607,San Bernardino County,CA,5474,738,445,4248,3440,22,3106.0,1283.0,766.0,799.0,145.0,113.0,5468,499,537,268,648,753,423,2340,45300.0,2413,392
14343,12011010610,Broward County,FL,6134,3765,458,1697,1535,0,4116.0,296.0,1766.0,1192.0,539.0,323.0,6045,217,677,267,232,1182,33,3437,50563.0,3224,376
44443,36047019700,Kings County,NY,3508,1569,1237,233,110,55,2769.0,48.0,282.0,446.0,949.0,1044.0,3498,193,300,58,41,70,67,2769,83650.0,2049,221
35566,27003050807,Anoka County,MN,3843,2658,377,225,213,0,2448.0,330.0,912.0,899.0,214.0,93.0,3834,178,374,305,42,443,126,2366,59145.0,1869,174
40895,34003047100,Bergen County,NJ,5873,4063,35,599,515,11,3915.0,91.0,418.0,623.0,1479.0,1304.0,5779,48,41,0,92,79,49,5470,153819.0,3074,98
18327,12131950400,Walton County,FL,3178,2652,222,59,59,0,1966.0,481.0,662.0,583.0,173.0,67.0,3107,434,583,215,156,358,0,1361,32600.0,1150,98
46180,36071001200,Orange County,NY,2815,408,505,1696,356,34,1619.0,556.0,543.0,348.0,115.0,57.0,2801,414,512,283,318,213,117,944,38889.0,1286,139
7062,6037980019,Los Angeles County,CA,214,143,8,40,40,0,143.0,0.0,0.0,55.0,20.0,68.0,214,0,0,0,0,0,0,214,146875.0,106,10
70887,53067010800,Thurston County,WA,6497,5554,31,336,321,0,4582.0,317.0,925.0,2035.0,811.0,494.0,6497,428,251,468,312,561,113,4364,52849.0,3269,184


Using this census data, create a new dataframe that includes the following columns:

* Percent unemployed
* Percents Black, White, and Hispanic
* Median Income (in increments of 10,000 dollars)
* Percent of the population with less than a high school education
* Percent of the population between 1-1.5x the poverty line

If you have to make any editorial decisions about which columns you choose or how you do your math, please explain them.

In [12]:
workEthEdu_df.Median_Household_Income.describe()

count     72939.000000
mean      57302.244725
std       28935.010119
min        2499.000000
25%       37685.000000
50%       51098.000000
75%       70147.000000
max      250001.000000
Name: Median_Household_Income, dtype: float64

In [13]:
buckets = [0,10000,20000,30000,40000,50000,60000,70000,80000,90000,
           100000,110000,120000,130000,140000,150000,160000,170000,180000,190000,
           200000,210000,220000,230000,240000,250002]
brackets = ['0-10k','10k-20k','20k-30k','30k-40k','40k-50k','50k-60k','60k-70k','70k-80k','80k-90k','90k-100k',
           '100-110k','110k-120k','120k-130k','130k-140k','140k-150k','150k-160k','160k-170k','170k-180k','180k-190k','190k-200k',
           '200-210k','210k-220k','220k-230k','230k-240k','240k-250k']
buckLimit = [10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,
           110000,120000,130000,140000,150000,160000,170000,180000,190000,200000,
           210000,220000,230000,240000,250000]
buckLim10 = [10,20,30,40,50,60,70,80,90,100,
           110,120,130,140,150,160,170,180,190,200,
           210,220,230,240,250]
workEthEdu_df['Income_Group'] = pd.cut(workEthEdu_df.Median_Household_Income, buckets, labels=brackets)
workEthEdu_df['Income_Under'] = pd.cut(workEthEdu_df.Median_Household_Income, buckets, labels=buckLim10)
workEthEdu_df.Income_Under = workEthEdu_df.Income_Under.astype(int)
workEthEdu_df[['Median_Household_Income','Income_Group','Income_Under']].sample(5)

Unnamed: 0,Median_Household_Income,Income_Group,Income_Under
56159,55362.0,50k-60k,60
65830,30287.0,30k-40k,40
57923,22279.0,20k-30k,30
28079,27159.0,20k-30k,30
60708,35396.0,30k-40k,40


In [14]:
workEthEdu_df.Income_Group.value_counts()

40k-50k      13660
30k-40k      12030
50k-60k      11554
60k-70k       8233
20k-30k       6812
70k-80k       5498
80k-90k       3971
90k-100k      2750
10k-20k       2180
100-110k      2097
110k-120k     1294
120k-130k      897
130k-140k      528
140k-150k      381
150k-160k      303
160k-170k      201
0-10k          130
170k-180k      121
180k-190k       75
200-210k        68
190k-200k       55
240k-250k       43
210k-220k       33
230k-240k       14
220k-230k       11
Name: Income_Group, dtype: int64

In [15]:
workEthEdu_df['Pop_pctWhite'] = (workEthEdu_df.Eth_White_NonHisp + workEthEdu_df.Eth_White_Hisp) / workEthEdu_df.Eth_Total
workEthEdu_df['Pop_pctBlack'] = (workEthEdu_df.Eth_Black_NonHisp + workEthEdu_df.Eth_Black_Hisp) / workEthEdu_df.Eth_Total
workEthEdu_df['Pop_pctHisp'] = workEthEdu_df.Eth_Hispanic_Total / workEthEdu_df.Eth_Total
workEthEdu_df['Pop_pctEduNoHS'] = workEthEdu_df.Edu_Lessser / workEthEdu_df.Edu_Total
workEthEdu_df['Pop_pctEduNoB'] = (workEthEdu_df.Edu_Lessser + workEthEdu_df.Edu_HighSchool + workEthEdu_df.Edu_SomeCollege) / workEthEdu_df.Edu_Total
workEthEdu_df['Pop_pctPovUnder'] =  (workEthEdu_df.Pov_0p00_0p50 + workEthEdu_df.Pov_0p50_0p99) / workEthEdu_df.Poverty_Total
workEthEdu_df['Pop_pctPov1p5'] = (workEthEdu_df.Pov_1p00_1p24 + workEthEdu_df.Pov_1p25_1p49)  / workEthEdu_df.Poverty_Total
workEthEdu_df['Unemployment_Rate'] = workEthEdu_df.Labor_Unemployed / workEthEdu_df.Labor_Force

### Join your datasets

Combine your life expectancy dataset with this census dataset to create a new dataframe.

In [16]:
merge_df = workEthEdu_df[['FIPS','State','County','Pop_pctWhite','Pop_pctBlack','Pop_pctHisp','Pop_pctEduNoHS','Pop_pctPov1p5','Unemployment_Rate','Income_Group','Income_Under']]
merge_df = merge_df.merge(lifeexp_df, how='left', left_on='FIPS', right_on='FIPS')

In [17]:
merge_df.sample(10)

Unnamed: 0,FIPS,State,County,Pop_pctWhite,Pop_pctBlack,Pop_pctHisp,Pop_pctEduNoHS,Pop_pctPov1p5,Unemployment_Rate,Income_Group,Income_Under,Life_Exp
17714,12105010702,FL,Polk County,0.916962,0.063239,0.202423,0.098091,0.12617,0.0,40k-50k,50,79.3
57293,42079215504,PA,Luzerne County,0.995752,0.0,0.0,0.068692,0.035427,0.056332,60k-70k,70,80.6
53921,40109103602,OK,Oklahoma County,0.6,0.293056,0.033333,0.281804,0.101408,0.238739,50k-60k,60,
26868,20155001200,KS,Reno County,0.915748,0.04252,0.02126,0.04148,0.024547,0.041165,50k-60k,60,76.4
53932,40109104700,OK,Oklahoma County,0.557925,0.037669,0.628998,0.407603,0.227434,0.094178,10k-20k,20,67.9
18135,12117021606,FL,Seminole County,0.780477,0.104904,0.244117,0.063895,0.134291,0.059368,40k-50k,50,79.8
31701,25013812901,MA,Hampden County,0.881859,0.0,0.019037,0.113626,0.037794,0.110892,60k-70k,70,82.0
25875,19085290500,IA,Harrison County,0.981694,0.0,0.0,0.080778,0.048106,0.032209,70k-80k,80,82.5
49465,37119005829,NC,Mecklenburg County,0.592657,0.34946,0.106263,0.057692,0.094643,0.080697,30k-40k,40,75.7
40015,32003004301,NV,Clark County,0.458414,0.063185,0.89813,0.604567,0.043843,0.20202,20k-30k,30,


## Running your multivariate regression

Using the `statsmodels` package and this new dataframe, run a multivariate linear regression to find the coefficient relating your columns and life expectancy.

In [18]:
merge_df.shape

(74001, 12)

In [19]:
merge_df.Pop_pctEduNoHS.value_counts(dropna=False).head()

NaN         1587
0.000000     381
0.333333      18
0.111111      18
0.083333      16
Name: Pop_pctEduNoHS, dtype: int64

In [20]:
merge_df = merge_df.dropna(axis=0, how='any')
merge_df.shape

(65656, 12)

In [21]:
X = merge_df[['Unemployment_Rate','Pop_pctPov1p5']]
X = sm.add_constant(X)
Y = merge_df.Life_Exp               

mod = sm.OLS(Y,X)
res = mod.fit()
res.summary()

0,1,2,3
Dep. Variable:,Life_Exp,R-squared:,0.305
Model:,OLS,Adj. R-squared:,0.305
Method:,Least Squares,F-statistic:,14380.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,00:02:32,Log-Likelihood:,-172230.0
No. Observations:,65656,AIC:,344500.0
Df Residuals:,65653,BIC:,344500.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,82.5195,0.028,2936.624,0.000,82.464,82.575
Unemployment_Rate,-23.7974,0.260,-91.394,0.000,-24.308,-23.287
Pop_pctPov1p5,-21.3895,0.238,-89.773,0.000,-21.856,-20.922

0,1,2,3
Omnibus:,859.179,Durbin-Watson:,1.272
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1568.561
Skew:,0.038,Prob(JB):,0.0
Kurtosis:,3.753,Cond. No.,23.2


In [22]:
X = merge_df[['Pop_pctWhite','Pop_pctEduNoHS']]
X = sm.add_constant(X)
Y = merge_df.Life_Exp               

mod = sm.OLS(Y,X)
res = mod.fit()
res.summary()

0,1,2,3
Dep. Variable:,Life_Exp,R-squared:,0.182
Model:,OLS,Adj. R-squared:,0.182
Method:,Least Squares,F-statistic:,7315.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,00:02:32,Log-Likelihood:,-177560.0
No. Observations:,65656,AIC:,355100.0
Df Residuals:,65653,BIC:,355200.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,78.2204,0.056,1386.871,0.000,78.110,78.331
Pop_pctWhite,2.5041,0.060,41.517,0.000,2.386,2.622
Pop_pctEduNoHS,-12.2981,0.138,-89.369,0.000,-12.568,-12.028

0,1,2,3
Omnibus:,595.569,Durbin-Watson:,1.024
Prob(Omnibus):,0.0,Jarque-Bera (JB):,951.242
Skew:,0.054,Prob(JB):,2.7599999999999998e-207
Kurtosis:,3.58,Cond. No.,13.0


In [23]:
X = merge_df[['Pop_pctWhite','Pop_pctBlack','Pop_pctHisp','Pop_pctEduNoHS','Pop_pctPov1p5','Unemployment_Rate','Income_Under']]
X = sm.add_constant(X)
Y = merge_df.Life_Exp               

mod = sm.OLS(Y,X)
res = mod.fit()
res.summary()

0,1,2,3
Dep. Variable:,Life_Exp,R-squared:,0.489
Model:,OLS,Adj. R-squared:,0.489
Method:,Least Squares,F-statistic:,8973.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,00:02:32,Log-Likelihood:,-162130.0
No. Observations:,65656,AIC:,324300.0
Df Residuals:,65648,BIC:,324300.0
Df Model:,7,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,80.6128,0.110,732.353,0.000,80.397,80.829
Pop_pctWhite,-3.3171,0.088,-37.681,0.000,-3.490,-3.145
Pop_pctBlack,-6.0662,0.103,-58.628,0.000,-6.269,-5.863
Pop_pctHisp,4.0169,0.078,51.649,0.000,3.864,4.169
Pop_pctEduNoHS,-8.4944,0.179,-47.475,0.000,-8.845,-8.144
Pop_pctPov1p5,-6.0775,0.274,-22.186,0.000,-6.614,-5.541
Unemployment_Rate,-9.3583,0.267,-35.000,0.000,-9.882,-8.834
Income_Under,0.0466,0.001,80.849,0.000,0.045,0.048

0,1,2,3
Omnibus:,2177.088,Durbin-Watson:,1.531
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4786.287
Skew:,0.204,Prob(JB):,0.0
Kurtosis:,4.258,Cond. No.,1750.0


Translate some of your coefficients into the form **"every X percentage point change in unemployment translates to a Y change in life expectancy."** Do this with numbers that are meaningful, and in a way that is easily understandable to your reader.