# Redlining and current day poverty: How historic disinvestment might cause heat vulnerabilities for NYC residents

Hypothesis: Residents of neighborhoods with high percentages of C or D rankings during redlining eras and higher poverty rates today are most vulnerable to the impacts on hot summers in New York City

In [21]:
%load_ext rpy2.ipython
%load_ext autoreload
%autoreload 2

%matplotlib inline  
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 100)

import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings("ignore") # Ignore all warnings
# warnings.filterwarnings("ignore", category=RRuntimeWarning) # Show some warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
%%javascript
// Disable auto-scrolling
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

In [3]:
import pandas as pd
df = pd.read_csv("Complete.csv")

I want to run a linear regression that sees how much of the variability in HVI scores can be explained by economic factors and disinvestment

In [22]:
%%R -i df

# run a linear regression of (dependent variable) on wt (independent variable)
fit <- lm("Score ~ Perc_D + Percent_Poverty" , data = df)
# summarize the regression results
summary(fit)


Call:
lm(formula = "Score ~ Perc_D + Percent_Poverty", data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.04740 -0.61887 -0.09419  0.62414  2.75648 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.94342    0.29329   3.217  0.00216 ** 
Perc_D           0.13536    0.39648   0.341  0.73408    
Percent_Poverty  0.10680    0.01483   7.200 1.62e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.006 on 56 degrees of freedom
Multiple R-squared:  0.5276,	Adjusted R-squared:  0.5107 
F-statistic: 31.27 on 2 and 56 DF,  p-value: 7.59e-10



Let's add "Percent" variable in as well. This represents vegetative cover -- trees, parks, etc, that cools down neighborhood -- and a lack of this (according to an interview I did shows disinvestment in a city)

In [23]:
%%R -i df

# run a linear regression of (dependent variable) on wt (independent variable)
fit <- lm("Score ~ Perc_D + Percent_Poverty + Percent" , data = df)
# summarize the regression results
summary(fit)


Call:
lm(formula = "Score ~ Perc_D + Percent_Poverty + Percent", data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.8175 -0.6784 -0.1438  0.3979  2.8207 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      2.26155    0.51365   4.403 4.97e-05 ***
Perc_D          -0.16467    0.38327  -0.430  0.66913    
Percent_Poverty  0.09899    0.01409   7.025 3.43e-09 ***
Percent         -0.03698    0.01219  -3.034  0.00368 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9396 on 55 degrees of freedom
Multiple R-squared:  0.5953,	Adjusted R-squared:  0.5733 
F-statistic: 26.97 on 3 and 55 DF,  p-value: 7.342e-11



.57 is a pretty solid R squared, over half the variability in HVI scores can be explained by redlining, poverty rate, and lack of vegetative cover

We also have a really low p value, so this is very unlikely to be due to chance alone!