## <span style = "color:#1A237E;">Hypothesis Three</span>
### <span style = "color:green;">Null Hypothesis</span>
1. There is no significant relationship between Headcount Rate (%) and Severity of Poverty (%).
2. There is no significant relationship between Poverty Gap (%) and Severity of Poverty (%).
3. There is no significant relationship between Distribution of the Poor (%) and Severity of Poverty (%).
### <span style = "color:green;">Alternative Hypothesis</span>
1. There is a significant relationship between Headcount Rate (%) and Severity of Poverty (%).
2. There is a significant relationship between Poverty Gap (%) and Severity of Poverty (%).
3. There is a significant relationship between Distribution of the Poor (%) and Severity of Poverty (%).
### <span style = "color:green;">Relevance</span>
By exploring the relationships between poverty indicators, such as headcount rate, poverty gap, and distribution of the poor, with the severity of poverty, we can gain insights into the key drivers of poverty in the country and inform policy and intervention strategies.

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm
import warnings

warnings.filterwarnings('ignore')

In [2]:
# Extract dataset
data = pd.read_csv('overall_poverty_est.csv')
data.head()

print(data.columns)

Index(['residence_county', 'Headcount Rate (%)', 'Severity of Poverty (%)',
       'Population (ths)', 'Number of Poor (ths)',
       'Proportion of households that sought credit (%)',
       'Proportion of households that sought and accessed credit (%)',
       'Number of Households that sought credit (ths)',
       'Distribution of the Poor (%)', 'Poverty Gap (%)'],
      dtype='object')


In [3]:
# Display number of rows and columns
data.shape

(47, 10)

In [4]:
# Display the columns headers in our dataset
data.columns

Index(['residence_county', 'Headcount Rate (%)', 'Severity of Poverty (%)',
       'Population (ths)', 'Number of Poor (ths)',
       'Proportion of households that sought credit (%)',
       'Proportion of households that sought and accessed credit (%)',
       'Number of Households that sought credit (ths)',
       'Distribution of the Poor (%)', 'Poverty Gap (%)'],
      dtype='object')

In [5]:
# Display statistical summary
data.describe()

Unnamed: 0,Headcount Rate (%),Severity of Poverty (%),Number of Poor (ths),Proportion of households that sought credit (%),Proportion of households that sought and accessed credit (%),Number of Households that sought credit (ths),Distribution of the Poor (%),Poverty Gap (%)
count,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0
mean,40.557447,5.306383,349.042553,33.082979,85.814894,81.851064,2.12766,10.314894
std,16.291085,5.254911,182.080125,16.18717,16.561138,79.732972,1.142404,6.053852
min,16.7,0.5,36.0,5.5,33.9,4.0,0.2,3.0
25%,28.8,2.5,231.0,21.3,84.05,39.5,1.3,6.75
50%,35.8,3.5,321.0,32.9,92.5,69.0,2.0,9.1
75%,47.45,5.8,455.5,43.1,97.55,108.5,2.65,11.75
max,79.4,30.8,860.0,66.1,99.2,510.0,4.9,32.9


In [6]:
# Calculate correlation matrix
corr_matrix = data.corr()

# Print correlation matrix
print(corr_matrix)

                                                    Headcount Rate (%)  \
Headcount Rate (%)                                            1.000000   
Severity of Poverty (%)                                       0.845491   
Number of Poor (ths)                                          0.242787   
Proportion of households that sought credit (%)              -0.413257   
Proportion of households that sought and access...           -0.291743   
Number of Households that sought credit (ths)                -0.511304   
Distribution of the Poor (%)                                  0.148212   
Poverty Gap (%)                                               0.890842   

                                                    Severity of Poverty (%)  \
Headcount Rate (%)                                                 0.845491   
Severity of Poverty (%)                                            1.000000   
Number of Poor (ths)                                               0.343458   
Proportion of hou

#### <span style = "color:brown;">Observations and Inferences</span>
1. **Severity_of_Poverty:** Has a _strong positive correlation_ with the `Headcount_Rate` and `Poverty_Gap`, and a _moderate positive correlation_ with the `Distribution_of_the_Poor` - important indicators.
2. **Poverty_Gap:** Has a _strong positive correlation_ with the `Headcount_Rate` and `Severity_of_Poverty`, and a moderate positive correlation with the `Distribution_of_the_Poor`.
3. **Distribution_of_the_Poor:** Has a moderate positive correlation with the `Severity_of_Poverty`, `Number of Poor (ths)`, and `Poverty_Gap` - may be a useful predictor.

### <span style = "color:green;">OLS Model</span>

In [7]:
# Implement the OLS model
X = data[['Headcount Rate (%)', 'Poverty Gap (%)', 'Distribution of the Poor (%)']]
y = data['Severity of Poverty (%)']

X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())

                               OLS Regression Results                              
Dep. Variable:     Severity of Poverty (%)   R-squared:                       0.882
Model:                                 OLS   Adj. R-squared:                  0.874
Method:                      Least Squares   F-statistic:                     107.1
Date:                     Wed, 10 May 2023   Prob (F-statistic):           5.64e-20
Time:                             15:16:52   Log-Likelihood:                -93.959
No. Observations:                       47   AIC:                             195.9
Df Residuals:                           43   BIC:                             203.3
Df Model:                                3                                         
Covariance Type:                 nonrobust                                         
                                   coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------

#### <span style = "color:brown;">Observations and Inferences</span>
1. The `R-squared` value of the model is `0.882`, which indicates that _`88.2%` of the variation in the dependent 
variable is explained by the independent variables in the model_. 
2. The `adjusted R-squared` value of `0.874` also suggests that _the model is a good fit for the data_. 
3. The `F-statistic` and its associated `p-value` indicate that _the model is statistically significant_.

`Poverty Gap (%)` has a `positive coefficient of 0.7962`, indicating that an _increase in poverty 
gap leads to an increase in severity of poverty_. 
<br>`Distribution of the Poor (%)` has a `negative coefficient of -0.1028`, indicating that _an 
increase in the proportion of the poor in more vulnerable sections of society leads to a decrease in the 
severity of poverty_ (interesting). 
<br>However, `Headcount Rate (%)` has a coefficient of only `0.0102`, which 
is `not statistically significant`, and therefore, we _cannot infer any significant relationship 
between headcount rate and severity of poverty_.

### <span style = 'color:green;'>Root Mean Square Error</span>

In [8]:
from sklearn.metrics import mean_squared_error

y_pred = model.predict(X)
rmse = mean_squared_error(y, y_pred, squared=False)
print('RMSE:', rmse)

RMSE: 1.7863860335702446


#### <span style = "color:brown;">Observations and Inferences</span>
The RMSE value of `1.7863860335702446` is **moderately high**, which suggests that _the model may not be 
suitable for accurately predicting the severity of poverty based on these independent variables 
alone_. 
<br>Therefore, it may be necessary to explore other variables that could be relevant to predicting 
poverty severity.

### <span style = 'color:green;'>Conclusion</span>
1. The variation in the severity of poverty can be explained by the independent variables in the model.
2. An increase in the poverty gap leads to a higher severity of poverty.
3. An increase in the proportion of the poor in more vulnerable sections of society is associated 
with a decrease in the severity of poverty.
4. There is insufficient evidence to support a significant relationship between headcount rate and 
severity of poverty.