## <span style = "color:#1A237E;">Hypothesis Four</span>
### <span style = "color:green;">Null Hypothesis</span>
There is no significant difference between the proportion of households that accessed credit between 
those that sought credit and those that did not seek credit
### <span style = "color:green;">Alternative Hypothesis</span>
There is a significant difference between the proportion of households that accessed credit between 
those that sought credit and those that did not seek credit
### <span style = "color:green;">Relevance</span>
This hypothesis aims to investigate the factors that affect credit access among Kenyan households, 
with a focus on the impact of seeking credit. It is hypothesized that households that have sought 
credit before are more likely to access credit in the future.

## <span style = "color:#1A237E;">Implementation</span>
To compare the proportions of households that sought credit and accessed credit between the two groups 
(households that sought credit and households that did not seek credit), we can use a __chi-square test__.
<br>First, we need to create a contingency table to summarize the data.
<br>We will use the "Proportion of households that sought credit (%)" column to divide the households 
into two groups: those that sought credit and those that did not seek credit. Then, we will use the 
"Proportion of households that sought and accessed credit (%)" column to determine how many 
households in each group accessed credit.
### <span style = "color:green;">Contingency Table</span>

In [1]:
# Import required libraries
import pandas as pd

In [2]:
# Load the data into a DataFrame
data = pd.read_csv('overall_poverty_est.csv')
data.head()

Unnamed: 0,residence_county,Headcount Rate (%),Distribution of the Poor (%),Poverty Gap (%),Severity of Poverty (%),Population (ths),Number of Poor (ths),Proportion of households that sought credit (%),Number of Households (ths),Proportion of households that sought and accessed credit (%),Number of Households that sought credit (ths),Distribution of the Poor (%).1,Poverty Gap (%).1,Severity of Poverty (%).1,Population (ths).1,Number of Poor (ths).1
0,Baringo,39.6,1.7,9.7,4.2,704,278,44.4,152,98.6,68,2.0,10.8,4.1,704,291
1,Bomet,48.8,2.7,9.3,2.8,916,447,19.5,179,83.8,35,2.1,5.6,1.6,916,300
2,Bungoma,35.7,3.4,9.5,3.6,1553,555,32.8,321,58.0,105,3.5,9.5,3.9,1553,503
3,Busia,69.3,3.6,22.3,9.3,840,583,5.5,177,62.2,10,3.4,17.5,7.2,840,500
4,Elgeyo/Marakwet,43.4,1.2,13.4,5.6,469,204,43.1,99,98.7,43,1.4,10.8,4.0,469,210


In [3]:
# Display statistical summary
data.describe()

Unnamed: 0,Headcount Rate (%),Distribution of the Poor (%),Poverty Gap (%),Severity of Poverty (%),Number of Poor (ths),Proportion of households that sought credit (%),Number of Households (ths),Proportion of households that sought and accessed credit (%),Number of Households that sought credit (ths),Distribution of the Poor (%).1,Poverty Gap (%).1,Severity of Poverty (%).1,Number of Poor (ths).1
count,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0,47.0
mean,40.557447,2.129787,12.085106,5.306383,349.042553,33.082979,242.808511,85.814894,81.851064,2.12766,10.314894,4.46383,309.297872
std,16.291085,1.109429,8.496751,5.254911,182.080125,16.18717,225.060343,16.561138,79.732972,1.142404,6.053852,3.569827,166.793795
min,16.7,0.2,2.4,0.5,36.0,5.5,30.0,33.9,4.0,0.2,3.0,0.8,25.0
25%,28.8,1.4,7.35,2.5,231.0,21.3,127.0,84.05,39.5,1.3,6.75,2.55,192.0
50%,35.8,2.0,9.4,3.5,321.0,32.9,210.0,92.5,69.0,2.0,9.1,3.5,287.0
75%,47.45,2.75,13.4,5.8,455.5,43.1,277.5,97.55,108.5,2.65,11.75,4.95,385.5
max,79.4,5.2,46.0,30.8,860.0,66.1,1503.0,99.2,510.0,4.9,32.9,20.4,717.0


In [4]:
# display the number of rows and columns
data.shape

(47, 16)

In [5]:
# display the columns
print(data.columns)

Index(['residence_county', 'Headcount Rate (%)',
       'Distribution of the Poor (%)', 'Poverty Gap (%)',
       'Severity of Poverty (%)', 'Population (ths)', 'Number of Poor (ths)',
       'Proportion of households that sought credit (%)',
       'Number of Households (ths)',
       'Proportion of households that sought and accessed credit (%)',
       'Number of Households that sought credit (ths)',
       'Distribution of the Poor (%).1', 'Poverty Gap (%).1',
       'Severity of Poverty (%).1', 'Population (ths).1',
       'Number of Poor (ths).1'],
      dtype='object')


In [6]:
# Create a contingency table
contingency_table = pd.crosstab(
    data['Proportion of households that sought credit (%)'] >= 50,
    data['Proportion of households that sought and accessed credit (%)'] >= 50
)
print(contingency_table)

Proportion of households that sought and accessed credit (%)  False  True 
Proportion of households that sought credit (%)                           
False                                                             3     35
True                                                              0      9


In [7]:
# Add labels to the rows and columns
contingency_table.index = ['Did not seek credit', 'Sought credit']
contingency_table.columns = ['Did not access credit', 'Accessed credit']

# Display the contingency table
print(contingency_table)

                     Did not access credit  Accessed credit
Did not seek credit                      3               35
Sought credit                            0                9


#### <span style = "color:brown;">Observations and Inference</span>
1. The probability of accessing credit given that an individual sought credit is 9/(9+0) = 1 
which means that all individuals who sought credit also accessed credit. 
2. The probability of seeking credit given that an individual did not 
access credit is 0/(0+35) = 0, which means that no individuals who did not access credit sought credit.

### <span style = "color:green;">Chi-Square Test</span>

In [8]:
# Import required libraries
from scipy.stats import chi2_contingency

In [9]:
# Perform the chi-square test
chi2, p_value, dof, expected = chi2_contingency(contingency_table)


In [10]:
# Display results
print("Chi-square test results:")
print('Chi-square statistic:', chi2)
print('p-value:', p_value)
print("Degrees of freedom: ", dof)

Chi-square test results:
Chi-square statistic: 0.012753632819422302
p-value: 0.910084479631301
Degrees of freedom:  1


#### <span style = "color:brown;">Observations and Inference</span>
The results of the chi-square test show that the chi-square statistic is 0.01275 and the p-value is 0.910. 
The p-value is greater than the typical significance level of 0.05, indicating that there is not enough 
evidence to reject the null hypothesis.

### <span style = "color:green;">One Tailed T-Test</span>

In [11]:
# Import relevant libraries 
from scipy import stats

In [12]:
# Perform t-test
t_stat, p_val = stats.ttest_ind_from_stats(
    mean1=contingency_table.loc[
        'Sought credit', 
        'Accessed credit'
    ] / contingency_table.loc['Sought credit'].sum(),
    std1=0,
    nobs1=contingency_table.loc['Sought credit'].sum(),
    mean2=contingency_table.loc[
        'Did not seek credit', 
        'Accessed credit'
    ] / contingency_table.loc['Did not seek credit'].sum(),
    std2=0,
    nobs2=contingency_table.loc['Did not seek credit'].sum())

# Print the results
print("t-test results:")
print("t-statistic: ", t_stat)
print("p-value: ", p_val)

t-test results:
t-statistic:  inf
p-value:  0.0


#### <span style = "color:brown;">Observations and Inferences</span>
1. A t-statistic of infinity and a p-value of 0 indicates a likelihood of very few observations 
in one of the groups, which can led to unstable estimates.
2. However, this can be interpreted as there being there is a significant difference between the 
means of the two groups being compared.

### <span style = "color:green;">Conclusions</span>
* Based on the contingency table and the conditional probabilities, there appears to be a relationship 
between seeking credit and accessing credit. 
* However, the results of the chi-square test are not consistent with this conclusion, likely due to the 
small sample size.