## Proportion Value Test: Task 1

An internet provider plans a special advertising campaign in regions with below-average internet usage. The criterion should be a proportion of at most 70% of households with internet access. To select suitable regions, random samples of 100 households each are taken. The test is based on a significance level of 1%.

In [1]:
# importing all the necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
import math

### 1. Check if the conditions for the normal distribution are met and determine the parameters of this distribution.

In [3]:
# We are questioning households whether they have an internet access or not. Furthermore, we are not questioning the same household twice (n draws without replacement).
# This means we have a hypergeometrical distribution.
# To check, if we can approximate the hyper distribution through the normal distribution, the following condition must be met:

# n * p * (1 - p) >= 9

n = 100
p = 0.7

norm_condition = round(n * p * (1 - p))

if norm_condition >= 9:
    print(f"n * p * (1 - p) = {norm_condition} and that is greater than 9   => Therefore the condition is met" )
else: print("The condition is not met")

n * p * (1 - p) = 21 and that is greater than 9   => Therefore the condition is met


In [4]:
# Since the condition is met, the parameters for the norm distribution are:

mu = round(n * p) 
var = round(n * p * (1 - p))

print(f"The parameters of the normal distribution function are mu = {mu} and var = {var}.")

The parameters of the normal distribution function are mu = 70 and var = 21.


### 2. Which hypothesis would be formulated by a rather risk-averse provider, and which by a rather cautious provider?

A risk-averse provider does not want to miss out on regions for the advertisement. Therefore, he assumes that the proportion of households with an internet access is at most 70%.

A cautious provider on the other hand, does not want to waste resources in regions, where it would be unprofitable. He therefore assumes that the proportion is 70% or more.

So the Hypothesis for the risk-averse provider would be:

H0: p <= 0.7

H1: p > 0.7

And for the cautious provider, it would be:

H0: p >= 0.7

H1: p < 0.7

###  3. Determine the acceptance or rejection bounds that would lead to the start of the advertising campaign in each case.

##### Risk-Averse Provider

Since H0 is p <= 0.7, we are looking for the upper bound of the confidence interval.

In [5]:
n = 100
p_0 = 0.7
alpha = 0.01
sigma_0 = math.sqrt((p_0*(1-p_0)/n))

z = -norm.isf(q=1-alpha, loc=0, scale=1)

CI_upper_bound = p_0 + (z * sigma_0)

print(f"The risk-averse provider would start the campaign, if p is lower than {CI_upper_bound}.")

The risk-averse provider would start the campaign, if p is lower than 0.8066066522559174.


##### Cautious Provider

Since H0 is p >= 0.7, we are looking for the lower bound of the confidence interval.

In [6]:
n = 100
p_0 = 0.7
alpha = 0.01
sigma_0 = math.sqrt((p_0*(1-p_0)/n))

z = -norm.isf(q=1-alpha, loc=0, scale=1)

CI_lower_bound = p_0 - (z * sigma_0)

print(f"The cautious provider would start the campaign, if p is greater than {CI_lower_bound}.")

The cautious provider would start the campaign, if p is greater than 0.5933933477440825.


### 4. What conclusions can be drawn regarding the acceptance or rejection of the hypotheses after drawing a sample with a proportion value p=0.68?

##### Risk-Averse Provider

H0: p <= 0.7

H1: p > 0.7

In [8]:
p = 0.68

if p < CI_upper_bound:
    print("The null hypotheses with the assumption that the percentage of households with an internet access is at most 70% cannot be rejected.\nTherefore, this region is eligible for the campaign.")
else:
    print("The null hypotheses with the assumption that the percentage of households is at most 70% must be rejected. This region is not eligible for the campaign.")

The null hypotheses with the assumption that the percentage of households with an internet access is at most 70% cannot be rejected.
Therefore, this region is eligible for the campaign.


##### Cautious Provider

H0: p >= 0.7

H1: p < 0.7

In [10]:
p = 0.68

if p > CI_lower_bound:
    print("The null hypotheses with the assumption that the percentage of households with an internet access is greater than 70% cannot be rejected.\nTherefore, this region is not suitable for the campaign.")
else:
    print("The null hypotheses with the assumption that the percentage of households is greater than 70% must be rejected. This region is suitable for the campaign.")

The null hypotheses with the assumption that the percentage of households with an internet access is greater than 70% cannot be rejected.
Therefore, this region is not suitable for the campaign.


## Proportion Value Test; Task 2

In the state of Texas, the illiteracy rate in the population was only 20% a few years ago. The government at that time aimed to reduce this rate through an intensive support program. After two years, the government announced that the program was already bearing fruit and the illiteracy rate had significantly decreased.

The opposition randomly selected 350 people for a test, 49 of whom did not pass. Conduct a hypothesis test at a significance level of 0.05, which allows for a clear statement about the reduction of the rate.

Does it make sense to test the hypothesis π ≤ 0.2 for acceptance or the hypothesis π ≥ 0.2 for rejection? What conclusion do you come to?

In [None]:
# We again have a hypergeometrical distribution, because we draw without replacement.

# Let's check, if we can approximate through the normal distribution.

n = 350
p = 49/350
p_0 = 0.2

approx = round(n * p_0 * (1 - p_0))

if approx > 9:
    print(f'n * p_0 * (1 - p_0) = {approx} > 9    =>  approximation via normal distribution eligible!')
else:
    print(f'n * p_0 * (1 - p_0) = {approx} < 9    =>  approximation via normal distribution not eligible!')

n * p_0 * (1 - p_0) = 56 > 9    =>  approximation via normal distribution eligible!


If we test for p <= 0.2, we are trying to prove, that the current illiteracy rate is at most 20%, which is not what we are trying to do.

Our attempt is to actually prove that the illiteracy rate decreased and the program was successful. 

That's why we have to disprove that p is still 20% or higher.
Therefore, we assume that p < 0.2.

Consequently, our hypotheses are as follows:

H0: p >= 0.2

H1: p < 0.2


Therefore, we are looking for the lower bound of the confidence interval.

In [11]:
# computing the lower bound 

n = 350
alpha = 0.05
p_0 = 0.2
sigma_0 = math.sqrt((p_0*(1-p_0))/n)

z = -norm.isf(q= 1-alpha, loc=0, scale=1)

CI_lower_bound = p_0 - (z * sigma_0)

# testing our hypotheses for rejection

p = 49/350

if p < CI_lower_bound:
    print(f'The illiteracy rate is {p} and therefore lower than the lower bound of {CI_lower_bound}. Hence, H0 must be rejected. The program was successful and the illiteracy rate decreased.')
else:
    print(f'The illiteracy rate is {p} and therefore greater than the lower bound. H0 is not to be rejected. The programme was not successful and the illiteracy rate is still 20% or even higher.')


The illiteracy rate is 0.14 and therefore lower than the lower bound of 0.16483155015174353. Hence, H0 must be rejected. The program was successful and the illiteracy rate decreased.
