In [1]:
import numpy as np
from scipy import stats

### Problem Statement 1

Is gender independent of education level? 

A random sample of 395 people were surveyed and each person was asked to report the highest education level they
obtained. 

The data that resulted from the survey is summarized in the following table:

|   |High School | Bachelors | Masters | Ph.d. | Total|
|:-:|:----------:|:---------:|:-------:|:-----:|:----:|
|female|60|54|46|41|201|
|male|40|44|53|57|194|
|total|100|98|99|98|395|

Question: Are gender and education level dependent at 5% level of significance? 

In other words, given the data collected above, is there a relationship between the gender of an individual and the level of education that they have obtained?

## Solution

Hypothesis

$H_{0}$ : There is no dependency between gender and education level

$H_{1}$ : There is dependency between gender and education level

## Observed results

|   |High School | Bachelors | Masters | Ph.d. | Total|
|:-:|:----------:|:---------:|:-------:|:-----:|:----:|
|female|60|54|46|41|201|
|male|40|44|53|57|194|
|total|100|98|99|98|395|


## Expected Results

|   |High School | Bachelors | Masters | Ph.d. | Total|
|:-:|:----------:|:---------:|:-------:|:-----:|:----:|
|female|51|50|50|50|201|
|male|49|48|49|48|194|
|total|100|98|99|98|395|



In [2]:
ch1 = ((60 - 51) ** 2) / 51
ch2 = ((40 - 49) ** 2) / 49
ch3 = ((54 - 50) ** 2) / 50
ch4 = ((44 - 48) ** 2) / 48
ch5 = ((46 - 50) ** 2) / 50
ch6 = ((53 - 49) ** 2) / 49
ch7 = ((41 - 50) ** 2) / 50
ch8 = ((57 - 48) ** 2) / 48

chi = ch1 + ch2 + ch3 + ch4 + ch5 + ch6 + ch7 + ch8

chi

7.8486604641856745

Degrees of freedom = (2 - 1) * (4 - 1) = 3

The value of $\chi^{2}$ at 5% signifiance level for 3 degrees of freedom is 7.815.

Since the calculated value of $\chi^{2}$ is 7.848, which is greater than the critical value of 7.815, **we reject the null hypothesis**

***
***
***
### Problem Statement 2

Using the following data, perform a one way analysis of variance using α=.05. Write
up the results in APA format.

[Group1: 51, 45, 33, 45, 67]

[Group2: 23, 43, 23, 43, 45]

[Group3: 56, 76, 74, 87, 56]

### Solution

Hypothesis

$H_{0}$ : There is no significant difference between the three groups
$H_{1}$ : There is significant difference between the three groups

Significance level : 5%

In [3]:
data_1 = np.array([51, 45, 33, 45, 67])
data_2 = np.array([23, 43, 23, 43, 45])
data_3 = np.array([56, 76, 74, 87, 56])

mean_1             = np.mean(data_1)
deviation_1        = data_1 - mean_1
sq_deviation_1     = deviation_1 ** 2
sum_sq_deviation_1 = sum(sq_deviation_1)

mean_2             = np.mean(data_2)
deviation_2        = data_2 - mean_2
sq_deviation_2     = deviation_2 ** 2
sum_sq_deviation_2 = sum(sq_deviation_2)

mean_3             = np.mean(data_3)
deviation_3        = data_3 - mean_3
sq_deviation_3     = deviation_3 ** 2
sum_sq_deviation_3 = sum(sq_deviation_3)

mean_of_means = (mean_1 + mean_2 + mean_3) / 3

ssc = len(data_1) * (((mean_1 - mean_of_means) ** 2) + ((mean_2 - mean_of_means) ** 2) + ((mean_3 - mean_of_means) ** 2))

sse = sum_sq_deviation_1 + sum_sq_deviation_2 + sum_sq_deviation_3

msc = ssc / 2  # columns = 3, so df = (3 - 1)
mse = sse / 12 # number = 15, columns = 3, so df = (15 - 3)

f_score = msc / mse

The critical value of F for degree of freedom (2, 12) with 5% significance error is 3.89

Since the calculated f score is 9.75, which is greater than 3.89, **we can reject the Null Hypothesis**

***
***
***
### Problem Statement 3

Calculate F Test for given 10, 20, 30, 40, 50 and 5,10,15, 20, 25.

In [4]:
## Solution

data1 = np.array([10, 20, 30, 40, 50])

var1 = np.var(data1, ddof=1)
var1

250.0

In [5]:
data2 = np.array([5, 10, 15, 20, 25])

var2 = np.var(data2, ddof=1)
var2

62.5

In [6]:
# Use the larger variance as the numerator.

f_test_score = var1 / var2

print(f"The F test score = {f_test_score}")

The F test score = 4.0
