# Hypothesis Testing - Chi-Squared Goodness of Fit Test

### Objective
The main purpose is to provide an understanding and in-depth knowledge of the chi-square goodness of fit test.

## 1. Formulation of Hypotheses

**Null Hypothesis (H₀):** The proportions of the categories in the sample are equal to the proportions in the general population.

**Alternative Hypothesis (H₁):** The proportions of the categories in the sample are different from the proportions in the general population.

## 2. Comparison Table

| Significance Level | Sample Size | Degrees of Freedom | $X^2$-Statistic | Non-rejection Region | P-Value | Decision                   |
|---------------------|-------------|---------------------|-------------------------|----------------------|---------|----------------------------|
| **α = 0.05**            | 20          | 19                   | 0.554                   | (-0.0, 0.0)          | 0.7581  | Fail to Reject H₀         |
|                     | 50          | 49                   | 0.912                   | (-2.13e-15, 2.13e-15) | 0.6338  | Fail to Reject H₀         |
|                     | 100         | 99                   | 0.101                   | (-3.01e-15, 3.01e-15) | 0.9508  | Fail to Reject H₀         |
| **α = 0.01**            | 20          | 19                   | 0.554                   | (-0.0, 0.0)          | 0.7581  | Fail to Reject H₀         |
|                     | 50          | 49                   | 0.912                   | (-2.13e-15, 2.13e-15) | 0.6338  | Fail to Reject H₀         |
|                     | 100         | 99                   | 0.101                   | (-3.01e-15, 3.01e-15) | 0.9508  | Fail to Reject H₀         |

## 3. Brief Conclusions

- The chi-square test results indicate that, regardless of the sample size and significance level, there is not enough evidence to suggest that the proportions observed in the sample are different from the proportions in the general population.
- As the sample size increases, the chi-square test statistic tends to increase slightly, but the p-value remains consistently high, leading to a failure to reject the null hypothesis.
- Adjusting the significance level does not significantly affect the acceptance or rejection of the null hypothesis, as the p-values are consistently higher than the chosen alpha values.
- Overall, the analysis suggests that the proportions observed in the sample are likely to be representative of the proportions in the general population, as indicated by the failure to reject the null hypothesis across different sample sizes and significance levels.

In [1]:
import numpy as np
from scipy.stats import chi2_contingency

np.random.seed(42)

# Define the general population proportions
population_proportions = np.array([0.2, 0.1, 0.7])

# Function to generate a sample with proportions corresponding to the general population
def generate_sample(population_proportions, sample_size):
    sample = np.random.choice(len(population_proportions), size=sample_size, p=population_proportions)
    return sample

# Function to perform chi-square test
def chi_square_test(sample, population_proportions, alpha):
    # Count observed frequencies in the sample
    observed_counts = np.bincount(sample, minlength=len(population_proportions))
    
    # Calculate expected frequencies based on population proportions and sample size
    expected_counts = population_proportions * len(sample)
    
    # Reshape observed and expected counts into a contingency table format
    contingency_table = np.array([observed_counts, expected_counts])
    
    # Perform chi-square test
    chi2_stat, p_value, _, _ = chi2_contingency(contingency_table)
    
    # Calculate critical values for non-rejection region
    chi2_critical = np.sqrt(chi2_contingency([np.ones_like(observed_counts)*len(sample)/len(observed_counts), np.ones_like(expected_counts)*len(sample)/len(expected_counts)])[0])
    non_rejection_region = (chi2_critical * (-1), chi2_critical)
    
    # Print results
    print(f"Sample size: {len(sample)}")
    print(f"Significance level: {alpha}")
    print(f"Chi-square test statistic: {chi2_stat}")
    print(f"P-value: {p_value}")
    print(f"Non-rejection Region: {non_rejection_region}")
    
    # Compare p-value to significance level
    if p_value < alpha:
        print("Reject null hypothesis: There is significant evidence to suggest that the proportions are different from the general population.")
    else:
        print("Fail to reject null hypothesis: There is not enough evidence to suggest that the proportions are different from the general population.")

# Define significance levels
alpha_values = [0.05, 0.01]

# Perform tasks for different sample sizes
sample_sizes = [20, 50, 100]
for sample_size in sample_sizes:
    # Generate sample
    sample = generate_sample(population_proportions, sample_size)
    
    # Perform chi-square test for each significance level
    for alpha in alpha_values:
        chi_square_test(sample, population_proportions, alpha)
        print()  # Add empty line for readability

Sample size: 20
Significance level: 0.05
Chi-square test statistic: 0.5538461538461539
P-value: 0.7581128112377238
Non-rejection Region: (-0.0, 0.0)
Fail to reject null hypothesis: There is not enough evidence to suggest that the proportions are different from the general population.

Sample size: 20
Significance level: 0.01
Chi-square test statistic: 0.5538461538461539
P-value: 0.7581128112377238
Non-rejection Region: (-0.0, 0.0)
Fail to reject null hypothesis: There is not enough evidence to suggest that the proportions are different from the general population.

Sample size: 50
Significance level: 0.05
Chi-square test statistic: 0.912106135986733
P-value: 0.6337802027625075
Non-rejection Region: (-2.131628207280301e-15, 2.131628207280301e-15)
Fail to reject null hypothesis: There is not enough evidence to suggest that the proportions are different from the general population.

Sample size: 50
Significance level: 0.01
Chi-square test statistic: 0.912106135986733
P-value: 0.6337802027