# Ali Foroozmand

# Homework 7

# Confidence Intervals

## Airline meals must look appetizing and attractive to the general public. For this, the airline industry spends considerable research time on the latest trends in food presentation. However, because of space, both the volume and weight of a meal are particularly important.

## A study focused on estimating the mean $\mu$ (g) of the normally distributed variable "weight of sausage on packaged breakfast" (Source: Ansett Airlines).

## As this was part of an on-going monitoring program, there was prior knowledge of the standard deviation of similar dataset. It was assumed $\sigma = 7$.

### Question 1: Construct a 99% confidence interval for $\mu$ assuming that $\sigma = 7$. The weight in grams of the "sausage component of breakfast" was measured to be:

| | | | | | | | | | |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|14|15|16|17|17|17|18|18|19|19|
|20|21|22|22|22|23|23|25|25|26|
|28|29|30|30|30|32|33|26|38|43|

### Remember that:

| Level of Confidence | z$_c$|
|:-:|:-:|
|0.80|1.28|
|0.90|1.645|
|0.95|1.96|
|0.99|2.576|
|0.999|3.290|


In [1]:
import numpy as np

sample = np.array([14,15,16,17,17,17,18,18,19,19,20,21,22,22,22,23,23,25,25,26,28,29,30,30,30,32,33,26,38,43])

sigma = 7 
z_loc = 2.576 #level of confidence 
N = len(sample)

x_bar = np.mean(sample)

confidence_interval = x_bar + z_loc* sigma / np.sqrt(N) , x_bar - z_loc* sigma / np.sqrt(N)

print (f"confidence interval is = {confidence_interval}")

confidence interval is = (27.225511052311052, 20.641155614355615)


### Question 2: Determine the (approximate) minimum sample size required to be 99% sure that the true average weight $\mu$ of all sausages will not differ by more than 3 grams from an observed sample mean weight.

#### To determine the minimum sample size $N$ required for a 99% confidence level, we use the following formula:

- $ n =(z . σ/ E)^2$

In [2]:
import math

E = 3 #g  # maximum allowable margin of error

n_min = (z_loc * sigma / E) ** 2
n_min_rounded = math.ceil(n_min)

print (f"N_min is roughly = {n_min_rounded}")

N_min is roughly = 37


# Wilcoxon-Mann-Whitney U test

## In 2012, an average Austrian's annual income was $\sim$ 25400 Euro.

###  Question 3: Write your own Python function that calculates the Wilcoxon-Mann-Whitney U test to determine, with probability of error $<$5%, if the samples shown below could have been drawn from the same population.





### Table 1: Annual income in Euro of 13 randomly selected Austrian people

| | | | |
|:-:|:-:|:-:|:-:|
|15171|12274|12707|14098|
|24872|13823|21178|13847|
|16877|12521|47137|14742|
|22091| | | |

             
### Table 2: Annual income in Euro of 13 randomly selected (Austrian?) people

| | | | |
|:-:|:-:|:-:|:-:|
|16895|43307|24688|32949|
|20808|19743|17601|21829|
|18278|17274|21439|25643|
|27836| | | |

### *Hint: For samples larger than N1=N2=8, Instead of using the U-table like in the example, you can calculate a Z-score from the sample using the following formula:*

\begin{equation}
Z_1 = \frac{T_1 - \frac{N_1(N1+N2+1)}{2}}{\sqrt{\frac{N1N2(N1+N2+1)}{12}}}
\end{equation}

\begin{equation}
Z_2 = \frac{T_2 - \frac{N_2(N1+N2+1)}{2}}{\sqrt{\frac{N1N2(N1+N2+1)}{12}}}
\end{equation}

### *and then find the p-value using a built-in Python function:*

#### p_values = scipy.stats.norm.sf(abs(z_scores))\*2 #twosided
#### help(scipy.stats.norm.sf)


In [3]:
from scipy.stats import norm

#data
table1 = np.array([15171, 12274, 12707, 14098, 24872, 13823, 21178, 13847, 16877, 12521, 47137, 14742, 22091])
table2 = np.array([16895, 43307, 24688, 32949, 20808, 19743, 17601, 21829, 18278, 17274, 21439, 25643, 27836])

def wmwu(sample1, sample2, alpha=0.05): # wilcoxon_mann_whitney_u_test
    
    combined = np.concatenate([sample1, sample2])
    ranks = np.argsort(np.argsort(combined)) + 1  # Rank starts at 1
    
   
    ranks1 = ranks[:len(sample1)]
    ranks2 = ranks[len(sample1):]
    
   
    T1 = np.sum(ranks1)
    T2 = np.sum(ranks2)
    
    
    N1, N2 = len(sample1), len(sample2)
    
    #Z-scores
    mean_rank1 = N1 * (N1 + N2 + 1) / 2
    mean_rank2 = N2 * (N1 + N2 + 1) / 2
    std_rank = np.sqrt(N1 * N2 * (N1 + N2 + 1) / 12)
    
    Z1 = (T1 - mean_rank1) / std_rank
    Z2 = (T2 - mean_rank2) / std_rank
    
    # Two-sided p-value
    p_value = norm.sf(abs(Z1)) * 2  # Same for Z2 because Z1 and Z2 are complementary
    
    
    result = "Reject null hypothesis" if p_value < alpha else "Fail to reject null hypothesis"
    
    return {
        "T1": T1,
        "T2": T2,
        "Z1": Z1,
        "Z2": Z2,
        "p_value": p_value,
        "result": result
    }

# Perform the test
result = wmwu(table1, table2)
print("Wilcoxon-Mann-Whitney U Test Results:")
print(f"T1: {result['T1']},             T2: {result['T2']}")
print(f"Z1: {result['Z1']:.2f},         Z2: {result['Z2']:.2f}")
print(f"p-value: {result['p_value']:.4f}")
print(f"Conclusion: {result['result']}")

Wilcoxon-Mann-Whitney U Test Results:
T1: 127,             T2: 224
Z1: -2.49,         Z2: 2.49
p-value: 0.0129
Conclusion: Reject null hypothesis
