7. In 1861, 10 essays appeared in the New Orleans Daily Crescent. They
were signed “Quintus Curtius Snodgrass” and some people suspected
they were actually written by Mark Twain. To investigate this, we will
consider the proportion of three letter words found in an author's work.

From eight Twain essays we have:

.225 .262 .217 .240 .230 .229 .235 .217

From 10 Snodgrass essays we have:

.209 .205 .196 .210 .202 .207 .224 .223 .220 .201

(a) Perform a Wald test for equality of the means. Use the nonparametric plug-in estimator. Report the p-value and a 95 per cent confidence
interval for the difference of means. What do you conclude?

# **(b) Now use a permutation test to avoid the use of large sample methods. What is your conclusion? (Brinegar (1963)).**

In [1]:
import numpy as np

twain = np.array([0.225, 0.262, 0.217, 0.240, 0.230, 0.229, 0.235, 0.217])
snodgrass = np.array([0.209, 0.205, 0.196, 0.210, 0.202, 0.207, 0.224, 0.223, 0.220, 0.201])

n_twain = len(twain)
n_snodgrass = len(snodgrass)

mean_twain = np.mean(twain)
mean_snodgrass = np.mean(snodgrass)

combined_data = np.concatenate([twain, snodgrass])
observed_difference = mean_twain - mean_snodgrass
num_permutations = 50000
differences = []

np.random.seed(28)
for _ in range(num_permutations):
    np.random.shuffle(combined_data)
    perm_twain = combined_data[:n_twain]
    perm_snodgrass = combined_data[n_twain:]
    difference = np.mean(perm_twain) - np.mean(perm_snodgrass)
    differences.append(difference)


diffs = np.array(differences)
p_value_permutation = np.mean(np.abs(differences) >= np.abs(observed_difference))

print("Observed Difference:", observed_difference)
print("p-value from Permutation Test:", p_value_permutation)

Observed Difference: 0.022175
p-value from Permutation Test: 0.0007


## Conclusion:
Because the p-value is significantly less than 0.05, there is a **strong evidence to reject the null hypothesis**. This result aligns with the Wald test (almost the same value), which means that there is a difference between the mean proportions of three-letter words in Twain's and Snodgrass's essays.

11. A randomized, double-blind experiment was conducted to assess the
effectiveness of several drugs for reducing postoperative nausea. The
data are as follows.

---

Number of Patients Incidence of Nausea

Placebo 80 45

Chlorpromazine 75 26

Dimenhydrinate 85 52

Pentobarbital (100 mg) 67 35

Pentobarbital (150 mg) 85 37

---

(a) Test each drug versus the placebo at the 5 per cent level. Also, report
the estimated odds–ratios. Summarize your findings.

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

placebo = np.array([45, 35])
chlorpromazine = np.array([26, 49])
dimenhydrinate = np.array([52, 33])
pentobarbital_100 = np.array([35, 32])
pentobarbital_150 = np.array([37, 48])

contingency_tables = {
    "Chlorpromazine": np.array([placebo, chlorpromazine]),
    "Dimenhydrinate": np.array([placebo, dimenhydrinate]),
    "Pentobarbital (100 mg)": np.array([placebo, pentobarbital_100]),
    "Pentobarbital (150 mg)": np.array([placebo, pentobarbital_150])
}

chi2_results = {}
odds_ratios = {}

for drug, table in contingency_tables.items():
    chi2, p, dof, expected = chi2_contingency(table, correction=False)
    chi2_results[drug] = (chi2, p)

    a, b = table[1]
    c, d = table[0]
    odds_ratio = (a / b) / (c / d)
    odds_ratios[drug] = odds_ratio

    observed = table.flatten()
    expected_flat = expected.flatten()
    oi_minus_ei = observed - expected_flat
    oi_minus_ei_sq = oi_minus_ei ** 2
    oi_minus_ei_sq_div_ei = oi_minus_ei_sq / expected_flat

    chi_square_df = pd.DataFrame({
        "Observed (Oi)": observed,
        "Expected (Ei)": expected_flat,
        "Oi – Ei": oi_minus_ei,
        "(Oi – Ei)²": oi_minus_ei_sq,
        "(Oi – Ei)² / Ei": oi_minus_ei_sq_div_ei
    })

    print(f"--- {drug} ---")
    print(f"Contingency Table:\n{table}")
    print(f"\nChi-square Components:\n{chi_square_df}")
    print(f"\nChi-square Statistic: {chi2}")
    print(f"p-value: {p}")
    print(f"Odds Ratio: {odds_ratio}")
    print(f"Significant: {p < 0.05}\n")


--- Chlorpromazine ---
Contingency Table:
[[45 35]
 [26 49]]

Chi-square Components:
   Observed (Oi)  Expected (Ei)   Oi – Ei  (Oi – Ei)²  (Oi – Ei)² / Ei
0             45      36.645161  8.354839    69.80333         1.904844
1             35      43.354839 -8.354839    69.80333         1.610047
2             26      34.354839 -8.354839    69.80333         2.031834
3             49      40.645161  8.354839    69.80333         1.717384

Chi-square Statistic: 7.264108959311418
p-value: 0.007034616082374464
Odds Ratio: 0.4126984126984127
Significant: True

--- Dimenhydrinate ---
Contingency Table:
[[45 35]
 [52 33]]

Chi-square Components:
   Observed (Oi)  Expected (Ei)   Oi – Ei  (Oi – Ei)²  (Oi – Ei)² / Ei
0             45      47.030303 -2.030303     4.12213         0.087648
1             35      32.969697  2.030303     4.12213         0.125028
2             52      49.969697  2.030303     4.12213         0.082493
3             33      35.030303 -2.030303     4.12213         0.117673

# Conclusion:
The results says that **Chlorpromazine** significantly reduces the incidence of nausea compared to the placebo (p = 0.007 which is significantly less than 0.05, OR = 0.413 which is lower than 1), indicating a strong protective effect.

While other drugs **Dimenhydrinate** (p = 0.521 which is significantly higher than 0.05, OR = 1.226 which is higher than 1), **Pentobarbital (100 mg)** (p = 0.627 which is significantly higher than 0.05, OR = 0.851 which is lower than 1), and **Pentobarbital (150 mg)** (p = 0.102 which is higher than 0.05, OR = 0.600 which is lower than 1) show no significant difference from the placebo in reducing nausea.

12. Let X1, ..., Xn ∼ Poisson(λ).

(a) Let λ0 > 0. Find the size α Wald test for

H0 : λ = λ0 versus H1 : λ != λ0.

# **(b) (Computer Experiment.) Let λ0 = 1, n = 20 and α = .05. Simulate X1,...,Xn ∼ Poisson(λ0) and perform the Wald test. Repeat many times and count how often you reject the null. How close is the type I error rate to .05?**

In [4]:
import scipy.stats as stats

lambda_0 = 1
n = 20
alpha = 0.05
num_simulations = 50000
critical_value = stats.norm.ppf(1 - alpha / 2)

rejections = 0
np.random.seed(28)

for _ in range(num_simulations):
    sample = np.random.poisson(lambda_0, n)
    sample_mean = np.mean(sample)
    wald_stat = (sample_mean - lambda_0) / np.sqrt(lambda_0 / n)
    if abs(wald_stat) > critical_value:
        rejections += 1

type_i_error_rate = rejections / num_simulations

print(f"Wald Test Rejections: {rejections}")
print(f"Type I Error Rate: {type_i_error_rate}")
print(f"Closeness of type I Error Rate to 0.05: {type_i_error_rate - 0.05}")

Wald Test Rejections: 2738
Type I Error Rate: 0.05476
Closeness of type I Error Rate to 0.05: 0.00476


# Conclusion:
The Type I Error Rate is **very close** to the significance level of 0.05. It means that the **Wald test performs well under the null hypothesis**, maintaining the expected Type I error rate within an acceptable range.