In [1]:
# Inferential
# Task 5
# A company produces chocolade bars with a standard weight of 100 gr.
# As a measure of quality controls he weighs 15 bars and obtains the
# following results:
# 98.32,97.26,99.85,99.52,95.73,95.56,100.49,98.19,95.16,
# 98.26,96.46,100.23,99.76,98.58,97.43
# sample <- c(98.32,97.26,99.85,99.52,95.73,95.56,100.49,98.19,95.16,
# 98.26,96.46,100.23,99.76,98.58,97.43)
# mu0 <- 100
#(a) What is an appropriate hypothesis regarding the expected weight
# µ for a two-sided-test?
# H0: m0 == 100, H1: m0 != 100
# (b) If weights can be assumed to be nomally distributed, which test
#should used to test these hypothesis?
# an appropriate test would be t -test because the sd is unknown
# and we are testing mu
# (normal model)
# (c) Conduct the test that was suggested to be used in b) at a 5%
# level. What is your test decision. Specify the p-value.
# alpha <- 0.05;
# t.test(x = sample, mu = mu0, alternative="two.sided",
# conf.level = 1 -alpha)
# pvalue is = 0.0007251 which is much lower than alpha,
# so we are rejecting the null hypothesis
# (d) Based on the sample, the producer changes the settings in produc-
# tion. To check whether the correction has led to an improvement,
# he again takes 15 chocolate bars and weighs them.
# 100.14,100.05,96.51,98.70,98.22,101.06,103.55,100.16,
# 100.60,102.85,103.15,100.66,102.52,102.09,100.84
# What is an appropriate hypothesis for comparing the expected
# weights of the two samples?
# sample2 <- c(100.14,100.05,96.51,98.70,98.22,101.06,103.55,100.16,
# 100.60,102.85,103.15,100.66,102.52,102.09,100.84)
# (e) Provide an appropriate statistical test to test the hypothesis and
#perform at the 5% level. Assume that the variances of the popu-
# lations of the two samples are equal. What is your test decision?
# Specify the p-value.
# H0: mu1 >= mu2 , H1: mu1 < mu 2
# t.test(sample, sample2, alternative="less", paired = F, var.equal =T,
# conf.level = 1-alpha)
# pvalue = 0.0002228 much lesser than alpha -> reject H0
#(f) In question e) the population variances of the two samples are
#assumed to be equal. Verify that the variances are equal using an
#appropriate test at the 10% level.
# var.test(sample, sample2, alternative = "two.sided",
# conf.level = 1 -0.1)

(a) **Unbiased Estimator of Prevalence:**  
   - To demonstrate that \( X = \frac{m}{n} \) is an unbiased estimator of prevalence, we need to show that the expected value of \( X \) equals the true prevalence \( p \).
   - The true prevalence \( p \) is the proportion of individuals in the population who have the disease.
   - Since the sample is drawn randomly, each individual has the same probability of being selected, which is \( p \).
   - Therefore, the expected value of \( X \) is equal to \( p \), making \( X \) an unbiased estimator of prevalence.

(b) **Variance of the Estimator \( X \):**  
   - The variance of \( X \), denoted by \( \text{Var}(X) \), is calculated using the variance formula for the binomial distribution.
   - The variance \( \text{Var}(X) = \frac{p(1-p)}{n} \), where \( p \) is the true prevalence and \( n \) is the sample size.

(c) **Upper 95% Confidence Bound for Prevalence:**  
   - For a large enough sample size \( n \), a normal approximation can be used to calculate the upper 95% confidence bound for prevalence.
   - The standard error of \( X \) is \( \sqrt{\frac{p(1-p)}{n}} \), and the margin of error for a 95% confidence interval is \( 1.96 \times \sqrt{\frac{p(1-p)}{n}} \).
   - Adding the margin of error to \( X \) gives the upper bound: \( X + \text{ME} \).

(d) **Minimum Sample Size for Desired Confidence Bound:**  
   - To find the minimum sample size needed for the upper 95% confidence bound to be 0.01 greater than the estimate, an inequality is set up and solved for \( n \).
   - This involves iterating through possible values of \( n \) until the inequality holds, considering the constraint that prevalence is ≤0.1.
   - The normal approximation for the confidence bound is used in this calculation.


In [3]:
# (a) (b)
# this solution I have in onenote under WS22.4

In [None]:
# (c)

# what do we have:
# 95%, upper confidence bound
# sample n=200
# m = 12, have the disease
# we can also determine the frequency (p.hat):
# p.hat <- m/n
# 

In [1]:
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tibble)
install.packages("teachingdemos")
library(TeachingDemos)

# Inferential
# Task 5
# A company produces chocolate bars with a standard weight of 100 gr.
# As a measure of quality controls, he weighs 15 bars and obtains the
# following results:
sample <- c(98.32, 97.26, 99.85, 99.52, 95.73, 95.56, 100.49, 98.19, 95.16,
            98.26, 96.46, 100.23, 99.76, 98.58, 97.43)
mu0 <- 100

# (a) What is an appropriate hypothesis regarding the expected weight
# µ for a two-sided-test?
# H0: m0 == 100, H1: m0 != 100

# (b) If weights can be assumed to be normally distributed, which test
# should used to test these hypothesis?
# An appropriate test would be t-test because the standard deviation is unknown
# and we are testing mu (normal model)

# (c) Conduct the test that was suggested to be used in b) at a 5%
# level. What is your test decision? Specify the p-value.
alpha <- 0.05
test_result <- t.test(x = sample, mu = mu0, alternative = "two.sided",
                      conf.level = 1 - alpha)
p_value_c <- test_result$p.value
test_decision_c <- ifelse(p_value_c < alpha, "Reject H0", "Fail to reject H0")
print(test_result)
print(p_value_c)
print(test_decision_c)

# (d) Based on the sample, the producer changes the settings in production.
# To check whether the correction has led to an improvement,
# he again takes 15 chocolate bars and weighs them.
sample2 <- c(100.14, 100.05, 96.51, 98.70, 98.22, 101.06, 103.55, 100.16,
             100.60, 102.85, 103.15, 100.66, 102.52, 102.09, 100.84)

# (e) Provide an appropriate statistical test to test the hypothesis and
# perform at the 5% level. Assume that the variances of the populations
# of the two samples are equal. What is your test decision?
# Specify the p-value.
# H0: mu1 >= mu2 , H1: mu1 < mu2
test_result_e <- t.test(sample, sample2, alternative = "less", paired = FALSE,
                        var.equal = TRUE, conf.level = 1 - alpha)
p_value_e <- test_result_e$p.value
test_decision_e <- ifelse(p_value_e < alpha, "Reject H0", "Fail to reject H0")
print(test_result_e)
print(p_value_e)
print(test_decision_e)

# (f) In question e) the population variances of the two samples are
# assumed to be equal. Verify that the variances are equal using an
# appropriate test at the 10% level.
var_test_result <- var.test(sample, sample2, alternative = "two.sided",
                            conf.level = 1 - 0.1)
print(var_test_result)



Attache Paket: 'dplyr'


Die folgenden Objekte sind maskiert von 'package:stats':

    filter, lag


Die folgenden Objekte sind maskiert von 'package:base':

    intersect, setdiff, setequal, union


"package 'teachingdemos' is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages"
"Perhaps you meant 'TeachingDemos' ?"



	One Sample t-test

data:  sample
t = -4.306, df = 14, p-value = 0.0007251
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
 97.08371 99.02295
sample estimates:
mean of x 
 98.05333 

[1] 0.0007250832
[1] "Reject H0"

	Two Sample t-test

data:  sample and sample2
t = -3.9781, df = 28, p-value = 0.0002228
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -1.537796
sample estimates:
mean of x mean of y 
 98.05333 100.74000 

[1] 0.0002227782
[1] "Reject H0"

	F test to compare two variances

data:  sample and sample2
F = 0.8119, num df = 14, denom df = 14, p-value = 0.702
alternative hypothesis: true ratio of variances is not equal to 1
90 percent confidence interval:
 0.3268884 2.0165399
sample estimates:
ratio of variances 
         0.8119012 

