# Problem

A coffee manufacturer is interested in estimating the difference in the average daily coffee consumption of regular-coffee drinkers and decaffeinated-coffee drinkers. Its researcher randomly selects 13 regular-coffee drinkers and asks how many cups of coffee per day they drink. He randomly locates 15 decaffeinated-coffee drinkers and asks how many cups of coffee per day they drink. The average for the regular-coffee drinkers is 4.35 cups, with a standard deviation of 1.20 cups. The average for the decaffeinated-coffee drinkers is 6.84 cups, with a standard deviation of 1.42 cups. The researcher assumes, for each population, that the daily consumption is normally distributed.

# To Find

Task 1	Calculate the mean and standard deviation of the dataset <br>
Task 2	Determine the appropriate statistic to use<br>
Task 3	Calculate the 95% confidence interval to estimate the differences in the avreages of the two populations.<br>
Task 4	Interpret the result<br>


# Solutions

# Given

In [19]:
#Sample statistics for regular-coffee drinkers

n_reg_coffee = 13
avg_reg_coffee = 4.35
std_reg_coffee = 1.20

#Sample statistics for decaffeinated-coffee drinkers

n_decaf_coffee = 15
avg_decaf_coffee = 6.84
std_decaf_coffee = 1.42

#Daily consumption is normally distributed

# Task 2

**The appropriate statistic to use for the given coffee question is a two-sample t-test. This is because we are comparing the means of two independent groups (regular-coffee drinkers and decaffeinated-coffee drinkers) with continuous data (number of cups of coffee per day). Additionally, we are assuming that the daily consumption of coffee is normally distributed for each population, which satisfies the assumptions for using a t-test.**

# Task 3

In [20]:
import numpy as np
from scipy.stats import t

#Pooled standard deviation
pooled_std = np.sqrt(((n_reg_coffee-1)*(std_reg_coffee**2) + (n_decaf_coffee-1)*(std_decaf_coffee**2)) / (n_reg_coffee+n_decaf_coffee-2))

#t-value for 95% confidence interval
alpha = 0.05                              #Tolerance
df = n_reg_coffee + n_decaf_coffee - 2    #Degrees of freedom
t_crit = t.ppf(1 - alpha/2, df)           #Critical t-value using Percent Point Function(t.ppf)

#Required confidence interval
diff = avg_reg_coffee - avg_decaf_coffee
std_error = pooled_std * np.sqrt(1/n_reg_coffee + 1/n_decaf_coffee)
lower = diff - t_crit * std_error
upper = diff + t_crit * std_error

print("95% confidence interval for difference in average of 2 populations is:", (lower, upper))

95% confidence interval for difference in average of 2 populations is: (-3.520505338977948, -1.4594946610220523)


# Task 4

**The 95% confidence interval calculated for the difference in the average daily coffee consumption between regular-coffee drinkers and decaffeinated-coffee drinkers is (-3.520505338977948, -1.4594946610220523) cups of coffee per day.**

This means that we can be 95% confident that the true difference in the average daily coffee consumption between the two groups falls within this range. We can say that we are 95% confident that decaffeinated-coffee drinkers consume between 1.45 and 3.52 cups of coffee per day more on average than regular-coffee drinkers.

Since the confidence interval does not include zero, we can conclude that the difference in the average daily coffee consumption between the two groups is statistically significant at the 0.05 level of significance. This suggests that decaffeinated-coffee drinkers may consume significantly more coffee on average than regular-coffee drinkers. However, **it is important to note that this is based on a sample of individuals and not the entire population of regular-coffee drinkers and decaffeinated-coffee drinkers**, so we should exercise caution when generalizing these results.

If we would have done the exercise on all of the population, the difference could have been more exact. As in our case the **difference is quiet huge i.e. 3.52 - 1.45 = 2.07 cups.**

# Optional Task

# 90% CI

In [21]:
#t-value for 90% confidence interval
alpha = 0.10                              #Tolerance
df = n_reg_coffee + n_decaf_coffee - 2    #Degrees of freedom
t_crit = t.ppf(1 - alpha/2, df)           #Critical t-value using Percent Point Function(t.ppf)

#Required confidence interval
diff = avg_reg_coffee - avg_decaf_coffee
std_error = pooled_std * np.sqrt(1/n_reg_coffee + 1/n_decaf_coffee)
lower = diff - t_crit * std_error
upper = diff + t_crit * std_error

print("90% confidence interval for difference in average of 2 populations is:", (lower, upper))

90% confidence interval for difference in average of 2 populations is: (-3.345083045528577, -1.6349169544714235)


**By finding 90% Confidence Interval, we found that the difference is reduced as we are only 90% sure of this found value.**

# 99% CI

In [22]:
#t-value for 99% confidence interval
alpha = 0.01                             #Tolerance
df = n_reg_coffee + n_decaf_coffee - 2    #Degrees of freedom
t_crit = t.ppf(1 - alpha/2, df)           #Critical t-value using Percent Point Function(t.ppf)

#Required confidence interval
diff = avg_reg_coffee - avg_decaf_coffee
std_error = pooled_std * np.sqrt(1/n_reg_coffee + 1/n_decaf_coffee)
lower = diff - t_crit * std_error
upper = diff + t_crit * std_error

print("99% confidence interval for difference in average of 2 populations is:", (lower, upper))

99% confidence interval for difference in average of 2 populations is: (-3.883062102764934, -1.096937897235066)


**By finding 99% Confidence Interval, we found that the difference is increased as we are 99% sure of this found value.**