Q1. To calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5 using Python, you can use the `scipy.stats` module. Here's how you can do it:

```python
from scipy import stats

# Sample data
mean = 50
std_dev = 5
sample_size = 100  # Assuming a sample size of 100 for demonstration

# Calculate the standard error
std_error = std_dev / (sample_size ** 0.5)

# Calculate the margin of error
margin_of_error = stats.norm.ppf(0.975) * std_error  # For a 95% confidence interval, z = 1.96

# Calculate the confidence interval
lower_bound = mean - margin_of_error
upper_bound = mean + margin_of_error

print("95% Confidence Interval:", (lower_bound, upper_bound))
```

Interpretation: With 95% confidence, we estimate that the population mean lies between the lower bound and upper bound of the confidence interval, which is approximately (48.2, 51.8).

Q2. To conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag matches the expected distribution, you can use the `scipy.stats` module as well. Here's how you can do it:

```python
from scipy.stats import chisquare

# Observed frequencies
observed = [observed_blue_count, observed_orange_count, observed_green_count, observed_yellow_count, observed_red_count, observed_brown_count]

# Expected frequencies (assuming equal counts for each color)
expected = [0.2 * total_mms_count] * 5 + [0.2 * total_mms_count]

# Perform chi-square goodness of fit test
chi2_stat, p_value = chisquare(observed, expected)

print("Chi-square statistic:", chi2_stat)
print("p-value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The distribution of M&Ms colors in the bag does not match the expected distribution.")
else:
    print("Fail to reject the null hypothesis. The distribution of M&Ms colors in the bag matches the expected distribution.")
```

Q3. To calculate the chi-square statistic and p-value for a contingency table, you can use the `scipy.stats` module as well. Here's how you can do it:

```python
from scipy.stats import chi2_contingency

# Contingency table
observed = [[20, 15],
            [10, 25],
            [15, 20]]

# Perform chi-square test
chi2_stat, p_value, dof, expected = chi2_contingency(observed)

print("Chi-square statistic:", chi2_stat)
print("p-value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant association between Group A and Group B.")
else:
    print("Fail to reject the null hypothesis. There is no significant association between Group A and Group B.")
```

Let me know if you need further clarification or assistance!

Q4. To calculate the 95% confidence interval for the true proportion of individuals in the population who smoke, you can use the normal approximation method for binomial proportions. Here's how you can do it in Python:

```python
import numpy as np
from scipy.stats import norm

# Sample size
n = 500

# Number of individuals who smoke
x = 60

# Proportion of smokers
p_hat = x / n

# Calculate the standard error
std_error = np.sqrt(p_hat * (1 - p_hat) / n)

# Calculate the margin of error
margin_of_error = norm.ppf(0.975) * std_error  # For a 95% confidence interval

# Calculate the confidence interval
lower_bound = p_hat - margin_of_error
upper_bound = p_hat + margin_of_error

print("95% Confidence Interval for Proportion of Smokers:", (lower_bound, upper_bound))
```

Interpretation: With 95% confidence, we estimate that the true proportion of individuals in the population who smoke lies between the lower bound and upper bound of the confidence interval.

Q5. To calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation of 12, you can use the formula for the confidence interval for a population mean. Here's how you can do it in Python:

```python
import numpy as np
from scipy.stats import norm

# Sample mean and standard deviation
mean = 75
std_dev = 12

# Sample size (assuming a large sample size for the normal approximation)
n = 100

# Calculate the standard error
std_error = std_dev / np.sqrt(n)

# Calculate the margin of error
margin_of_error = norm.ppf(0.95) * std_error  # For a 90% confidence interval

# Calculate the confidence interval
lower_bound = mean - margin_of_error
upper_bound = mean + margin_of_error

print("90% Confidence Interval for Population Mean:", (lower_bound, upper_bound))
```

Interpretation: With 90% confidence, we estimate that the true population mean lies between the lower bound and upper bound of the confidence interval.

Q6. To plot the chi-square distribution with 10 degrees of freedom and shade the area corresponding to a chi-square statistic of 15, you can use matplotlib. Here's how you can do it in Python:

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Degrees of freedom
df = 10

# Chi-square statistic
chi_stat = 15

# Generate x values for the chi-square distribution
x = np.linspace(0, 30, 1000)

# Plot the chi-square distribution
plt.plot(x, chi2.pdf(x, df), 'b-', label=f'Chi-square Distribution (df={df})')

# Shade the area corresponding to the chi-square statistic of 15
plt.fill_between(x, 0, chi2.pdf(x, df), where=(x >= chi_stat), color='red', alpha=0.5, label=f'Chi-square Statistic = {chi_stat}')

# Label axes and add legend
plt.xlabel('Chi-square Statistic')
plt.ylabel('Probability Density')
plt.legend()

# Show plot
plt.show()
```

This code will generate a plot of the chi-square distribution with 10 degrees of freedom and shade the area corresponding to the chi-square statistic of 15.

Q7. To calculate a 99% confidence interval for the true proportion of people in the population who prefer Coke, you can use the normal approximation method for binomial proportions. Here's how you can do it in Python:

```python
import numpy as np
from scipy.stats import norm

# Sample size
n = 1000

# Number of people who prefer Coke
x = 520

# Proportion of people who prefer Coke
p_hat = x / n

# Calculate the standard error
std_error = np.sqrt(p_hat * (1 - p_hat) / n)

# Calculate the margin of error
margin_of_error = norm.ppf(0.995) * std_error  # For a 99% confidence interval

# Calculate the confidence interval
lower_bound = p_hat - margin_of_error
upper_bound = p_hat + margin_of_error

print("99% Confidence Interval for Proportion of People Preferring Coke:", (lower_bound, upper_bound))
```

Interpretation: With 99% confidence, we estimate that the true proportion of people in the population who prefer Coke lies between the lower bound and upper bound of the confidence interval.

Q8. To conduct a chi-square goodness of fit test to determine if the observed frequencies match the expected frequencies of a fair coin, you can use the chi2_contingency function from the scipy.stats module. Here's how you can do it in Python:

```python
from scipy.stats import chi2_contingency

# Observed frequencies (tails)
observed = [45, 55]  # 45 tails, 55 heads

# Expected frequencies (fair coin)
expected = [50, 50]  # 50 tails, 50 heads

# Perform chi-square goodness of fit test
chi2_stat, p_value = chi2_contingency([observed, expected])

# Print results
print("Chi-square Statistic:", chi2_stat)
print("p-value:", p_value)

# Determine significance
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The coin is biased towards tails.")
else:
    print("Fail to reject the null hypothesis: The coin is not biased towards tails.")
```

Interpretation: The chi-square statistic measures the discrepancy between the observed and expected frequencies. The p-value indicates the probability of observing such a discrepancy if the coin were fair. If the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis that the coin is fair and conclude that it is biased towards tails. Otherwise, we fail to reject the null hypothesis.

Q9. A study was conducted to determine if there is an association between smoking status (smoker or
non-smoker) and lung cancer diagnosis (yes or no). The results are shown in the contingency table below.
Conduct a chi-square test for independence to determine if there is a significant association between
smoking status and lung cancer diagnosis.

Use a significance level of 0.05.

To conduct a chi-square test for independence for the given contingency table, you can use the `chi2_contingency` function from the `scipy.stats` module. Here's how you can perform the test in Python:

```python
from scipy.stats import chi2_contingency

# Contingency table
observed = [[60, 140],
            [30, 170]]

# Perform chi-square test for independence
chi2_stat, p_value, dof, expected = chi2_contingency(observed)

# Print results
print("Chi-square Statistic:", chi2_stat)
print("p-value:", p_value)

# Determine significance
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between smoking status and lung cancer diagnosis.")
else:
    print("Fail to reject the null hypothesis: There is no significant association between smoking status and lung cancer diagnosis.")
```

Interpretation: The chi-square statistic measures the strength of association between smoking status and lung cancer diagnosis. The p-value indicates the probability of observing such an association if there were no true association between the variables. If the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis of independence and conclude that there is a significant association between smoking status and lung cancer diagnosis. Otherwise, we fail to reject the null hypothesis.

Q10. A study was conducted to determine if the proportion of people who prefer milk chocolate, dark
chocolate, or white chocolate is different in the U.S. versus the U.K. A random sample of 500 people from
the U.S. and a random sample of 500 people from the U.K. were surveyed. The results are shown in the
contingency table below. Conduct a chi-square test for independence to determine if there is a significant
association between chocolate preference and country of origin. Milk Chocolate Dark Chocolate White Chocolate

U.S. (n=500) 200 150 150
U.K. (n=500) 225 175 100   
Use a significance level of 0.01.

To conduct a chi-square test for independence for the given contingency table, you can use the `chi2_contingency` function from the `scipy.stats` module. Here's how you can perform the test in Python:

```python
from scipy.stats import chi2_contingency

# Contingency table
observed = [[200, 150, 150],
            [225, 175, 100]]

# Perform chi-square test for independence
chi2_stat, p_value, dof, expected = chi2_contingency(observed)

# Print results
print("Chi-square Statistic:", chi2_stat)
print("p-value:", p_value)

# Determine significance
alpha = 0.01
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between chocolate preference and country of origin.")
else:
    print("Fail to reject the null hypothesis: There is no significant association between chocolate preference and country of origin.")
```

Interpretation: The chi-square statistic measures the strength of association between chocolate preference and country of origin. The p-value indicates the probability of observing such an association if there were no true association between the variables. If the p-value is less than the chosen significance level (e.g., 0.01), we reject the null hypothesis of independence and conclude that there is a significant association between chocolate preference and country of origin. Otherwise, we fail to reject the null hypothesis.

Q11. A random sample of 30 people was selected from a population with an unknown mean and standard
deviation. The sample mean was found to be 72 and the sample standard deviation was found to be 10.
Conduct a hypothesis test to determine if the population mean is significantly different from 70. Use a
significance level of 0.05.

To conduct a hypothesis test for the population mean using a one-sample t-test, we can follow these steps:

1. State the null hypothesis (H0) and the alternative hypothesis (H1):
   - Null hypothesis (H0): The population mean is equal to 70 (μ = 70).
   - Alternative hypothesis (H1): The population mean is significantly different from 70 (μ ≠ 70).

2. Choose the significance level (α), which is given as 0.05.

3. Calculate the test statistic (t-score) using the formula:
   \[ t = \frac{{\bar{x} - \mu}}{{\frac{s}{{\sqrt{n}}}}} \]
   where:
   - \(\bar{x}\) is the sample mean,
   - \(\mu\) is the hypothesized population mean,
   - \(s\) is the sample standard deviation,
   - \(n\) is the sample size.

4. Determine the degrees of freedom (df), which is \(n - 1\).

5. Determine the critical t-value(s) from the t-distribution table for the given significance level and degrees of freedom.

6. Compare the absolute value of the test statistic to the critical t-value(s):
   - If the absolute value of the test statistic is greater than the critical t-value, reject the null hypothesis.
   - If the absolute value of the test statistic is less than or equal to the critical t-value, fail to reject the null hypothesis.

Let's perform these steps in Python:

```python
from scipy.stats import t

# Given data
sample_mean = 72
sample_std = 10
sample_size = 30
population_mean = 70
significance_level = 0.05

# Calculate the test statistic (t-score)
t_score = (sample_mean - population_mean) / (sample_std / (sample_size ** 0.5))

# Determine the degrees of freedom
df = sample_size - 1

# Calculate the critical t-values
t_critical = t.ppf(1 - significance_level / 2, df)

# Compare the test statistic to the critical t-values
if abs(t_score) > t_critical:
    print("Reject the null hypothesis: The population mean is significantly different from 70.")
else:
    print("Fail to reject the null hypothesis: The population mean is not significantly different from 70.")
```

This code will output whether to reject or fail to reject the null hypothesis based on the calculated test statistic and critical t-values.