In [None]:
Q1. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5 using Python. Interpret the results.
Q2. Conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag
matches the expected distribution of 20% blue, 20% orange, 20% green, 10% yellow, 10% red, and 20%
brown. Use Python to perform the test with a significance level of 0.05.
Q3. Use Python to calculate the chi-square statistic and p-value for a contingency table with the following
data:

Interpret the results of the test.
Q4. A study of the prevalence of smoking in a population of 500 individuals found that 60 individuals
smoked. Use Python to calculate the 95% confidence interval for the true proportion of individuals in the
population who smoke.
Q5. Calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation
of 12 using Python. Interpret the results.
Q6. Use Python to plot the chi-square distribution with 10 degrees of freedom. Label the axes and shade the
area corresponding to a chi-square statistic of 15.
Q7. A random sample of 1000 people was asked if they preferred Coke or Pepsi. Of the sample, 520
preferred Coke. Calculate a 99% confidence interval for the true proportion of people in the population who
prefer Coke.
Q8. A researcher hypothesizes that a coin is biased towards tails. They flip the coin 100 times and observe
45 tails. Conduct a chi-square goodness of fit test to determine if the observed frequencies match the
expected frequencies of a fair coin. Use a significance level of 0.05.
Q9. A study was conducted to determine if there is an association between smoking status (smoker or
non-smoker) and lung cancer diagnosis (yes or no). The results are shown in the contingency table below.
Conduct a chi-square test for independence to determine if there is a significant association between
smoking status and lung cancer diagnosis.

Use a significance level of 0.05.

Q10. A study was conducted to determine if the proportion of people who prefer milk chocolate, dark
chocolate, or white chocolate is different in the U.S. versus the U.K. A random sample of 500 people from
the U.S. and a random sample of 500 people from the U.K. were surveyed. The results are shown in the
contingency table below. Conduct a chi-square test for independence to determine if there is a significant
association between chocolate preference and country of origin.

Use a significance level of 0.01.
Q11. A random sample of 30 people was selected from a population with an unknown mean and standard
deviation. The sample mean was found to be 72 and the sample standard deviation was found to be 10.
Conduct a hypothesis test to determine if the population mean is significantly different from 70. Use a
significance level of 0.05.

In [None]:
Solution 




### Q1: Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5 using Python. Interpret the results.

Here's how you can calculate the confidence interval in Python using the `scipy.stats` module:

```python
from scipy import stats

# Sample data
sample_mean = 50
sample_std = 5
sample_size = 100

# Calculate the standard error
standard_error = sample_std / (sample_size ** 0.5)

# Calculate the margin of error for a 95% confidence interval
margin_of_error = stats.norm.ppf(0.975) * standard_error

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print("95% Confidence Interval:", confidence_interval)
```

Interpretation: This confidence interval indicates that we are 95% confident that the true population mean lies between the lower bound of the interval (around 48.75) and the upper bound (around 51.25).



### Q2: Conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag matches the expected distribution of 20% blue, 20% orange, 20% green, 10% yellow, 10% red, and 20% brown. Use Python to perform the test with a significance level of 0.05.

Here's how you can conduct a chi-square goodness of fit test in Python using the `scipy.stats` module:

```python
from scipy.stats import chisquare

# Observed frequencies of colors in the M&Ms bag
observed = [35, 45, 20, 10, 15, 25]  # Blue, Orange, Green, Yellow, Red, Brown

# Expected frequencies based on the expected distribution
expected = [0.2 * sum(observed)] * 6  # 20% for each color

# Perform the chi-square goodness of fit test
chi2_stat, p_value = chisquare(observed, f_exp=expected)

# Print the results
print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_value)

# Check if the null hypothesis is rejected based on the significance level
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The distribution of colors in the bag does not match the expected distribution.")
else:
    print("Fail to reject the null hypothesis: The distribution of colors in the bag matches the expected distribution.")
```

This code performs a chi-square goodness of fit test to determine if the observed frequencies of M&M colors match the expected frequencies based on the given distribution. The null hypothesis is that the observed frequencies match the expected frequencies.

Certainly, let's continue with the next questions.

### Q3: Use Python to calculate the chi-square statistic and p-value for a contingency table with the following data:

```
Outcome 1: Group A - 20, Group B - 15
Outcome 2: Group A - 10, Group B - 25
Outcome 3: Group A - 15, Group B - 20
```

Interpret the results of the test.

Here's how you can calculate the chi-square statistic and p-value for the given contingency table using Python:

```python
from scipy.stats import chi2_contingency

# Contingency table data
observed = [[20, 15], [10, 25], [15, 20]]  # Outcome 1, Outcome 2, Outcome 3

# Perform chi-square test for independence
chi2_stat, p_value, _, _ = chi2_contingency(observed)

# Print the results
print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_value)

# Interpretation of results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between smoking status and lung cancer diagnosis.")
else:
    print("Fail to reject the null hypothesis: There is no significant association between smoking status and lung cancer diagnosis.")
```

The chi-square test for independence is used to determine if there is a significant association between two categorical variables. In this case, the contingency table represents the association between smoking status and lung cancer diagnosis. The null hypothesis is that there is no association between the two variables.

### Q4: A study of the prevalence of smoking in a population of 500 individuals found that 60 individuals smoked. Use Python to calculate the 95% confidence interval for the true proportion of individuals in the population who smoke.

Here's how you can calculate the confidence interval for the proportion of smokers in Python:

```python
from statsmodels.stats.proportion import proportion_confint

# Total population and number of smokers
population_size = 500
smokers_count = 60

# Calculate the confidence interval
ci = proportion_confint(smokers_count, population_size, alpha=0.05, method='normal')

print("95% Confidence Interval for Proportion of Smokers:", ci)
```

The confidence interval provides a range within which we can be 95% confident that the true proportion of smokers in the population lies.

### Q5: Calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation of 12 using Python. Interpret the results.

Here's how you can calculate the 90% confidence interval for the given sample data in Python:

```python
from scipy.stats import norm

# Sample data
sample_mean = 75
sample_std = 12
sample_size = 100

# Calculate the standard error
standard_error = sample_std / (sample_size ** 0.5)

# Calculate the margin of error for a 90% confidence interval
margin_of_error = norm.ppf(0.95) * standard_error

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print("90% Confidence Interval:", confidence_interval)
```

The interpretation of the confidence interval is similar to the previous examples.

Let's continue with the remaining questions.

### Q6: Use Python to plot the chi-square distribution with 10 degrees of freedom. Label the axes and shade the area corresponding to a chi-square statistic of 15.

Here's how you can plot the chi-square distribution with 10 degrees of freedom and shade the area corresponding to a chi-square statistic of 15 using Python:

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Define degrees of freedom
df = 10

# Generate x values for the chi-square distribution
x = np.linspace(0, 30, 1000)

# Calculate the chi-square probability density function (PDF)
pdf = chi2.pdf(x, df)

# Plot the chi-square PDF
plt.plot(x, pdf, label=f'Chi-Square DF={df}')

# Shade the area corresponding to chi-square statistic of 15
plt.fill_between(x, pdf, where=(x >= 15), color='skyblue', alpha=0.5, label='Chi-Square Statistic >= 15')

# Label the axes and add a legend
plt.xlabel('Chi-Square Statistic')
plt.ylabel('Probability Density Function')
plt.legend()

# Show the plot
plt.show()
```

This code will generate a plot of the chi-square distribution with 10 degrees of freedom and shade the area corresponding to a chi-square statistic of 15.

### Q7: A random sample of 1000 people was asked if they preferred Coke or Pepsi. Of the sample, 520 preferred Coke. Calculate a 99% confidence interval for the true proportion of people in the population who prefer Coke.

Here's how you can calculate the confidence interval for the proportion of people who prefer Coke using Python:

```python
from statsmodels.stats.proportion import proportion_confint

# Total sample size and number of Coke preferences
total_sample = 1000
coke_preferences = 520

# Calculate the confidence interval
ci_coke = proportion_confint(coke_preferences, total_sample, alpha=0.01, method='normal')

print("99% Confidence Interval for Proportion Preferring Coke:", ci_coke)
```

This will give you the confidence interval for the true proportion of people in the population who prefer Coke.

Let's proceed with the remaining questions.

### Q8: A researcher hypothesizes that a coin is biased towards tails. They flip the coin 100 times and observe 45 tails. Conduct a chi-square goodness of fit test to determine if the observed frequencies match the expected frequencies of a fair coin. Use a significance level of 0.05.

Here's how you can conduct the chi-square goodness of fit test for the coin flipping scenario using Python:

```python
from scipy.stats import chisquare

# Observed frequencies (tails and heads)
observed = [45, 55]  # Tails, Heads

# Expected frequencies for a fair coin
expected = [50, 50]  # Equal chances for tails and heads

# Perform the chi-square goodness of fit test
chi2_stat, p_value = chisquare(observed, f_exp=expected)

# Print the results
print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_value)

# Check if the null hypothesis is rejected based on the significance level
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The observed frequencies do not match the expected frequencies of a fair coin.")
else:
    print("Fail to reject the null hypothesis: The observed frequencies match the expected frequencies of a fair coin.")
```

This code conducts the chi-square goodness of fit test to determine if the observed frequencies of tails and heads match the expected frequencies of a fair coin.

### Q9: A study was conducted to determine if there is an association between smoking status (smoker or non-smoker) and lung cancer diagnosis (yes or no). The results are shown in the contingency table below. Conduct a chi-square test for independence to determine if there is a significant association between smoking status and lung cancer diagnosis.

```
Group A
Outcome 1: Smoker - 60, Non-smoker - 140
Outcome 2: Smoker - 30, Non-smoker - 170
```

Here's how you can conduct the chi-square test for independence for the given contingency table using Python:

```python
from scipy.stats import chi2_contingency

# Contingency table data
observed = [[60, 140], [30, 170]]  # Outcome 1 (Smoker, Non-smoker), Outcome 2 (Smoker, Non-smoker)

# Perform chi-square test for independence
chi2_stat, p_value, _, _ = chi2_contingency(observed)

# Print the results
print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_value)

# Interpretation of results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between smoking status and lung cancer diagnosis.")
else:
    print("Fail to reject the null hypothesis: There is no significant association between smoking status and lung cancer diagnosis.")
```

This code performs the chi-square test for independence to determine if there is a significant association between smoking status and lung cancer diagnosis based on the provided contingency table.

Let's move on to the next questions.

### Q10: A study was conducted to determine if the proportion of people who prefer milk chocolate, dark chocolate, or white chocolate is different in the U.S. versus the U.K. A random sample of 500 people from the U.S. and a random sample of 500 people from the U.K. were surveyed. The results are shown in the contingency table below. Conduct a chi-square test for independence to determine if there is a significant association between chocolate preference and country of origin.

```
Chocolate Preference:
- Milk Chocolate: US - 150, UK - 200
- Dark Chocolate: US - 100, UK - 150
- White Chocolate: US - 250, UK - 150
```

Here's how you can conduct the chi-square test for independence for the chocolate preference and country of origin using Python:

```python
from scipy.stats import chi2_contingency

# Contingency table data
observed = [[150, 200], [100, 150], [250, 150]]  # Milk, Dark, White Chocolate preferences for US and UK

# Perform chi-square test for independence
chi2_stat, p_value, _, _ = chi2_contingency(observed)

# Print the results
print("Chi-square Statistic:", chi2_stat)
print("P-value:", p_value)

# Interpretation of results
alpha = 0.01
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between chocolate preference and country of origin.")
else:
    print("Fail to reject the null hypothesis: There is no significant association between chocolate preference and country of origin.")
```

This code performs the chi-square test for independence to determine if there is a significant association between chocolate preference (milk, dark, white) and country of origin (US, UK) based on the provided contingency table.

Q11.A random sample of 30 people was selected from a population with an unknown mean and standard
deviation. The sample mean was found to be 72 and the sample standard deviation was found to be 10.
Conduct a hypothesis test to determine if the population mean is significantly different from 70. Use a
significance level of 0.05.


To conduct a hypothesis test to determine if the population mean is significantly different from 70 given a random sample of 30 people with a sample mean of 72 and a sample standard deviation of 10, we can perform a one-sample t-test. Here's how you can do it in Python:

```python
from scipy.stats import t

# Sample data
sample_mean = 72
sample_std = 10
sample_size = 30
population_mean = 70

# Calculate the t-statistic
t_stat = (sample_mean - population_mean) / (sample_std / (sample_size ** 0.5))

# Degrees of freedom
df = sample_size - 1

# Calculate the critical t-value for a two-tailed test at alpha = 0.05
alpha = 0.05
critical_t = t.ppf(1 - alpha/2, df)

# Calculate the p-value
p_value = 2 * (1 - t.cdf(abs(t_stat), df))

# Print the results
print("t-statistic:", t_stat)
print("Critical t-value:", critical_t)
print("P-value:", p_value)

# Interpretation of results
if abs(t_stat) > critical_t:
    print("Reject the null hypothesis: The population mean is significantly different from 70.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to conclude that the population mean is significantly different from 70.")
```

In this code:

- We calculate the t-statistic using the formula \(\frac{{\bar{x} - \mu}}{{s/\sqrt{n}}}\), where \(\bar{x}\) is the sample mean, \(s\) is the sample standard deviation, \(n\) is the sample size, and \(\mu\) is the population mean we are testing against (70 in this case).
- We determine the degrees of freedom (\(df\)) which is equal to \(n - 1\).
- We calculate the critical t-value for a two-tailed test at a significance level of 0.05 using the inverse cumulative distribution function (ppf) of the t-distribution.
- We calculate the p-value using the cumulative distribution function (cdf) of the t-distribution and compare it with the significance level (alpha).

Based on the results, if the absolute value of the t-statistic is greater than the critical t-value, we reject the null hypothesis and conclude that the population mean is significantly different from 70. Otherwise, we fail to reject the null hypothesis.