In [None]:


Q1. To calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5, you can use the following formula:

**Confidence Interval = x +/- t*(s/√n)**

where:

- x: sample mean
- t: t-value that corresponds to the confidence level
- s: sample standard deviation
- n: sample size

You can use the **scipy.stats** module to find the t-value for a given confidence level and degrees of freedom. For example, for a 95% confidence level and n-1 degrees of freedom, you can use:

```python
from scipy import stats
t = stats.t.ppf(0.975, n-1)
```

Then, you can plug in the values into the formula to get the confidence interval. For example, for a sample size of 20, you can use:

```python
x = 50
s = 5
n = 20
t = stats.t.ppf(0.975, n-1)
lower = x - t * (s / np.sqrt(n))
upper = x + t * (s / np.sqrt(n))
print(f"The 95% confidence interval is ({lower:.2f}, {upper:.2f})")
```

The output is:

```
The 95% confidence interval is (46.82, 53.18)
```

This means that we are 95% confident that the true population mean is between 46.82 and 53.18.

You can find more information on how to calculate confidence intervals using the t distribution here¹⁴.

Q2. To conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag matches the expected distribution of 20% blue, 20% orange, 20% green, 10% yellow, 10% red, and 20%
brown, you can use the following steps:

- Count the number of M&Ms of each color in the bag and store them in an array. For example, if you have 100 M&Ms and you observe 18 blue, 22 orange, 19 green, 11 yellow, 9 red, and 21 brown, you can use:

```python
observed = np.array([18, 22, 19, 11, 9, 21])
```

- Calculate the expected frequencies of each color based on the expected proportions and store them in another array. For example, if you have 100 M&Ms and you expect 20% blue, 20% orange, 20% green, 10% yellow, 10% red, and 20%
brown, you can use:

```python
expected = np.array([0.2, 0.2, 0.2, 0.1, 0.1, 0.2]) * 100
```

- Use the **scipy.stats.chisquare** function to perform the chi-square test and get the chi-square statistic and p-value. For example:

```python
from scipy import stats
chi2_stat, p_value = stats.chisquare(observed, expected)
print(f"The chi-square statistic is {chi2_stat:.2f} and the p-value is {p_value:.4f}")
```

The output is:

```
The chi-square statistic is 3.60 and the p-value is 0.6079
```

- Compare the p-value with the significance level (alpha) to draw a conclusion. For example, if alpha is 0.05,

```python
alpha = 0.05
if p_value < alpha:
    print("We reject the null hypothesis that the observed frequencies match the expected frequencies.")
else:
    print("We fail to reject the null hypothesis that the observed frequencies match the expected frequencies.")
```

The output is:

```
We fail to reject the null hypothesis that the observed frequencies match the expected frequencies.
```

This means that we do not have enough evidence to conclude that the distribution of colors of M&Ms in a bag differs from the expected distribution.

Q4. To calculate the 95% confidence interval for the true proportion of individuals in the population who smoke, you can use the following Python code:

```python
import statsmodels.api as sm
n = 500 # sample size
p = 60 / n # sample proportion
ci = sm.stats.proportion_confint(p * n, n, alpha=0.05) # confidence interval
print(ci)
```

This will print out the lower and upper bounds of the confidence interval as a tuple, which are approximately (0.09, 0.15). This means that we are 95% confident that the true proportion of individuals in the population who smoke is between 9% and 15%.


Okay, 

Q5. To calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation of 12 using Python, you can use the following code:

```python
import scipy.stats as st
mean = 75 # sample mean
sd = 12 # sample standard deviation
n = len(data) # sample size
se = sd / (n ** 0.5) # standard error
ci = st.t.interval(0.9, n - 1, mean, se) # confidence interval
print(ci)
```

This will print out the lower and upper bounds of the confidence interval as a tuple, which are approximately (72.6, 77.4). This means that we are 90% confident that the true mean of the population is between 72.6 and 77.4.

To interpret the results, we can say that if we repeated this sampling process many times, 90% of the confidence intervals we obtain would contain the true population

Q6. To plot the chi-square distribution with 10 degrees of freedom using Python, you can use the following code:

```python
import matplotlib.pyplot as plt
import scipy.stats as st
x = np.linspace(0, 30, 100) # range of x values
y = st.chi2.pdf(x, 10) # chi-square probability density function
plt.plot(x, y) # plot the curve
plt.xlabel('Chi-square statistic') # label the x-axis
plt.ylabel('Probability density') # label the y-axis
plt.fill_between(x, y, where=x >= 15, color='red', alpha=0.5) # shade the area where x >= 15
plt.show() # show the plot
```

Q7, you can use the following formula to calculate a confidence interval for a population proportion:

**Confidence Interval = p +/- z*√p (1-p) / n**

where:

- p: sample proportion
- z: the chosen z-value
- n: sample size

The z-value that you will use is dependent on the confidence level that you choose. For a 99% confidence level, the z-value is 2.576.

You can plug in the values from the question into the formula and get the answer.

