Q1. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5 using Python. Interpret the results.

In [None]:
Sure, you can calculate the 95% confidence interval for a sample mean using Python with the help of the `scipy` library. Here's an example:

from scipy.stats import norm
import numpy as np

# Sample data
sample_mean = 50
sample_std_dev = 5
sample_size = 100  # Replace this with your sample size

# Calculate standard error
standard_error = sample_std_dev / np.sqrt(sample_size)

# Define the confidence level
confidence_level = 0.95

# Calculate the margin of error
margin_of_error = norm.ppf((1 + confidence_level) / 2) * standard_error

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"95% Confidence Interval: ({lower_bound}, {upper_bound})")

Replace `sample_size` with the actual size of your sample. This code calculates the 95% confidence interval for the given sample mean of 50, standard deviation of 5, and sample size of 100. Adjust the variables as needed for your specific dataset.

Interpreting the results: The confidence interval (e.g., (lower_bound, upper_bound)) provides a range of values within which we can be 95% confident that the true population mean lies. In this example, if you were to take multiple samples and compute their confidence intervals, approximately 95% of those intervals would contain the true population mean.

Q2. Conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag
matches the expected distribution of 20% blue, 20% orange, 20% green, 10% yellow, 10% red, and 20%
brown. Use Python to perform the test with a significance level of 0.05.

In [None]:
Certainly! You can conduct a chi-square goodness-of-fit test in Python using the `scipy` library to determine if the observed distribution matches the expected distribution. Here's an example:

from scipy.stats import chi2_contingency

# Observed frequencies (actual counts from the bag of M&Ms)
observed_freq = [18, 22, 15, 12, 11, 22]  # Replace this with your observed frequencies

# Expected frequencies based on the expected distribution
expected_freq = [20, 20, 20, 10, 10, 20]

# Perform chi-square test
chi2_stat, p_val = chi2_contingency([observed_freq, expected_freq])

# Set the significance level
alpha = 0.05

# Compare p-value with significance level
if p_val < alpha:
    print("Reject the null hypothesis: The observed distribution differs significantly from the expected distribution.")
else:
    print("Fail to reject the null hypothesis: The observed distribution matches the expected distribution.")

Replace `observed_freq` with the actual counts of each color observed in your bag of M&Ms. This code compares the observed frequencies with the expected frequencies using a chi-square test. The test will determine whether the observed distribution significantly differs from the expected distribution at a 5% significance level (alpha = 0.05).

Remember to input the actual observed frequencies for each color in the bag of M&Ms to perform the test accurately.

Q3. Use Python to calculate the chi-square statistic and p-value for a contingency table with the following
data:
        Group A  

Outcome 1 20 15
Outcome 2 10 25
Outcome 3 15 20
Group B

In [None]:
Certainly! You can use Python, particularly the `scipy` library, to compute the chi-square statistic and p-value for the given contingency table representing Group A and Group B outcomes. Here's an example:

from scipy.stats import chi2_contingency

# Contingency table data
observed = [[20, 15],
            [10, 25],
            [15, 20]]

# Perform chi-square test for independence
chi2_stat, p_val, dof, expected = chi2_contingency(observed)

print(f"Chi-square Statistic: {chi2_stat}")
print(f"P-value: {p_val}")

The `chi2_contingency` function from `scipy.stats` is used to compute the chi-square statistic, p-value, degrees of freedom, and expected frequencies based on the observed contingency table.

Interpreting the results:
- The chi-square statistic value represents the strength of association between the variables (Group A and Group B outcomes). Higher values indicate a stronger association.
- The p-value measures the significance of the association. If the p-value is less than a chosen significance level (e.g., 0.05), it indicates that there's evidence to reject the null hypothesis of independence between the groups.

You can interpret the results by comparing the p-value to your chosen significance level. If the p-value is less than your chosen alpha level (e.g., 0.05), you might conclude that there is a significant association between the groups' outcomes. If the p-value is greater, it suggests that there isn't sufficient evidence to reject the null hypothesis of independence between the groups.

Q4. A study of the prevalence of smoking in a population of 500 individuals found that 60 individuals
smoked. Use Python to calculate the 95% confidence interval for the true proportion of individuals in the
population who smoke.

In [None]:
To calculate the 95% confidence interval for the true proportion of individuals in the population who smoke, you can use Python's statsmodels library. Here's an example:

from statsmodels.stats.proportion import proportion_confint

# Total population and number of individuals who smoke
total_population = 500
smokers = 60

# Calculate the confidence interval
conf_interval = proportion_confint(smokers, total_population, alpha=0.05, method='normal')

print(f"95% Confidence Interval for Smoking Rate: {conf_interval}")

This code utilizes the `proportion_confint` function from the `statsmodels.stats.proportion` module to compute the confidence interval for the proportion of individuals who smoke in the population. Replace the `total_population` and `smokers` variables with your actual population size and number of individuals who smoke, respectively.

The result will provide you with a 95% confidence interval for the true proportion of individuals in the population who smoke. It's expressed as a tuple containing the lower and upper bounds of the confidence interval. For instance, the output might look like "(0.104, 0.176)", indicating that the estimated proportion of smokers in the population falls between 10.4% and 17.6% with 95% confidence.

Q5. Calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation
of 12 using Python. Interpret the results.

In [None]:
Certainly! You can calculate the 90% confidence interval for a sample mean using Python. Here's an example using the `scipy` library:

from scipy.stats import norm

# Sample data
sample_mean = 75
sample_std_dev = 12
sample_size = 100  # Replace this with your sample size

# Calculate standard error
standard_error = sample_std_dev / (sample_size ** 0.5)

# Define the confidence level
confidence_level = 0.90

# Calculate the margin of error
margin_of_error = norm.ppf((1 + confidence_level) / 2) * standard_error

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"90% Confidence Interval: ({lower_bound}, {upper_bound})")

Replace `sample_size` with the actual size of your sample. This code calculates the 90% confidence interval for the given sample mean of 75, standard deviation of 12, and sample size of 100. Adjust the variables as needed for your specific dataset.

Interpreting the results: The confidence interval (e.g., (lower_bound, upper_bound)) provides a range of values within which we can be 90% confident that the true population mean lies. In this example, if you were to take multiple samples and compute their confidence intervals, approximately 90% of those intervals would contain the true population mean.

Q6. Use Python to plot the chi-square distribution with 10 degrees of freedom. Label the axes and shade the
area corresponding to a chi-square statistic of 15.

In [None]:
Certainly! You can plot the chi-square distribution with 10 degrees of freedom using Python's `matplotlib` library. Here's an example:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Degrees of freedom
df = 10

# Create an array of x values
x = np.linspace(0, 30, 1000)

# Calculate the chi-square probability density function (PDF) for the given degrees of freedom
pdf = chi2.pdf(x, df)

# Plot the chi-square distribution
plt.plot(x, pdf, label=f'Chi-square Distribution (df={df})')

# Shade the area corresponding to a chi-square statistic of 15
x_fill = np.linspace(0, 15, 1000)
pdf_fill = chi2.pdf(x_fill, df)
plt.fill_between(x_fill, pdf_fill, color='skyblue', alpha=0.5, where=(x_fill <= 15), label='Chi-square Statistic = 15')

# Label axes and add a legend
plt.xlabel('Chi-square Statistic')
plt.ylabel('Probability Density Function')
plt.title('Chi-square Distribution and Statistic')
plt.legend()
plt.grid(True)
plt.show()

This code creates a plot of the chi-square distribution with 10 degrees of freedom. It shades the area corresponding to a chi-square statistic of 15 to highlight that region on the plot. Adjust the values or parameters as needed for your specific requirements.

Q7. A random sample of 1000 people was asked if they preferred Coke or Pepsi. Of the sample, 520
preferred Coke. Calculate a 99% confidence interval for the true proportion of people in the population who
prefer Coke.

In [None]:
To compute the 99% confidence interval for the true proportion of people in the population who prefer Coke based on the sample data, you can use Python's statsmodels library. Here's an example:

from statsmodels.stats.proportion import proportion_confint

# Total sample size and number of individuals who prefer Coke
total_sample = 1000
coke_preference = 520

# Calculate the confidence interval
conf_interval = proportion_confint(coke_preference, total_sample, alpha=0.01, method='normal')

print(f"99% Confidence Interval for Coke Preference: {conf_interval}")

This code utilizes the `proportion_confint` function from the `statsmodels.stats.proportion` module to compute the confidence interval for the proportion of people in the population who prefer Coke. Replace `total_sample` and `coke_preference` with the actual total sample size and number of individuals who prefer Coke, respectively.

The output will provide you with a 99% confidence interval for the true proportion of people in the population who prefer Coke. It will be displayed as a tuple containing the lower and upper bounds of the confidence interval. For instance, the result might look like "(0.482, 0.558)", indicating that the estimated proportion of people who prefer Coke in the population falls between 48.2% and 55.8% with 99% confidence.

Q8. A researcher hypothesizes that a coin is biased towards tails. They flip the coin 100 times and observe
45 tails. Conduct a chi-square goodness of fit test to determine if the observed frequencies match the
expected frequencies of a fair coin. Use a significance level of 0.05.

In [None]:
To conduct a chi-square goodness of fit test in Python to determine if the observed frequencies match the expected frequencies of a fair coin, you can use the `scipy` library. Here's an example:

from scipy.stats import chi2_contingency

# Observed frequencies
observed_tails = 45
total_flips = 100

# Expected frequencies for a fair coin (50% heads, 50% tails)
expected_heads = total_flips / 2
expected_tails = total_flips / 2

# Create the observed and expected frequency arrays
observed = [observed_tails, total_flips - observed_tails]
expected = [expected_tails, expected_heads]

# Perform chi-square goodness of fit test
chi2_stat, p_val = chi2_contingency([observed, expected])[0:2]

print(f"Chi-square Statistic: {chi2_stat}")
print(f"P-value: {p_val}")

This code uses the `chi2_contingency` function from `scipy.stats` to perform the chi-square goodness of fit test. It compares the observed frequency of tails (45) in 100 coin flips against the expected frequencies for a fair coin (50% heads, 50% tails).

The `chi2_stat` variable holds the chi-square statistic, and `p_val` contains the p-value resulting from the test.

Interpretation: 
- If the p-value is less than the significance level of 0.05, it suggests that there is evidence to reject the null hypothesis of the coin being fair.
- If the p-value is greater than 0.05, it indicates that there isn't sufficient evidence to reject the null hypothesis, suggesting that the observed frequencies are consistent with what would be expected from a fair coin.

Q9. A study was conducted to determine if there is an association between smoking status (smoker or
non-smoker) and lung cancer diagnosis (yes or no). The results are shown in the contingency table below.
Conduct a chi-square test for independence to determine if there is a significant association between
smoking status and lung cancer diagnosis.

              Lung Cancer: Yes   Lung Cancer: No

Smoker            60                140
Non-smoker        30                170

Use a significance level of 0.05.


In [None]:
Certainly! To conduct a chi-square test for independence in Python using the given contingency table, you can use the `scipy.stats` library. Here's an example:

from scipy.stats import chi2_contingency

# Create the contingency table
observed = [[60, 140], [30, 170]]

# Perform chi-square test for independence
chi2_stat, p_val, _, _ = chi2_contingency(observed)

print(f"Chi-square Statistic: {chi2_stat}")
print(f"P-value: {p_val}")

This code utilizes the `chi2_contingency` function from `scipy.stats` to perform the chi-square test for independence on the provided contingency table.

The `chi2_stat` variable contains the chi-square statistic, and `p_val` contains the p-value resulting from the test.

Interpretation:
- If the p-value is less than the significance level of 0.05, it suggests that there is evidence to reject the null hypothesis. In this case, it would indicate a significant association between smoking status and lung cancer diagnosis.
- If the p-value is greater than 0.05, it implies that there isn't enough evidence to reject the null hypothesis, suggesting no significant association between smoking status and lung cancer diagnosis.

Q10. A study was conducted to determine if the proportion of people who prefer milk chocolate, dark
chocolate, or white chocolate is different in the U.S. versus the U.K. A random sample of 500 people from
the U.S. and a random sample of 500 people from the U.K. were surveyed. The results are shown in the
contingency table below. Conduct a chi-square test for independence to determine if there is a significant
association between chocolate preference and country of origin.

                Milk Chocolate   Dark Chocolate   White Chocolate

U.S. (n=500)        200             150               150
U.K. (n=500)        225             175               100

Use a significance level of 0.01.

In [None]:
To conduct a chi-square test for independence in Python based on the provided contingency table representing chocolate preference and country of origin, you can use the `scipy.stats` library. Here's an example:

from scipy.stats import chi2_contingency

# Create the contingency table
observed = [[200, 150, 150], [225, 175, 100]]

# Perform chi-square test for independence
chi2_stat, p_val, _, _ = chi2_contingency(observed)

print(f"Chi-square Statistic: {chi2_stat}")
print(f"P-value: {p_val}")

This code utilizes the `chi2_contingency` function from `scipy.stats` to perform the chi-square test for independence on the provided contingency table.

The `chi2_stat` variable contains the chi-square statistic, and `p_val` contains the p-value resulting from the test.

Interpretation:
- If the p-value is less than the significance level of 0.01, it suggests that there is evidence to reject the null hypothesis. It indicates a significant association between chocolate preference and country of origin.
- If the p-value is greater than 0.01, it implies that there isn't enough evidence to reject the null hypothesis, suggesting no significant association between chocolate preference and country of origin.

Q11. A random sample of 30 people was selected from a population with an unknown mean and standard
deviation. The sample mean was found to be 72 and the sample standard deviation was found to be 10.
Conduct a hypothesis test to determine if the population mean is significantly different from 70. Use a
significance level of 0.05.

In [None]:
To conduct a hypothesis test to determine if the population mean is significantly different from 70 based on the provided sample information, you can perform a one-sample t-test in Python. Here's an example using the `scipy` library:

from scipy.stats import t

# Sample information
sample_mean = 72
sample_std = 10
sample_size = 30
population_mean = 70
significance_level = 0.05

# Calculate the t-statistic
t_stat = (sample_mean - population_mean) / (sample_std / (sample_size ** 0.5))

# Calculate degrees of freedom
degrees_freedom = sample_size - 1

# Calculate the critical t-value
critical_t_value = t.ppf(1 - significance_level / 2, degrees_freedom)

print(f"T-Statistic: {t_stat}")
print(f"Critical T-Value: {critical_t_value}")

# Perform the hypothesis test
if abs(t_stat) > critical_t_value:
    print("Reject the null hypothesis: Population mean is significantly different from 70")
else:
    print("Fail to reject the null hypothesis: No significant difference in population mean")
    
This code computes the t-statistic using the sample mean, sample standard deviation, sample size, and the population mean of interest. It then compares the absolute value of the t-statistic to the critical t-value obtained from the t-distribution at the given significance level and degrees of freedom.

- If the absolute value of the calculated t-statistic is greater than the critical t-value, you would reject the null hypothesis, indicating a significant difference in the population mean from 70.
- If the absolute value of the calculated t-statistic is not greater than the critical t-value, you would fail to reject the null hypothesis, suggesting no significant difference in the population mean from 70.