In [None]:


Q1: A **t-test** is a statistical test that compares the means of two samples when the population variance is unknown or the sample size is small (usually less than 30). A **z-test** is a statistical test that compares the means of two samples when the population variance is known or the sample size is large (usually 30 or more).¹²³

For example, you can use a t-test to compare the average heights of male and female students in a small class, where you don't know the population variance. You can use a z-test to compare the average weights of men and women in a large city, where you know the population variance from previous studies.

Q2: A **one-tailed test** is a statistical test that tests whether the mean or proportion of one sample is greater than or less than a specified value or another sample. A **two-tailed test** is a statistical test that tests whether the mean or proportion of one sample is equal to or not equal to a specified value or another sample.⁴

For example, you can use a one-tailed test to test whether the average score of students who took an online course is higher than 80. You can use a two-tailed test to test whether the average score of students who took an online course is different from the average score of students who took an in-person course.

Q3: A **Type 1 error** is a statistical error that occurs when you reject a true null hypothesis. A **Type 2 error** is a statistical error that occurs when you fail to reject a false null hypothesis.⁴

For example, suppose you are testing whether a new drug is effective in curing a disease. The null hypothesis is that the drug has no effect, and the alternative hypothesis is that the drug has an effect. A Type 1 error would be concluding that the drug is effective when it is not, and a Type 2 error would be concluding that the drug is not effective when it is.


Q4: **Bayes's theorem** is a formula that describes how to update the probability of a hypothesis based on new evidence. It is based on the idea that the probability of an event depends on the prior knowledge of the conditions that might be related to the event.¹²

The formula for Bayes's theorem is:

$$P(H|E) = \frac{P(E|H)P(H)}{P(E)}$$

where:

- $P(H|E)$ is the probability of hypothesis $H$ given evidence $E$ (also called the posterior probability).
- $P(E|H)$ is the probability of evidence $E$ given hypothesis $H$ (also called the likelihood).
- $P(H)$ is the probability of hypothesis $H$ before observing evidence $E$ (also called the prior probability).
- $P(E)$ is the probability of evidence $E$ regardless of hypothesis $H$ (also called the marginal probability).

For example, suppose you want to know the probability that a person has a rare disease given that they tested positive for it. The test has a 99% accuracy rate, meaning that it correctly identifies 99% of people who have the disease and 99% of people who don't have the disease. The disease affects 1 in 10,000 people in the population.

Using Bayes's theorem, you can calculate:

- $P(H)$: The prior probability of having the disease is 0.0001 (1 in 10,000).
- $P(E)$: The marginal probability of testing positive is 0.0001 x 0.99 + 0.9999 x 0.01 = 0.010098 (the sum of true positives and false positives).
- $P(E|H)$: The likelihood of testing positive given that you have the disease is 0.99 (the accuracy rate).
- $P(H|E)$: The posterior probability of having the disease given that you tested positive is $\frac{0.99 x 0.0001}{0.010098} = 0.0098$ (about 1%).

This means that even if you test positive for the disease, there is only a 1% chance that you actually have it, because the disease is so rare in the population.

Q5: A **confidence interval** is a range of values that estimates an unknown parameter of a population, such as the mean, proportion, or difference between two groups. A confidence interval is calculated at a specified confidence level, which indicates how confident you are that the interval contains the true parameter value.¹²³

To calculate a confidence interval, you need to know:

- The point estimate of your parameter, which is the value obtained from your sample (e.g., the sample mean or proportion).
- The standard error of your parameter, which is the standard deviation of your sampling distribution (e.g., the standard deviation of the sample mean or proportion divided by the square root of the sample size).
- The critical value for your confidence level, which is the number of standard errors away from the point estimate that you need to go in order to capture the true parameter value with a certain probability (e.g., for a 95% confidence level, the critical value is 1.96 for a normal distribution).

The formula for a confidence interval is:

$$\text{point estimate} \pm \text{critical value} \times \text{standard error}$$

For example, suppose you want to estimate the mean height of adult males in India using a random sample of 100 men. You find that the sample mean is 167 cm and the sample standard deviation is 10 cm. You want to calculate a 95% confidence interval for the population mean.

Using the formula, you can calculate:

- The point estimate is 167 cm.
- The standard error is $\frac{10}{\sqrt{100}} = 1$ cm.
- The critical value is 1.96 for a 95% confidence level.
- The confidence interval is $167 \pm 1.96 \times 1 = (165.04, 168.96)$ cm.

This means that you are 95% confident that the true mean height of adult males in India is between 165.04 cm and 168.96 cm.


Q6: A **normal distribution** is a type of continuous probability distribution that describes how likely it is to obtain a certain value of a random variable. A normal distribution has a bell-shaped curve that is symmetric around the mean, which is also the median and mode of the distribution.¹²³

The formula for the probability density function of a normal distribution is:

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

where:

- $x$ is the value of the random variable.
- $\mu$ is the mean of the distribution.
- $\sigma$ is the standard deviation of the distribution.

The normal distribution has some important properties, such as:

- About 68% of the values are within one standard deviation of the mean, i.e., in the interval $(\mu-\sigma, \mu+\sigma)$.
- About 95% of the values are within two standard deviations of the mean, i.e., in the interval $(\mu-2\sigma, \mu+2\sigma)$.
- About 99.7% of the values are within three standard deviations of the mean, i.e., in the interval $(\mu-3\sigma, \mu+3\sigma)$.

The normal distribution is widely used in statistics and many other fields because it can model many natural phenomena, such as heights, weights, IQ scores, blood pressure, errors, etc. It is also useful for performing statistical tests and inference.⁴⁵

7.To calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5, you need to follow these steps:

Find the sample size (n) by counting the number of data points in the sample.
Find the standard error (SE) by dividing the standard deviation (SD) by the square root of the sample size: SE = SD / √n
Find the margin of error (ME) by multiplying the standard error by the z-score corresponding to the confidence level of 95%, which is 1.96: ME = SE * 1.96
Find the lower and upper limits of the confidence interval by adding and subtracting the margin of error from the sample mean: Lower limit = Mean - ME, Upper limit = Mean + ME
Interpret the results by stating that you are 95% confident that the true population mean lies between the lower and upper limits.
For example, if your sample size is 25, then your standard error is 5 / √25 = 1, your margin of error is 1 * 1.96 = 1.96, your lower limit is 50 - 1.96 = 48.04, and your upper limit is 50 + 1.96 = 51.96. You can say that you are 95% confident that the true population mean is between 48.04 and 51.96.