### Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

Estimation statistics is a branch of statistics that deals with the estimation of population parameters from sample data. A point estimate is a single value estimate of a population parameter. For example, the sample mean is a point estimate of the population mean. An interval estimate is a range of values that is likely to contain the true value of the population parameter. For example, a 95% confidence interval for the population mean is an interval that is likely to contain the true population mean with 95% confidence.

### Q2. Write a Python function to estimate the population mean using a sample mean and standard deviation.

In [1]:
def estimate_population_mean(sample_mean, sample_std):
  """
  Estimate the population mean using a sample mean and standard deviation.

  Args:
    sample_mean: The sample mean.
    sample_std: The sample standard deviation.

  Returns:
    The estimated population mean.
  """

  estimated_mean = sample_mean + sample_std * 1.96
  return estimated_mean


### Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

Hypothesis testing is a statistical method used to test the validity of an hypothesis about a population parameter. An hypothesis is a statement about a population parameter that we want to test. For example, we might hypothesize that the average weight of male college students is greater than the average weight of female college students.

Hypothesis testing is used to determine whether there is enough evidence to reject the hypothesis. If we reject the hypothesis, then we can conclude that the data is inconsistent with the hypothesis. Otherwise, we cannot conclude that the hypothesis is false.

Hypothesis testing is an important tool for making decisions in many different fields, such as science, medicine, and business. It can be used to test the effectiveness of new treatments, to make predictions about future events, and to make decisions about how to allocate resources.

### Q4. Create a hypothesis that states whether the average weight of male college students is greater than
### the average weight of female college students.

The following is a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students:

Null hypothesis: The average weight of male college students is equal to the average weight of female college students.

Alternative hypothesis: The average weight of male college students is greater than the average weight of female college students.

### Q5. Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.

In [2]:
def hypothesis_test(sample_mean_1, sample_mean_2, sample_std_1, sample_std_2, sample_size_1, sample_size_2):
  """
  Conduct a hypothesis test on the difference between two population means.

  Args:
    sample_mean_1: The mean of the first sample.
    sample_mean_2: The mean of the second sample.
    sample_std_1: The standard deviation of the first sample.
    sample_std_2: The standard deviation of the second sample.
    sample_size_1: The size of the first sample.
    sample_size_2: The size of the second sample.

  Returns:
    A tuple of (p-value, decision).
  """

  # Calculate the t-statistic.
  t_statistic = (sample_mean_1 - sample_mean_2) / np.sqrt(sample_std_1**2 / sample_size_1 + sample_std_2**2 / sample_size_2)

  # Calculate the p-value.
  p_value = t.cdf(t_statistic, df=sample_size_1 + sample_size_2 - 2)

  # Make a decision.
  decision = "Reject the null hypothesis" if p_value < 0.05 else "Do not reject the null hypothesis"

  return p_value, decision


### Q6: What is a null and alternative hypothesis? Give some examples.

A null hypothesis is a statement about a population parameter that we assume to be true unless there is enough evidence to reject it. An alternative hypothesis is a statement that is the opposite of the null hypothesis.

Some examples of null and alternative hypotheses are:

Null hypothesis: The average weight of male college students is equal to the average weight of female college students.
Alternative hypothesis: The average weight of male college students is greater than the average weight of female college students.
Null hypothesis: The average price of a gallon of gasoline is $3.00.
Alternative hypothesis: The average price of a gallon of gasoline is not $3.00.
### Q7: Write down the steps involved in hypothesis testing.

The steps involved in hypothesis testing are:

State the null and alternative hypotheses.
Select a significance level.
Calculate the test statistic.
Determine the p-value.
Make a decision.

### Q8. Define p-value and explain its significance in hypothesis testing.

The p-value is the probability of getting a test statistic as extreme as the one we observed, assuming that the null hypothesis is true. The p-value is used to determine whether to reject the null hypothesis.

A low p-value (typically less than 0.05) means that the probability of getting a test statistic as extreme as the one we observed is very low if the null hypothesis is true. In this case, we would reject the null hypothesis and conclude that the alternative hypothesis is true.

A high p-value (typically greater than 0.05) means that the probability of getting a test statistic as extreme as the one we observed is not very low if the null hypothesis is true. In this case, we would not reject the null hypothesis and we cannot conclude that the alternative hypothesis is true.

### Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.

In [None]:
import matplotlib.pyplot as plt
import numpy.random as rnd

# Set the degrees of freedom.
df = 10

# Generate a t-distribution.
t_distribution = rnd.t(df)

# Plot the t-distribution.
plt.hist(t_distribution)
plt.xlabel("t-statistic")
plt.ylabel("Frequency")
plt.title("Student's t-distribution with df={}".format(df))
plt.show()


### Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

In [4]:
import numpy as np
import scipy.stats as stats

# Define the two samples.
sample_1 = np.random.randn(10)
sample_2 = np.random.randn(10)

# Calculate the t-statistic.
t_statistic = stats.ttest_ind(sample_1, sample_2)

# Calculate the p-value.
p_value = t_statistic.pvalue

# Print the results.
print("t-statistic:", t_statistic)
print("p-value:", p_value)


t-statistic: Ttest_indResult(statistic=0.1333752787572409, pvalue=0.8953764179921657)
p-value: 0.8953764179921657




### Q11: What is Student’s t distribution? When to use the t-Distribution.

Student's t-distribution is a probability distribution that is used when the sample size is small and the population standard deviation is unknown. It is a bell-shaped distribution that is similar to the normal distribution, but it has fatter tails.

The t-distribution is used in hypothesis testing when the sample size is small and the population standard deviation is unknown. It is also used to calculate confidence intervals for the population mean when the sample size is small and the population standard deviation is unknown.

### Q12: What is t-statistic? State the formula for t-statistic.

The t-statistic is a statistical test that is used to compare two sample means. It is calculated by subtracting the two sample means and dividing by the standard error of the difference between the means.

The formula for the t-statistic is:

```
t = (x̄1 - x̄2) / s̄√(n1/n2)
```

where:

* x̄1 is the mean of the first sample
* x̄2 is the mean of the second sample
* s̄ is the pooled standard deviation of the two samples
* n1 is the size of the first sample
* n2 is the size of the second sample

### Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval.

The 95% confidence interval for the population mean revenue is:

```
(500 - 1.96 * 50 / √50, 500 + 1.96 * 50 / √50)
```

```
= (470, 530)
```

We can be 95% confident that the true population mean revenue is between $470 and $530.

### Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

The null hypothesis is that the new drug does not decrease blood pressure. The alternative hypothesis is that the new drug does decrease blood pressure.

The t-statistic for this test is:

```
t = (8 - 10) / 3 √100
```

```
= -1.11
```

The p-value for this test is 0.265.

Since the p-value is greater than the significance level of 0.05, we cannot reject the null hypothesis. Therefore, there is not enough evidence to conclude that the new drug decreases blood pressure.

### Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.

The null hypothesis is that the true mean weight of the products is 5 pounds. The alternative hypothesis is that the true mean weight of the products is less than 5 pounds.

The t-statistic for this test is:

```
t = (4.8 - 5) / 0.5 √25
```

```
= -2.24
```

The p-value for this test is 0.027.

Since the p-value is less than the significance level of 0.01, we can reject the null hypothesis. Therefore, there is enough evidence to conclude that the true mean weight of the products is less than 5 pounds.


### Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =
30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.

The null hypothesis is that the population means for the two groups are equal. The alternative hypothesis is that the population means for the two groups are not equal.

The t-statistic for this test is:

```
t = (80 - 75) / √(10^2/30 + 8^2/40)
```

```
= 2.13
```

The p-value for this test is 0.034.

Since the p-value is less than the significance level of 0.01, we can reject the null hypothesis. Therefore, there is enough evidence to conclude that the population means for the two groups are not equal.

### Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.

The 99% confidence interval for the population mean is:

```
(4 - 2.576 * 1.5 / √50, 4 + 2.576 * 1.5 / √50)
```

```
= (3.66, 4.34)
```

We can be 99% confident that the true population mean number of ads watched by viewers during a TV program is between 3.66 and 4.34.

