# Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

Estimation in Statistics:
Estimation in statistics refers to the process of using sample data to make educated guesses or inferences about population parameters (such as means, proportions, variances, etc.) when the entire population cannot be measured or observed directly. Estimation involves using the information gathered from a sample to estimate the true, unknown values of population parameters.

There are two main types of estimation in statistics: point estimation and interval estimation.

Point Estimate:
A point estimate is a single value that is calculated from sample data and is used to estimate an unknown population parameter. It provides a single "best guess" of the parameter's value based on the available sample information. However, a point estimate does not provide information about the uncertainty associated with the estimate.

For example, if you want to estimate the average height of students in a university, you might take a sample of students and calculate the sample mean height. The calculated sample mean would be a point estimate of the population mean height.

Interval Estimate:
An interval estimate, also known as a confidence interval, provides a range of values within which the true population parameter is believed to lie with a certain level of confidence. It accounts for the inherent uncertainty in estimation by providing a range of values rather than a single point. The confidence interval indicates the degree of uncertainty associated with the estimate.

# Q2. Write a Python function to estimate the population mean using a sample mean and standard deviation.

# Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

Hypothesis Testing:
Hypothesis testing is a fundamental method in statistics that involves making decisions about population parameters based on sample data. It's a structured process for evaluating and testing claims, assertions, or hypotheses about a population using evidence from a sample. Hypothesis testing allows us to draw conclusions and make informed decisions about the validity of certain statements or assumptions.

Importance of Hypothesis Testing:
Hypothesis testing serves several important purposes in statistics and scientific research:

Evidence-Based Decisions: Hypothesis testing provides a systematic approach to making decisions based on evidence from data. It helps determine whether a claim or hypothesis is supported by the available data or if there's a need for further investigation.

Scientific Validity: In scientific research, hypotheses are formulated to test theories or explanations. Hypothesis testing allows researchers to validate or reject these hypotheses using empirical evidence.

Statistical Significance: Hypothesis testing quantifies the level of statistical significance, which indicates whether observed differences or effects in the data are likely to have occurred due to random chance or if they represent genuine patterns.

Comparison and Comparison: Hypothesis testing enables comparison between sample data and known or hypothesized population parameters. It helps determine if observed differences are statistically significant and if they are likely to generalize to the entire population.

# Q4. Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.

Null Hypothesis (H0):
The average weight of male college students is equal to or less than the average weight of female college students.
H0: μ_male ≤ μ_female

Alternative Hypothesis (Ha):
The average weight of male college students is greater than the average weight of female college students.
Ha: μ_male > μ_female

In this case, the null hypothesis assumes that there is no significant difference in the average weights of male and female college students, while the alternative hypothesis suggests that the average weight of male students is greater than that of female students.

# Q5. Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.

# Q6: What is a null and alternative hypothesis? Give some examples.

Null Hypothesis (H0):
The null hypothesis is a statement of no effect, no difference, or no relationship. It's often formulated as the default assumption to be tested against the available data. In hypothesis testing, the null hypothesis serves as the benchmark against which we evaluate the evidence from the sample.

Examples of Null Hypotheses:

Population Mean: H0: The average IQ score of students is 100.
No Effect: H0: A new drug has no effect on reducing blood pressure.
No Difference: H0: There is no difference in the sales performance between two marketing strategies.
Equal Proportions: H0: The proportion of defective products is 0.05.
Alternative Hypothesis (Ha):
The alternative hypothesis is a statement that contradicts the null hypothesis and represents the hypothesis that researchers want to support or establish. It's formulated based on the research question and the expected relationship between variables.

Examples of Alternative Hypotheses:

Population Mean: Ha: The average IQ score of students is greater than 100.
Effect: Ha: The new drug significantly reduces blood pressure.
Difference: Ha: Strategy A leads to higher sales performance compared to Strategy B.
Not Equal Proportions: Ha: The proportion of defective products is not 0.05.

# Q7: Write down the steps involved in hypothesis testing.


Hypothesis testing involves a structured process to make informed decisions about population parameters based on sample data. Here are the general steps involved in hypothesis testing:

1. Formulate the Hypotheses:

State the null hypothesis (H0) and the alternative hypothesis (Ha) based on the research question and the expected relationship between variables.
2. Set the Significance Level (α):

Choose the significance level (α), which represents the probability of making a Type I error (rejecting a true null hypothesis). Common values for α include 0.05, 0.01, or 0.10.
3. Collect and Analyze Data:

Collect sample data relevant to the hypothesis being tested.
Calculate relevant sample statistics (e.g., sample mean, sample standard deviation).
4. Choose a Statistical Test:

Select an appropriate statistical test based on the type of data, the research question, and the distributional assumptions. Common tests include t-tests, ANOVA, chi-squared tests, etc.
5. Calculate Test Statistic:

Compute the test statistic using the sample data. The test statistic measures the difference between the observed sample data and the values expected under the null hypothesis.
6. Determine the Critical Region or P-value:

Depending on the test chosen, determine the critical region in the distribution of the test statistic or calculate the p-value.
The critical region is the range of values that would lead to rejecting the null hypothesis if the test statistic falls within it.
The p-value is the probability of observing a test statistic as extreme as or more extreme than the one obtained, assuming the null hypothesis is true.
7. Make a Decision:

If using the critical region approach: Compare the test statistic to the critical values. If the test statistic falls in the critical region, reject the null hypothesis. If not, fail to reject the null hypothesis.
If using the p-value approach: Compare the p-value to the chosen significance level (α). If p-value < α, reject the null hypothesis. If p-value ≥ α, fail to reject the null hypothesis.
8. Draw Conclusions:

Based on the decision made in the previous step, draw conclusions about the null hypothesis and the research question.
If the null hypothesis is rejected, support the alternative hypothesis. If not, there isn't enough evidence to support the alternative hypothesis.


# Q8. Define p-value and explain its significance in hypothesis testing.

The p-value, or probability value, is a fundamental concept in hypothesis testing. It quantifies the strength of evidence against the null hypothesis (H0) provided by the sample data. The p-value represents the probability of obtaining a test statistic as extreme as or more extreme than the one observed, assuming that the null hypothesis is true.

Significance of the p-value:

Decision Criterion: In hypothesis testing, the p-value is compared to the chosen significance level (α), also known as the level of significance or alpha level. If the p-value is less than or equal to α, it suggests that the observed sample data is unlikely under the null hypothesis. This leads to the rejection of the null hypothesis in favor of the alternative hypothesis.

Strength of Evidence: A small p-value (typically below α) indicates strong evidence against the null hypothesis. It suggests that the observed data is inconsistent with the null hypothesis and supports the alternative hypothesis. Conversely, a large p-value suggests weak evidence against the null hypothesis.

Quantification of Uncertainty: The p-value provides a measure of the uncertainty associated with the hypothesis test. A small p-value indicates that the observed sample data is unlikely to have occurred by random chance, while a large p-value suggests that the observed data could reasonably occur under the null hypothesis.

Interpretation of Results: Researchers can use the p-value to interpret the results of a hypothesis test. If the p-value is very small, it suggests that the sample data provides strong evidence against the null hypothesis, leading to a conclusion in favor of the alternative hypothesis.

# Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.

# Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

# Q11: What is Student’s t distribution? When to use the t-Distribution.

The Student's t-distribution, often referred to simply as the t-distribution, is a probability distribution that is widely used in statistics. It is used to make inferences about the population mean when the sample size is small (typically when the sample size is less than 30) and when the population standard deviation is unknown.

The t-distribution is similar in shape to the normal distribution but has heavier tails. As the sample size increases, the t-distribution approaches the normal distribution. The t-distribution is parameterized by a parameter called "degrees of freedom" (df), which affects the shape of the distribution.

When to Use the t-Distribution:

Small Sample Sizes: The t-distribution is especially useful when dealing with small sample sizes (typically less than 30). In such cases, the normal distribution assumption might not hold, and using the t-distribution provides more accurate results.

Unknown Population Standard Deviation: When the population standard deviation is unknown and needs to be estimated from the sample, the t-distribution is used to account for the uncertainty introduced by the estimation process.

Confidence Intervals: When calculating confidence intervals for the population mean based on a sample mean, the t-distribution is used. This accounts for the fact that the sample mean's distribution is less certain when the sample size is small.

Hypothesis Testing: When conducting hypothesis tests involving the population mean, and the sample size is small and/or the population standard deviation is unknown, the t-distribution is used for calculating p-values and making decisions about the null hypothesis.

# Q12: What is t-statistic? State the formula for t-statistic.

The t-statistic is a crucial statistic used in hypothesis testing and confidence interval calculations when the sample size is small or when the population standard deviation is unknown. It measures how many standard errors the sample mean is away from the hypothesized population mean under the null hypothesis. The t-statistic is used to assess whether the observed difference between the sample mean and the hypothesized population mean is statistically significant.

The formula for the t-statistic depends on the context in which it is used. Here are two common scenarios:

For One-Sample t-Test:

Suppose you have a sample of size n, a sample mean x̄, and a hypothesized population mean μ0. The formula for the t-statistic for a one-sample t-test is:

t = (x̄ - μ0) / (s / √n)

For Two-Sample t-Test (Independent Samples):

Suppose you have two independent samples, each with its own sample mean (x̄1, x̄2) and sample standard deviation (s1, s2). The formula for the t-statistic for an independent two-sample t-test is:

t = (x̄1 - x̄2) / √((s1² / n1) + (s2² / n2))

# Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random sample of 50 days and find the sample mean revenue to be $500 with  a  standard  deviation of $50. Estimate the population mean revenue with a 95% confidence interval.

# Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

# Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5pounds with a significance level of 0.01.

# Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a meanscore of 75 with a standard deviation of 8. Test the hypothesis that the population means for the twogroups are equal with a significance level of 0.01.

# Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standarddeviation of 1.5. Estimate the population mean with a 99% confidence interval.