In [None]:
Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

In [None]:
Estimation Statistics
Estimation statistics is a branch of statistics that focuses on estimating population parameters based on sample data. 
The main goal is to provide a way to infer characteristics about a larger population without having to collect data 
from every individual within that population.

Estimation can be divided into two main types: point estimation and interval estimation.

Point Estimate
A point estimate provides a single value as an estimate of a population parameter. It is derived from sample data and 
serves as the best guess of the parameter's true value. 

Examples:
- Sample Mean: The average of a sample (\(\bar{x}\)) is used as a point estimate for the population mean (mu).
- Sample Proportion: The proportion of successes in a sample (\(\hat{p}\)) is used as a point estimate for the 
population proportion (p).

Characteristics:
- Simplicity: Point estimates are straightforward to compute and interpret.
- Limited Information: They provide no indication of the uncertainty or variability associated with the estimate.

Interval Estimate
An interval estimate provides a range of values within which the population parameter is expected to lie, along with 
a certain level of confidence. This approach incorporates the idea of uncertainty by acknowledging that point estimates
can vary due to sampling error.

Examples:
- Confidence Interval for the Mean: A confidence interval (e.g., 95% CI) for the population mean might be expressed as 
(bar{x} - E, bar{x} + E), where (E) is the margin of error.
- Confidence Interval for Proportions: For a sample proportion, the interval might be expressed as 
(hat{p} - E, hat{p} + E).

Characteristics:
- Provides Information about Uncertainty: Interval estimates convey the reliability of the estimate and how much 
uncertainty is involved.
- Level of Confidence: The confidence level (e.g., 95% or 99%) indicates how certain we are that the true parameter 
lies within the interval.

In [None]:
Q2. Write a Python function to estimate the population mean using a sample mean and standard deviation.

In [None]:
import numpy as np
import scipy.stats as stats

def estimate_population_mean(sample_mean, sample_std, sample_size, confidence_level=0.95):
    """
    Estimate the population mean using a sample mean and standard deviation.
    
    Parameters:
    sample_mean (float): The mean of the sample
    sample_std (float): The standard deviation of the sample
    sample_size (int): The size of the sample
    confidence_level (float): The confidence level for the interval (default is 0.95)
    
    Returns:
    tuple: A tuple containing the estimated population mean, 
           lower bound of the confidence interval, 
           and upper bound of the confidence interval
    """
    # Calculate the standard error
    standard_error = sample_std / np.sqrt(sample_size)
    
    # Calculate the critical value
    critical_value = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1)
    
    # Calculate the margin of error
    margin_of_error = critical_value * standard_error
    
    # Calculate confidence interval
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error
    
    return sample_mean, lower_bound, upper_bound

# Example usage
sample_mean = 50
sample_std = 10
sample_size = 30
confidence_level = 0.95

estimated_mean, lower_ci, upper_ci = estimate_population_mean(sample_mean, sample_std, sample_size, confidence_level)
print(f"Estimated Population Mean: {estimated_mean}")
print(f"95% Confidence Interval: ({lower_ci:.2f}, {upper_ci:.2f})")

In [None]:
Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

In [None]:
Hypothesis Testing
Hypothesis testing is a statistical method used to make decisions about a population based on sample data. It involves
formulating a statement (hypothesis) about a population parameter and then using sample data to determine whether to 
reject that hypothesis.

Key Components of Hypothesis Testing
1. Null Hypothesis: This is the default assumption that there is no effect or no difference. It represents a statement 
of no change or status quo.
2. Alternative Hypothesis: This is the hypothesis that indicates the presence of an effect or a difference. It is what 
the researcher aims to support.
3. Test Statistic: A standardized value calculated from sample data, which is used to determine the likelihood of 
observing the sample data under the null hypothesis.
4. Significance Level: The probability threshold (commonly set at 0.05) below which the null hypothesis will be 
rejected. It represents the risk of making a Type I error (rejecting a true null hypothesis).
5. P-Value: The probability of obtaining a test statistic as extreme as the one observed, under the null hypothesis. 
A smaller p-value indicates stronger evidence against the null hypothesis.
6. Decision: Based on the p-value and the significance level, researchers decide whether to reject the null hypothesis 
or fail to reject it.

Why is Hypothesis Testing Used?

- Decision Making: It helps researchers make informed decisions based on data rather than intuition or speculation.
- Scientific Validation: Hypothesis testing is essential in scientific research to validate findings and ensure they 
are not due to random chance.
- Comparative Analysis: It allows for comparing groups, treatments, or conditions to determine if differences are 
statistically significant.

Importance of Hypothesis Testing
1. Objective Analysis: Provides a structured framework for evaluating evidence, reducing biases in decision-making.
2. Quantitative Assessment: Allows researchers to quantify the strength of evidence against the null hypothesis 
through p-values and confidence intervals.
3. Guiding Research: Helps researchers determine the direction of further studies based on whether hypotheses are 
supported or refuted.
4. Error Management: Aids in controlling errors (Type I and Type II) by defining acceptable risk levels, which is 
critical in many fields such as medicine, psychology, and social sciences.
5. Reproducibility: Enhances the credibility of research findings, as hypotheses can be tested and validated by others.

In [None]:
Q4. Create a hypothesis that states whether the average weight of male college students is greater than
the average weight of female college students.

In [None]:
Hypothesis Statement
To compare the average weights of male and female college students, we can formulate the null and alternative 
hypotheses as follows:
1. Null Hypothesis: The average weight of male college students is equal to or less than the average weight of female 
college students.
2. Alternative Hypothesis: The average weight of male college students is greater than the average weight of female 
college students.

- This hypothesis can be tested using sample data from male and female college students, applying appropriate 
statistical methods (e.g., a t-test) to determine whether there is enough evidence to reject the null hypothesis in 
favor of the alternative hypothesis.
- The significance level (typically \(\alpha = 0.05\)) will be used to determine the threshold for rejecting the null 
hypothesis based on the calculated p-value.

In [None]:
Q5. Write a Python script to conduct a hypothesis test on the difference between two population means,
given a sample from each population.

In [None]:
import numpy as np
import scipy.stats as stats

def hypothesis_test(sample1, sample2, alpha=0.05):
    """
    Conduct a two-sample t-test to compare the means of two populations.

    Parameters:
    sample1 (list or array): Sample data from population 1
    sample2 (list or array): Sample data from population 2
    alpha (float): Significance level for the hypothesis test

    Returns:
    None
    """
    
    # Calculate the sample means and standard deviations
    mean1 = np.mean(sample1)
    mean2 = np.mean(sample2)
    std1 = np.std(sample1, ddof=1)  # Sample standard deviation
    std2 = np.std(sample2, ddof=1)

    # Calculate the sample sizes
    n1 = len(sample1)
    n2 = len(sample2)

    # Conduct the t-test
    t_statistic, p_value = stats.ttest_ind(sample1, sample2, alternative='greater')

    # Output the results
    print(f"Sample 1 Mean: {mean1:.2f}, Sample 2 Mean: {mean2:.2f}")
    print(f"Sample 1 Std Dev: {std1:.2f}, Sample 2 Std Dev: {std2:.2f}")
    print(f"T-statistic: {t_statistic:.2f}, P-value: {p_value:.4f}")

    # Decision
    if p_value < alpha:
        print("Reject the null hypothesis: There is significant evidence that the average weight of male college students is greater.")
    else:
        print("Fail to reject the null hypothesis: No significant evidence that the average weight of male college students is greater.")

# Example data
sample_male_weights = [70, 75, 80, 85, 90, 78, 82, 76, 84, 88]  # Sample from male college students
sample_female_weights = [65, 68, 70, 72, 75, 67, 71, 74, 69, 66]  # Sample from female college students

# Conduct the hypothesis test
hypothesis_test(sample_male_weights, sample_female_weights, alpha=0.05)

In [None]:
Q6: What is a null and alternative hypothesis? Give some examples.

In [None]:
Null Hypothesis: This is a statement that there is no effect, no difference, or no relationship between groups or 
variables in a study. It serves as a baseline that researchers aim to test against.

Alternative Hypothesis: This is a statement that indicates the presence of an effect, difference, or relationship. 
It represents what researchers want to prove or establish through their study.

In [None]:
Q7: Write down the steps involved in hypothesis testing.

In [None]:
Steps in Hypothesis Testing

1. Formulate the Hypotheses:
   - Null Hypothesis: State the null hypothesis, which represents no effect or no difference.
   - Alternative Hypothesis: State the alternative hypothesis, which represents the effect or difference you expect 
to find.

2. Choose the Significance Level:
   - Select a significance level, usually set at 0.05, 0.01, or 0.10. This value defines the threshold for rejecting 
the null hypothesis.

3. Collect Data:
   - Gather the relevant data through experiments, surveys, or observations. Ensure that the data is representative 
of the population being studied.

4. Select the Appropriate Test:
   - Choose a statistical test based on the type of data, the sample size, and the hypotheses (e.g., t-test, chi-square
test, ANOVA).

5. Calculate the Test Statistic:
   - Use the selected statistical test to calculate the test statistic from the sample data. This statistic will help 
determine how far the sample data is from the null hypothesis.

6. Determine the p-value:
   - Calculate the p-value, which is the probability of observing the test statistic (or something more extreme) under 
the null hypothesis. 

7. Make a Decision:
   - Compare the p-value to the significance level:
     - If (p = leq alpha): Reject the null hypothesis.
     - If (p > alpha): Fail to reject the null hypothesis.

8. Draw a Conclusion:
   - Summarize the findings, including whether the null hypothesis was rejected or not. Interpret the results in the 
context of the research question.

9. Report the Results:
   - Clearly present the results, including the test statistic, p-value, confidence intervals, and any relevant graphs 
or tables. Discuss the implications of the findings.

In [None]:
Q8. Define p-value and explain its significance in hypothesis testing.

In [None]:
Definition of p-value
The p-value is a statistical measure that helps to determine the significance of the results obtained from a hypothesis 
test. Specifically, it is the probability of observing the test statistic, or something more extreme, assuming that 
the null hypothesis is true.

Significance of p-value in Hypothesis Testing
1. Decision-Making Tool: The p-value aids researchers in deciding whether to reject the null hypothesis:
   - A low p-value (typically (p < leq 0.05)) indicates strong evidence against the null hypothesis, 
leading to its rejection.
   - A high p-value (typically (p > 0.05)) suggests insufficient evidence to reject the null hypothesis, so it is not 
rejected.

2. Measures Strength of Evidence: The p-value quantifies how compatible the sample data is with the null hypothesis. 
Smaller p-values indicate that the observed data is less likely under the null hypothesis, suggesting that an effect or
difference is present.

3. Interpretation:
   - p < 0.01: Strong evidence against (H_0).
   - 0.01 < p < 0.05: Moderate evidence against (H_0).
   - 0.05 < p < 0.10: Weak evidence against (H_0).
   - p > 0.10: Weak to no evidence against (H_0).

4. Contextual Interpretation: The significance of a p-value depends on the context of the study and the chosen 
significance level. Different fields may have varying thresholds for what constitutes "significant."

5. Not a Measure of Effect Size: It is crucial to understand that the p-value does not indicate the magnitude of an 
effect or difference. A small p-value can arise from large sample sizes, even with trivial differences, while a larger
p-value might indicate no substantial evidence against (H_0).

In [None]:
Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom
parameter set to 10.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Set the degrees of freedom
df = 10

# Generate x values
x = np.linspace(-5, 5, 1000)

# Calculate the t-distribution PDF
y = t.pdf(x, df)

# Create the plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, label=f"Student's t-distribution (df={df})", color='blue')
plt.title("Student's t-Distribution")
plt.xlabel("x")
plt.ylabel("Probability Density Function (PDF)")
plt.axhline(0, color='black', linewidth=0.5, linestyle='--')
plt.axvline(0, color='red', linewidth=0.5, linestyle='--')
plt.grid()
plt.legend()
plt.show()

In [None]:
Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two
random samples of equal size and a null hypothesis that the population means are equal.

In [None]:
import numpy as np
from scipy import stats

# Generate two random samples of equal size
np.random.seed(0)  # For reproducibility
sample_size = 30
sample1 = np.random.normal(loc=50, scale=10, size=sample_size)
sample2 = np.random.normal(loc=55, scale=10, size=sample_size)

# Display the samples
print("Sample 1:", sample1)
print("Sample 2:", sample2)

# Perform the two-sample t-test
t_statistic, p_value = stats.ttest_ind(sample1, sample2)

# Print the results
print("\nTwo-sample t-test results:")
print("T-statistic:", t_statistic)
print("P-value:", p_value)

# Set a significance level
alpha = 0.05

# Decision based on the p-value
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the population means.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference between the population means.")


In [None]:
Q11: What is Student’s t distribution? When to use the t-Distribution.

In [None]:
What is Student’s t-Distribution?
Student’s t-distribution, commonly referred to as the t-distribution, is a type of probability distribution that is 
symmetric and bell-shaped, similar to the normal distribution but with heavier tails. This distribution is particularly
useful in statistics for small sample sizes and when the population standard deviation is unknown.

Key Characteristics:
1. Shape: The t-distribution is similar in shape to the normal distribution but has thicker tails. This means that it 
provides a greater probability for extreme values, which helps account for the uncertainty introduced by estimating the 
population standard deviation from a small sample.
2. Degrees of Freedom: The shape of the t-distribution is determined by the degrees of freedom (df), which is related 
to the sample size. As the sample size increases, the t-distribution approaches the normal distribution. Specifically, 
the df is typically calculated as (n - 1) for a single sample, where (n) is the sample size.

When to Use the t-Distribution
1. Small Sample Sizes: Use the t-distribution when you have a small sample size (typically (n < 30)). The heavier tails
of the t-distribution help to provide a more accurate estimation of the population parameters under these conditions.
2. Unknown Population Standard Deviation: When the population standard deviation is unknown and needs to be estimated
from the sample data, the t-distribution should be used instead of the normal distribution.
3. Hypothesis Testing: The t-distribution is frequently used in hypothesis testing, particularly in tests such as:
   - One-sample t-tests
   - Two-sample t-tests (independent samples)
   - Paired sample t-tests (dependent samples)
4. Confidence Intervals: When constructing confidence intervals for the mean of a small sample with an unknown 
population standard deviation, the t-distribution is the appropriate choice.

In [None]:
Q12: What is t-statistic? State the formula for t-statistic.

In [None]:
What is t-Statistic?
The t-statistic is a standardized value that is calculated from sample data during a hypothesis test. It is used to 
determine whether to reject the null hypothesis based on the sample data. The t-statistic helps assess how much the 
sample mean deviates from the population mean, considering the variability of the data and the size of the sample.

Formula for t-Statistic
The formula for calculating the t-statistic depends on the context of the hypothesis test (e.g., one-sample, two-sample,
or paired sample t-tests). Here are the formulas for the different scenarios:

1. One-Sample t-Test
For a one-sample t-test, the t-statistic is calculated as follows:

t = {bar{X} - mu}/{s / sqrt{n}}

Where:
- (bar{X}) = Sample mean
- (mu) = Population mean (under the null hypothesis)
- (s) = Sample standard deviation
- (n) = Sample size

2. Two-Sample t-Test (Independent Samples)
For a two-sample t-test comparing two independent samples, the formula is:

t = {bar{X_1} - bar{X_2}}/{sqrt{{s_1^2}/{n_1} + {s_2^2}/{n_2}}}

Where:
- (bar{X_1}) = Mean of the first sample
- (bar{X_2}) = Mean of the second sample
- (s_1) = Standard deviation of the first sample
- (s_2) = Standard deviation of the second sample
- (n_1) = Size of the first sample
- (n_2) = Size of the second sample

In [None]:
Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval.

In [None]:
To estimate the population mean revenue with a 95% confidence interval, we can use the formula for the confidence 
interval for the mean when the population standard deviation is unknown:

    text{Confidence Interval} = bar{X} pm t * left({s}/{sqrt{n}}right)

Where:
- (bar{X}) = Sample mean
- (t) = t-value from the t-distribution for the desired confidence level and degrees of freedom
- (s) = Sample standard deviation
- (n) = Sample size

Given Data:
- Sample mean (bar{X}) = $500
- Sample standard deviation (s) = $50
- Sample size (n) = 50
- Confidence level = 95%

Step 1: Determine the t-value
The degrees of freedom (df) is calculated as:

text{df} = n - 1 = 50 - 1 = 49

Using a t-table or a statistical calculator for a 95% confidence level and 49 degrees of freedom, the t-value is 
approximately:

t approx 2.0096

Step 2: Calculate the Standard Error (SE)

text{SE} = frac{s}/{sqrt{n}} = {50}/{sqrt{50}} = approx 7.071

Step 3: Calculate the Margin of Error (ME)
    
text{ME} = t * text{SE} = 2.0096 * 7.071 = approx 14.19

Step 4: Calculate the Confidence Interval

Now we can calculate the confidence interval:

text{Confidence Interval} = bar{X} pm text{ME} = 500 pm 14.19

So the confidence interval is:

(500 - 14.19, 500 + 14.19) = (485.81, 514.19)

In [None]:
Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

In [None]:
To test the hypothesis that the new drug will decrease blood pressure by 10 mmHg, we will conduct a one-sample t-test.
Here are the steps to perform the hypothesis test:

Step 1: Define the Hypotheses
- Null Hypothesis: The mean decrease in blood pressure is 10 mmHg.
- Alternative Hypothesis (\(H_1\))**: The mean decrease in blood pressure is less than 10 mmHg.

Step 2: Gather the Data
- Sample mean= 8 mmHg
- Population mean under the null hypothesis= 10 mmHg
- Sample standard deviation= 3 mmHg
- Sample size= 100

Step 3: Calculate the Test Statistic
Using the formula for the t-statistic: approx -6.67

Step 4: Determine the Critical Value: approx -1.660

Step 5: Compare the Test Statistic to the Critical Value
- Test Statistic: approx -6.67
- Critical Value: (t_{0.05, 99} = approx -1.660)

Since (-6.67 < -1.660), we reject the null hypothesis.

In [None]:
Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.

In [None]:
At the 0.01 significance level, there is not enough evidence to conclude that the true mean weight of the products is 
less than 5 pounds. The evidence does not support the claim that the mean weight is significantly lower than the 
hypothesized value of 5 pounds.

In [None]:
Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =
30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.

In [None]:
At the 0.01 significance level, there is not enough evidence to conclude that the population means for the two 
groups are different. The data do not support the claim that the study materials had a significant effect on the 
test scores.

In [None]:
Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.

In [None]:
The 99% confidence interval for the population mean number of ads watched by viewers during the TV program is 
approximately:
(3.45,4.55)
(3.45,4.55)

This means we are 99% confident that the true population mean lies within this interval.