Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

In [1]:
#Estimation statistics involves the use of sample data to estimate population parameters. There are two main types of estimates:
# point estimates and interval estimates.

# Point Estimate:A point estimate is a single value that is used to approximate the population parameter.

#Interval Estimate:An interval estimate provides a range of values within which the true population parameter is likely to lie, 
# along with a level of confidence.


Q2. Write a Python function to estimate the population mean using a sample mean and standard
deviation.

In [2]:
import scipy.stats as stats
import math

def estimate_population_mean(sample_mean, sample_std, sample_size, confidence_level=0.95):
    # Calculate the critical value (z) based on the confidence level
    z_value = stats.norm.ppf((1 + confidence_level) / 2)

    # Calculate the margin of error
    margin_of_error = z_value * (sample_std / math.sqrt(sample_size))

    # Calculate the confidence interval
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    return lower_bound, upper_bound

# Example usage:
sample_mean = 50.5
sample_std = 10.2
sample_size = 100
confidence_level = 0.95

interval = estimate_population_mean(sample_mean, sample_std, sample_size, confidence_level)
print(f"Population mean estimate: {sample_mean:.2f}")
print(f"95% Confidence Interval: ({interval[0]:.2f}, {interval[1]:.2f})")

Population mean estimate: 50.50
95% Confidence Interval: (48.50, 52.50)


Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

In [3]:
# Hypothesis testing is a statistical method used to make inferences about a population based on a sample of data. It involves 
# formulating a hypothesis about a population parameter, collecting and analyzing data, and then making a decision about whether
# the evidence supports or contradicts the initial hypothesis.
# Importance of Hypothesis Testing:
# 1.Inference about Populations
# 2.Decision Making
# 3.Risk Assessment

Q4. Create a hypothesis that states whether the average weight of male college students is greater than
the average weight of female college students.

In [4]:
# To create a hypothesis comparing the average weight of male and female college students, you can formulate a pair of null and 
# alternative hypotheses.
# NULL HYPOTHESIS :
#     average weight of male <= average weight of female
# ALTERNATE HYPOTHESIS :
#     average weight of female < average weight of male

Q5. Write a Python script to conduct a hypothesis test on the difference between two population means,
given a sample from each population.

In [5]:
import scipy.stats as stats

def two_sample_t_test(sample1, sample2, alpha=0.05):
    # Perform two-sample t-test
    t_stat, p_value = stats.ttest_ind(sample1, sample2, equal_var=False)

    # Compare p-value to the significance level (alpha)
    if p_value < alpha:
        print("Reject the null hypothesis")
        print("There is enough evidence to suggest a significant difference between the population means.")
    else:
        print("Fail to reject the null hypothesis")
        print("There is not enough evidence to suggest a significant difference between the population means.")

    # Print the t-statistic and p-value
    print(f"T-statistic: {t_stat:.4f}")
    print(f"P-value: {p_value:.4f}")

# Example usage:
# Assume 'sample_male' and 'sample_female' are the weight samples for male and female college students.
sample_male = [75, 80, 85, 78, 77, 82, 79, 81, 84, 76]
sample_female = [65, 70, 75, 68, 67, 72, 69, 71, 74, 66]

# Set the significance level (alpha)
alpha = 0.05

# Conduct the two-sample t-test
two_sample_t_test(sample_male, sample_female, alpha)

Reject the null hypothesis
There is enough evidence to suggest a significant difference between the population means.
T-statistic: 6.7049
P-value: 0.0000


Q6: What is a null and alternative hypothesis? Give some examples.

In [6]:
# Null Hypothesis :Represents a default or status quo assumption. It often states that there is no effect or no difference.
# Alternative Hypothesis: Represents a statement that contradicts the null hypothesis, suggesting an effect, difference, or 
# relationship.
# example:
# NULL HYPOTHESIS :
#     average weight of male <= average weight of female
# ALTERNATE HYPOTHESIS :
#     average weight of female < average weight of male

Q7: Write down the steps involved in hypothesis testing.

In [7]:
# The process of hypothesis testing generally involves the following steps:

# 1.Formulate Hypotheses: 
# Null Hypothesis :Represents a default or status quo assumption. It often states that there is no effect or no difference.
# Alternative Hypothesis: Represents a statement that contradicts the null hypothesis, suggesting an effect, difference, or 
# relationship.
#2.Collect and Analyze Data:Collect a sample of data relevant to the hypothesis.Perform statistical analysis on the sample data.
# 3.Make a Decision:Based on the analysis, decide whether to reject the null hypothesis in favor of the alternative hypothesis, 
# or fail to reject the null hypothesis.
# 4.Draw Conclusions:Interpret the results and draw conclusions about the population based on the sample data.

Q8. Define p-value and explain its significance in hypothesis testing.

In [8]:
# P-value:The p-value, or probability value, is a measure that helps assess the strength of evidence against a null hypothesis 
# in hypothesis testing. It quantifies the probability of obtaining results as extreme as, or more extreme than, the observed 
# results under the assumption that the null hypothesis is true.

# Significance in Hypothesis Testing: It helps researchers and analysts make informed decisions about whether to reject or fail 
# to reject the null hypothesis based on the observed data.

Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom
parameter set to 10.

In [9]:
import numpy as np
import scipy.stats as stats

# Set the degrees of freedom
degrees_of_freedom = 10

# Generate x values
x_values = np.linspace(-4, 4, 1000)

# Calculate the probability density function (PDF) for the t-distribution
t_pdf = stats.t.pdf(x_values, df=degrees_of_freedom)


Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

In [10]:
import numpy as np
from scipy.stats import ttest_ind

def two_sample_t_test(sample1, sample2, alpha=0.05):
    # Perform two-sample t-test assuming equal variances
    t_stat, p_value = ttest_ind(sample1, sample2)

    # Display the results
    print("Sample 1 Mean:", np.mean(sample1))
    print("Sample 2 Mean:", np.mean(sample2))
    print("Two-Sample T-Test Results:")
    print(f"T-Statistic: {t_stat:.4f}")
    print(f"P-Value: {p_value:.4f}")

    # Check for significance at the specified alpha level
    if p_value < alpha:
        print("Reject the null hypothesis: There is enough evidence to suggest a significant difference.")
    else:
        print("Fail to reject the null hypothesis: There is not enough evidence to suggest a significant difference.")

# Generate two random samples with equal size
np.random.seed(42)  # Setting seed for reproducibility
sample_size = 30
sample1 = np.random.normal(loc=50, scale=10, size=sample_size)
sample2 = np.random.normal(loc=55, scale=10, size=sample_size)

# Perform the two-sample t-test
two_sample_t_test(sample1, sample2)

Sample 1 Mean: 48.118531041489625
Sample 2 Mean: 53.788375297100565
Two-Sample T-Test Results:
T-Statistic: -2.3981
P-Value: 0.0197
Reject the null hypothesis: There is enough evidence to suggest a significant difference.


Q11: What is Student’s t distribution? When to use the t-Distribution.

In [11]:
# The Student's t-distribution, often referred to simply as the t-distribution, is a probability distribution that arises in the
# context of statistical inference. It is used to model the distribution of sample means when the underlying population standard
# deviation is unknown and must be estimated from the sample.
# When to use the t-Distribution:
# 1.Small Sample Size
# 2.Population Standard Deviation is Unknown
# 3.Comparing Means

Q12: What is t-statistic? State the formula for t-statistic.

In [12]:
# The t-statistic is a measure used in hypothesis testing to determine if there is a significant difference between sample and
# population means or between the means of two independent samples. 
# The formula for the t-statistic :
#     t = (x bar - population mean)/ (s/sqrt(n))

Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval.

In [13]:
# n = 50
# x bar = 500
# s = 50
# c.i = 95%
# alpha = 0.05
# 1-0.025 = 0.9750=+1.96
# lower ci = 500 - (1.96)50/sqrt(50) = 486.139
# higher ci = 500 + (1.96)50/sqrt(50) = 513.859

# Confidence Interval (95.0%): (486.139, 513.859)

Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

In [14]:
import numpy as np
import scipy.stats as stats

sample_mean = 8
hypothesized_mean = 10
sample_std = 3
sample_size = 100
significance_level = 0.05

t_statistic = (sample_mean - hypothesized_mean) / (sample_std / np.sqrt(sample_size))
degrees_of_freedom = sample_size - 1
critical_t_value = stats.t.ppf(1 - significance_level / 2, df=degrees_of_freedom)
print(f"t-Statistic: {t_statistic:.4f}")
print(f"Critical t-Value: {critical_t_value:.4f}")

# Check for significance
if np.abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis: There is enough evidence to suggest a significant difference.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to suggest a significant difference.")

t-Statistic: -6.6667
Critical t-Value: 1.9842
Reject the null hypothesis: There is enough evidence to suggest a significant difference.


Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.

In [15]:
# population mean = 5
# s = 0.5
# n = 25
# x bar = 4.8
# Ho = population mean >= 5
# H1 = population mean<5
# t = (xbar - population mean )/ s/sqrt(n)
# t = (4.8 - 5)/0.5/sqrt(25)
# t = -2
# df = n-1 = 25-1
# df = 24
# critical_Value = -2.4922
# P-Value: 0.0285
# Reject the null hypothesis: There is enough evidence to suggest the true mean weight is less than 5 pounds.

Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =
30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.

In [16]:
# Ho : population_mean1 = population_mean2
# H1 : population_mean1 != population_mean2
# # Group 1 data
# mean1 = 80
# std1 = 10
# n1 = 30

# # Group 2 data
# mean2 = 75
# std2 = 8
# n2 = 40

# # Significance level
# significance_level = 0.01
# t_statistic = (mean1 - mean2) / sqrt((std1**2 / n1) + (std2**2 / n2))
#  = (80 - 75)/ sqrt((10**2/30)+(8**2/40))
# t_statistics = 2.2511
# degrees_of_freedom = n1 + n2 - 2 = 68
# Critical t-Value: 2.6501
# P-Value: 0.0276
# Fail to reject the null hypothesis: There is not enough evidence to suggest the population means are not equal.

Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.

In [17]:
import numpy as np
import scipy.stats as stats

sample_mean = 4
sample_std = 1.5
sample_size = 50
confidence_level = 0.99  

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical t-value
critical_t_value = stats.t.ppf((1 + confidence_level) / 2, df=degrees_of_freedom)

# Calculate the margin of error
margin_of_error = critical_t_value * (sample_std / np.sqrt(sample_size))

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

# Print the results
print(f"Sample Mean: {sample_mean}")
print(f"Confidence Interval ({confidence_level * 100}%): ({confidence_interval_lower:.4f}, {confidence_interval_upper:.4f})")


Sample Mean: 4
Confidence Interval (99.0%): (3.4315, 4.5685)
