***Q1:***Generate a list of 100 integers containing values between 90 to 130 and store it in the variable `int_list`.
After generating the list, find the following:
 (i) Write a Python function to calculate the mean of a given list of numbers.
 Create a function to find the median of a list of numbers.
 (ii) Develop a program to compute the mode of a list of integers.
 (iii) Implement a function to calculate the weighted mean of a list of values and their corresponding weights.
 (iv) Write a Python function to find the geometric mean of a list of positive numbers.
 (v) Create a program to calculate the harmonic mean of a list of values.
 (vi) Build a function to determine the midrange of a list of numbers (average of the minimum and maximum).
 (vii) Implement a Python program to find the trimmed mean of a list, excluding a certain percentage of
outliers

In [2]:
# Generate a List of 100 Integers between 90 and 130

import random

int_list = [random.randint(90, 130) for _ in range(100)]


In [3]:
#Calculating the Mean

def calculate_mean(numbers):
    return sum(numbers) / len(numbers)


mean = calculate_mean(int_list)

In [4]:
#Finding the Median

def calculate_median(numbers):
    sorted_numbers = sorted(numbers)
    n = len(sorted_numbers)
    midpoint = n // 2

    if n % 2 == 0:
        return (sorted_numbers[midpoint - 1] + sorted_numbers[midpoint]) / 2
    else:
        return sorted_numbers[midpoint]

median = calculate_median(int_list)


In [5]:
#Computing the Mode

from collections import Counter

def calculate_mode(numbers):
    count = Counter(numbers)
    max_count = max(count.values())
    mode = [k for k, v in count.items() if v == max_count]
    return mode

mode = calculate_mode(int_list)

In [6]:
#Calculating the Weighted Mean

def calculate_weighted_mean(values, weights):
    weighted_sum = sum(v * w for v, w in zip(values, weights))
    total_weight = sum(weights)
    return weighted_sum / total_weight

weights = [random.uniform(0.5, 1.5) for _ in range(100)]  # Example weights
weighted_mean = calculate_weighted_mean(int_list, weights)

In [7]:
#Finding the Geometric Mean

import math

def calculate_geometric_mean(numbers):
    product = math.prod(numbers)
    return product ** (1 / len(numbers))

# Make sure all numbers are positive for geometric mean
geometric_mean = calculate_geometric_mean(int_list)

In [8]:
#Calculating the Harmonic Mean

def calculate_harmonic_mean(numbers):
    reciprocal_sum = sum(1 / num for num in numbers)
    return len(numbers) / reciprocal_sum


harmonic_mean = calculate_harmonic_mean(int_list)

In [9]:
#Determining the Midrange

def calculate_midrange(numbers):
    return (min(numbers) + max(numbers)) / 2

midrange = calculate_midrange(int_list)

In [10]:
#Finding the Trimmed Mean

def calculate_trimmed_mean(numbers, trim_percent):
    sorted_numbers = sorted(numbers)
    n = len(numbers)
    trim_count = int(n * trim_percent / 100)
    trimmed_numbers = sorted_numbers[trim_count:n - trim_count]
    return sum(trimmed_numbers) / len(trimmed_numbers)

trim_percent = 10  # Trim 10% from both ends
trimmed_mean = calculate_trimmed_mean(int_list, trim_percent)

In [None]:
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Weighted Mean: {weighted_mean}")
print(f"Geometric Mean: {geometric_mean}")
print(f"Harmonic Mean: {harmonic_mean}")
print(f"Midrange: {midrange}")
print(f"Trimmed Mean: {trimmed_mean}")

***Q2:*** Generate a list of 500 integers containing values between 200 to 300 and store it in the variable `int_list2`.
After generating the list, find the following:
 (i) Compare the given list of visualization for the given data:
 1. Frequency & Gaussian distribution
 2. Frequency smoothened KDE plot
 3. Gaussian distribution & smoothened KDE plot
 (ii) Write a Python function to calculate the range of a given list of numbers.
 (iii) Create a program to find the variance and standard deviation of a list of numbers.
 (iv) Implement a function to compute the interquartile range (IQR) of a list of values.
 (v) Build a program to calculate the coefficient of variation for a dataset.
 (vi) Write a Python function to find the mean absolute deviation (MAD) of a list of numbers.
 (vii) Create a program to calculate the quartile deviation of a list of values.
 (viii) Implement a function to find the range-based coefficient of dispersion for a dataset

In [None]:
#Generate a List of 500 Integers between 200 and 300

import random

# Generate a list of 500 integers between 200 and 300
int_list2 = [random.randint(200, 300) for _ in range(500)]

In [None]:
'''Visualizations
Frequency & Gaussian distribution
Frequency smoothed KDE plot
Gaussian distribution & smoothed KDE plot'''

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import norm

# Frequency & Gaussian distribution
def plot_frequency_and_gaussian(data):
    plt.figure(figsize=(10, 6))
    sns.histplot(data, bins=20, kde=False, color='blue', label='Frequency', stat='density')
    mean = np.mean(data)
    std_dev = np.std(data)
    x = np.linspace(min(data), max(data), 1000)
    plt.plot(x, norm.pdf(x, mean, std_dev), color='red', label='Gaussian Distribution')
    plt.title('Frequency & Gaussian Distribution')
    plt.legend()
    plt.show()

# Frequency smoothened KDE plot
def plot_frequency_kde(data):
    plt.figure(figsize=(10, 6))
    sns.histplot(data, bins=20, kde=True, color='blue', label='Frequency', stat='density')
    plt.title('Frequency with Smoothed KDE Plot')
    plt.legend()
    plt.show()

# Gaussian distribution & smoothened KDE plot
def plot_gaussian_kde(data):
    plt.figure(figsize=(10, 6))
    mean = np.mean(data)
    std_dev = np.std(data)
    x = np.linspace(min(data), max(data), 1000)
    sns.kdeplot(data, color='blue', label='Smoothed KDE')
    plt.plot(x, norm.pdf(x, mean, std_dev), color='red', label='Gaussian Distribution')
    plt.title('Gaussian Distribution & Smoothed KDE Plot')
    plt.legend()
    plt.show()

plot_frequency_and_gaussian(int_list2)
plot_frequency_kde(int_list2)
plot_gaussian_kde(int_list2)

In [None]:
#Range Calculation

def calculate_range(numbers):
    return max(numbers) - min(numbers)

range_value = calculate_range(int_list2)

In [None]:
#Variance and Standard Deviation

def calculate_variance(numbers):
    mean = sum(numbers) / len(numbers)
    return sum((x - mean) ** 2 for x in numbers) / len(numbers)

def calculate_std_dev(numbers):
    return calculate_variance(numbers) ** 0.5

variance = calculate_variance(int_list2)
std_dev = calculate_std_dev(int_list2)


In [None]:
#Interquartile Range (IQR)

def calculate_iqr(numbers):
    sorted_numbers = sorted(numbers)
    q1 = np.percentile(sorted_numbers, 25)
    q3 = np.percentile(sorted_numbers, 75)
    return q3 - q1

iqr = calculate_iqr(int_list2)

In [None]:
#Coefficient of Variation

def calculate_coefficient_of_variation(numbers):
    mean = np.mean(numbers)
    std_dev = np.std(numbers)
    return std_dev / mean

coefficient_of_variation = calculate_coefficient_of_variation(int_list2)

In [None]:
#Mean Absolute Deviation (MAD)

def calculate_mad(numbers):
    mean = np.mean(numbers)
    return np.mean([abs(x - mean) for x in numbers])

mad = calculate_mad(int_list2)

In [None]:
#Quartile Deviation

def calculate_quartile_deviation(numbers):
    return calculate_iqr(numbers) / 2


quartile_deviation = calculate_quartile_deviation(int_list2)

In [None]:
#Range-Based Coefficient of Dispersion

def calculate_range_coefficient_of_dispersion(numbers):
    range_value = max(numbers) - min(numbers)
    return range_value / (max(numbers) + min(numbers))


range_coefficient_of_dispersion = calculate_range_coefficient_of_dispersion(int_list2)

In [None]:
print(f"Range: {range_value}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")
print(f"Interquartile Range (IQR): {iqr}")
print(f"Coefficient of Variation: {coefficient_of_variation}")
print(f"Mean Absolute Deviation (MAD): {mad}")
print(f"Quartile Deviation: {quartile_deviation}")
print(f"Range-Based Coefficient of Dispersion: {range_coefficient_of_dispersion}")

***Q3:*** Write a Python class representing a discrete random variable with methods to calculate its expected
value and variance.

In [None]:
class DiscreteRandomVariable:
    def __init__(self, values, probabilities):
        #Initialize the discrete random variable.

        if len(values) != len(probabilities):
            raise ValueError("Values and probabilities must have the same length.")
        if not abs(sum(probabilities) - 1.0) < 1e-6:
            raise ValueError("The sum of probabilities must be 1.")

        self.values = values
        self.probabilities = probabilities

    def expected_value(self):
       #Calculate the expected value (mean) of the random variable.
        return sum(value * prob for value, prob in zip(self.values, self.probabilities))

    def variance(self):
        #Calculate the variance of the random variable.
        mean = self.expected_value()
        return sum(((value - mean) ** 2) * prob for value, prob in zip(self.values, self.probabilities))

values = [1, 2, 3, 4, 5]
probabilities = [0.1, 0.2, 0.3, 0.2, 0.2]

# Create a discrete random variable
random_var = DiscreteRandomVariable(values, probabilities)

# Calculate the expected value and variance
expected_value = random_var.expected_value()
variance = random_var.variance()

print(f"Expected Value: {expected_value}")
print(f"Variance: {variance}")


***Q4:***Implement a program to simulate the rolling of a fair six-sided die and calculate the expected value and
variance of the outcomes.

In [None]:
import random

class FairDie:
    def __init__(self, sides=6):
        #Initialize a fair die with the given number of sides.

        self.sides = sides
        self.values = list(range(1, sides + 1))  # Possible outcomes: [1, 2, ..., sides]
        self.probabilities = [1 / sides] * sides  # Equal probability for each outcome

    def roll(self):
        #Simulate rolling the die.

        return random.choice(self.values)

    def expected_value(self):

        #Calculate the expected value of the outcomes of the die.

        return sum(value * prob for value, prob in zip(self.values, self.probabilities))

    def variance(self):
        #Calculate the variance of the outcomes of the die.

        mean = self.expected_value()
        return sum(((value - mean) ** 2) * prob for value, prob in zip(self.values, self.probabilities))

# Create a fair six-sided die
die = FairDie()

# Calculate the expected value and variance
expected_value = die.expected_value()
variance = die.variance()

print(f"Expected Value: {expected_value}")
print(f"Variance: {variance}")


***Q5:***Create a Python function to generate random samples from a given probability distribution (e.g.,
binomial, Poisson) and calculate their mean and variance.

In [None]:
import numpy as np

def generate_samples(distribution, params, sample_size):

    #Generate random samples from a specified probability distribution and calculate their mean and variance.


    if distribution == 'binomial':
        n = params.get('n')
        p = params.get('p')
        if n is None or p is None:
            raise ValueError("For binomial distribution, 'n' and 'p' parameters are required.")
        samples = np.random.binomial(n, p, sample_size)
    elif distribution == 'poisson':
        lam = params.get('lam')
        if lam is None:
            raise ValueError("For Poisson distribution, 'lam' parameter is required.")
        samples = np.random.poisson(lam, sample_size)
    else:
        raise ValueError("Unsupported distribution. Use 'binomial' or 'poisson'.")

    mean = np.mean(samples)
    variance = np.var(samples)

    return samples, mean, variance


samples_binomial, mean_binomial, variance_binomial = generate_samples(
    'binomial', {'n': 10, 'p': 0.5}, 1000)
print(f"Binomial Distribution - Mean: {mean_binomial}, Variance: {variance_binomial}")


samples_poisson, mean_poisson, variance_poisson = generate_samples(
    'poisson', {'lam': 3}, 1000)
print(f"Poisson Distribution - Mean: {mean_poisson}, Variance: {variance_poisson}")


***Q6:*** Write a Python script to generate random numbers from a Gaussian (normal) distribution and compute
the mean, variance, and standard deviation of the samples.

In [None]:
import numpy as np

def generate_gaussian_samples(mean, std_dev, sample_size):

    #Generate random samples from a Gaussian (normal) distribution and calculate their mean, variance, and standard deviation.


    # Generate random samples from the Gaussian distribution
    samples = np.random.normal(mean, std_dev, sample_size)

    # Calculate the mean, variance, and standard deviation of the samples
    sample_mean = np.mean(samples)
    sample_variance = np.var(samples)
    sample_std_dev = np.std(samples)

    return samples, sample_mean, sample_variance, sample_std_dev


mean = 0  # Mean of the Gaussian distribution
std_dev = 1  # Standard deviation of the Gaussian distribution
sample_size = 1000  # Number of samples to generate

# Generate Gaussian samples and calculate statistics
samples, sample_mean, sample_variance, sample_std_dev = generate_gaussian_samples(mean, std_dev, sample_size)

print(f"Sample Mean: {sample_mean}")
print(f"Sample Variance: {sample_variance}")
print(f"Sample Standard Deviation: {sample_std_dev}")


***Q7.***Use seaborn library to load `tips` dataset. Find the following from the dataset for the columns `total_bill`
and `tip`:
 (i) Write a Python function that calculates their skewness.
 approximately symmetric.
 (ii) Create a program that determines whether the columns exhibit positive skewness, negative skewness, or is
(iii) Write a function that calculates the covariance between two columns.
 (iv) Implement a Python program that calculates the Pearson correlation coefficient between two columns.
 (v) Write a script to visualize the correlation between two specific columns in a Pandas DataFrame using
scatter plots.

In [None]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import skew

# Load the tips dataset
tips = sns.load_dataset('tips')




In [None]:
# (i) Function to calculate skewness of two columns
def calculate_skewness(data, column):

    #Calculate the skewness of a specified column in the DataFrame.

    return skew(data[column])

# Calculate skewness for 'total_bill' and 'tip'
skew_total_bill = calculate_skewness(tips, 'total_bill')
skew_tip = calculate_skewness(tips, 'tip')

print(f"Skewness of 'total_bill': {skew_total_bill}")
print(f"Skewness of 'tip': {skew_tip}")

# (ii) Determine the type of skewness
def skewness_type(skew_value):

    #Determine whether a skewness value indicates positive skewness, negative skewness, or is approximately symmetric.


    if skew_value > 0:
        return 'Positive skewness'
    elif skew_value < 0:
        return 'Negative skewness'
    else:
        return 'Approximately symmetric'

print(f"'total_bill' skewness type: {skewness_type(skew_total_bill)}")
print(f"'tip' skewness type: {skewness_type(skew_tip)}")



In [None]:
# (iii) Function to calculate covariance between two columns
def calculate_covariance(data, column1, column2):

    #Calculate the covariance between two specified columns in the DataFrame.

    return np.cov(data[column1], data[column2])[0, 1]

# Calculate covariance between 'total_bill' and 'tip'
covariance = calculate_covariance(tips, 'total_bill', 'tip')
print(f"Covariance between 'total_bill' and 'tip': {covariance}")



In [None]:
# (iv) Function to calculate Pearson correlation coefficient between two columns
def calculate_pearson_correlation(data, column1, column2):

    #Calculate the Pearson correlation coefficient between two specified columns in the DataFrame.


    return data[column1].corr(data[column2])

# Calculate Pearson correlation coefficient between 'total_bill' and 'tip'
pearson_correlation = calculate_pearson_correlation(tips, 'total_bill', 'tip')
print(f"Pearson correlation coefficient between 'total_bill' and 'tip': {pearson_correlation}")



In [None]:
# (v) Visualize the correlation using scatter plot
def plot_scatter(data, column1, column2):

    #Create a scatter plot to visualize the correlation between two columns in the DataFrame.

    plt.figure(figsize=(8, 6))
    sns.scatterplot(data=data, x=column1, y=column2)
    plt.title(f'Scatter Plot of {column1} vs {column2}')
    plt.xlabel(column1)
    plt.ylabel(column2)
    plt.show()

# Plot scatter plot for 'total_bill' and 'tip'
plot_scatter(tips, 'total_bill', 'tip')

***Q8:*** Write a Python function to calculate the probability density function (PDF) of a continuous random
variable for a given normal distribution.

In [None]:
import math

def normal_pdf(x, mean, std_dev):
    #Calculate the Probability Density Function (PDF) of a normal distribution at a given point.

    # Calculate the PDF using the formula
    pdf_value = (1 / (std_dev * math.sqrt(2 * math.pi))) * math.exp(-((x - mean) ** 2) / (2 * std_dev ** 2))
    return pdf_value

mean = 0  # Mean of the normal distribution
std_dev = 1  # Standard deviation of the normal distribution
x = 1  # Point at which to calculate the PDF

pdf_value = normal_pdf(x, mean, std_dev)
print(f"PDF of normal distribution at x = {x} with mean = {mean} and std_dev = {std_dev}: {pdf_value}")


***Q9:***Create a program to calculate the cumulative distribution function (CDF) of exponential distribution.

In [None]:
import math

def exponential_cdf(x, rate):
    #Calculate the Cumulative Distribution Function (CDF) of an exponential distribution at a given point.

    if x < 0:
        return 0  # CDF is 0 for x < 0 in an exponential distribution
    else:
        return 1 - math.exp(-rate * x)

rate = 0.8  # Rate parameter (λ) of the exponential distribution
x = 5  # Point at which to calculate the CDF

cdf_value = exponential_cdf(x, rate)
print(f"CDF of exponential distribution at x = {x} with rate = {rate}: {cdf_value}")


Q10:Write a Python function to calculate the probability mass function (PMF) of Poisson distribution.

In [None]:
import math

def poisson_pmf(k, lamb):
    #Calculate the Probability Mass Function (PMF) of a Poisson distribution.

    if k < 0:
        return 0  # PMF is 0 for negative values of k in a Poisson distribution
    else:
        return (lamb ** k) * math.exp(-lamb) / math.factorial(k)

lamb = 3  # Average number of events (λ)
k = 2  # Number of events for which we want the probability

pmf_value = poisson_pmf(k, lamb)
print(f"PMF of Poisson distribution for k = {k} with λ = {lamb}: {pmf_value}")


***Q11.*** A company wants to test if a new website layout leads to a higher conversion rate (percentage of visitors
who make a purchase). They collect data from the old and new layouts to compare.
 To generate the data use the following command:
 ```python
 import numpy as np
 # 50 purchases out of 1000 visitors
 old_layout = np.array([1] * 50 + [0] * 950)
 # 70 purchases out of 1000 visitors  
new_layout = np.array([1] * 70 + [0] * 930)
 ```
 Apply z-test to find which layout is successful.

In [None]:
import numpy as np
from scipy import stats

# Generate the data
old_layout = np.array([1] * 50 + [0] * 950)
new_layout = np.array([1] * 70 + [0] * 930)

# Calculate the number of successes and total number of visitors
old_successes = np.sum(old_layout)
old_total = len(old_layout)

new_successes = np.sum(new_layout)
new_total = len(new_layout)

# Calculate the proportions
p_old = old_successes / old_total
p_new = new_successes / new_total

# Calculate the pooled proportion
p_pooled = (old_successes + new_successes) / (old_total + new_total)

# Calculate the standard error
se = np.sqrt(p_pooled * (1 - p_pooled) * (1 / old_total + 1 / new_total))

# Calculate the z-score
z_score = (p_new - p_old) / se

# Calculate the p-value for the two-tailed test
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))

print(f"Old layout conversion rate: {p_old:.4f}")
print(f"New layout conversion rate: {p_new:.4f}")
print(f"Z-score: {z_score:.4f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("The new layout significantly improves the conversion rate.")
else:
    print("There is no significant difference in conversion rates between the old and new layouts.")


***Q12:*** A tutoring service claims that its program improves students' exam scores. A sample of students who
participated in the program was taken, and their scores before and after the program were recorded.
 Use the below code to generate samples of respective arrays of marks:
 ```python
 before_program = np.array([75, 80, 85, 70, 90, 78, 92, 88, 82, 87])
 after_program = np.array([80, 85, 90, 80, 92, 80, 95, 90, 85, 88])
 ```
 Use z-test to find if the claims made by tutor are true or false.

In [None]:
import numpy as np
from scipy import stats

# Generate the data
before_program = np.array([75, 80, 85, 70, 90, 78, 92, 88, 82, 87])
after_program = np.array([80, 85, 90, 80, 92, 80, 95, 90, 85, 88])

# Calculate the differences
differences = after_program - before_program

# Calculate the mean and standard deviation of the differences
mean_diff = np.mean(differences)
std_diff = np.std(differences, ddof=0)  # Population standard deviation

# Number of pairs
n = len(differences)

# Calculate the standard error of the mean difference
se = std_diff / np.sqrt(n)

# Calculate the z-score
z_score = mean_diff / se

# Calculate the p-value for the two-tailed test
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))

print(f"Mean difference: {mean_diff:.2f}")
print(f"Standard deviation of differences: {std_diff:.2f}")
print(f"Standard error of the mean difference: {se:.2f}")
print(f"Z-score: {z_score:.2f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("The tutoring program has a significant effect on students' exam scores.")
else:
    print("There is no significant effect of the tutoring program on students' exam scores.")


***Q13.*** A pharmaceutical company wants to determine if a new drug is effective in reducing blood pressure. They
conduct a study and record blood pressure measurements before and after administering the drug.
 Use the below code to generate samples of respective arrays of blood pressure:
 ```python
 before_drug = np.array([145, 150, 140, 135, 155, 160, 152, 148, 130, 138])
 after_drug = np.array([130, 140, 132, 128, 145, 148, 138, 136, 125, 130])
 ```
 Implement z-test to find if the drug really works or not.


In [None]:
import numpy as np
from scipy import stats

# Generate the data
before_drug = np.array([145, 150, 140, 135, 155, 160, 152, 148, 130, 138])
after_drug = np.array([130, 140, 132, 128, 145, 148, 138, 136, 125, 130])

# Calculate the differences
differences = after_drug - before_drug

# Calculate the mean and standard deviation of the differences
mean_diff = np.mean(differences)
std_diff = np.std(differences, ddof=0)  # Population standard deviation

# Number of pairs
n = len(differences)

# Calculate the standard error of the mean difference
se = std_diff / np.sqrt(n)

# Calculate the z-score
z_score = mean_diff / se

# Calculate the p-value for the two-tailed test
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))

print(f"Mean difference: {mean_diff:.2f}")
print(f"Standard deviation of differences: {std_diff:.2f}")
print(f"Standard error of the mean difference: {se:.2f}")
print(f"Z-score: {z_score:.2f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("The drug has a significant effect in reducing blood pressure.")
else:
    print("There is no significant effect of the drug on blood pressure.")


***Q14.***A customer service department claims that their average response time is less than 5 minutes. A sample
of recent customer interactions was taken, and the response times were recorded.
 Implement the below code to generate the array of response time:
 ```python
 response_times = np.array([4.3, 3.8, 5.1, 4.9, 4.7, 4.2, 5.2, 4.5, 4.6, 4.4])
 ```
 Implement z-test to find the claims made by customer service department are tru or false.

In [None]:
import numpy as np
from scipy import stats

# Generate the data
response_times = np.array([4.3, 3.8, 5.1, 4.9, 4.7, 4.2, 5.2, 4.5, 4.6, 4.4])

# Population mean (claimed)
claimed_mean = 5.0

# Calculate the sample mean and standard deviation
sample_mean = np.mean(response_times)
sample_std = np.std(response_times, ddof=0)  # Population standard deviation

# Number of samples
n = len(response_times)

# Calculate the standard error of the mean
se = sample_std / np.sqrt(n)

# Calculate the z-score
z_score = (sample_mean - claimed_mean) / se

# Calculate the p-value for a one-tailed test (less than claimed mean)
p_value = stats.norm.cdf(z_score)

print(f"Sample mean: {sample_mean:.2f}")
print(f"Sample standard deviation: {sample_std:.2f}")
print(f"Standard error of the mean: {se:.2f}")
print(f"Z-score: {z_score:.2f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("The claim that the average response time is less than 5 minutes is supported.")
else:
    print("There is not enough evidence to support the claim that the average response time is less than 5 minutes.")


***Q15.***A company is testing two different website layouts to see which one leads to higher click-through rates.
Write a Python function to perform an A/B test analysis, including calculating the t-statistic, degrees of
freedom, and p-value.
 Use the following data:
 ```python
 layout_a_clicks = [28, 32, 33, 29, 31, 34, 30, 35, 36, 37]
 layout_b_clicks = [40, 41, 38, 42, 39, 44, 43, 41, 45, 47]

In [None]:
import numpy as np
from scipy import stats

def ab_test_analysis(layout_a, layout_b):
    # Convert lists to numpy arrays for calculations
    layout_a = np.array(layout_a)
    layout_b = np.array(layout_b)

    # Calculate the sample means
    mean_a = np.mean(layout_a)
    mean_b = np.mean(layout_b)

    # Calculate the sample standard deviations
    std_a = np.std(layout_a, ddof=1)  # Sample standard deviation
    std_b = np.std(layout_b, ddof=1)  # Sample standard deviation

    # Calculate the sample sizes
    n_a = len(layout_a)
    n_b = len(layout_b)

    # Calculate the standard error of the difference in means
    se = np.sqrt((std_a**2 / n_a) + (std_b**2 / n_b))

    # Calculate the t-statistic
    t_statistic = (mean_a - mean_b) / se

    # Calculate the degrees of freedom
    # Using Welch's t-test for unequal variances
    df = ((std_a**2 / n_a + std_b**2 / n_b)**2) / \
         (((std_a**2 / n_a)**2 / (n_a - 1)) + ((std_b**2 / n_b)**2 / (n_b - 1)))

    # Calculate the p-value for a two-tailed test
    p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

    return t_statistic, df, p_value

# Data for the two layouts
layout_a_clicks = [28, 32, 33, 29, 31, 34, 30, 35, 36, 37]
layout_b_clicks = [40, 41, 38, 42, 39, 44, 43, 41, 45, 47]

# Perform A/B test analysis
t_statistic, df, p_value = ab_test_analysis(layout_a_clicks, layout_b_clicks)

print(f"T-statistic: {t_statistic:.4f}")
print(f"Degrees of freedom: {df:.2f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a significant difference in click-through rates between the two layouts.")
else:
    print("There is no significant difference in click-through rates between the two layouts.")


***Q16.*** A pharmaceutical company wants to determine if a new drug is more effective than an existing drug in
reducing cholesterol levels. Create a program to analyze the clinical trial data and calculate the t
Use the following data of cholestrol level:
 ```python
 existing_drug_levels = [180, 182, 175, 185, 178, 176, 172, 184, 179, 183]
 new_drug_levels = [170, 172, 165, 168, 175, 173, 170, 178, 172, 176]
 ```

In [None]:
import numpy as np
from scipy import stats

def analyze_drug_effect(existing_drug_levels, new_drug_levels):
    # Convert lists to numpy arrays
    existing_drug_levels = np.array(existing_drug_levels)
    new_drug_levels = np.array(new_drug_levels)

    # Calculate the sample means
    mean_existing = np.mean(existing_drug_levels)
    mean_new = np.mean(new_drug_levels)

    # Calculate the sample standard deviations
    std_existing = np.std(existing_drug_levels, ddof=1)  # Sample standard deviation
    std_new = np.std(new_drug_levels, ddof=1)  # Sample standard deviation

    # Calculate the sample sizes
    n_existing = len(existing_drug_levels)
    n_new = len(new_drug_levels)

    # Calculate the standard error of the difference in means
    se = np.sqrt((std_existing**2 / n_existing) + (std_new**2 / n_new))

    # Calculate the t-statistic
    t_statistic = (mean_new - mean_existing) / se

    # Calculate the degrees of freedom using Welch's t-test formula
    df = ((std_existing**2 / n_existing + std_new**2 / n_new)**2) / \
         (((std_existing**2 / n_existing)**2 / (n_existing - 1)) + ((std_new**2 / n_new)**2 / (n_new - 1)))

    # Calculate the p-value for a one-tailed test (new drug has lower levels)
    p_value = stats.t.cdf(t_statistic, df)

    return t_statistic, df, p_value

# Data for the two drugs
existing_drug_levels = [180, 182, 175, 185, 178, 176, 172, 184, 179, 183]
new_drug_levels = [170, 172, 165, 168, 175, 173, 170, 178, 172, 176]

# Perform the analysis
t_statistic, df, p_value = analyze_drug_effect(existing_drug_levels, new_drug_levels)

print(f"T-statistic: {t_statistic:.4f}")
print(f"Degrees of freedom: {df:.2f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("The new drug is significantly more effective in reducing cholesterol levels than the existing drug.")
else:
    print("There is no significant difference between the new drug and the existing drug in reducing cholesterol levels.")


***Q17.*** A school district introduces an educational intervention program to improve math scores. Write a Python
function to analyze pre- and post-intervention test scores, calculating the t-statistic and p-value to
determine if the intervention had a significant impact.
 Use the following data of test score:
 ```python
 pre_intervention_scores = [80, 85, 90, 75, 88, 82, 92, 78, 85, 87]
 post_intervention_scores = [90, 92, 88, 92, 95, 91, 96, 93, 89, 93]
 ```


In [None]:
import numpy as np
from scipy import stats

def analyze_intervention(pre_scores, post_scores):
    # Convert lists to numpy arrays
    pre_scores = np.array(pre_scores)
    post_scores = np.array(post_scores)

    # Calculate the differences between post and pre scores
    differences = post_scores - pre_scores

    # Calculate the mean and standard deviation of the differences
    mean_diff = np.mean(differences)
    std_diff = np.std(differences, ddof=1)  # Sample standard deviation

    # Number of paired samples
    n = len(differences)

    # Calculate the standard error of the mean difference
    se = std_diff / np.sqrt(n)

    # Calculate the t-statistic
    t_statistic = mean_diff / se

    # Degrees of freedom
    df = n - 1

    # Calculate the p-value for a two-tailed test
    p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

    return t_statistic, df, p_value

# Data for pre- and post-intervention scores
pre_intervention_scores = [80, 85, 90, 75, 88, 82, 92, 78, 85, 87]
post_intervention_scores = [90, 92, 88, 92, 95, 91, 96, 93, 89, 93]

# Perform the analysis
t_statistic, df, p_value = analyze_intervention(pre_intervention_scores, post_intervention_scores)

print(f"T-statistic: {t_statistic:.4f}")
print(f"Degrees of freedom: {df}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("The intervention had a significant impact on improving math scores.")
else:
    print("There is no significant impact of the intervention on math scores.")

***Q18.*** An HR department wants to investigate if there's a gender-based salary gap within the company. Develop
a program to analyze salary data, calculate the t-statistic, and determine if there's a statistically
significant difference between the average salaries of male and female employees.
 Use the below code to generate synthetic data:
 ```python
 # Generate synthetic salary data for male and female employees
 np.random.seed(0)  # For reproducibility
 male_salaries = np.random.normal(loc=50000, scale=10000, size=20)
 female_salaries = np.random.normal(loc=55000, scale=9000, size=20)
 ```

In [None]:
import numpy as np
from scipy import stats

def analyze_salary_gap(male_salaries, female_salaries):
    # Convert lists to numpy arrays
    male_salaries = np.array(male_salaries)
    female_salaries = np.array(female_salaries)

    # Calculate the sample means
    mean_male = np.mean(male_salaries)
    mean_female = np.mean(female_salaries)

    # Calculate the sample standard deviations
    std_male = np.std(male_salaries, ddof=1)  # Sample standard deviation
    std_female = np.std(female_salaries, ddof=1)  # Sample standard deviation

    # Calculate the sample sizes
    n_male = len(male_salaries)
    n_female = len(female_salaries)

    # Calculate the standard error of the difference in means
    se = np.sqrt((std_male**2 / n_male) + (std_female**2 / n_female))

    # Calculate the t-statistic
    t_statistic = (mean_female - mean_male) / se

    # Calculate the degrees of freedom using Welch's t-test formula
    df = ((std_male**2 / n_male + std_female**2 / n_female)**2) / \
         (((std_male**2 / n_male)**2 / (n_male - 1)) + ((std_female**2 / n_female)**2 / (n_female - 1)))

    # Calculate the p-value for a two-tailed test
    p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

    return t_statistic, df, p_value

# Generate synthetic salary data
np.random.seed(0)  # For reproducibility
male_salaries = np.random.normal(loc=50000, scale=10000, size=20)
female_salaries = np.random.normal(loc=55000, scale=9000, size=20)

# Perform the analysis
t_statistic, df, p_value = analyze_salary_gap(male_salaries, female_salaries)

print(f"T-statistic: {t_statistic:.4f}")
print(f"Degrees of freedom: {df:.2f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a statistically significant gender-based salary gap.")
else:
    print("There is no statistically significant gender-based salary gap.")

***Q19.*** A manufacturer produces two different versions of a product and wants to compare their quality scores.
Create a Python function to analyze quality assessment data, calculate the t-statistic, and decide
whether there's a significant difference in quality between the two versions.
 Use the following data:
 ```python
 version1_scores = [85, 88, 82, 89, 87, 84, 90, 88, 85, 86, 91, 83, 87, 84, 89, 86, 84, 88, 85, 86, 89, 90, 87, 88, 85]
 version2_scores = [80, 78, 83, 81, 79, 82, 76, 80, 78, 81, 77, 82, 80, 79, 82, 79, 80, 81, 79, 82, 79, 78, 80, 81, 82]
 ```

In [None]:
import numpy as np
from scipy import stats

def analyze_quality_scores(version1_scores, version2_scores):
    # Convert lists to numpy arrays
    version1_scores = np.array(version1_scores)
    version2_scores = np.array(version2_scores)

    # Calculate the sample means
    mean_v1 = np.mean(version1_scores)
    mean_v2 = np.mean(version2_scores)

    # Calculate the sample standard deviations
    std_v1 = np.std(version1_scores, ddof=1)  # Sample standard deviation
    std_v2 = np.std(version2_scores, ddof=1)  # Sample standard deviation

    # Calculate the sample sizes
    n_v1 = len(version1_scores)
    n_v2 = len(version2_scores)

    # Calculate the standard error of the difference in means
    se = np.sqrt((std_v1**2 / n_v1) + (std_v2**2 / n_v2))

    # Calculate the t-statistic
    t_statistic = (mean_v1 - mean_v2) / se

    # Calculate the degrees of freedom using Welch's t-test formula
    df = ((std_v1**2 / n_v1 + std_v2**2 / n_v2)**2) / \
         (((std_v1**2 / n_v1)**2 / (n_v1 - 1)) + ((std_v2**2 / n_v2)**2 / (n_v2 - 1)))

    # Calculate the p-value for a two-tailed test
    p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

    return t_statistic, df, p_value

# Data for the two versions
version1_scores = [85, 88, 82, 89, 87, 84, 90, 88, 85, 86, 91, 83, 87, 84, 89, 86, 84, 88, 85, 86, 89, 90, 87, 88, 85]
version2_scores = [80, 78, 83, 81, 79, 82, 76, 80, 78, 81, 77, 82, 80, 79, 82, 79, 80, 81, 79, 82, 79, 78, 80, 81, 82]

# Perform the analysis
t_statistic, df, p_value = analyze_quality_scores(version1_scores, version2_scores)

print(f"T-statistic: {t_statistic:.4f}")
print(f"Degrees of freedom: {df:.2f}")
print(f"P-value: {p_value:.4f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a significant difference in quality scores between the two versions.")
else:
    print("There is no significant difference in quality scores between the two versions.")


***Q20.***A restaurant chain collects customer satisfaction scores for two different branches. Write a program to
analyze the scores, calculate the t-statistic, and determine if there's a statistically significant difference in
customer satisfaction between the branches.
 Use the below data of scores:
 ```python
 branch_a_scores = [4, 5, 3, 4, 5, 4, 5, 3, 4, 4, 5, 4, 4, 3, 4, 5, 5, 4, 3, 4, 5, 4, 3, 5, 4, 4, 5, 3, 4, 5, 4]
 branch_b_scores = [3, 4, 2, 3, 4, 3, 4, 2, 3, 3, 4, 3, 3, 2, 3, 4, 4, 3, 2, 3, 4, 3, 2, 4, 3, 3, 4, 2, 3, 4, 3

In [None]:
import numpy as np
from scipy import stats

def analyze_satisfaction_scores(branch_a_scores, branch_b_scores):
    # Convert lists to numpy arrays
    branch_a_scores = np.array(branch_a_scores)
    branch_b_scores = np.array(branch_b_scores)

    # Calculate the sample means
    mean_a = np.mean(branch_a_scores)
    mean_b = np.mean(branch_b_scores)

    # Calculate the sample standard deviations
    std_a = np.std(branch_a_scores, ddof=1)  # Sample standard deviation
    std_b = np.std(branch_b_scores, ddof=1)  # Sample standard deviation

    # Calculate the sample sizes
    n_a = len(branch_a_scores)
    n_b = len(branch_b_scores)

    # Calculate the standard error of the difference in means
    se = np.sqrt((std_a**2 / n_a) + (std_b**2 / n_b))

    # Calculate the t-statistic
    t_statistic = (mean_a - mean_b) / se

    # Calculate the degrees of freedom using Welch's t-test formula
    df = ((std_a**2 / n_a + std_b**2 / n_b)**2) / \
         (((std_a**2 / n_a)**2 / (n_a - 1)) + ((std_b**2 / n_b)**2 / (n_b - 1)))

    # Calculate the p-value for a two-tailed test
    p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df))

    return t_statistic, df, p_value

# Data for the two branches
branch_a_scores = [4, 5, 3, 4, 5, 4, 5, 3, 4, 4, 5, 4, 4, 3, 4, 5, 5, 4, 3, 4, 5, 4, 3, 5, 4, 4, 5, 3, 4, 5, 4]
branch_b_scores = [3, 4, 2, 3, 4, 3, 4, 2, 3, 3, 4, 3, 3, 2, 3, 4, 4, 3, 2, 3, 4, 3, 2, 4, 3, 3, 4, 2, 3, 4, 3]

# Perform the analysis
t_statistic, df, p_value = analyze_satisfaction_scores(branch_a_scores, branch_b_scores)

print(f"T-statistic: {t_statistic:.4f}")
print(f"Degrees of freedom: {df:.2f}")
print(f"P-value: {p_value:.10f}")

# Determine the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a significant difference in customer satisfaction scores between the two branches.")
else:
    print("There is no significant difference in customer satisfaction scores between the two branches.")


***Q21:*** A political analyst wants to determine if there is a significant association between age groups and voter
preferences (Candidate A or Candidate B). They collect data from a sample of 500 voters and classify
them into different age groups and candidate preferences. Perform a Chi-Square test to determine if
there is a significant association between age groups and voter preferences.
 Use the below code to generate data:
 ```python
 np.random.seed(0)
 age_groups = np.random.choice(['18-30', '31-50', '51+', '51+'], size=30)
 voter_preferences = np.random.choice(['Candidate A', 'Candidate B'], size=30)

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

# Generate synthetic data
np.random.seed(0)
age_groups = np.random.choice(['18-30', '31-50', '51+', '51+'], size=500)
voter_preferences = np.random.choice(['Candidate A', 'Candidate B'], size=500)

# Create a DataFrame to easily compute the contingency table
data = pd.DataFrame({'Age Group': age_groups, 'Voter Preference': voter_preferences})

# Create the contingency table
contingency_table = pd.crosstab(data['Age Group'], data['Voter Preference'])

# Perform Chi-Square test
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

print("Contingency Table:")
print(contingency_table)
print("\nChi-Square Statistic:", chi2_stat)
print("Degrees of Freedom:", dof)
print("P-Value:", p_value)

# Interpret the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a significant association between age groups and voter preferences.")
else:
    print("There is no significant association between age groups and voter preferences.")


***Q22.*** A company conducted a customer satisfaction survey to determine if there is a significant relationship
between product satisfaction levels (Satisfied, Neutral, Dissatisfied) and the region where customers are
located (East, West, North, South). The survey data is summarized in a contingency table. Conduct a Chi
Square test to determine if there is a significant relationship between product satisfaction levels and
customer regions.
 Sample data:
 ```python

 #Sample data: Product satisfaction levels (rows) vs. Customer regions (columns)
 data = np.array([[50, 30, 40, 20], [30, 40, 30, 50], [20, 30, 40, 30]])
 ```

In [None]:
import numpy as np
from scipy.stats import chi2_contingency

# Sample data: Product satisfaction levels (rows) vs. Customer regions (columns)
data = np.array([[50, 30, 40, 20],
                 [30, 40, 30, 50],
                 [20, 30, 40, 30]])

# Perform Chi-Square test
chi2_stat, p_value, dof, expected = chi2_contingency(data)

print("Contingency Table:")
print(data)
print("\nChi-Square Statistic:", chi2_stat)
print("Degrees of Freedom:", dof)
print("P-Value:", p_value)

# Interpret the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a significant relationship between product satisfaction levels and customer regions.")
else:
    print("There is no significant relationship between product satisfaction levels and customer regions.")


***Q23.***A company implemented an employee training program to improve job performance (Effective, Neutral,
Ineffective). After the training, they collected data from a sample of employees and classified them based
on their job performance before and after the training. Perform a Chi-Square test to determine if there is a
significant difference between job performance levels before and after the training.
 Sample data:
 ```python

 # Sample data: Job performance levels before (rows) and after (columns) training
 data = np.array([[50, 30, 20], [30, 40, 30], [20, 30, 40]])
 ```

In [None]:
import numpy as np
from scipy.stats import chi2_contingency

# Sample data: Job performance levels before (rows) and after (columns) training
data = np.array([[50, 30, 20],
                 [30, 40, 30],
                 [20, 30, 40]])

# Perform Chi-Square test
chi2_stat, p_value, dof, expected = chi2_contingency(data)

print("Contingency Table:")
print(data)
print("\nChi-Square Statistic:", chi2_stat)
print("Degrees of Freedom:", dof)
print("P-Value:", p_value)

# Interpret the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a significant difference in job performance levels before and after the training.")
else:
    print("There is no significant difference in job performance levels before and after the training.")


***Q24.***. A company produces three different versions of a product: Standard, Premium, and Deluxe. The
company wants to determine if there is a significant difference in customer satisfaction scores among the
three product versions. They conducted a survey and collected customer satisfaction scores for each
version from a random sample of customers. Perform an ANOVA test to determine if there is a significant
difference in customer satisfaction scores.
 Use the following data:
 ```python
 # Sample data: Customer satisfaction scores for each product version
 standard_scores = [80, 85, 90, 78, 88, 82, 92, 78, 85, 87]

 premium_scores = [90, 92, 88, 92, 95, 91, 96, 93, 89, 93]
 deluxe_scores = [95, 98, 92, 97, 96, 94, 98, 97, 92, 99]
  ```

In [None]:
import numpy as np
from scipy.stats import f_oneway

# Sample data: Customer satisfaction scores for each product version
standard_scores = [80, 85, 90, 78, 88, 82, 92, 78, 85, 87]
premium_scores = [90, 92, 88, 92, 95, 91, 96, 93, 89, 93]
deluxe_scores = [95, 98, 92, 97, 96, 94, 98, 97, 92, 99]

# Perform ANOVA test
f_stat, p_value = f_oneway(standard_scores, premium_scores, deluxe_scores)

print("F-Statistic:", f_stat)
print("P-Value:", p_value)

# Interpret the result
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is a significant difference in customer satisfaction scores among the three product versions.")
else:
    print("There is no significant difference in customer satisfaction scores among the three product versions.")
