11 A company wants to test if a new website layout leads to a higher conversion rate (percentage of visitors
who make a purchase). They collect data from the old and new layouts to compare.


To generate the data use the following command:

```python

import numpy as np

# 50 purchases out of 1000 visitors

old_layout = np.array([1] * 50 + [0] * 950)

# 70 purchases out of 1000 visitors  

new_layout = np.array([1] * 70 + [0] * 930)

  ```

Apply z-test to find which layout is successful.

Certainly, let's perform a z-test to compare the conversion rates of the old and new website layouts.

**1. Define Hypotheses**

* **Null Hypothesis (H0):** There is no significant difference in the conversion rates between the old and new layouts. 
    * Mathematically:  p1 - p2 = 0 
* **Alternative Hypothesis (H1):** The new layout has a significantly higher conversion rate than the old layout.
    * Mathematically: p1 - p2 < 0 
    * (We use a one-tailed test since we're specifically interested in whether the new layout is *better*)

**2. Calculate Sample Proportions**

* **Old Layout:**
    * p1 = number of conversions in old layout / total visitors in old layout = 50/1000 = 0.05
* **New Layout:**
    * p2 = number of conversions in new layout / total visitors in new layout = 70/1000 = 0.07

**3. Calculate Pooled Proportion**

* Pooled Proportion (p) = (number of conversions in old layout + number of conversions in new layout) / (total visitors in old layout + total visitors in new layout)
    * p = (50 + 70) / (1000 + 1000) = 0.06

**4. Calculate Standard Error**

* Standard Error (SE) = sqrt(p * (1-p) * (1/n1 + 1/n2)) 
    * SE = sqrt(0.06 * 0.94 * (1/1000 + 1/1000)) 
    * SE = sqrt(0.0001128) 
    * SE ≈ 0.0106

**5. Calculate Test Statistic (z-score)**

* z = (p1 - p2) / SE
    * z = (0.05 - 0.07) / 0.0106 
    * z = -0.02 / 0.0106 
    * z ≈ -1.89

**6. Determine Critical Value**

* Choose a significance level (alpha). Let's use alpha = 0.05.
* Find the critical value for a one-tailed test at alpha = 0.05. 
    * Using a standard normal distribution table, the critical value is -1.645.

**7. Make a Decision**

* Since the calculated z-score (-1.89) is less than the critical value (-1.645), we **reject the null hypothesis**.

**8. Conclusion**

* There is sufficient evidence to suggest that the new website layout has a significantly higher conversion rate than the old layout at the 0.05 significance level.

**In Summary:**

The z-test results indicate that the new website layout is more successful in terms of conversion rates compared to the old layout.

**Note:**

* This analysis assumes that the samples are independent and that the conditions for using the z-test are met (e.g., large sample sizes, independent observations).

I hope this helps! Let me know if you have any further questions.


12 A tutoring service claims that its program improves students' exam scores. A sample of students who
participated in the program was taken, and their scores before and after the program were recorded.


Use the below code to generate samples of respective arrays of marks:

```python

before_program = np.array([75, 80, 85, 70, 90, 78, 92, 88, 82, 87])

after_program = np.array([80, 85, 90, 80, 92, 80, 95, 90, 85, 88])

```

Use z-test to find if the claims made by tutor are true or false.

In [9]:
import numpy as np

before_program = np.array([75, 80, 85, 70, 90, 78, 92, 88, 82, 87])
after_program = np.array([80, 85, 90, 80, 92, 80, 95, 90, 85, 88])

differences = after_program - before_program
print(differences)  # Output: [ 5  5  5 10  2  2  3  2  3  1]
mean_diff = np.mean(differences) 

print("Mean of differences:", mean_diff)  
std_diff = np.std(differences, ddof=1)
print("Standard deviation of differences:", std_diff)  

n = len(differences) 
z_score = mean_diff / (std_diff / np.sqrt(n)) 
print("Z-score:", z_score)


[ 5  5  5 10  2  2  3  2  3  1]
Mean of differences: 3.8
Standard deviation of differences: 2.616188916046478
Z-score: 4.593190894944668


13 A pharmaceutical company wants to determine if a new drug is effective in reducing blood pressure. They
conduct a study and record blood pressure measurements before and after administering the drug.


Use the below code to generate samples of respective arrays of blood pressue:


```python

before_drug = np.array([145, 150, 140, 135, 155, 160, 152, 148, 130, 138])

after_drug = np.array([130, 140, 132, 128, 145, 148, 138, 136, 125, 10])

  ```


Implement z-test to find if the drug really works or not

In [13]:
import numpy as np

# Sample data
before_drug = np.array([145, 150, 140, 135, 155, 160, 152, 148, 130, 138])
after_drug = np.array([130, 140, 132, 128, 145, 148, 138, 136, 125, 130])

# Calculate the difference in blood pressure for each individual
differences = after_drug - before_drug

# Calculate the mean and standard deviation of the differences
mean_diff = np.mean(differences)
std_diff = np.std(differences, ddof=1)  # ddof=1 for sample standard deviation

# Calculate the z-score
z_score = mean_diff / (std_diff / np.sqrt(len(differences)))

# Define significance level (alpha)
alpha = 0.05

# Determine critical value for a one-tailed test (we expect blood pressure to decrease)
critical_value = -1.645  # For alpha = 0.05

# Make a decision
if z_score < critical_value:
    print("Reject the null hypothesis. The drug appears to be effective in reducing blood pressure.")
else:
    print("Fail to reject the null hypothesis. There is no significant evidence that the drug reduces blood pressure.")

Reject the null hypothesis. The drug appears to be effective in reducing blood pressure.


14 A customer service department claims that their average response time is less than 5 minutes. A sample
of recent customer interactions was taken, and the response times were recorded.


Implement the below code to generate the array of response time:

```python

response_times = np.array([4.3, 3.8, 5.1, 4.9, 4.7, 4.2, 5.2, 4.5, 4.6, 4.4])

```

Implement z-test to find the claims made by customer service department are tru or false.

In [16]:
import numpy as np

# Sample data
response_times = np.array([4.3, 3.8, 5.1, 4.9, 4.7, 4.2, 5.2, 4.5, 4.6, 4.4])

# Calculate sample mean and standard deviation
sample_mean = np.mean(response_times)
sample_std = np.std(response_times, ddof=1)  # ddof=1 for sample standard deviation

# Define hypothesized population mean (5 minutes)
population_mean = 5

# Calculate z-score
z_score = (sample_mean - population_mean) / (sample_std / np.sqrt(len(response_times)))

# Define significance level (alpha)
alpha = 0.05

# Determine critical value for a one-tailed test (we expect the mean to be less than 5 minutes)
critical_value = -1.645  # For alpha = 0.05

# Make a decision
if z_score < critical_value:
    print("Reject the null hypothesis. The average response time is significantly less than 5 minutes.")
else:
    print("Fail to reject the null hypothesis. There is no significant evidence that the average response time is less than 5 minutes.")

Reject the null hypothesis. The average response time is significantly less than 5 minutes.


15 A company is testing two different website layouts to see which one leads to higher click-through rates.
Write a Python function to perform an A/B test analysis, including calculating the t-statistic, degrees of
freedom, and p-value.


Use the following data:

```python

layout_a_clicks = [28, 32, 33, 29, 31, 34, 30, 35, 36, 37]

layout_b_clicks = [40, 41, 38, 42, 39, 44, 43, 41, 45, 47]

15 A company is testing two different website layouts to see which one leads to higher click-through rates.
Write a Python function to perform an A/B test analysis, including calculating the t-statistic, degrees of
freedom, and p-value.


Use the following data:

```python

layout_a_clicks = [28, 32, 33, 29, 31, 34, 30, 35, 36, 37]

layout_b_clicks = [40, 41, 38, 42, 39, 44, 43, 41, 45, 47]

In [20]:
import numpy as np
from scipy.stats import ttest_ind

def a_b_test_analysis(layout_a_clicks, layout_b_clicks):
  """
  Performs an A/B test analysis to compare click-through rates of two website layouts.

  Args:
    layout_a_clicks: List of click-through rates for layout A.
    layout_b_clicks: List of click-through rates for layout B.

  Returns:
    A tuple containing:
      - t-statistic: The calculated t-statistic.
      - df: Degrees of freedom.
      - p_value: The p-value.
  """

  t_statistic, p_value = ttest_ind(layout_a_clicks, layout_b_clicks)
  df = len(layout_a_clicks) + len(layout_b_clicks) - 2 
  return t_statistic, df, p_value

# Example usage
layout_a_clicks = [28, 32, 33, 29, 31, 34, 30, 35, 36, 37]
layout_b_clicks = [40, 41, 38, 42, 39, 44, 43, 41, 45, 47]

t_stat, df, p_value = a_b_test_analysis(layout_a_clicks, layout_b_clicks)

print("t-statistic:", t_stat)
print("Degrees of Freedom:", df)
print("p-value:", p_value)

if p_value < 0.05:
  print("Reject the null hypothesis. There is a statistically significant difference in click-through rates between the two layouts.")
else:
  print("Fail to reject the null hypothesis. There is no statistically significant difference in click-through rates.")

t-statistic: -7.298102156175071
Degrees of Freedom: 18
p-value: 8.833437608301987e-07
Reject the null hypothesis. There is a statistically significant difference in click-through rates between the two layouts.


16 A pharmaceutical company wants to determine if a new drug is more effective than an existing drug in
reducing cholesterol levels. Create a program to analyze the clinical trial data and calculate the tstatistic and p-value for the treatment effect.


Use the following data of cholestrol level:

```python

existing_drug_levels = [180, 182, 175, 185, 178, 176, 172, 184, 179, 183]

new_drug_levels = [170, 172, 165, 168, 175, 173, 170, 178, 172, 176]

16 A pharmaceutical company wants to determine if a new drug is more effective than an existing drug in
reducing cholesterol levels. Create a program to analyze the clinical trial data and calculate the tstatistic and p-value for the treatment effect.


Use the following data of cholestrol level:

```python

existing_drug_levels = [180, 182, 175, 185, 178, 176, 172, 184, 179, 183]

new_drug_levels = [170, 172, 165, 168, 175, 173, 170, 178, 172, 176]

In [24]:
import numpy as np
from scipy.stats import ttest_ind

# Sample data (replace with actual clinical trial data)
existing_drug_levels = np.array([180, 182, 175, 185, 178, 176, 172, 184, 179, 183])
new_drug_levels = np.array([170, 172, 165, 168, 175, 173, 170, 178, 172, 176])

# Perform independent samples t-test
t_statistic, p_value = ttest_ind(existing_drug_levels, new_drug_levels)

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Interpretation
if p_value < 0.05:  # Assuming a significance level of 0.05
    print("The new drug appears to be significantly more effective in reducing cholesterol levels.")
else:
    print("There is no significant difference in cholesterol reduction between the two drugs.")

t-statistic: 4.14048098620866
p-value: 0.0006143398442372505
The new drug appears to be significantly more effective in reducing cholesterol levels.


17 A school district introduc1es an educational intervention program to improve math scores. Write a Python
function to analyze pre- and post-intervention test scores, calculating the t-statistic and p-value to
determine if the intervention had a significant impact.


Use the following data of test score:


  ```python

  pre_intervention_scores = [80, 85, 90, 75, 88, 82, 92, 78, 85, 87]

  post_intervention_scores = [90, 92, 88, 92, 95, 91, 96, 93, 89, 93]

In [27]:
import numpy as np
from scipy.stats import ttest_rel

def analyze_intervention(pre_scores, post_scores):
  """
  Analyzes pre- and post-intervention test scores to determine 
  if the intervention had a significant impact.

  Args:
    pre_scores: List of pre-intervention test scores.
    post_scores: List of post-intervention test scores.

  Returns:
    A tuple containing:
      - t_statistic: The calculated t-statistic.
      - p_value: The p-value.
  """

  t_statistic, p_value = ttest_rel(pre_scores, post_scores)
  return t_statistic, p_value

# Example usage
pre_intervention_scores = [80, 85, 90, 75, 88, 82, 92, 78, 85, 87]
post_intervention_scores = [90, 92, 88, 92, 95, 91, 96, 93, 89, 93]

t_stat, p_value = analyze_intervention(pre_intervention_scores, post_intervention_scores)

print("t-statistic:", t_stat)
print("p-value:", p_value)

if p_value < 0.05:  # Assuming a significance level of 0.05
  print("The intervention appears to have a significant impact on improving math scores.")
else:
  print("There is no significant evidence that the intervention improved math scores.")

t-statistic: -4.42840883965761
p-value: 0.0016509548165795493
The intervention appears to have a significant impact on improving math scores.


18 An HR department wants to investigate if there's a gender-based salary gap within the company. Develop
a program to analyze salary data, calculate the t-statistic, and determine if there's a statistically
significant difference between the average salaries of male and female employees.


Use the below code to generate synthetic data:


```python

# Generate synthetic salary data for male and female employees

np.random.seed(0)  # For reproducibility

male_salaries = np.random.normal(loc=50000, scale=10000, size=20)

female_salaries = np.random.normal(loc=55000, scale=9000, size=20)

  ```


In [30]:
import numpy as np
from scipy.stats import ttest_ind

# Generate synthetic salary data for male and female employees
np.random.seed(0)  # For reproducibility
male_salaries = np.random.normal(loc=50000, scale=10000, size=20)
female_salaries = np.random.normal(loc=55000, scale=9000, size=20)

# Perform independent samples t-test
t_statistic, p_value = ttest_ind(male_salaries, female_salaries)

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Interpretation
if p_value < 0.05:  # Assuming a significance level of 0.05
    print("There is a statistically significant difference in average salaries between male and female employees.")
else:
    print("There is no statistically significant evidence of a gender-based salary gap.")

t-statistic: 0.06114208969631383
p-value: 0.9515665020676465
There is no statistically significant evidence of a gender-based salary gap.


19 A manufacturer produces two different versions of a product and wants to compare their quality scores.
Create a Python function to analyze quality assessment data, calculate the t-statistic, and decide
whether there's a significant difference in quality between the two versions.
Use the following data:
```python

version1_scores = [85, 88, 82, 89, 87, 84, 90, 88, 85, 86, 91, 83, 87, 84, 89, 86, 84, 88, 85, 86, 89, 90, 87, 88, 85]

version2_scores = [80, 78, 83, 81, 79, 82, 76, 80, 78, 81, 77, 82, 80, 79, 82, 79, 80, 81, 79, 82, 79, 78, 80, 81, 82]

In [35]:
import numpy as np
from scipy.stats import ttest_ind

def compare_product_versions(version1_scores, version2_scores):
  """
  Analyzes quality assessment data for two product versions and 
  determines if there's a significant difference in quality.

  Args:
    version1_scores: List of quality scores for version 1.
    version2_scores: List of quality scores for version 2.

  Returns:
    A tuple containing:
      - t_statistic: The calculated t-statistic.
      - p_value: The p-value.
  """

  t_statistic, p_value = ttest_ind(version1_scores, version2_scores)
  return t_statistic, p_value

# Example usage
version1_scores = [85, 88, 82, 89, 87, 84, 90, 88, 85, 86, 91, 83, 87, 84, 89, 86, 84, 88, 85, 86, 89, 90, 87, 88, 85]
version2_scores = [80, 78, 83, 81, 79, 82, 76, 80, 78, 81, 77, 82, 80, 79, 82, 79, 80, 81, 79, 82, 79, 78, 80, 81, 82]

t_stat, p_value = compare_product_versions(version1_scores, version2_scores)

print("t-statistic:", t_stat)
print("p-value:", p_value)

if p_value < 0.05:  # Assuming a significance level of 0.05
  print("There is a statistically significant difference in quality between the two product versions.")
else:
  print("There is no statistically significant difference in quality between the two product versions.")

t-statistic: 11.325830417646698
p-value: 3.6824250702873965e-15
There is a statistically significant difference in quality between the two product versions.


20 A restaurant chain collects customer satisfaction scores for two different branches. Write a program to
analyze the scores, calculate the t-statistic, and determine if there's a statistically significant difference in
customer satisfaction between the branches.


Use the below data of scores:

  ```python

branch_a_scores = [4, 5, 3, 4, 5, 4, 5, 3, 4, 4, 5, 4, 4, 3, 4, 5, 5, 4, 3, 4, 5, 4, 3, 5, 4, 4, 5, 3, 4, 5, 4]

branch_b_scores = [3, 4, 2, 3, 4, 3, 4, 2, 3, 3, 4, 3, 3, 2, 3, 4, 4, 3, 2, 3, 4, 3, 2, 4, 3, 3, 4, 2, 3, 4, 3]

In [38]:
import numpy as np
from scipy.stats import ttest_ind

def compare_branch_satisfaction(branch_a_scores, branch_b_scores):
  """
  Analyzes customer satisfaction scores for two restaurant branches 
  and determines if there's a statistically significant difference.

  Args:
    branch_a_scores: List of customer satisfaction scores for branch A.
    branch_b_scores: List of customer satisfaction scores for branch B.

  Returns:
    A tuple containing:
      - t_statistic: The calculated t-statistic.
      - p_value: The p-value.
  """

  t_statistic, p_value = ttest_ind(branch_a_scores, branch_b_scores)
  return t_statistic, p_value

# Example usage
branch_a_scores = [4, 5, 3, 4, 5, 4, 5, 3, 4, 4, 5, 4, 4, 3, 4, 5, 5, 4, 3, 4, 5, 4, 3, 5, 4, 4, 5, 3, 4, 5, 4]
branch_b_scores = [3, 4, 2, 3, 4, 3, 4, 2, 3, 3, 4, 3, 3, 2, 3, 4, 4, 3, 2, 3, 4, 3, 2, 4, 3, 3, 4, 2, 3, 4, 3]

t_stat, p_value = compare_branch_satisfaction(branch_a_scores, branch_b_scores)

print("t-statistic:", t_stat)
print("p-value:", p_value)

if p_value < 0.05:  # Assuming a significance level of 0.05
  print("There is a statistically significant difference in customer satisfaction between the two branches.")
else:
  print("There is no statistically significant difference in customer satisfaction between the two branches.")

t-statistic: 5.480077554195743
p-value: 8.895290509945655e-07
There is a statistically significant difference in customer satisfaction between the two branches.


20 A political analyst wants to determine if there is a significant association between age groups and voter
preferences (Candidate A or Candidate B). They collect data from a sample of 500 voters and classify
them into different age groups and candidate preferences. Perform a Chi-Square test to determine if
there is a significant association between age groups and voter preferences.


Use the below code to generate data:

```python

np.random.seed(0)

age_groups = np.random.choice([ 18 30 , 31 50 , 51+', 51+'], size=30)

voter_preferences = np.random.choice(['Candidate A', 'Candidate B'], size=30)

In [41]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

# Generate synthetic data (replace with actual data)
np.random.seed(0) 
age_groups = np.random.choice(['18-30', '31-50', '51+'], size=500) 
voter_preferences = np.random.choice(['Candidate A', 'Candidate B'], size=500)

# Create a DataFrame for easier manipulation
data = pd.DataFrame({'Age Group': age_groups, 'Voter Preference': voter_preferences})

# Create a contingency table
contingency_table = pd.crosstab(data['Age Group'], data['Voter Preference'])

# Perform Chi-Square test
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

# Print results
print("Chi-Square Statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)

# Interpretation
if p_value < 0.05:  # Assuming a significance level of 0.05
    print("There is a statistically significant association between age groups and voter preferences.")
else:
    print("There is no statistically significant association between age groups and voter preferences.")

Chi-Square Statistic: 0.8779923945254768
p-value: 0.6446832311860852
Degrees of Freedom: 2
Expected Frequencies:
 [[96.824 85.176]
 [89.908 79.092]
 [79.268 69.732]]
There is no statistically significant association between age groups and voter preferences.


22  A company conducted a customer satisfaction survey to determine if there is a significant relationship
between product satisfaction levels (Satisfied, Neutral, Dissatisfied) and the region where customers are
located (East, West, North, South). The survey data is summarized in a contingency table. Conduct a ChiSquare test to determine if there is a significant relationship between product satisfaction levels and
customer regions.


Sample data:

```python

#Sample data: Product satisfaction levels (rows) vs. Customer regions (columns)

data = np.array([[50, 30, 40, 20], [30, 40, 30, 50], [20, 30, 40, 30]])

In [44]:
import numpy as np
from scipy.stats import chi2_contingency

# Sample data: Product satisfaction levels (rows) vs. Customer regions (columns)
data = np.array([[50, 30, 40, 20], 
                 [30, 40, 30, 50], 
                 [20, 30, 40, 30]])

# Perform Chi-Square test
chi2_stat, p_value, dof, expected = chi2_contingency(data)

# Print results
print("Chi-Square Statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)

# Interpretation
if p_value < 0.05:  # Assuming a significance level of 0.05
    print("There is a statistically significant relationship between product satisfaction levels and customer regions.")
else:
    print("There is no statistically significant relationship between product satisfaction levels and customer regions.")

Chi-Square Statistic: 27.777056277056275
p-value: 0.00010349448486004387
Degrees of Freedom: 6
Expected Frequencies:
 [[34.14634146 34.14634146 37.56097561 34.14634146]
 [36.58536585 36.58536585 40.24390244 36.58536585]
 [29.26829268 29.26829268 32.19512195 29.26829268]]
There is a statistically significant relationship between product satisfaction levels and customer regions.


23 A company implemented an employee training program to improve job performance (Effective, Neutral,
Ineffective). After the training, they collected data from a sample of employees and classified them based
on their job performance before and after the training. Perform a Chi-Square test to determine if there is a
significant difference between job performance levels before and after the training.


Sample data:

```python

# Sample data: Job performance levels before (rows) and after (columns) training

data = np.array([[50, 30, 20], [30, 40, 30], [20, 30, 40]])

In [47]:
import numpy as np
from scipy.stats import chi2_contingency

# Sample data: Job performance levels before (rows) and after (columns) training
data = np.array([[50, 30, 20], 
                 [30, 40, 30], 
                 [20, 30, 40]])

# Perform Chi-Square test
chi2_stat, p_value, dof, expected = chi2_contingency(data)

# Print results
print("Chi-Square Statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)

# Interpretation
if p_value < 0.05:  # Assuming a significance level of 0.05
    print("There is a statistically significant difference in job performance levels before and after the training.")
else:
    print("There is no statistically significant difference in job performance levels before and after the training.")

Chi-Square Statistic: 22.161728395061726
p-value: 0.00018609719479882557
Degrees of Freedom: 4
Expected Frequencies:
 [[34.48275862 34.48275862 31.03448276]
 [34.48275862 34.48275862 31.03448276]
 [31.03448276 31.03448276 27.93103448]]
There is a statistically significant difference in job performance levels before and after the training.
