In [1]:
from statsmodels.stats import weightstats as stests
from scipy import  stats
import statsmodels.api as sm
import numpy as np


In [4]:
def TwoSampZTest(samp_mean_1, samp_mean_2, samp_std_1, samp_std_2, n1, n2):
  # Calculate the test statistic
  denominator = np.sqrt((samp_std_1**2 / n1) + (samp_std_2**2 / n2))
  z_score = (samp_mean_1 - samp_mean_2) / denominator
  return z_score

# Q1. Average hourly wage

The average hourly wage of a sample of 150 workers in plant 'A' was Rs.2·87 with a standard deviation of Rs. 1·08.

The average wage of a sample of 200 workers in plant 'B' was Rs. 2·56 with a standard deviation of Rs. 1·28.

(i) Calculate the Z-score for this scenario.

(ii) Can an applicant safely assume that the hourly wages paid by plant 'A' are higher than those paid by plant 'B' at a 1% significance level?

In [5]:
# Given data
sample_mean_X = 2.87 # Average fuel efficiency for Group X (Engine X)
sample_mean_Y = 2.56 # Average fuel efficiency for Group Y (Engine Y)
sample_std_X = 1.08 # Standard deviation for Group X
sample_std_Y = 1.28 # Standard deviation for Group Y
significance_level = 0.01
sample_size_X = 150 # Sample size for Group X
sample_size_Y = 200 # Sample size for Group Y

#Calculate the z-score using the function
z_score = TwoSampZTest(sample_mean_X, sample_mean_Y, sample_std_X, sample_std_Y, sample_size_X, sample_size_Y)
z_score


2.453219634102559

In [8]:
# Calculate the one-tailed p-value
p_value = 1-stats.norm.cdf(z_score)

# Compare the p-value to the significance level
if p_value < significance_level:
    conclusion = "Reject the null hypothesis. Hourly wages in plant 'A' are higher than those in plant 'B' at a 1% significance level."
else:
    conclusion = "Fail to reject the null hypothesis. No significant difference in hourly wages between plant 'A' and 'B' at a 1% significance level."

# Print the results
print(f'z-score: {z_score:.4f}')
print(f'p-value: {p_value:.4f}')
print('Conclusion:', conclusion)

z-score: 2.4532
p-value: 0.0071
Conclusion: Reject the null hypothesis. Hourly wages in plant 'A' are higher than those in plant 'B' at a 1% significance level.


# Q2. Complexity of SQL queries
The Head of Data Analyst Department is conducting a comparative analysis of the complexity of SQL queries written by two analysts, namely Analyst X and Analyst Y.

He has gathered data on the number of lines of code for each SQL query.
```javascript
Analyst X's SQL lines of code: [15, 18, 20, 17, 16, 19, 22, 16, 18, 21, 23, 18, 17, 19, 20, 24, 25, 26, 27, 28, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
Analyst Y's SQL lines of code: [14, 17, 19, 16, 15, 18, 21, 15, 17, 20, 22, 17, 16, 18, 19, 23, 24, 25, 26, 27, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
```
The analyst hypothesizes that Analyst Y writes less complex code compared to Analyst X. To investigate this hypothesis, conduct an appropriate test with a 90% confidence interval.

In [21]:
# Number of lines of code for SQL queries by Analyst X
sql_lines_X = [15, 18, 20, 17, 16, 19, 22, 16, 18, 21, 23, 18, 17, 19, 20, 24, 25, 26, 27, 28, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]

# Number of lines of code for SQL queries by Analyst Y
sql_lines_Y = [14, 17, 19, 16, 15, 18, 21, 15, 17, 20, 22, 17, 16, 18, 19, 23, 24, 25, 26, 27, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

# Perform two-sample Z-test
z_score, p_value = stests.ztest(sql_lines_X, sql_lines_Y, alternative ='larger')

# Confidence level
confidence_level = 0.90
alpha = 1 - confidence_level

# Print the results
print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")

if p_value < alpha:
    conclusion = "Reject the null hypothesis. Hourly wages in plant 'A' are higher than those in plant 'B' at a 1% significance level."
else:
    conclusion = "Fail to reject the null hypothesis. No significant difference in hourly wages between plant 'A' and 'B' at a 1% significance level."

# Print the results
print(f'z-score: {z_score:.4f}')
print(f'p-value: {p_value:.4f}')
print('Conclusion:', conclusion)


Z-score: 0.9186781563938095
P-value: 0.17913196923296043
z-score: 0.9187
p-value: 0.1791
Conclusion: Fail to reject the null hypothesis. No significant difference in hourly wages between plant 'A' and 'B' at a 1% significance level.


# Q3. Rice and Wheat

Out of a sample of 1,000 people residing in Maharashtra, 540 are rice eaters, while the rest consume wheat primarily.

Can we assume that rice and wheat are equally popular in this state at a 5% significance level?

In [35]:
# Perform the Z-proportions test
total_population = 1000
rice_eaters = 540
wheat_eaters = total_population - rice_eaters
assumed_proportion = 0.5  # Assuming equal popularity of rice and wheat

# Hypothesis test
z_stat, p_value = sm.stats.proportions_ztest(rice_eaters, total_population, assumed_proportion, alternative='two-sided')

# Print the results
print(f"Z-statistic = {z_stat}")
print(f"P-value = {p_value}")

alpha = 0.05
if p_value < alpha:
    decision = "Reject the null hypothesis"
else:
    decision = "Fail to reject the null hypothesis"

if decision == "Reject the null hypothesis":
    conclusion = "There is enough evidence to conclude that the Wheat and Rice conversion are equally popular."
else:
    conclusion = "There is not enough evidence to conclude that the Wheat and Rice conversion rates are equally popular."

# Print the results
print(f"Decision: {decision}")
print(f"Conclusion: {conclusion}")

Z-statistic = 2.537956625422939
P-value = 0.011150180283180655
Decision: Reject the null hypothesis
Conclusion: There is enough evidence to conclude that the Wheat and Rice conversion are equally popular.


# Q4. Politician Support for Environment

A state senator cannot decide how to vote on an environmental protection bill.

The senator decides to request a survey and if the proportion of registered voters supporting the bill exceeds 0.60, she will vote for it.

A random sample of 750 voters is selected and 495 are found to support the bill.

Conduct an appropriate test at a 90% confidence interval.

In [25]:
n = 750 # Sample size
x = 495 # Number of customers dissatisfied with the new system
p_hat = x/n # Sample proportion
p = 0.60 # Hypothesized proportion

# Calculate test statistic value for one sample proportion test
Z = (p_hat - p) / np.sqrt((p * (1 - p)) / n)
print('Test statistic:',Z)

# Calculate the p-value for the test statistic
p_value = 1 - stats.norm.cdf(Z)
print('p-value:', p_value)

# Define the significance level
alpha = 0.10

# Make a decision based on the p-value and significance level
if p_value < alpha:
  print('Reject the null hypothesis.')
else:
  print('Fail to reject the null hypothesis.')

Test statistic: 3.354101966249688
p-value: 0.0003981150787953913
Reject the null hypothesis.


# Q5. Find the Hypotheses

A fair coin should land showing tails with a relative frequency of 50% in a long series of flips.

John was told by a friend that spinning a coin on a flat surface, rather than flipping it would not be fair. Spinning would cause the coin to be more biased towards giving tails.

To test this claim, he spun his own penny 100 times. It was observed that the penny showed tails in 60% of the spins.

Let p represent the proportion of spins that this penny would land showing tails.

What are appropriate hypotheses for John's significance test?

## Correct option: Null: p = 50%, Alternative: p > 50%

## Explanation:
Null Hypothesis (H0):

The null hypothesis represents the assumption that there is no difference from the expected proportion of tails for a fair coin,
i.e. H0: p=50
Alternative Hypothesis (H1):

The alternative hypothesis expresses the claim being tested, which is that spinning the penny makes it more likely to land showing tails, implying that the proportion of tails may be greater than 50%,
i.e. H1: p>50

# Q6. Quidditch teams

The Quidditch teams at Hogwarts conducted tryouts for two positions: Chasers and Seekers.

In Group Chasers, out of 90 students who tried out, 57 were selected. In Group Seekers, out of 120 students who tried out, 98 were selected.

Is there a significant difference in the proportion of students selected for Chasers and Seekers positions?

Conduct a test at 90% confidence level.

In [30]:

# Data for Chasers
selected_chasers = 57
total_chasers = 90

# Data for Seekers
selected_seekers = 98
total_seekers = 120

# Perform two-sample Z-proportion test
z_stat, p_value = sm.stats.proportions_ztest([selected_chasers, selected_seekers], [total_chasers, total_seekers], alternative = 'two-sided')

# Confidence level
confidence_level = 0.90
# Calculate the critical value for a two-tailed test
alpha = 1 - confidence_level

# Print the results
print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")

# Decision Rule
if p_value < alpha:
   print("Reject the null hypothesis. There is a significant difference in the proportion of students selected for Chasers and Seekers positions.")
else:
   print("Fail to reject the null hypothesis. There is no significant difference in the proportion of students selected for Chasers and Seekers positions.")

Z-statistic: -2.990306921349541
P-value: 0.002786972588958094
Reject the null hypothesis. There is a significant difference in the proportion of students selected for Chasers and Seekers positions.


# Q7. Best Season of Naruto

As a product manager, you want to evaluate the user satisfaction for two different seasons of Naruto Shippuden (Season 1 and Season 2).

You collected feedback from 250 viewers who watched Season 1 of Naruto Shippuden, and 120 expressed satisfaction. Similarly, for Season 2, you gathered data from 300 viewers, and 150 of them expressed satisfaction.

Conduct an appropriate test at a 95% confidence interval to determine if there's a higher user satisfaction for Season 2 than for Season 1.

In [32]:

total = np.array([250, 300])
satisfied = np.array([120, 150])

# Perform two-sample Z-proportion test
z_stat, p_value = sm.stats.proportions_ztest(satisfied, total, alternative = 'smaller')

# Confidence level
confidence_level = 0.95
# Calculate the critical value for a two-tailed test
alpha = 1 - confidence_level

# Print the results
print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")

# Decision Rule
if p_value < alpha:
   print("Reject the null hypothesis. Season 2 is better than Season 1 in terms of customer satisfaction.")
else:
   print("Fail to reject the null hypothesis. Season 2 is equal to Season 1 in terms of customer satisfaction.")

Z-statistic: -0.46717659215115714
P-value: 0.3201867697265242
Fail to reject the null hypothesis. Season 2 is equal to Season 1 in terms of customer satisfaction.


# Q8. Assess Customer Satisfaction

A company is surveying to assess customer satisfaction with two different support approaches.

The company collects feedback from customers subjected to each approach and wants to compare the satisfied customers.

Which statistical test would be most appropriate for the company to compare the satisfied customers between the two support approaches, and what would be the relevant null hypothesis?

- One-sample z-test for mean, H0: The proportion of satisfied customers is different for the two customer support approaches.
- Two-sample z-test for mean, H0: The proportion of satisfied customers is the same for both customer support approaches.
- One-sample z-proportion test H0: The proportion of satisfied customers is different for the two customer support approaches.
- Two-sample z-proportion test, H0: The proportion of satisfied customers is the same for both customer support approaches.

## Correct Answer: 
Two-sample z-proportion test, H0: The proportion of satisfied customers is the same for both customer support approaches.

## Explanation:

In this scenario, the company is comparing the proportion of satisfied customers between two different groups (support approaches).
Therefore, we need a statistical test that compares the proportions between two independent samples.

- One-sample z-test for mean: This is not suitable as it compares the mean of a single sample to a known mean.
- Two-sample z-test for mean: This is not applicable as we are dealing with proportions, not means.
- One-sample z-proportion test: This is only suitable for comparing the proportion of a single sample to a known proportion.
- Two-sample z-proportion test: This is the best option as it specifically compares the proportions of two independent samples.
Null Hypothesis (H0): The proportion of satisfied customers is the same for both customer support approaches.

Alternative Hypothesis (H1): The proportion of satisfied customers is different for the two customer support approaches.

By performing a two-sample z-proportion test, the company can statistically assess whether the observed difference in customer satisfaction between the two support approaches is simply due to chance or reflects a real difference in the effectiveness of the approaches.