# <center> **Hypothesis Testing**

### Null & Alternate Hypothesis

**Question 1:**
Suppose you are conducting a study to determine whether a new drug is effective in reducing blood pressure. Define the null hypothesis (H0) and the alternate hypothesis (H1) for this study. Explain the significance of each hypothesis in the context of your research.

**Question 2:**
You are working with a manufacturing company that claims its new manufacturing process can produce car batteries with a longer lifespan than the current process. Formulate the null and alternate hypotheses for this scenario. Additionally, explain why it's essential to have clear and well-defined hypotheses before conducting hypothesis testing.

**Question 3:**
A fast-food restaurant chain is concerned that the average delivery time for online orders has increased in recent months, potentially affecting customer satisfaction. Create the null and alternate hypotheses for testing whether there has been a significant increase in delivery times.

*These questions will help you to practice defining and understanding null and alternate hypotheses in various real-world scenarios.*

### Rejection Region Approach & Z-test Implementation

**Question 1:**
You are conducting a Z-test to determine whether the mean score of students in your class is significantly different from the national average. You have collected a sample of 30 student scores, and you know the population standard deviation is 10. Explain the Rejection Region Approach in the context of this test. Provide the formula for calculating the Z-score and describe how you would determine whether to reject the null hypothesis.

**Question 2:**
A pharmaceutical company is testing a new drug to reduce cholesterol levels in patients. They want to know if the new drug is more effective than the current medication. You have been given data on the cholesterol levels of 50 patients who took the new drug and 50 patients who took the current medication. Using a significance level of 0.05, perform a Z-test to determine if there is a significant difference in cholesterol reduction between the two groups. Show all the steps involved in the test, including setting up the null and alternate hypotheses, calculating the Z-score, and making a decision based on the Rejection Region Approach.

**Question 3:**
A car manufacturer claims that their new electric car model can travel an average of 250 miles on a single charge. To test this claim, a consumer group selects a random sample of 36 cars from the manufacturer's production line and records their mileage on a single charge. Using the Rejection Region Approach and a significance level of 0.01, conduct a Z-test to determine if there is enough evidence to support the manufacturer's claim. Provide the step-by-step process, including the formulation of hypotheses and the calculation of the Z-score.

**Question 4:**
A researcher is investigating whether there is a significant difference in the mean salary of employees with a master's degree and those with a bachelor's degree in a particular industry. A random sample of 100 employees with master's degrees and 100 employees with bachelor's degrees is collected. Using the Rejection Region Approach and a significance level of 0.05, perform a Z-test to determine if there is a significant difference in mean salaries between the two groups. Explain how you would set up the null and alternate hypotheses and interpret the results of the test.

### Type 1 & Type 2 Errors

**Case Study 1: Medical Testing**

**Scenario**:
A medical company has developed a new diagnostic test for a rare disease. The test is highly sensitive, meaning it rarely misses true cases of the disease (low Type II error rate). However, the company is concerned about the Type I error rate, as false positives can lead to unnecessary stress and further testing for patients.

**Assignment:**

1. Describe the potential consequences of a Type I error in this medical testing scenario.
2. Explain the potential consequences of a Type II error.
3. Discuss the trade-off between Type I and Type II errors in medical testing and how it influences the decision on test thresholds.

**Case Study 2: Criminal Justice System**

**Scenario:**
In a criminal trial, the prosecution aims to convict a defendant based on evidence. The consequences of a wrongful conviction (Type I error) are severe, but so are the consequences of letting a guilty person go free (Type II error).

**Assignment:**

1. Describe a situation in the criminal justice system where a Type I error can occur and its implications.
2. Provide an example of a situation where a Type II error can occur and its potential consequences.
3. Discuss the challenges and ethical considerations in finding a balance between minimizing Type I and Type II errors in criminal trials.

**Case Study 3: Product Quality Control**

**Scenario**:
A manufacturing company uses a quality control process to ensure that its products meet specific standards. Rejecting a good product (Type I error) can be costly, while accepting a defective product (Type II error) can harm the company's reputation.

**Assignment**:

1. Describe a scenario in product quality control where a Type I error may occur and its financial implications.
2. Provide an example of a situation where a Type II error can occur and its potential impact on the company's reputation.
3. Discuss strategies that the company can implement to strike a balance between minimizing Type I and Type II errors in its quality control process.

### P-value Approach & T-test Implementation

**Case Study 1: Education**

**Scenario**:
A school district is implementing a new teaching method aimed at improving students' math scores. They want to know if the new method is effective. A random sample of 30 students who received the new instruction method is compared to a sample of 30 students who received the traditional method. The district wants to use a t-test to analyze the data.

**Assignment:**

1. Formulate the null and alternate hypotheses for this study.
2. Perform a t-test using the collected data and calculate the t-statistic.
3. Calculate the P-value and interpret its significance.
Based on the P-value and a significance level of 0.05, make a decision regarding the null hypothesis.

**Case Study 2: Baseball Player Performance**

**Scenario**:
You are a data analyst for a professional baseball team. The team's management wants to assess whether a new training program has led to a significant improvement in the batting averages of their players. You have collected batting average data for two groups of players: one group who underwent the new training program (Group A) and another group who did not (Group B).

**Data**:

**Group A** (New Training Program):

Sample Size: 25 players,
Mean Batting Average: 0.285,
Sample Standard Deviation: 0.040.

**Group B** (No Training Program):

Sample Size: 30 players,
Mean Batting Average: 0.270,
Sample Standard Deviation: 0.045.

**Assignment:**

1. Formulate the null and alternate hypotheses to test whether the new training program has led to a significant improvement in batting averages.
2. Conduct a two-sample t-test using the provided data. Calculate the t-statistic.
3. Calculate the degrees of freedom for the test.
4. Calculate the p-value associated with the test.
5. Interpret the results and make a recommendation to the team's management based on the p-value approach.

## 📈 Advanced Task: Linear Regression
Predicting Price using hardware features.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

features = ['RAM', 'Bty_Pwr', 'Int_Mem', 'Px_h', 'Px_w', 'Weight']
X = df[features]
y = df['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)

print("Coefficients:", lr.coef_)
print("Intercept:", lr.intercept_)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

## 🔢 Advanced Task: Principal Component Analysis (PCA)

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA(n_components=2)
components = pca.fit_transform(X_scaled)

plt.figure(figsize=(8,6))
plt.scatter(components[:, 0], components[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('PCA Projection Colored by Price')
plt.colorbar(label='Price')
plt.grid(True)
plt.show()

## 💡 Advanced Task: Top Configurations for Price

In [None]:
# Top 5 price rows with key features
df.sort_values(by='Price', ascending=False)[features + ['Price']].head()