# Practical 4: Hypothesis Testing\n\n## Objective:\n- Formulate null and alternative hypotheses for a given problem.\n- Conduct a hypothesis test using appropriate statistical tests (e.g., t-test, chi-square test).\n- Interpret the results and draw conclusions based on the test outcomes.

### 1. Loading the Data\nWe begin by loading the `cars.csv` dataset and selecting the columns relevant for our hypothesis tests. We will also handle missing values by dropping the corresponding rows.

In [None]:
import pandas as pd\nfrom scipy.stats import ttest_ind, chi2_contingency\n\nprint(\"--- Loading Data ---\")\ndf = pd.read_csv('cars.csv')\ncolumns_to_use = ['Engine Information.Engine Statistics.Horsepower', 'Engine Information.Fuel Type', 'Identification.Make', 'Engine Information.Engine Statistics.Cylinders']\ndf_subset = df[columns_to_use].dropna().copy()\nprint(\"Original Data (subset):\")\nprint(df_subset.head())\nprint(\"\\n\")

### 2. Independent Samples T-test\nWe want to determine if there is a statistically significant difference in the mean horsepower of cars based on their fuel type (Gasoline vs. Diesel).\n\n**Hypotheses:**\n- **H0 (Null Hypothesis):** The mean horsepower of gasoline cars is equal to the mean horsepower of diesel cars.\n- **H1 (Alternative Hypothesis):** The mean horsepower of gasoline cars is not equal to the mean horsepower of diesel cars.

In [None]:
print(\"--- Independent Samples T-test ---\")\nfuel_type_col = 'Engine Information.Fuel Type'\nhp_col = 'Engine Information.Engine Statistics.Horsepower'\n\ngasoline_hp = df_subset[df_subset[fuel_type_col] == 'Gasoline'][hp_col]\ndiesel_hp = df_subset[df_subset[fuel_type_col] == 'Diesel'][hp_col]\n\nt_stat, p_value = ttest_ind(gasoline_hp, diesel_hp, equal_var=False) # Welch's t-test\n\nprint(f\"T-statistic: {t_stat:.4f}\")\nprint(f\"P-value: {p_value:.4f}\")\n\nalpha = 0.05\nif p_value < alpha:\n    print(\"Conclusion: Reject the null hypothesis.\")\n    print(\"There is a statistically significant difference in horsepower between gasoline and diesel cars.\")\nelse:\n    print(\"Conclusion: Fail to reject the null hypothesis.\")\n    print(\"There is no statistically significant difference in horsepower between gasoline and diesel cars.\")\nprint(\"\\n\")

### 3. Chi-square Test of Independence\nHere, we want to check if there is a significant association between two categorical variables: the car's make and its number of cylinders.\n\n**Hypotheses:**\n- **H0 (Null Hypothesis):** There is no association between the car make and the number of cylinders (they are independent).\n- **H1 (Alternative Hypothesis):** There is an association between the car make and the number of cylinders (they are dependent).

In [None]:
print(\"--- Chi-square Test of Independence ---\")\nmake_col = 'Identification.Make'\ncylinders_col = 'Engine Information.Engine Statistics.Cylinders'\n\n# To make the test meaningful, we'll consider a subset of car makes.\ntop_makes = df_subset[make_col].value_counts().nlargest(5).index\ndf_filtered_makes = df_subset[df_subset[make_col].isin(top_makes)]\n\n# Create a contingency table\ncontingency_table = pd.crosstab(df_filtered_makes[make_col], df_filtered_makes[cylinders_col])\nprint(\"Contingency Table (Make vs. Cylinders):\")\nprint(contingency_table)\nprint(\"\\n\")\n\nchi2, p_value_chi, dof, expected = chi2_contingency(contingency_table)\n\nprint(f\"Chi-square statistic: {chi2:.4f}\")\nprint(f\"P-value: {p_value_chi:.4f}\")\nprint(f\"Degrees of freedom: {dof}\")\n\nif p_value_chi < alpha:\n    print(\"Conclusion: Reject the null hypothesis.\")\n    print(\"There is a statistically significant association between car make and the number of cylinders.\")\nelse:\n    print(\"Conclusion: Fail to reject the null hypothesis.\")\n    print(\"There is no statistically significant association between car make and the number of cylinders.\")\nprint(\"\\n\")\n\nprint(\"--- Practical 4 execution finished ---\")