# Welch Test
## Independent samples t-test that assumes unequal variances)

### Here's a brief note on Welch's t-test:

Welch's t-test, also known as the unequal variances t-test, is a statistical test used to compare the means of two independent groups when the variances of the groups are not assumed to be equal. It is an alternative to the independent t-test with equal variances assumption (pooled t-test).

### Key points about Welch's t-test:

**Unequal Variances:** Welch's t-test is specifically designed for situations where the variances of the two groups being compared are different.

**Degrees of Freedom:** Welch's t-test uses a modified formula for calculating the degrees of freedom, which takes into account the unequal variances of the groups.

**Test Statistic:** Welch's t-test calculates a t-statistic, which measures the difference between the group means relative to the variability within each group.

**Assumptions:** Welch's t-test does not assume equal variances or equal sample sizes between the groups. However, it still assumes that the data within each group is approximately normally distributed.

**Interpretation:** Similar to other t-tests, Welch's t-test produces a p-value that indicates the statistical significance of the difference between the means of the two groups. If the p-value is below a predetermined significance level (e.g., 0.05), it suggests that there is a significant difference between the group means.

**When to Use:** Welch's t-test is recommended when the assumption of equal variances is violated or unknown, or when dealing with groups of different sizes.

Overall, Welch's t-test provides a reliable method for comparing the means of two independent groups when the assumption of equal variances cannot be met. It allows for more robust and accurate statistical inference, particularly in situations where the variances of the groups differ substantially.

## Here is an outline of the steps we'll cover:

* Importing the necessary libraries
* Loading the dataset
* Data preprocessing (if needed)
* Formulating the null and alternative hypotheses
* Choosing the appropriate t-test based on the study design
* Conducting the t-test
* Interpreting the results and making conclusions

Let's perform a welch-test using the dataset "bloodg".

### Here's the code with explanations for each analytical step

### Step 1: Import the necessary libraries
* We'll import the required libraries, including numpy, pandas, and scipy.stats.

In [3]:
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
import os

In [4]:
# Changing working directory
os.chdir("C:\\Users\\HP\\Desktop\\JITSOLUTIONS\\Datasets0")

In [5]:
pwd

'C:\\Users\\HP\\Desktop\\JITSOLUTIONS\\Datasets0'

________________

### Step 2: Load the dataset
* Assuming your dataset is in a CSV file format, you can load it into a Pandas DataFrame using the read_csv() function.
* Assuming your dataset is in a xlsx file format, you can load it into a Pandas DataFrame using the read_excel() function.

In [32]:
# Importing dataset
df = pd.read_csv("bloodg.csv")

In [33]:
df.head()

Unnamed: 0,group,blood_pressure
0,1,94.949597
1,1,90.409626
2,1,95.462942
3,1,95.711593
4,1,105.650325


### Step 3: Formulating the null and alternative hypotheses
Define the null hypothesis (H0) and alternative hypothesis (H1) based on your research question. These hypotheses should be specific to your analysis.

In [8]:
# Example hypotheses
# H0: There is no significant difference between Group 1 and Group 2.
# H1: The is significant difference between Group 1 and Group 2.

In [20]:
# Extract the columns for the paired samples
group_1 = df[df['group'] == 1]['blood_pressure']
group_2 = df[df['group'] == 2]['blood_pressure']

In [35]:
df1=pd.DataFrame(group_1)

In [36]:
df2=pd.DataFrame(group_2)

In [40]:
df1.head()

Unnamed: 0,blood_pressure
0,94.949597
1,90.409626
2,95.462942
3,95.711593
4,105.650325


In [52]:
df1.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
blood_pressure,200.0,100.256984,9.602504,77.571338,94.124365,100.17083,106.50145,140.434395


In [41]:
df2.head()

Unnamed: 0,blood_pressure
200,112.047931
201,117.970208
202,133.632701
203,101.544808
204,121.880076


In [53]:
df2.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
blood_pressure,200.0,110.147865,11.377692,79.919855,102.625807,109.404953,117.723177,144.026969


### Step 4: Conduct the Independent-sample t-test
* Carryout levene's test for homogeneity of variance.
* Use the ttest_ind function from the scipy.stats module to perform the independent-sample t-test. 
* Pass the group_1 and group_2 variables as arguments.

#### Levene's Test:

* Levene's test is a statistical test that examines the equality of variances between groups.
* It tests the null hypothesis that the variances are equal.
* You can use the levene() function from the scipy.stats module to perform Levene's test.

In [46]:
from scipy.stats import levene

In [47]:
# Perform Levene's test
t_stat, p_val = levene(group_1, group_2)

alpha = 0.05

if p_val < alpha:
    print("The variances are significantly different. Use Welch's t-test or another appropriate variant.")
else:
    print("The variances are not significantly different. You can consider using the pooled t-test if other assumptions are met.")

The variances are significantly different. Use Welch's t-test or another appropriate variant.


In [48]:
# Perform the independent sample t-test assuming unequal variance
t_statistic, p_value = ttest_ind(group_1, group_2,equal_var=False)

In [49]:
print(f"t_statistic: {t_statistic}")
print(f"p_value: {p_value}")

t_statistic: -9.395207080912995
p_value: 5.03969621392083e-19


### Step 5: Interpret the results and make conclusions
* Compare the obtained p-value with a predetermined significance level (α) to determine if the results are statistically significant. 
* You can print the results and provide a conclusion based on the outcome of the test.

In [50]:
alpha = 0.05

if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference between the blood_pressure of group 1 and group 2.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference between the blood_pressure of group 1 and group 2.")

Reject the null hypothesis. There is a significant difference between the blood_pressure of group 1 and group 2.
