## Statistical_Parametric_T_Tests

### Problem Statement: Analyzing Factors Influencing Subscription to a Term Deposit

A bank has collected data on various attributes of its clients and their interactions with the bank's marketing campaigns. The bank is interested in understanding the factors that influence whether a client subscribes to a term deposit or not. The bank wants to perform statistical tests to analyze these factors and gain insights into the data.

#### Objective:

The objective of this analysis is to identify significant differences and relationships between various attributes and the likelihood of a client subscribing to a term deposit. This can provide insights into which attributes play a key role in influencing the clients' decision to subscribe, helping the bank tailor its marketing strategies and improve its campaign success rate.

#### Analysis:

To achieve this objective, we will perform three different types of t-tests on the provided dataset: one-sample t-test, two-sample t-test, and paired sample t-test. Each test will target specific attributes and relationships to determine their statistical significance with respect to the subscription outcome. By conducting these tests, we can uncover patterns, trends, and key factors that impact the clients' decisions regarding term deposit subscriptions.

#### Data Attributes:

Age: The age of the clients.

Job: The type of job the client has.

Marital: The marital status of the client.

Education: The level of education of the client.

Default: Whether the client has credit in default.

Balance: The average yearly balance in euros.

Housing: Whether the client has a housing loan.

Loan: Whether the client has a personal loan.

Contact: The communication type used for the last contact.

Day: The last contact day of the month.

Month: The last contact month of the year.

Duration: The duration of the last contact in seconds.

Campaign: The number of contacts performed during the current campaign.

Pdays: The number of days since the client was last contacted from a previous campaign.

Previous: The number of contacts performed before the current campaign.

Poutcome: The outcome of the previous marketing campaign.

Y: Whether the client subscribed to a term deposit.

In [1]:
# importng the require libraries
import pandas as pd
import numpy as np
from scipy import stats 

In [2]:
data = pd.read_csv(r"C:\Users\venka\OneDrive\Documents\Datasets\bank_dataset.csv")
data

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,Target
0,58,management,married,tertiary,no,2143,yes,no,unknown,5,may,261,1,-1,0,unknown,no
1,44,technician,single,secondary,no,29,yes,no,unknown,5,may,151,1,-1,0,unknown,no
2,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,may,76,1,-1,0,unknown,no
3,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,may,92,1,-1,0,unknown,no
4,33,unknown,single,unknown,no,1,no,no,unknown,5,may,198,1,-1,0,unknown,no
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45206,51,technician,married,tertiary,no,825,no,no,cellular,17,nov,977,3,-1,0,unknown,yes
45207,71,retired,divorced,primary,no,1729,no,no,cellular,17,nov,456,2,-1,0,unknown,yes
45208,72,retired,married,secondary,no,5715,no,no,cellular,17,nov,1127,5,184,3,success,yes
45209,57,blue-collar,married,secondary,no,668,no,no,telephone,17,nov,508,4,-1,0,unknown,no


### Proposed Hypothesis For One_Sample_T-Test:

I want to test whether the average age of my data sample is different from 40 years.

#### Null Hypothesis (H0): 

The average age of the population is equal to 40 years.
    
#### Alternative Hypothesis (Ha): 

The average age of the population is not equal to 40 years.

Mathematically, this can be expressed as:

H0: μ = 40

Ha: μ ≠ 40

Here, μ represents the population mean age.

After performing the one-sample t-test on the "age" column, if the resulting p-value is less than your chosen significance level (often 0.05), you may reject the null hypothesis. This would indicate that there is enough evidence to suggest that the average age of the population is significantly different from 40 years. On the other hand, if the p-value is greater than your significance level, you would fail to reject the null hypothesis, suggesting that you do not have enough evidence to claim that the average age is different from 40 years

In [3]:
# Extract relevant columns for each test
# One-Sample T-Test: Using 'age' column for this example
one_sample_data = data['age']

In [4]:
cal_population_mean = data["age"].mean()
cal_population_mean

40.93621021432837

### One_Sample T-Test

In [5]:
population_mean = 40.014 # Adjust this value based on your data
t_statistic, p_value = stats.ttest_1samp(one_sample_data, population_mean)
print("*" * 40)
print("One-Sample T-Test:")
print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("*" * 40)

****************************************
One-Sample T-Test:
t-statistic: 18.46622385678433
p-value: 7.349675256922722e-76
****************************************


The p-value you provided is very close to zero: approximately 7.35e-76. In scientific notation, this means 7.35 multiplied by 10 raised to the power of -76, which is an extremely small value.

In hypothesis testing, a small p-value indicates strong evidence against the null hypothesis. In most cases, when the p-value is very small (typically below the chosen significance level, e.g., 0.05), you would reject the null hypothesis.

In your case, since the p-value is effectively zero, you would reject the null hypothesis in favor of the alternative hypothesis. This means that the sample you've tested provides strong evidence to suggest that the population mean is significantly different from the hypothesized value.

In summary, a p-value of 7.35e-76 is considered extremely small and indicates a very strong rejection of the null hypothesis, supporting the claim that there is a significant difference between the sample mean and the hypothesized value.

Here's how you can interpret the results based on the provided p-value:

If p-value < alpha (usually 0.05): Reject the null hypothesis.

If p-value >= alpha (usually 0.05): Fail to reject the null hypothesis.

In your case, the p-value is  0.05 or 0.01, which is significantly lower than 0.05. This means that you do have enough evidence to reject the null hypothesis.

### Proposed Hypothesis For Two_Sample_T-Test:

I want to perform a two-sample t-test to compare the average balances of two groups: clients with personal loans and clients without personal loans. Here's how I can propose the hypotheses:

#### Null Hypothesis (H0): 

The average balance of clients with personal loans is equal to the average balance of clients without personal loans.

#### Alternative Hypothesis (Ha): 

The average balance of clients with personal loans is different from the average balance of clients without personal loans.

Mathematically, this can be expressed as:

H0: μ1 = μ2

Ha: μ1 ≠ μ2

Here, μ1 represents the population mean balance of clients with personal loans, and μ2 represents the population mean balance of clients without personal loans.

After performing the two-sample t-test, if the resulting p-value is less than your chosen significance level (often 0.05), you may reject the null hypothesis. This would indicate that there is enough evidence to suggest that the average balance of clients with personal loans is significantly different from the average balance of clients without personal loans.

Conversely, if the p-value is greater than your significance level, you would fail to reject the null hypothesis, indicating that you do not have enough evidence to claim a significant difference in average balances between the two groups.

### Two_Sample T-Test

In [6]:
# Two-Sample T-Test: Using 'balance' column for Group A and 'loan' column for Group B for this example
group_A = data['balance']
group_B = data[data['loan'] == 'yes']['balance']  # Example: Group B contains balance for clients with personal loans

In [7]:
# Perform Two-Sample T-Test
t_statistic, p_value = stats.ttest_ind(group_A, group_B)
print("*" * 40)
print("Two-Sample T-Test:")
print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("*" * 40)

****************************************
Two-Sample T-Test:
t-statistic: 15.941484422605951
p-value: 4.448233894975193e-57
****************************************


t-Statistic: The t-statistic value, in this case, is 15.941484422605951. The t-statistic represents the difference between the means of the two samples relative to the variability within the samples. A higher t-statistic generally indicates a larger difference between the means.

p-Value: The p-value associated with the test is 4.448233894975193e-57. This p-value is extremely small, effectively zero when rounded. A p-value this small indicates that the observed difference between the two groups is highly unlikely to have occurred due to random chance alone, assuming the null hypothesis is true.

Conclusion:

Based on the provided t-statistic and p-value:

The p-value is much smaller than a typical significance level (e.g., 0.05). This suggests strong evidence against the null hypothesis.

Since the p-value is effectively zero, it is reasonable to reject the null hypothesis.

The extremely small p-value indicates that the observed difference in means between the two groups is highly statistically significant.

This means that there is strong evidence to suggest that there is a significant difference between the two groups being compared.

In summary, with such a small p-value and a relatively high t-statistic, you have strong statistical evidence to reject the null hypothesis and conclude that there is a significant difference between the two groups being compared based on the variable you used in the t-test.

### Proposed Hypothesis For Paired_Sample_T-Test:

In a paired sample t-test, you typically compare the same group of subjects or items before and after a treatment, condition, or event. In this case, i want to perform a paired sample t-test using the 'previous' and 'campaign' columns. Let's propose the hypotheses:

#### Null Hypothesis (H0): 

There is no significant difference between the mean values of 'previous' and 'campaign'.

#### Alternative Hypothesis (Ha): 

There is a significant difference between the mean values of 'previous' and 'campaign'.

Mathematically, this can be expressed as:

H0: μ_previous = μ_campaign

Ha: μ_previous ≠ μ_campaign

Here, μ_previous represents the population mean of the 'previous' column, and μ_campaign represents the population mean of the 'campaign' column.

After performing the paired sample t-test, if the resulting p-value is less than your chosen significance level (often 0.05), you may reject the null hypothesis. This would indicate that there is enough evidence to suggest that there is a significant difference between the mean values of 'previous' and 'campaign', implying that some change or effect has occurred.

Conversely, if the p-value is greater than your significance level, you would fail to reject the null hypothesis, suggesting that you do not have enough evidence to claim a significant difference between the mean values of the two columns.

In [8]:
# Paired Sample T-Test: Using 'previous' and 'campaign' columns for this example
pre_test = data['previous']
post_test = data['campaign']

In [9]:
# Perform Paired Sample T-Test
differences = post_test - pre_test
t_statistic, p_value = stats.ttest_rel(post_test, pre_test)
print("*" * 40)
print("Paired Sample T-Test:")
print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("*" * 40)

****************************************
Paired Sample T-Test:
t-statistic: 118.41495518727946
p-value: 0.0
****************************************


t-Statistic: The t-statistic value is 118.41495518727946. The t-statistic represents the difference between the means of the paired samples relative to the variability within the pairs. A higher t-statistic generally indicates a larger difference between the means.

p-Value: The p-value associated with the test is 0.0. This extremely small p-value effectively indicates that the observed difference between the paired samples is highly unlikely to have occurred due to random chance alone, assuming the null hypothesis is true.

Conclusion:

Based on the provided t-statistic and p-value:

The p-value is effectively zero. This suggests very strong evidence against the null hypothesis.

With such a small p-value, it is reasonable to reject the null hypothesis.

The extremely small p-value indicates that the observed difference between the paired samples is highly statistically significant.

This means that there is strong evidence to suggest that there is a significant difference between the means of the two paired samples.

In summary, the results strongly support the rejection of the null hypothesis, indicating that there is a significant difference between the paired samples based on the variable you used in the paired sample t-test.