# Shapiro-Wilk Test 

The Shapiro-Wilk test is a statistical test used to determine whether a sample comes from a normally distributed population. It is commonly used to assess the normality assumption in parametric testing.

 Test Statistic:

The Shapiro-Wilk test produces a test statistic W, 
which measures how well the sample distribution matches a normal distribution.

 Null Hypothesis (H0): 
 
The sample is drawn from a normally distributed population.

Alternative Hypothesis (H1): 

The sample is not drawn from a normally distributed population.

Decision Rule:

If the p-value is less than the chosen significance level (e.g., 0.05), reject the null hypothesis (conclude that the data does not follow a normal distribution).

If the p-value is greater than the chosen significance level, fail to reject the null hypothesis (conclude that there is no evidence to suggest the data is not normally distributed).

In [2]:
#import library
from scipy.stats import shapiro

In [6]:
sample_data=[12,15,14,10,13,14,11,13,12,15]

In [17]:
sample_data

[12, 15, 14, 10, 13, 14, 11, 13, 12, 15]

In [7]:
data = [sample_data]
stat, p = shapiro(data)
print('Statistics=%.3f, p=%.3f' % (stat, p))

Statistics=0.948, p=0.646


In [8]:
if p > 0.05:
    print('Sample looks Gaussian (fail to reject H0/accept H0)')
else:
    print('Sample does not look Gaussian (reject H0)')


Sample looks Gaussian (fail to reject H0)


# Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov (K-S) test is a non-parametric test used to determine if a sample comes from a specified distribution or to compare two samples to assess whether they come from the same distribution. It is widely used due to its versatility and minimal assumptions.

The test statistic D measures the maximum absolute difference between the empirical distribution function (EDF) of the sample and the cumulative distribution function (CDF) of the specified distribution (one-sample) or between the EDFs of two samples (two-sample).

One-Sample K-S Test: To test if a sample comes from a specific distribution (e.g., normal, exponential).

Two-Sample K-S Test: To compare two samples to determine if they come from the same distribution.

Null Hypothesis (H0): 

For the one-sample test, the sample follows the specified distribution. For the two-sample test, the two samples come from the same distribution.

Alternative Hypothesis (H1): 

For the one-sample test, the sample does not follow the specified distribution. For the two-sample test, the two samples come from different distributions.

Interpretation:

Since the p-value is greater than the significance level (0.05), we fail to reject the null hypothesis. There is no significant evidence to suggest that the sample does not follow a normal distribution.

#K-S test for one sample

In [19]:
data = [12, 15, 14, 10, 13, 14, 11, 13, 12, 15]

In [20]:
data

[12, 15, 14, 10, 13, 14, 11, 13, 12, 15]

In [18]:
#import library
import numpy as np
from scipy.stats import kstest

In [21]:
mean = np.mean(data)
std = np.std(data)
result = kstest(data, 'norm', args=(mean, std))

In [25]:
# Print the result
print("Statistic:", round(result.statistic,4))
print("p-value:", round(result.pvalue,4))

Statistic: 0.1571
p-value: 0.9347


In [35]:
p=result.pvalue

In [39]:
if p > 0.05:
    print('Sample looks Gaussian (fail to reject H0/accept H0)')
else:
    print('Sample does not look Gaussian (reject H0)')

Sample looks Gaussian (fail to reject H0/accept H0)


#K-S test for two sample 

In [40]:
#import library
from scipy.stats import ks_2samp

In [41]:
data1 = [12, 15, 14, 10, 13]
data2 = [14, 13, 15, 12, 11]

In [42]:
data1

[12, 15, 14, 10, 13]

In [43]:
data2

[14, 13, 15, 12, 11]

In [44]:
# Perform the two-sample K-S test
result = ks_2samp(data1, data2)

In [45]:
print("Statistic:", result.statistic)
print("p-value:", result.pvalue)

Statistic: 0.2
p-value: 1.0


In [46]:
p=result.pvalue

In [47]:
if p > 0.05:
    print('Sample looks Gaussian (fail to reject H0/accept H0)')
else:
    print('Sample does not look Gaussian (reject H0)')

Sample looks Gaussian (fail to reject H0/accept H0)


# Anderson-Darling Test

The Anderson-Darling test is a statistical test used to determine if a sample of data comes from a specific distribution. It is a more powerful alternative to the Kolmogorov-Smirnov test, particularly for detecting deviations in the tails of the distribution.

The test statistic A*A
  measures the deviation of the empirical cumulative distribution function (EDF) from the specified cumulative distribution function (CDF).

Null Hypothesis (H0): The sample comes from the specified distribution.

Alternative Hypothesis (H1): The sample does not come from the specified distribution.

Interpreting the Results :

Compare the test statistic to the critical values at various significance levels.

If the test statistic is greater than a critical value, the null hypothesis is rejected at that significance level, indicating that the sample does not come from the specified distribution.

If the test statistic is less than all the critical values, there is no evidence to reject the null hypothesis, suggesting that the sample comes from the specified distribution.

In [48]:
#import library
import numpy as np
from scipy.stats import anderson

In [50]:
data=np.random.sample(20)

In [59]:
np.round(data,2)

array([0.88, 0.32, 0.91, 0.2 , 0.3 , 0.95, 0.1 , 0.81, 0.98, 0.19, 0.  ,
       0.65, 0.15, 0.56, 0.63, 0.22, 0.66, 0.34, 0.05, 0.29])

In [60]:
data1 = [12, 15, 14, 10, 13, 14, 11, 13, 12, 15]

In [61]:
# Perform the Anderson-Darling test for normality
result = anderson(data, dist='norm')

In [63]:
print("Statistic:", result.statistic)
print("Critical Values:", result.critical_values)
print("Significance Levels:", result.significance_level)

Statistic: 0.666216528861181
Critical Values: [0.506 0.577 0.692 0.807 0.96 ]
Significance Levels: [15.  10.   5.   2.5  1. ]


In [68]:
for cv, sl in zip(result.critical_values, result.significance_level):
    if result.statistic > cv:
        print(f"At the {sl}% significance level, the null hypothesis is rejected.")
    else:
        print(f"At the {sl}% significance level, the null hypothesis is not rejected.")

At the 15.0% significance level, the null hypothesis is rejected.
At the 10.0% significance level, the null hypothesis is rejected.
At the 5.0% significance level, the null hypothesis is not rejected.
At the 2.5% significance level, the null hypothesis is not rejected.
At the 1.0% significance level, the null hypothesis is not rejected.
