<a href="https://colab.research.google.com/github/AnjulaMehto/Sampling/blob/main/CheckGoodnessofaSample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Check Goodness of a Sample

In [None]:
import numpy as np
import scipy.stats as stats

def check_goodness_of_sample(sample):
    # Calculate central tendency measures
    sample_mean = np.mean(sample)
    sample_median = np.median(sample)
    sample_mode = stats.mode(sample).mode[0]
    
    # Calculate variability measures
    sample_range = np.ptp(sample)
    sample_var = np.var(sample)
    sample_std = np.std(sample)
    
    # Calculate skewness and kurtosis
    sample_skewness = stats.skew(sample)
    sample_kurtosis = stats.kurtosis(sample)
    
    # Calculate confidence intervals
    sample_confidence_interval = stats.t.interval(alpha=0.95, df=len(sample) - 1, scale=sample_std/np.sqrt(len(sample)), loc=sample_mean)
    
    # Print results
    print("Sample Mean: ", sample_mean)
    print("Sample Median: ", sample_median)
    print("Sample Mode: ", sample_mode)
    print("Sample Range: ", sample_range)
    print("Sample Variance: ", sample_var)
    print("Sample Standard Deviation: ", sample_std)
    print("Sample Skewness: ", sample_skewness)
    print("Sample Kurtosis: ", sample_kurtosis)
    print("Sample Confidence Interval: ", sample_confidence_interval)

# Example usage
sample = [50, 49, 50, 51, 49, 50, 51, 50, 49, 51]
check_goodness_of_sample(sample)

Sample Mean:  50.0
Sample Median:  50.0
Sample Mode:  50
Sample Range:  2
Sample Variance:  0.6
Sample Standard Deviation:  0.7745966692414834
Sample Skewness:  0.0
Sample Kurtosis:  -1.3333333333333333
Sample Confidence Interval:  (49.44588692333024, 50.55411307666976)


# Normality Test to Check the Goodness of a Sample

# Shapiro-Wilk test
#####The Shapiro-Wilk test is a commonly used statistical test to check the normality of a data set. It can be performed using the scipy library in Python. 
##### The null hypothesis of the Shapiro-Wilk test is that the data set is normally distributed.
#####The p-value can be used to determine if the null hypothesis can be rejected or not. If the p-value is less than the significance level (usually 0.05), the null hypothesis can be rejected, and it can be concluded that the data set is not normally distributed. If the p-value is greater than the significance level, the null hypothesis cannot be rejected, and it can be concluded that the data set is normally distributed.

#### Note that the Shapiro-Wilk test is sensitive to sample size and may not be suitable for large datasets. In such cases, other normality tests such as the Anderson-Darling or Lilliefors tests may be used.

In [1]:
import scipy.stats as stats
data = [1,2,5,6,2,4,3,3]
stat, p = stats.shapiro(data)
print('Statistics=%.3f, p=%.3f' % (stat, p))

Statistics=0.959, p=0.801


# Anderson-Darling Test

#####The Anderson-Darling test is another test for normality, and is considered to be one of the most sensitive tests.
##### It is a more powerful test than the Shapiro-Wilk test, and can detect deviations from normality in the tails of the distribution. The test statistic of the Anderson-Darling test is based on the weighted differences between the observed cumulative distribution function and the expected cumulative distribution function.
#####The Anderson-Darling test in python can be implemented using the scipy.stats library. 

In [4]:
import numpy as np
from scipy.stats import anderson

data = [1,2,5,6,2,4,3,3]
result = anderson(data)
print("Anderson-Darling test statistic: ", result.statistic)

for i in range(len(result.critical_values)):
    sl, cv = result.significance_level[i], result.critical_values[i]
    if result.statistic < result.critical_values[i]:
        print("Sample looks normal with significance level: ", sl)
    else:
        print("Sample does not look normal with significance level: ", sl)

Anderson-Darling test statistic:  0.22455957924563208
Sample looks normal with significance level:  15.0
Sample looks normal with significance level:  10.0
Sample looks normal with significance level:  5.0
Sample looks normal with significance level:  2.5
Sample looks normal with significance level:  1.0
