We import the libs to analyze the dataset

In [28]:
import numpy as np              #import numpy with the alias np
import matplotlib.pyplot as plt #import matplotlib.pyplot with the alias plt
import pandas as pd             #import pandas with the alias pd
import seaborn as sns           #import seaborn with the alias sns
import scipy.stats as ss

We read the CSV then we calculate the mean and CI's 'manually'

1. Choose a feature.
2. Iterate over classes
3. Compute mean for that class and feature
4. Compute standard deviation for that class and feature
5. Iterate over alpha values.
6. Compute t-critical value using ppf function.
7. Compute standard error for the mean.
8. Compute confidence intervals

In [29]:
iris = pd.read_csv("iris.csv")

species = iris['species'].unique()
confidenceLevels = [0.95, 0.99, 0.999]

for s in species:
    print("Specie: " + s)
    petal_length = iris[iris['species']==s]['petal_length']
    mean = np.mean(petal_length)
    standard = np.std(petal_length)

    print("mean -> " + str(mean))
    print("std -> " + str(standard))
    
    for confidence in confidenceLevels:

        #We want the positive value
        t = ss.t.ppf((1 + confidence) / 2, len(petal_length) - 1)

        sem = standard / np.sqrt(len(petal_length))
        ci_lower = mean - t * sem
        ci_upper = mean + t * sem

        print("ci " + str(confidence) + " -> (", str(ci_lower) + ", " + str(ci_upper))
print("\n")

Specie: Iris-setosa
mean -> 1.464
std -> 0.17176728442867112
ci 0.95 -> ( 1.415184277888331, 1.512815722111669
ci 0.99 -> ( 1.3988997796149678, 1.5291002203850321
ci 0.999 -> ( 1.378968773352899, 1.549031226647101
Specie: Iris-versicolor
mean -> 4.26
std -> 0.4651881339845203
ci 0.95 -> ( 4.1277949951076724, 4.392205004892327
ci 0.99 -> ( 4.08369256087605, 4.43630743912395
ci 0.999 -> ( 4.02971448442031, 4.490285515579689
Specie: Iris-virginica
mean -> 5.5520000000000005
std -> 0.546347874526844
ci 0.95 -> ( 5.396729652052696, 5.707270347947305
ci 0.99 -> ( 5.344932820956594, 5.759067179043407
ci 0.999 -> ( 5.2815373790091815, 5.8224626209908195




Now we want to calculate the same using stats.t.interval()

In [30]:
confidenceLevels = [0.05, 0.01, 0.001]

for s in species:
    print("Specie: " + s)
    petal_length = iris[iris['species']==s]['petal_length']
    mean = np.mean(petal_length)
    standard = np.std(petal_length)

    print("mean -> " + str(mean))
    print("std -> " + str(standard))
    
    for conf in confidenceLevels:

        sem = ss.sem(petal_length)

        ci = ss.t.interval(
            confidence = 1 - conf,
            df = len(petal_length) - 1,
            loc = mean,
            scale = sem
        )
        

        print("ci " + str(confidence) + " -> (", str(ci[0]) + ", " + str(ci[1]))
print("\n")

Specie: Iris-setosa
mean -> 1.464
std -> 0.17176728442867112
ci 0.999 -> ( 1.414688674094744, 1.513311325905256
ci 0.999 -> ( 1.39823884672715, 1.52976115327285
ci 0.999 -> ( 1.378105490036036, 1.5498945099639647
Specie: Iris-versicolor
mean -> 4.26
std -> 0.4651881339845203
ci 0.999 -> ( 4.126452777905478, 4.393547222094521
ci 0.999 -> ( 4.081902591745458, 4.438097408254541
ci 0.999 -> ( 4.027376500463661, 4.492623499536341
Specie: Iris-virginica
mean -> 5.5520000000000005
std -> 0.546347874526844
ci 0.999 -> ( 5.395153262927524, 5.708846737072477
ci 0.999 -> ( 5.342830562196055, 5.761169437803946
ci 0.999 -> ( 5.278791495199867, 5.825208504800137




Write the code to conduct the following hypothesis tests, using the Shapiro-Wilk test and the Anderson 
Darling test, for all the features K and classes J. 

• Null hypothesis  : Feature K from class J comes from a Gaussian distribution at the significance 
level a 
Note: use shapiro( ) and anderson( ) functions from SciPy.stats. 



 
For  each  test  complete  the  corresponding  table  with  the  decisions  (acceptance/rejection)  for  the  null 
hypothesis H0 (feature Gaussianity), and the p-value or the critical and statistic values, respectively, for a = 
0,05 and a = 0,01 
Explain the meaning of the p-value / critical value and interpret the results accordingly. 
Table for Shapiro-Wilk test