<font size=7> T-Tests

<font color="red">

**To Do**
* Read wiki article on T-test
* Read one sample t-test article
* put in data for first example
* run T test

# Summary

This notebook implements T-tests on the data collected from students who took one of the two classes.  This is for an education paper being submitted to eNeuro. The reviewers asked us to perform some statistical analysis on our survey data. An example of a survey question from the Math of Mind class is shown below. 



<div><img src="example_student_response.png" width="500"/></div>

![fail](example_student_response.png)

We are implementing a 1-sample T-Test

The question we are attempting to answer is, "In which survey questions are student responses statistically different from neutral?" To get analyzable distributions, we label student responses, "Strongly Disagree" to "Strongly Agree," from -2 to 2. We then do a T-test to show significance. For the T-test, we must compare the mean and standard deviation of the survey data to a "null" distribution. I will define the null to have mean 0, and a standard deviation that's the same as the corresponding survey data. This approach is somewhat hand-wavy. For this paper it's probably sufficient though.

**References**
* [T-Test wiki](https://en.wikipedia.org/wiki/Student%27s_t-test)
* [1 sample T-test description](https://www.jmp.com/en_us/statistics-knowledge-portal/t-test/one-sample-t-test.html)

# <font color="orchid"> T-Tests

## <font color="orchid"> Set up Data and code

import packages for analysis

In [2]:
# import packages for analysis
from scipy import stats
import numpy as np
from matplotlib import pyplot as plt

set general parameters

In [4]:
num_students_math = 24 # number of students in the math of mind class
num_students_bio = 10 # number of students in the highschool class
survey_range = np.array([-2,-1,0,1,2]) # range of survey responses from "strongly disagree" to "strongly agree"

example data

In [5]:
questions = ["I am interested in the application of mathematics in biological processes", "fake data"]

survey_results = np.array([[16,6,2,0,0], [6,4,4,4,6]])

proccess data fro t-test

In [6]:
# create an array by repeating the numbers in survery_range by the corresponding number in survey_results
survey_data =[]
for i in range(len(survey_results)):
    survey_data.append( np.repeat(survey_range, survey_results[i]) )

function to run t-test

In [9]:
# wrtie a function that does a T-test
def t_test(data, questions):
    for i in range(len(questions)):
        print(questions[i])
        print("Mean: ", np.mean(data[i]))
        print("Standard Deviation: ", np.std(data[i]))
        print("Standard Error: ", np.std(data[i])/np.sqrt(len(data[i])))
        print("Median: ", np.median(data[i]))
        print("Mode: ", stats.mode(data[i])[0] )
        print("Range: ", np.max(data[i]) - np.min(data[i]))
        print("Variance: ", np.var(data[i]))
        print("Skewness: ", stats.skew(data[i]))
        print("Kurtosis: ", stats.kurtosis(data[i]))
        print("T-Test: ", stats.ttest_1samp(data[i], 0))
        print("Wilcox: ", stats.wilcoxon(survey_data[i], zero_method='wilcox', correction=False))
        print("")


## <font color="orchid">Run Code


run t-test

In [10]:
t_test(survey_data, questions)

I am interested in the application of mathematics in biological processes
Mean:  -1.5833333333333333
Standard Deviation:  0.6400954789890506
Standard Error:  0.13065894251546376
Median:  -2.0
Mode:  -2
Range:  2
Variance:  0.40972222222222215
Skewness:  1.2665836424236083
Kurtosis:  0.4119505889112327
T-Test:  TtestResult(statistic=-11.862917582084185, pvalue=2.778826173484555e-11, df=23)
P-Value:  2.778826173484555e-11

fake data
Mean:  0.0
Standard Deviation:  1.5275252316519468
Standard Error:  0.3118047822311618
Median:  0.0
Mode:  -2
Range:  4
Variance:  2.3333333333333335
Skewness:  0.0
Kurtosis:  -1.4693877551020411
T-Test:  TtestResult(statistic=0.0, pvalue=1.0, df=23)
P-Value:  1.0



# <font color="green"> Wilcoxon Rank Test

We run the wilcoxon signed rank test for one sample median 0, so that we have a non-parametric test.

**Refenrences**
* [Wilcox wiki](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test)
* [Video Tutorial](https://www.youtube.com/watch?v=PGiXtngX3YQ)

In [12]:
survey_data

[array([-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -1,
        -1, -1, -1, -1, -1,  0,  0]),
 array([-2, -2, -2, -2, -2, -2, -1, -1, -1, -1,  0,  0,  0,  0,  1,  1,  1,
         1,  2,  2,  2,  2,  2,  2])]

In [14]:
# run the wilcoxon signed rank test, one-sample, median 0,  on the survey_data
stats.wilcoxon(survey_data[1], zero_method='wilcox', correction=False)


WilcoxonResult(statistic=105.0, pvalue=1.0)