<font size=7> Test Statistics

# <font color="peru">Summary

This notebook implements T-tests on the data collected from students who took one of the two classes.  This is for an education paper being submitted to eNeuro. The reviewers asked us to perform some statistical analysis on our survey data. An example of a survey question from the Math of Mind class is shown below. 



<div><img src="example_student_response.png" width="500"/></div>

We are implementing a 1-sample T-Test

The question we are attempting to answer is, "In which survey questions are student responses statistically different from neutral?" To get analyzable distributions, we label student responses, "Strongly Disagree" to "Strongly Agree," from -2 to 2. We then do a T-test to show significance. For the T-test, we must compare the mean and standard deviation of the survey data to a "null" distribution. I will define the null to have mean 0, and a standard deviation that's the same as the corresponding survey data. This approach is somewhat hand-wavy. For this paper it's probably sufficient though.

**References**
* [T-Test wiki](https://en.wikipedia.org/wiki/Student%27s_t-test)
* [1 sample T-test description](https://www.jmp.com/en_us/statistics-knowledge-portal/t-test/one-sample-t-test.html)
* [Wilcox wiki](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test)
* [Wilcox Video Tutorial](https://www.youtube.com/watch?v=PGiXtngX3YQ)

# <font color="blue">Statistics Function

import packages for analysis

In [2]:
# import packages for analysis
from scipy import stats
import numpy as np
from matplotlib import pyplot as plt

Create function for running statistics on datasets

In [3]:
# wrtie a function that does a T-test
def get_statistics(data, questions):
    for i in range(len(questions)):

        # format the data from a questions so that we can run statistics on it
        survey_range = np.array([-2,-1,0,1,2])   #. create an array of numbers from -2 to 2, representing the survey responses
        if len(data[i]) == 4:                    # Some surveys only have 4 possibilities, so we need to remove 0 from the range
            survey_range = np.array([-2,-1,1,2])
        data_i = np.repeat(survey_range, data[i])   # create data for stats by repeating the numbers in survery_range by the corresponding results

        # print results from statistics
        print(questions[i])
        print("Mean, Median, Mode : ", f"[{np.mean(data_i):0.3f}, {np.median(data_i):0.3f}, {stats.mode(data_i)[0]:0.3f} ]")
        print("SD, Var, SE : ", f"[ {np.std(data_i):0.3f}, {np.var(data_i):0.3f}, {np.std(data_i)/np.sqrt(len(data_i)):0.3f}  " )
        print("Range, Skew, Kurtosis : ", f"[ {np.max(data_i) - np.min(data_i):0.3f}, {stats.skew(data_i):0.3f}, {stats.kurtosis(data_i):0.3f} ]")
        print("T-Test: ", stats.ttest_1samp(data_i, 0))
        print("Wilcox: ", stats.wilcoxon( data[i], zero_method='wilcox', correction=False))
        print("")


function to summarize statisical results across surveys

In [4]:
def get_summary(data, questions):
    sig_t_test = 0 
    sig_wilcox = 0
    non_sig_questions = []
    
    for i in range(len(questions)):
        # format the data from a questions so that we can run statistics on it
        survey_range = np.array([-2,-1,0,1,2])   #. create an array of numbers from -2 to 2, representing the survey responses
        if len(data[i]) == 4:                    # Some surveys only have 4 possibilities, so we need to remove 0 from the range
            survey_range = np.array([-2,-1,1,2])
        data_i = np.repeat(survey_range, data[i])   # create data for stats by repeating the numbers in survery_range by the corresponding results

        # Find out how many questions are significant via t-test and wilcox, and which ones are not
        if stats.ttest_1samp(data_i, 0)[1]< 0.05:
            sig_t_test += 1
        else:
            non_sig_questions.append(questions[i])
        if stats.wilcoxon( data[i], zero_method='wilcox', correction=False)[1] < 0.10:
            sig_wilcox += 1
        
    # print results from statistics
    print("T-Test: % Significant (p-val=0.05) ---", sig_t_test/len(questions) ) 
    print("Wilcox: % Significant (p-val=0.10) ---", sig_wilcox/len(questions) )
    print("Non-Significant Questions (T-Test): ", non_sig_questions)
    print("")

# <font color="orchid">  Run Statistics

Here we run our statistics on the survey data for each figure.

In [11]:
print("Total number of survey qeustions asked:", 11+16+6+16)
print("All Data, T-Test: % Significant (p-val=0.05) --- ", (11+16*.88+6*0.6666+16)/(11+16+6+16))

Total number of survey qeustions asked: 49
All Data, T-Test: % Significant (p-val=0.05) ---  0.9199918367346939


## <font color="orchid"> Figure 2 Data

In [4]:
questions_fig2 = ["I enjoyed learning about organoids", "Performing a remote microscopy experiment was interesting",
                      "The experiment selected helped solidify concepts discussed in class", "I felt that performing remote microscopy allowed me to use novel and complex experimental models",
                      "Performing remote experiments allowed me to multitask more than in person labs", "I enjoyed performing these experiments", 
                      "Data analysis was straightforward", "Performing this experiment made me want to learn more about organoids",
                      "Performing this experiment made me want to lear more about stem cells", "I would recommend this experiment to my peers",
                      "I would recommend this course to my peers"]

survey_results_fig2 = np.array([ [5,5,0,0,0], [4,4,1,0,0], [5,3,2,0,0], [5,3,1,0,0], [5,0,4,1,0], [5,2,3,0,0], [4,6,0,0,0], [6,2,2,0,0], [4,6,0,0,0], [7,2,1,0,0], [9,1,0,0,0] ])

In [5]:
get_summary( survey_results_fig2, questions_fig2)

T-Test: % Significant (p-val=0.05) --- 1.0
Wilcox: % Significant (p-val=0.10) --- 0.0
Non-Significant Questions (T-Test):  []





In [6]:
get_statistics( survey_results_fig2, questions_fig2)

I enjoyed learning about organoids
Mean, Median, Mode :  [-1.500, -1.500, -2.000 ]
SD, Var, SE :  [ 0.500, 0.250, 0.158  
Range, Skew, Kurtosis :  [ 1.000, 0.000, -2.000 ]
T-Test:  TtestResult(statistic=-8.999999999999998, pvalue=8.538051223166285e-06, df=9)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.15729920705028502)

Performing a remote microscopy experiment was interesting
Mean, Median, Mode :  [-1.333, -1.000, -2.000 ]
SD, Var, SE :  [ 0.667, 0.444, 0.222  
Range, Skew, Kurtosis :  [ 2.000, 0.500, -0.750 ]
T-Test:  TtestResult(statistic=-5.65685424949238, pvalue=0.0004776140575940057, df=8)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.10247043485974937)

The experiment selected helped solidify concepts discussed in class
Mean, Median, Mode :  [-1.300, -1.500, -2.000 ]
SD, Var, SE :  [ 0.781, 0.610, 0.247  
Range, Skew, Kurtosis :  [ 2.000, 0.579, -1.136 ]
T-Test:  TtestResult(statistic=-4.993438317382942, pvalue=0.000745618060127258, df=9)
Wilcox:  WilcoxonResult(statistic

## <font color="orchid"> Figure 3 Data

In [6]:
questions_fig3 = ["What is your previous experience in Mathematics?", "What is your previous experience in computer programming?",
                  "What is your previous experience in stem cell biology?", "What is your previous experience in neuroscience?",
                        
                 "I am interested in the application of mathematics in biological processes", "This course requires me to think at a deep level or use critical thinking",
                 "This course enables me to develop my skills examining questions that matter beyond the classroom", "I am interested in learning about stem cells",
                 "I am interested in learning about neuroscience", "I am interested in learning about organoids and connectoids", 
                 "I am comfortable using WetAi in the classroom setting", "I am comfortable using Jupyter notebooks in the classroom setting",
                 "I am comfortable performing multielectrode array (MEA) experiments", "I am comfortable identifying burst signals in a multi electrode array (MA)",
                 "I am interested in learning more about Internet-enabled technologies in the lab", "I am comfortable reading neuroscience literature, in order to develop my own hypothesis for an experiment" 
                 ]

survey_results_fig3 = [ [5,10,3,0], [7,8,3,0], [1,5,6,6], [0,1,11,6],
    [12,5,1,0,0], [14,3,0,1,0], [11,4,3,0,0],  [8,6,4,0,0], [14,3,1,0,0], [11,5,2,0,0], [7,5,5,1,0], [11,2,4,1,0], [1,7,6,3,1], [2,10,5,1,0], [12,6,0,0,0], [5,6,5,1,1] ]

In [7]:
get_summary( survey_results_fig3, questions_fig3)

T-Test: % Significant (p-val=0.05) --- 0.875
Wilcox: % Significant (p-val=0.10) --- 0.3125
Non-Significant Questions (T-Test):  ['What is your previous experience in stem cell biology?', 'I am comfortable performing multielectrode array (MEA) experiments']





In [9]:
get_statistics( survey_results_fig3, questions_fig3)

What is your previous experience in Mathematics?
Mean, Median, Mode :  [-0.944, -1.000, -1.000 ]
SD, Var, SE :  [ 0.970, 0.941, 0.229  
Range, Skew, Kurtosis :  [ 3.000, 0.984, 0.078 ]
T-Test:  TtestResult(statistic=-4.01350180282898, pvalue=0.0009004769977088531, df=17)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.10880943004054568)

What is your previous experience in computer programming?
Mean, Median, Mode :  [-1.056, -1.000, -1.000 ]
SD, Var, SE :  [ 1.026, 1.052, 0.242  
Range, Skew, Kurtosis :  [ 3.000, 1.037, -0.034 ]
T-Test:  TtestResult(statistic=-4.242295068554326, pvalue=0.0005491669226711871, df=17)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.10880943004054568)

What is your previous experience in stem cell biology?
Mean, Median, Mode :  [0.611, 1.000, 1.000 ]
SD, Var, SE :  [ 1.339, 1.793, 0.316  
Range, Skew, Kurtosis :  [ 4.000, -0.516, -1.227 ]
T-Test:  TtestResult(statistic=1.8816076913913078, pvalue=0.07712677433397398, df=17)
Wilcox:  WilcoxonResult(statistic

## <font color="orchid"> Figure 4 Data

In [4]:
questions_fig4 = ["I think I am capable and skillful at Mathematics", "Being a good mathematics student makes me feel that my classmates and teachers think more of me",
                  "My performance in mathematics largely depends on the methodology and empathy of the teachers", "In mathematics exams, I feel unsure, desperate and nervous",
                  "Mathematics is useful and necessary in all areas of life", "Mathematics is useful and necessary for a career in Biology" ]
survey_results_fig4 = np.array([ [1,10,5,2,0], [1,7,7,2,0], [2,6,3,5,2], [4,5,3,2,3], [11,4,0,3,0], [7,9,1,1,0]  ])

In [5]:
get_summary( survey_results_fig4, questions_fig4)

T-Test: % Significant (p-val=0.05) --- 0.6666666666666666
Wilcox: % Significant (p-val=0.10) --- 0.8333333333333334
Non-Significant Questions (T-Test):  ['My performance in mathematics largely depends on the methodology and empathy of the teachers', 'In mathematics exams, I feel unsure, desperate and nervous']





In [6]:
get_statistics( survey_results_fig4, questions_fig4)

I think I am capable and skillful at Mathematics
Mean, Median, Mode :  [-0.556, -1.000, -1.000 ]
SD, Var, SE :  [ 0.762, 0.580, 0.180  
Range, Skew, Kurtosis :  [ 3.000, 0.565, -0.206 ]
T-Test:  TtestResult(statistic=-3.0070838351282063, pvalue=0.007933939339956862, df=17)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.06788915486182899)

Being a good mathematics student makes me feel that my classmates and teachers think more of me
Mean, Median, Mode :  [-0.412, 0.000, -1.000 ]
SD, Var, SE :  [ 0.771, 0.595, 0.187  
Range, Skew, Kurtosis :  [ 3.000, 0.088, -0.451 ]
T-Test:  TtestResult(statistic=-2.1349799846564648, pvalue=0.04857488920688251, df=16)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.06559969214707187)

My performance in mathematics largely depends on the methodology and empathy of the teachers
Mean, Median, Mode :  [-0.056, 0.000, -1.000 ]
SD, Var, SE :  [ 1.223, 1.497, 0.288  
Range, Skew, Kurtosis :  [ 4.000, 0.106, -1.134 ]
T-Test:  TtestResult(statistic=-0.18722058

## <font color="orchid"> Figure 6 Data

In [7]:
questions_fig6 = [  "I enjoyed learning about organoids", "I enjoyed performing these experiments", 
                        "Performing a remote electrophysiology experiment was interesting", "The experiment selected helped solidify concepts discussed in class",
                        "I felt that performing remote electrophysiology allowed me to use novel and complex experimental models", "Performing remote experiments allowed me to multitask more than in person labs",
                        "Performing remote experiments allowed me to do projects that require complex training", "Performing remote experiments allowed me to study new areas beyond my academic program",
                        "Performing remote experiments allowed me to do projects that are not available for most students around the world", "After this course, I feel more comfortable reading and discussing mathematics literature",
                        
                        "Performing this experiment made me want to learn more about organoids", "Performing this experiment made me want to learn more about stem cells",
                        "I would consider applying for a job working with stem cell and neuroscience data", "I would consider further education in stem cell and neuroscience data",
                        "After this course, I feel more comfortable reading and discussing stem cell literature", "After this course, I feel more comfortable reading and discussing neuroscience literature" ]

survey_results_fig6 = np.array([ [16,7,1,0,0], [13,9,2,0,0], [14,8,2,0,0], [12,11,0,1,0], [16,4,4,0,0], [10,5,6,1,1], [11,10,3,0,0], [16,6,2,0,0], [20,3,0,0,0], [6,10,7,1,0],  
                                 [15,6,2,1,0], [15,5,3,1,0], [12,7,2,3,0], [10,11,2,1,0], [10,9,5,0,0], [11,11,2,0,0] ])



In [8]:
get_summary( survey_results_fig6, questions_fig6)

T-Test: % Significant (p-val=0.05) --- 1.0
Wilcox: % Significant (p-val=0.10) --- 0.375
Non-Significant Questions (T-Test):  []



In [5]:
get_statistics( survey_results_fig6, questions_fig6)

I enjoyed learning about organoids
Mean, Median, Mode :  [-1.625, -2.000, -2.000 ]
SD, Var, SE :  [ 0.564, 0.318, 0.115  
Range, Skew, Kurtosis :  [ 2.000, 1.200, 0.450 ]
T-Test:  TtestResult(statistic=-13.826204628394212, pvalue=1.2452221358497965e-12, df=23)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.10880943004054568)

I enjoyed performing these experiments
Mean, Median, Mode :  [-1.458, -2.000, -2.000 ]
SD, Var, SE :  [ 0.644, 0.415, 0.131  
Range, Skew, Kurtosis :  [ 2.000, 0.780, -0.444 ]
T-Test:  TtestResult(statistic=-10.857579347566515, pvalue=1.5833688686517335e-10, df=23)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=0.10880943004054568)

Performing a remote electrophysiology experiment was interesting
Mean, Median, Mode :  [-1.500, -2.000, -2.000 ]
SD, Var, SE :  [ 0.645, 0.417, 0.132  
Range, Skew, Kurtosis :  [ 2.000, 0.930, -0.240 ]
T-Test:  TtestResult(statistic=-11.144505372604026, pvalue=9.528315069947521e-11, df=23)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=

