<font size=7> Test Statistics

<font color="red">

**To Do**




# <font color="peru">Summary

This notebook implements T-tests on the data collected from students who took one of the two classes.  This is for an education paper being submitted to eNeuro. The reviewers asked us to perform some statistical analysis on our survey data. An example of a survey question from the Math of Mind class is shown below. 



<div><img src="example_student_response.png" width="500"/></div>

We are implementing a 1-sample T-Test

The question we are attempting to answer is, "In which survey questions are student responses statistically different from neutral?" To get analyzable distributions, we label student responses, "Strongly Disagree" to "Strongly Agree," from -2 to 2. We then do a T-test to show significance. For the T-test, we must compare the mean and standard deviation of the survey data to a "null" distribution. I will define the null to have mean 0, and a standard deviation that's the same as the corresponding survey data. This approach is somewhat hand-wavy. For this paper it's probably sufficient though.

**References**
* [T-Test wiki](https://en.wikipedia.org/wiki/Student%27s_t-test)
* [1 sample T-test description](https://www.jmp.com/en_us/statistics-knowledge-portal/t-test/one-sample-t-test.html)
* [Wilcox wiki](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test)
* [Wilcox Video Tutorial](https://www.youtube.com/watch?v=PGiXtngX3YQ)

# <font color="orchid"> Set up Data 

import packages for analysis

In [1]:
# import packages for analysis
from scipy import stats
import numpy as np
from matplotlib import pyplot as plt

set general parameters

In [2]:
#num_students_math = 18 # number of students in the math of mind class
#num_students_bio = 10 # number of students in the highschool class
survey_range = np.array([-2,-1,0,1,2]) # range of survey responses from "strongly disagree" to "strongly agree"

**Figure 2 Data**

In [2]:
mom_questions_fig2 = ["I enjoyed learning about organoids", "Performing a remote microscopy experiment was interesting",
                      "The experiment selected helped solidify concepts discussed in class", "I felt that performing remote microscopy allowed me to use novel and complex experimental models",
                      "Performing remote experiments allowed me to multitask more than in person labs", "I enjoyed performing these experiments", 
                      "Data analysis was straightforward", "Performing this experiment made me want to learn more about organoids",
                      "Performing this experiment made me want to lear more about stem cells", "I would recommend this experiment to my peers",
                      "I would recommend this course to my peers"]

mom_survey_results_fig2 = np.array([ [5,5,0,0,0], [4,4,1,0,0], [5,3,2,0,0], [5,3,1,0,0], [5,0,4,1,0], [5,2,3,0,0], [4,6,0,0,0], [6,2,2,0,0], [4,6,0,0,0], [7,2,1,0,0], [9,1,0,0,0] ])

**Figure 3 Data**

In [4]:
mom_questions_fig3_1 = ["What is your previous experience in Mathematics?", "What is your previous experience in computer programming?",
                        "What is your previous experience in stem cell biology?", "What is your previous experience in neuroscience?"]
mom_survey_results_fig3_1 = np.array([ [5,10,3,0], [7,8,3,0], [1,5,6,6], [0,1,11,6] ])

mom_questions_fig3_2 = [ "I am interested in the application of mathematics in biological processes", "This course requires me to think at a deep level or use critical thinking",
                 "This course enables me to develop my skills examining questions that matter beyond the classroom", "I am interested in learning about stem cells",
                 "I am interested in learning about neuroscience", "I am interested in learning about organoids and connectoids", 
                 "I am comfortable using WetAi in the classroom setting", "I am comfortable using Jupyter notebooks in the classroom setting",
                 "I am comfortable performing multielectrode array (MEA) experiments", "I am comfortable identifying burst signals in a multi electrode array (MA)",
                 "I am interested in learning more about Internet-enabled technologies in the lab", "I am comfortable reading neuroscience literature, in order to develop my own hypothesis for an experiment" 
                 ] #, "fake data"]

mom_survey_results_fig3_2 = np.array([ [12,5,1,0,0], [14,3,0,1,0], [11,4,3,0,0],  [8,6,4,0,0], [14,3,1,0,0], [11,5,2,0,0], [7,5,5,1,0], [11,2,4,1,0], [1,7,6,3,1], [2,10,5,1,0], [12,6,0,0,0], [5,6,5,1,1] ] )#, [6,4,4,4,6]])

**Figure 4 Data**

In [6]:
mom_questions_fig4 = ["I think I am capable and skillful at Mathematics", "Being a good mathematics student makes me feel that my classmates and teachers think more of me",
                  "My performance in mathematics largely depends on the methodology and empathy of the teachers", "In mathematics exams, I feel unsure, desperate and nervous",
                  "Mathematics is useful and necessary in all areas of life", "Mathematics is useful and necessary for a career in Biology" ]
mom_survey_results_fig4 = np.array([ [1,10,5,2,0], [1,7,7,2,0], [2,6,3,5,2], [4,5,3,2,3], [11,4,0,3,0], [7,9,1,1,0]  ])

**Figure 6 Data**

In [1]:
mom_questions_fig6 = [  "I enjoyed learning about organoids", "I enjoyed performing these experiments", 
                        "Performing a remote electrophysiology experiment was interesting", "The experiment selected helped solidify concepts discussed in class",
                        "I felt that performing remote electrophysiology allowed me to use novel and complex experimental models", "Performing remote experiments allowed me to multitask more than in person labs",
                        "Performing remote experiments allowed me to do projects that require complex training", "Performing remote experiments allowed me to study new areas beyond my academic program",
                        "Performing remote experiments allowed me to do projects that are not available for most students around the world", "Data analysis was straightforward",
                        
                        "Performing this experiment made me want to learn more about organoids", "Performing this experiment made me want to lear more about stem cells",
                        "I would recommend this experiment to my peers", "I would consider applying for a job working with stem cell and neuroscience data",
                        "I would consider further education in stem cell and neuroscience data", "After this course, I feel more comfortable reading and discussing stem cell literature",
                        "After this course, I feel more comfortable reading and discussing neuroscience literature", "After this course, I feel more comfortable reading and discussing mathematics literature"]

mom_survey_results_fig6 = np.array([ [16,7,1,0,0], [13,9,2,0,0], [14,8,2,0,0], [12,11,0,1,0], [16,4,4,0,0], [10,5,6,1,1], [11,10,3,0,0], [16,6,2,0,0], [20,3,0,0,0],  [5,7,6,3,3],
                                      [15,6,2,1,0], [15,5,3,1,0], [15,8,1,0,0], [12,7,2,3,0], [10,11,2,1,0], [10,9,5,0,0], [11,11,2,0,0], [6,10,7,1,0]  ])

NameError: name 'np' is not defined

# <font color="blue">Run Statistics

proccess data fro t-test

In [25]:
# # create an array by repeating the numbers in survery_range by the corresponding number in survey_results
# survey_data =[]
# for i in range(len(survey_results)):
#     survey_data.append( np.repeat(survey_range, survey_results[i]) )

function to run t-test

We run the wilcoxon signed rank test for one sample median 0, so that we have a non-parametric test.

In [5]:
i=0

In [3]:
questions = mom_questions_fig2
data = mom_survey_results_fig2 

In [None]:
data_i

5

In [7]:
survey_range = np.array([-2,-1,0,1,2])   #. create an array of numbers from -2 to 2, representing the survey responses
if len(data[i]) == 4:                    # Some surveys only have 4 possibilities, so we need to remove 0 from the range
    survey_range = np.array([-2,-1,1,2])
data_i = np.repeat(survey_range, survey_results[i])   # create data for stats by repeating the numbers in survery_range by the corresponding results


In [10]:
survey_range

array([-2, -1,  0,  1,  2])

In [36]:
# wrtie a function that does a T-test
def get_statistics(data, questions):
    for i in range(len(questions)):

        # format the data from a questions so that we can run statistics on it
        survey_range = np.array([-2,-1,0,1,2])   #. create an array of numbers from -2 to 2, representing the survey responses
        if len(data[i]) == 4:                    # Some surveys only have 4 possibilities, so we need to remove 0 from the range
            survey_range = np.array([-2,-1,1,2])
        data_i = np.repeat(survey_range, survey_results[i])   # create data for stats by repeating the numbers in survery_range by the corresponding results

        # print results from statistics
        print(questions[i])
        print("Mean, Median, Mode : ", f"[{np.mean(data_i):0.3f}, {np.median(data_i):0.3f}, {stats.mode(data_i)[0]:0.3f} ]")
        print("SD, Var, SE : ", f"[ {np.std(data_i):0.3f}, {np.var(data_i):0.3f}, {np.std(data_i)/np.sqrt(len(data_i)):0.3f}  " )
        print("Range, Skew, Kurtosis : ", f"[ {np.max(data_i) - np.min(data_i):0.3f}, {stats.skew(data_i):0.3f}, {stats.kurtosis(data_i):0.3f} ]")
        print("T-Test: ", stats.ttest_1samp(data_i, 0))
        print("Wilcox: ", stats.wilcoxon( data[i], zero_method='wilcox', correction=False))
        print("")


## <font color="orchid">Run Code


run t-test

In [37]:
get_statistics(survey_data, questions)

I am interested in the application of mathematics in biological processes
Mean, Median, Mode :  [-1.583, -2.000, -2.000 ]
SD, Var, SE :  [ 0.640, 0.410, 0.131  
Range, Skew, Kurtosis :  [ 2.000, 1.267, 0.412 ]
T-Test:  TtestResult(statistic=-11.862917582084185, pvalue=2.778826173484555e-11, df=23)
Wilcox:  WilcoxonResult(statistic=0.0, pvalue=1.594711030217795e-05)



