# Application of Statistics: Research studies

# I. Inferential Statistics: t-test

![T-Table.PNG](attachment:T-Table.PNG)

In [165]:
import numpy as np

# Function that returns a t-statistic value

def ttest(data1, data2, t_critical, ttest_type):
    
    if ttest_type == 'paired':
        
        # Mean of the data
        m1 = np.mean(data1)
        m2 = np.mean(data2)
        
        # Mean Difference between the two groups
        difference = [val1-val2 for val1, val2 in zip(data1, data2)]
        mean_difference = np.mean(difference)
        
        # Standard error of the calculated difference
        std_err = np.std(difference, ddof=1)/np.sqrt(len(difference))
        
        # t-statistic value for paired t-test
        dof = len(difference) - 1
        t_stat = np.abs(mean_difference/std_err)
        
    elif ttest_type == 'independent':
        
        # Mean of the given data
        m1 = np.mean(data1)
        m2 = np.mean(data2)
        
        # Mean Difference between the two groups
        difference = [val1-val2 for val1, val2 in zip(data1, data2)]
        mean_difference = np.mean(difference)
        
        # Standard error of the given data
        std_err1 = np.std(data1, ddof=1)
        std_err2 = np.std(data2, ddof=1)
        
        # t-statistic value for independent t-test
        dof = len(data1) + len(data2) - 2
        t_stat = np.abs(mean_difference/np.sqrt(std_err1**2/len(data1)+std_err2**2/len(data2)))        
        
    print('| Calculated t-statistic: %.3f |' %t_stat)
    
    if t_stat >= t_critical:
        print("\nt-critical: {}\nDegrees of freedom: {}\nError possibility, if any: Type I error\n\nNull hypothesis can be rejected as we have sufficient evidence to support our claim".format(t_critical, dof))
    
    else:
        print("\nt-critical: {}\nDegrees of freedom: {}\nError possibility, if any: Type II error\n\nNull hypothesis cannot be rejected as we don't have sufficient evidence to support our claim".format(t_critical, dof))

# Independent samples t-test: Problem 1

* A research study was conducted to examine the differences between older and younger adults on perceived life satisfaction. A pilot study was conducted to examine this hypothesis. Ten older adults (over the age of 70) and ten younger adults (between 20 and 30) were give a life satisfaction test (known to have high reliability and validity). Scores on the measure range from 0 to 60 with high scores indicative of high life satisfaction; low scores indicative of low life satisfaction. The data are presented below. Compute the appropriate t-test. 


* Null and Alternate Hypothesis,
   
   Ho: There is no significant difference between older_adults and young_adults on life satisfaction

   Ha: There is significant difference between older_adults and young_adults life satisfaction

   Level of significance: 5% (or 95% confidence interval)

In [166]:
older_adults = [45,38,52,48,25,39,51,46,55,46]
young_adults = [34,22,15,27,37,41,24,19,26,36]
t_crit = 2.262

ttest(older_adults, young_adults, t_crit, 'independent')

| Calculated t-statistic: 4.258 |

t-critical: 2.262
Degrees of freedom: 18
Error possibility, if any: Type I error

Null hypothesis can be rejected as we have sufficient evidence to support our claim


# Independent samples t-test: Problem 2

* Researchers want to examine the effect of perceived control on health complaints of geriatric patients in a long-term care facility. Thirty patients are randomly selected to participate in the study. Half are given a plant to care for and half are given a plant but the care is conducted by the staff. Number of health complaints are recorded for each patient over the following seven days. Compute the appropriate t-test for the data provided below.


* Ho: Control over the plant doesn't have an impact on the number of health complaints

   Ha: Control over the plant has an impact on the number of health complaints

   Level of significance: 5% (or 95% confidence interval)

In [167]:
control = [23,12,6,15,18,5,21,18,34,10,23,14,19,23,8]
no_control = [35,21,26,24,17,23,37,22,16,38,23,41,27,24,32]
t_crit = 2.101

ttest(control, no_control, t_crit, 'independent')

| Calculated t-statistic: 3.691 |

t-critical: 2.101
Degrees of freedom: 28
Error possibility, if any: Type I error

Null hypothesis can be rejected as we have sufficient evidence to support our claim


# Paired t-test: Problem 1

* A researcher hypothesizes that electrical stimulation of the lateral habenula will result in a decrease in food intake (in this case, chocolate chips) in rats. Rats undergo stereotaxic surgery and an electrode is implanted in the right lateral habenula. Following a ten day recovery period, rats (kept at 80 percent body weight) are tested for the number of chocolate chips consumed during a 10 minute period of time both with and without electrical stimulation. The testing conditions are counter balanced. Compute the appropriate t-test for the data provided below.


* Null and Alternate Hypothesis,
   
   Ho: There is no significant difference in amount of food intake

   Ha: There is a significant difference in amount of food intake

   Level of significance: 5% (or 95% confidence interval)

In [168]:
stimulation = [12,7,3,11,8,5,14,7,9,10]
no_stimulation = [8,7,4,14,6,7,12,5,5,8]
t_crit = 2.048

ttest(stimulation, no_stimulation, t_crit, 'paired')

| Calculated t-statistic: 1.316 |

t-critical: 2.048
Degrees of freedom: 9
Error possibility, if any: Type II error

Null hypothesis cannot be rejected as we don't have sufficient evidence to support our claim


# Paired t-test: Problem 2

* Sleep researchers decide to test the impact of REM sleep deprivation on a computerized assembly line task. Subjects are required to participate in two nights of testing. On the nights of testing EEG, EMG, EOG measures are taken. On each night of testing the subject is allowed a total of four hours of sleep. However, on one of the nights, the subject is awakened immediately upon achieving REM sleep. On the alternate night, subjects are randomly awakened at various times throughout the 4 hour total sleep session. Testing conditions are counterbalanced so that half of the subject experience REM deprivation on the first night of testing and half experience REM deprivation on the second night of testing. Each subject after the sleep session is required to complete a computerized assembly line task. The task involves five rows of widgets slowly passing across the computer screen. Randomly placed on a one/five ratio are widgets missing a component that must be "fixed" by the subject. Number of missed widgets is recorded. Compute the appropriate t-test for the data provided below.



* Null and Alternate Hypothesis,
   
   Ho: There is no significant impact of rem deprivation on subject's performance

   Ha: There is a significant impact of rem deprivation on subject's performance

   Level of significance: 5% (or 95% confidence interval)

In [169]:
rem_deprived = [26,15,8,44,26,13,38,24,17,29]
normal_condition = [20,4,9,36,20,3,25,10,6,14]
t_crit = 2.101

ttest(rem_deprived, normal_condition, t_crit, 'paired')

| Calculated t-statistic: 6.176 |

t-critical: 2.101
Degrees of freedom: 9
Error possibility, if any: Type I error

Null hypothesis can be rejected as we have sufficient evidence to support our claim


# II. Inferential Statistics: Anova test

![f-table.PNG](attachment:f-table.PNG)

In [174]:
def anova(f_critical,*args):
    
    ngroups = 0
    group_means = []
    dif_squared = []
    
    # Calculating ssb and ssw
    
    for treatment_group in args:
        
        ngroups+=1
        group_means.append(np.mean(treatment_group))
        
        for val in treatment_group:
            
            dif_squared.append((np.mean(treatment_group)-val)**2)
       
    # Sum of squared within groups
    ssw = np.sum(dif_squared)
    
    # Degrees of freedom
    dofw = len(dif_squared)-ngroups # n_group_values-ngroups
    
    # Sum of squared between groups
    ssb = len(treatment_group)*np.sum([(np.mean(group_means)-mean)**2 for mean in group_means])
    
    # Degrees of freedom
    dofb = ngroups-1 # n_means-1 
    
    f_statistic = (ssb/dofb)/(ssw/dofw)

    print('| Calculated f-statistic: %.3f |' %f_statistic)
    
    # Hypothesis testing
    
    if f_statistic >= f_critical:
        print("\nf-critical: {}\ndof within: {} | dof between: {}\nError possibility, if any: Type I error\n\nNull hypothesis can be rejected as we have sufficient evidence to support our claim".format(f_critical, dofw, dofb))
    
    else:
        print("\nf-critical: {}\ndof within: {} | dof between: {} \nError possibility, if any: Type II error\n\nNull hypothesis cannot be rejected as we don't have sufficient evidence to support our claim".format(f_critical, dofw, dofb))

# One way anova: Problem 1

* A research study was conducted to examine the clinical efficacy of a new antidepressant. Depressed patients were randomly assigned to one of three groups: a placebo group, a group that received a low dose of the drug, and a group that received a moderate dose of the drug. After four weeks of treatment, the patients completed the Beck Depression Inventory. The higher the score, the more depressed the patient. The data are presented below. Compute the appropriate test.


* Ho: The drug's effectiveness is not better than the placebo drug

   Ha: The drug's effectiveness is better than the placebo drug
   
   Level of significance: 1% (or 99% confidence interval)

In [171]:
f_crit = 6.93
anova(f_crit, [38,47,39,25,42],[22,19,8,23,31],[14,26,11,18,5])

| Calculated f-statistic: 11.267 |

f-critical: 6.93
dof within: 12 | dof between: 2
Error possibility, if any: Type I error

Null hypothesis can be rejected as we have sufficient evidence to support our claim


# One way anova: Problem 2

* A researcher is concerned about the level of knowledge possessed by university students regarding United States history. Students completed a high school senior level standardized U.S. history exam. Major for students was also recorded. Data in terms of percent correct is recorded below for 32 students. Compute the appropriate test for the data provided below.


* Ho: There is no difference in scores between students with different major

   Ha: There is a difference in scores between the students with different major
   
   Level of significance: 1% (or 99% confidence interval)

In [172]:
f_crit = 4.57
anova(f_crit, [62,81,75,58,67,48,26,36],[72,49,63,68,39,79,40,15],[42,52,31,80,22,71,68,76],[80,57,87,64,28,29,62,45])

| Calculated f-statistic: 0.048 |

f-critical: 4.57
dof within: 28 | dof between: 3 
Error possibility, if any: Type II error

Null hypothesis cannot be rejected as we don't have sufficient evidence to support our claim


# One way anova: Problem 3

* Neuroscience researchers examined the impact of environment on rat development. Rats were randomly assigned to be raised in one of the four following test conditions: Impoverished (wire mesh cage - housed alone), standard (cage with other rats), enriched (cage with other rats and toys), super enriched (cage with rats and toys changes on a periodic basis). After two months, the rats were tested on a variety of learning measures (including the number of trials to learn a maze to a three perfect trial criteria), and several neurological measure (overall cortical weight, degree of dendritic branching, etc.). The data for the maze task is below. Compute the appropriate test for the data provided below.


* Ho: There is no impact of environment on rat development

   Ha: There is an impact of environment on rat development
   
   Level of significance: 1% (or 99% confidence interval)

In [173]:
f_crit = 5.29
anova(f_crit, [22,19,15,24,18],[17,21,15,12,19],[12,14,11,9,15],[8,7,10,9,12])

| Calculated f-statistic: 12.718 |

f-critical: 5.29
dof within: 16 | dof between: 3
Error possibility, if any: Type I error

Null hypothesis can be rejected as we have sufficient evidence to support our claim


> This is a bonafide work of Aarush Gandhi