In [2]:
import numpy
import scipy.stats
import pandas

def compare_averages(filename):
    """
    Performs a t-test on two sets of baseball data (left-handed and right-handed hitters).

    You will be given a csv file that has three columns.  A player's
    name, handedness (L for lefthanded or R for righthanded) and their
    career batting average (called 'avg'). You can look at the csv
    file by downloading the baseball_stats file from Downloadables below. 
    
    Write a function that will read that the csv file into a pandas data frame,
    and run Welch's t-test on the two cohorts defined by handedness.
    
    One cohort should be a data frame of right-handed batters. And the other
    cohort should be a data frame of left-handed batters.
    
    We have included the scipy.stats library to help you write
    or implement Welch's t-test:
    http://docs.scipy.org/doc/scipy/reference/stats.html
    
    With a significance level of 95%, if there is no difference
    between the two cohorts, return a tuple consisting of
    True, and then the tuple returned by scipy.stats.ttest.  
    
    If there is a difference, return a tuple consisting of
    False, and then the tuple returned by scipy.stats.ttest.
    
    For example, the tuple that you return may look like:
    (True, (9.93570222, 0.000023))
    """
    
    baseball = pandas.read_csv(filename)
    
    # Returns boolean expression
    baseball_right = [baseball['handedness'] == 'R']
    
    # Returns a complete list, not boolean expression 
    baseball_right_2 = baseball[baseball['handedness'] == 'R']
    baseball_left_2 = baseball[baseball['handedness'] == 'L']
    
    # results format -> Tuple(t-test, pvalue)
    result = scipy.stats.ttest_ind(baseball_right_2['avg'], baseball_left_2['avg'], equal_var=False)
    if 0.025 > result[1]:
        return (False, (result[0], result[1]))
    else:
        return (True, (result[0], result[1]))
    
compare_averages('baseball_stats.csv')

(False, (-9.9357022262420944, 3.8102742258887383e-23))

T-test: 
- Gives us the insight to either accept or reject a null-hypothesis
- The t test (also called Student’s T Test) compares two averages (means) and tells you if they are different from each other. The t test also tells you how significant the differences are
- In other words it lets you know if those differences could have happened by chance.