# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

<p>Prior to determining which test to run, orienting to the data and/or preparing the data for analysis is important.</p>  

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import scipy.stats as stats

In [2]:
# Import data
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# Orient to data
data.head() 

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [4]:
# Orient to data columns (names, type, complete data, etc)
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4870 entries, 0 to 4869
Data columns (total 65 columns):
id                    4870 non-null object
ad                    4870 non-null object
education             4870 non-null int8
ofjobs                4870 non-null int8
yearsexp              4870 non-null int8
honors                4870 non-null int8
volunteer             4870 non-null int8
military              4870 non-null int8
empholes              4870 non-null int8
occupspecific         4870 non-null int16
occupbroad            4870 non-null int8
workinschool          4870 non-null int8
email                 4870 non-null int8
computerskills        4870 non-null int8
specialskills         4870 non-null int8
firstname             4870 non-null object
sex                   4870 non-null object
race                  4870 non-null object
h                     4870 non-null float32
l                     4870 non-null float32
call                  4870 non-null float32
city        

<h6><b>1.  What test is appropriate for this problem?  Does CLT apply?</b></h6>

<p> As the exercise sets a challenge to establish if race has a significant impact on callbacks for resumes, there are two notable areas to search for evidence:
<ul>- Testing distrbutions by race to determine if *true* distributions are similar [or different], and</ul>
<ul>- Determining if a relationship exists between variables "call" and "race".</ul></p>

<p>Pursuing the first, the data may be separated into two sets, by race, and tested as a two sample test.  That said, options for testing are available:
<ul>- A two sample permutation test offers a bootstrap method that will evaluate multiple permutations of the concatinated data.</ul>
<ul>- A two sample z-test will compute a z-score representing the measurement of the relationship between a mean and group of data.  
$$z = \frac{\left( \mu_w - \mu_b \right) - \Delta_{H_0}}{\sqrt{\mu\left( 1 - \mu \right ) \left ( \frac{1}{n_w} + \frac{1}{n_b}\right)}},$$
where $\mu$ represents the mean of rate of calls.</ul></p>

<p>According to the parameters of the exercise, the data consist of sets of identical resumes with names randomly applied.  The random sampling allows the treatement of the data as independent.  Furthermore, the independence, and reasonably large number, of samples allows treatment of the data as representative.  CLT applies.</p>

In [5]:
# Create function to return permutation of data for hypothesis testing
def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets"""
    
    data = np.concatenate((data1, data2))
    permuted_data = np.random.permutation(data)
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]
    
    return perm_sample_1, perm_sample_2

def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multipl permutation replicates"""
    
    perm_replicates = np.empty(size)
    
    for i in range(size):
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)
        
    return perm_replicates

# Test statistice for permutation test.
def diff_of_means(data_1, data_2):
    """Difference in means of two arrays"""
    
    prop1 = np.mean(data_1)
    prop2 = np.mean(data_2) 
    
    diff = prop1 - prop2
    
    return abs(diff)

# Two sample z test formula. 
def ztst_2samp(data1, data2, h_0):
    """Performs a 2 sample z-test, returning z-score and p-value"""
    
    d1m, d1n = np.mean(data1), len(data1)
    d2m, d2n = np.mean(data2), len(data2)
    mean = (np.sum(data1) + np.sum(data2)) / (len(data1) + len(data2))
    
    top = d1m - d2m - h_0 
    bottom = mean * (1 - mean) * ((1/d1n) + (1/d2n))
    
    z = top / np.sqrt(bottom)
    
    if d1m >= d2m:
        p = stats.norm.cdf(-z)*2
    
    else:
        p = stats.norm.cdf(z)*2
    
    return z,p

# Define functions for test stat data and margin of error
def z_merci2(data1, data2, z_conf_int):
    full = np.concatenate((data1, data2))
    mean = np.mean(full)
    diff = np.mean(data1) - np.mean(data2)
    moe = z_conf_int * np.sqrt(mean * (1-mean) * ((1/len(data1)+(1/len(data2)))))
    ci = (mean - diff) + np.array([-1, 1]) * moe
    
    return moe, ci, diff

def perm_merci(perm_data, confidence):
    """Computes margin of error and confidence interval for permutation testing and confidence range.  
    Enter confidence variable as a decimal value (e.g. 95% = .95)"""
    
    sem = abs(np.std(perm_data)) 
    moeh = np.mean(perm_data) + (sem * 2)
    moel = np.mean(perm_data) - (sem * 2)
    pctout = ((1 - confidence)/2)
    pcth = (1 - pctout) * 100
    pctl = (0 + pctout) * 100
    pctci = np.percentile(perm_data,[pctl,pcth])
    moe = sem * 2
    sem2moe = [moel, moeh]
    
    return pctci, sem2moe, moe



In [6]:
# Separate data for comparison
w = data[data.race=='w']
b = data[data.race=='b']

call_w = w['call']
call_b = b['call']

len(call_w), len(call_b)

(2435, 2435)

<h6><b>2. What are the null and alternate hypotheses?</b></h6>

<p>Both will be used to test the hypotheses: </p>
$$H_0: \mu_w = \mu_b$$ 
$$H_A: \mu_w \ne \mu_b$$

<h6><b>3. Computer margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.</b></h6>


In [7]:
# Calculate empirical difference of proportions
emp_means_diff = diff_of_means(call_w, call_b)

# Draw 10000 permutation replicates
perm_replicates = draw_perm_reps(call_w, call_b, diff_of_means, size=10000)

# Calculate & print p-value
p_perm = np.sum(perm_replicates >= emp_means_diff) / len(perm_replicates)

# Compute z & p values via z-test
z,p = ztst_2samp(call_w, call_b, 0)

In [8]:
# Calculate margin of error with 95% confidence interval
permci, sem2, permoe = perm_merci(perm_replicates, 0.95)
moci = z_merci2(call_w, call_b, 1.96)

print('--- Permuation Test Stats ---')
print('p-value: ',p_perm)
print('Test-stat (diff of means): ', emp_means_diff)
print('confidence interval: ', permci)
print('margin of error: ', permoe)
print('confidence range (2SEM): ', sem2)
print('--- z-test Stats ---')
print('p-value: ',p)
print('z-score:',z)
print('Test-stat: ', moci[2])
print('95% confidence interval: ', moci[1])
print('margin of error : ', moci[0])

--- Permuation Test Stats ---
p-value:  0.0002
Test-stat (diff of means):  0.03203285485506058
confidence interval:  [ 0.          0.01724846]
margin of error:  0.00941653672373
confidence range (2SEM):  [-0.003234031413407148, 0.015599042034060779]
--- z-test Stats ---
p-value:  3.9838854095e-05
z-score: 4.10841223524
Test-stat:  0.03203285485506058
95% confidence interval:  [ 0.03317805  0.06374187]
margin of error :  0.0152819126334


<p><b><u>Permutation Test</b></u>:  With a p-value of zero, the null hypothesis that white and black sounding names have the same callback rate should be rejected.  Given the test statistic lies outside the 95% confidence interval, rejection of the null hypothesis is further supported.</p>

<p><b><u>z-test</b></u>:  Similarly to the permutation test, the smallness of the p-value indicates it can be regarded as zero.  That said, the null hypothesis is rejected.  Whereas the test stat lies outside the confidence interval, further evidence is provided for rejection of the null hypthesis.</p>

<p>Another test usefull in determining the relationships between variables would be the $\chi^2$ test.  The claim of the null hypothesis, that distributions of callbacks for race are equal, is a claim that "callback rate" and "race" are variables independent from each other.  The $\chi^2$ test will measure how well the null hypothesis fits the data.</p> 

In [9]:
# Compute variables for test
mcalls = np.sum(call_w) + np.sum(call_b)
tcalls = len(call_w) + len(call_b)
tprop = mcalls / tcalls

obsv = [np.sum(call_w), np.sum(call_b)]
expv = [tprop * len(call_w), tprop * len(call_b)]

print(stats.chisquare(obsv, expv))


Power_divergenceResult(statistic=15.520408163265309, pvalue=8.1619303597043819e-05)


<p><b><u>$\chi^2$ test</b></u>: As expected, the p-value is practically zero; supporting rejection of the null hypothesis.</p>

<h6><b>4. Write a story describing the statistical significance in the context of the original problem.</b></h6>

<p>Attempting to examine levels of racial discrimination in the United States labor market, researchers assembled identical resumes and randomly assigned them white-sounding and black-sounding names.  Noting the callbacks associated with names, data were collected for examination.</p>

<p>Three statistical testing methods were performed on the data under the assumption that callback rates for black-sounding names and white-sounding names were the same. A brief summary of the tests:

<ul>- <b>Permutation Test</b>:  This test utilizes an approach similar to shuffling cards.  In effect, the data sets were shuffled together and randomly *redealt* a large number of times in order to simulate data recollection under identical circumstances.  The difference in callback rates were evaluated against the actual callback rate difference.  If given equal true callback rate distributions, the permutation test determined, with 95% confidence, the difference of callback rates would be between 0 and 0.018 callbacks per resume.  With an actual callback difference of 0.032 callbacks per resume, the probability of distributions including, at least, a 0.032 callback rate computes to practically zero.</ul>
<ul>- <b>*z*-test</b>:  This test provides a quantification of a hypothetical mean value in relation to the group of data.  The computed z-score indicates the relationship in terms of standard deviation.  In normal distributions, approximately 95% of all values occur within 2 standard deviations.  In other words, if the callback rates have are equally distributed, the hypothetical mean will have a close relationship with the data resulting in a lower z-score.  In this case, the z-score of 4.1 indicates the data do not share a close relationship.  According to this test, with 95% confidence, the difference in mean callback rates would be between 0.33 and 0.64 callbacks per resume.  However close it seems, the actual difference in callback rates (0.032) lies outside the confidence range and the probability of distributions including, at least, a 0.032 callback rate also computes to practically zero.</ul>
<ul>-<b>$\chi^2$ test</b>: This test, also referred to as the Pearson $\chi^2$ test, measures the relationship between variables.  Results from this test generally lend to an interpretation of indepence or dependence of variables.  As with the Pearson coefficient, divergence indicates the impact of relationship between variables; results nearest zero indicate variables are independent and vice versa.  In this case, if the callback rates are equally distributed, then there should be minimal impact and, therefore, a low divergence statistic.  The $\chi^2$ test resulted in a divergence statistic of 15.5, indicating that callback rates and race are likely not independent variables.  That said, the probability of distributions showing independence with inclusion of, at least, a 0.032 callback rate computes, once again, to practically zero. </ul>


<h6><b>5. Does your analysis mean that race/name is the most important factor in callback success?  Why or why not? If not, how would you amend  your analysis?</b></h6>

<p>Each of the test models used indicated that callback rates for white-sounding names and black-sounding names were not similarly distributed.  However, while these tests may indicate that a statistical relationship may exist between callback rate and race, they do not provide enough evidence to suggest that race is the most important factor for callback success.  The scope of this exercise does not measure race against other factors for callback success, so a conclusion of race as the most important factor would be premature.</p>

<p>According to Monster contributing writer, Catherine Conlan, resume attributes employers primarily look for are personal summary, skills/competencies, related experience, education, and presentation.  While names may act as indicators for race, it would be important to measure race against these factors before making any claim toward asserting "most important factor".</p> 

<p>The background information provides that identical resumes were assigned to white-sounding names and black-sounding names.  This would indicated that, within the data, duplicated resumes may be found in which related experience, skills, and education are equal.  If, in these cases, the only difference is race, then observations of callback rates may provide useful information.</p> 

#### Resources
+ Article - 5 Critical Elements of any Resume: https://www.monster.com/career-advice/article/5-critical-elements-of-resume

<div class="span5 alert alert-success">
<h5><b>Additional Experimentation</b></h5>

<p>What follows is a brief experiment regarding a potential direction of analysis of the impact of race and the success of callbacks.  Given that duplicate resumes are assumed to exist, samples will be drawn where duplicates will be weighed against each other.  The "occupspecific" category of the data appears to identify a resume entry akin to "related experience".  Samples will be drawn according to this parameter and duplicates within the samples will be identified for testing.</p>

<p>Data collected will be tested using the three methods previously used and test the same hypothesis:
$$H_0: \mu_w = \mu_b$$ 
$$H_A: \mu_w \ne \mu_b$$

In [10]:
# Simplify data into general resume entry categories
dfdata = data[['education','yearsexp','empholes','occupspecific', 'occupbroad', 'sex','race', 'call']]

dfdata['occupspecific'].unique()

array([ 17, 316,  19, 313, 266,  13, 263, 379,  27,  21, 268, 267, 265,
       317, 189, 185, 255, 323, 195, 329, 337, 326, 365,  34,  22, 229,
       389, 387, 264, 214, 256, 374, 903, 274, 285, 253,  25, 448, 364,
         7, 257, 443,  33, 307, 386, 785, 385, 233, 188,   8, 304, 461,
       804, 338, 213, 269, 276,   9, 243, 234, 286])

In [11]:
# Create array for random sample selection.
occupspecifics = [ 17, 316, 313, 266,  13, 263, 379,  27,  21, 268, 267, 317, 189,
       185, 255, 323, 265, 195, 329, 337, 326,  34,  22, 229, 389, 365,
       274, 285, 253,  19,  25, 387, 364,   7, 257, 443, 264,  33, 307,
       386, 785, 385, 233, 188, 461, 448, 256,   8]

# Randomly select three samples from data and assign as test data.
selected = np.random.choice(occupspecifics,size=3)

test1 = dfdata[dfdata['occupspecific']==selected[0]]
test2 = dfdata[dfdata['occupspecific']==selected[1]]
test3 = dfdata[dfdata['occupspecific']==selected[2]]

# Inspect test data inormation 
#test1.info(), test2.info(), test3.info()

In [12]:
# Identifies columns for duplicate matching criteria.
columns = ['education', 'yearsexp', 'empholes', 'occupspecific', 'occupbroad']

# Reduce test samples by rows having no duplicate rows per matching criteria
test1 = test1[test1[columns].duplicated(keep=False)]
test2 = test2[test2[columns].duplicated(keep=False)]
test3 = test3[test3[columns].duplicated(keep=False)]

# Store test sample arrays of indices of duplicated rows
a = test1.groupby(columns).apply(lambda x: list(x.index)).tolist()
b = test2.groupby(columns).apply(lambda x: list(x.index)).tolist()
c = test3.groupby(columns).apply(lambda x: list(x.index)).tolist()

In [13]:
# Observe test data
lets = [a,b,c]
count = 1

for x in lets:
    print()
    print('Test set ',count)
    
    for i,y in enumerate(x): 
        if x == a:
            print('w:b ',np.array(data.iloc[y]['race'].value_counts()))
        elif x == b:
            print('w:b ',np.array(data.iloc[y]['race'].value_counts()))
        else:
            print('w:b ',np.array(data.iloc[y]['race'].value_counts()))
    count = count + 1


Test set  1
w:b  [3 2]
w:b  [4 3]
w:b  [3]
w:b  [14  6]
w:b  [1 1]
w:b  [4 1]
w:b  [3 1]
w:b  [5]
w:b  [12 10]
w:b  [2]
w:b  [2]
w:b  [1 1]

Test set  2
w:b  [17 15]

Test set  3
w:b  [15  7]
w:b  [5 3]
w:b  [2]


In [14]:
# Organize data for testing.
t1w = test1[test1['race']=='w']
t1b = test1[test1['race']=='b']

t2w = test2[test2['race']=='w']
t2b = test2[test2['race']=='b']

t3w = test3[test3['race']=='w']
t3b = test3[test3['race']=='b']

print('w:b ', len(t1w['race']), len(t1b['race'])) 
print('w:b ', len(t2w['race']), len(t2b['race'])) 
print('w:b ', len(t3w['race']), len(t3b['race']))

w:b  46 33
w:b  17 15
w:b  20 12


<div class="span5 alert alert-success">
<p>Observing the test sets indidicates that counts for duplication of resumes may not be an ideal 1:1 ratio.  In some cases, the duplicate resumes are present within one sample rather than both.  This could be further explored to establish if bias exists within the sets.</p>


In [15]:
# Set call data variables for functions
call_t1w = t1w['call']
call_t1b = t1b['call']
call_t2w = t2w['call']
call_t2b = t2b['call']
call_t3w = t3w['call']
call_t3b = t3b['call']

# Calculate empirical difference of means
pdf1 = diff_of_means(call_t1w, call_t1b)
pdf2 = diff_of_means(call_t2w, call_t2b)
pdf3 = diff_of_means(call_t2w, call_t3b)

# Observe test stats to determine p-value calculations as > <
print('Test stat 1 = ', pdf1)
print('Test stat 2 = ', pdf2)
print('Test stat 3 = ', pdf3)


Test stat 1 =  0.03491435945034027
Test stat 2 =  0.11764705926179886
Test stat 3 =  0.03431372344493866


In [17]:
# Draw 10000 permutation replicates
t1reps = draw_perm_reps(call_t1w, call_t1b, diff_of_means, size=10000)
t2reps = draw_perm_reps(call_t2w, call_t2b, diff_of_means, size=10000)
t3reps = draw_perm_reps(call_t3w, call_t3b, diff_of_means, size=10000)

# Compute p-values
t1p = np.sum(t1reps >= pdf1) / len(t1reps)
t2p = np.sum(t2reps >= pdf1) / len(t2reps)
t3p = np.sum(t3reps >= pdf1) / len(t3reps)

# Compute margin of error, confidence intervals, 2 SEM
t1mci, t1sem2, t1moe = perm_merci(t1reps, 0.95)
t2mci, t2sem2, t2moe = perm_merci(t2reps, 0.95)
t3mci, t3sem2, t3moe = perm_merci(t3reps, 0.95)

# Computer z-score & p values with z-test
t1z, t1zp = ztst_2samp(call_t1w, call_t1b, 0)
t2z, t2zp = ztst_2samp(call_t2w, call_t2b, 0)
t3z, t3zp = ztst_2samp(call_t3w, call_t3b, 0)

# Determine margin of error & confidence interval.
merr1 = z_merci2(call_t1w, call_t1b, 1.96)
merr2 = z_merci2(call_t2w, call_t2b, 1.96)
merr3 = z_merci2(call_t2w, call_t3b, 1.96)



In [18]:
print('------ Permutation Test values ------')
print('******* Test 1 *******')
print('p-value: ', t1p)
print('Test-stat (diff of means): ', pdf1)
print('confidence interval: ', t1mci)
print('bs margin of error: ', t1moe)
print('****** Test 2 *******')
print('p-value: = ', t2p)
print('Test-stat (diff of means): ', pdf2)
print('confidence interval: ',t2mci)
print('margin of error: ',t2moe)
print('****** Test 3 ******')
print('p-value: = ', t3p)
print('Test-stat (diff of means): ', pdf3)
print('confidence interval: ',t3mci)
print('margin of error: ',t3moe)
print()
print('------ z Test values ------')
print('******* Test 1 *******')
print('p-value: ',t1zp )
print('z-score: ',t1z)
print('Test-stat: ', merr1[2])
print('95% confidence interval: ', merr1[1])
print('margin of error= ', merr1[0])
print('******* Test 2 *******')
print('p-value: ', t2zp)
print('z-score: ',t2z)
print('Test-stat: ', merr2[2])
print('95% confidence interval: ', merr2[1])
print('margin of error= ', merr2[0])
print('******* Test 3 *******')
print('p-value: ', t3zp)
print('z-score: ', t3z)
print('Test-stat: ', merr3[2])
print('95% confidence interval: ', merr3[1])
print('margin of error= ', merr3[0])

------ Permutation Test values ------
******* Test 1 *******
p-value:  0.6349
Test-stat (diff of means):  0.03491435945034027
confidence interval:  [ 0.0171278   0.12121212]
bs margin of error:  0.054760256562
****** Test 2 *******
p-value: =  0.4898
Test-stat (diff of means):  0.11764705926179886
confidence interval:  [ 0.00784314  0.13333334]
margin of error:  0.117179396432
****** Test 3 ******
p-value: =  1.0
Test-stat (diff of means):  0.03431372344493866
confidence interval:  [ 0.06666667  0.2       ]
margin of error:  0.129248401292

------ z Test values ------
******* Test 1 *******
p-value:  0.485139541397
z-score:  0.69806020878
Test-stat:  0.03491435945034027
95% confidence interval:  [-0.08231331  0.11375042]
margin of error=  0.0980318667021
******* Test 2 *******
p-value:  0.170066959863
z-score:  1.37198868625
Test-stat:  0.11764705926179886
95% confidence interval:  [-0.22321567  0.11292155]
margin of error=  0.16806861344
******* Test 3 *******
p-value:  0.580912400608

In [19]:
# Organize veariables to test for independence.
mcalls1 = np.sum(call_t1w) + np.sum(call_t1b)
mcalls2 = np.sum(call_t2w) + np.sum(call_t2b)
mcalls3 = np.sum(call_t3w) + np.sum(call_t3b)

tcalls1 = len(call_t1w) + len(call_t1b)
tcalls2 = len(call_t3w) + len(call_t2b)
tcalls3 = len(call_t3w) + len(call_t3b)

tprop = mcalls1 / tcalls1
tprop = mcalls2 / tcalls2
tprop = mcalls3 / tcalls3

obsv1 = [np.sum(call_t1w), np.sum(call_t1b)]
obsv2 = [np.sum(call_t2w), np.sum(call_t2b)]
obsv3 = [np.sum(call_t3w), np.sum(call_t3b)]

expv1 = [tprop * len(call_t1w), tprop * len(call_t1b)]
expv2 = [tprop * len(call_t2w), tprop * len(call_t2b)]
expv3 = [tprop * len(call_t3w), tprop * len(call_t3b)]

# Run chisquare function
t1chi = stats.chisquare(obsv1, expv1)
t2chi = stats.chisquare(obsv2, expv2)
t3chi = stats.chisquare(obsv3, expv3)

print(t1chi)
print(t2chi)
print(t3chi)

Power_divergenceResult(statistic=3.6826416337285899, pvalue=0.054981675065636892)
Power_divergenceResult(statistic=1.8823529411764706, pvalue=0.17006696145390077)
Power_divergenceResult(statistic=0.26666666666666666, pvalue=0.60557661633534621)


<div class="span5 alert alert-success">
<p>According to each of the testing methods, evidence exists to support the null hypothesis that distributions of white-sounding names are equal to black-sounding names.  In most cases, the test statistic fell within the respective 95% confidence range.  Furthermore, the $\chi^2$ tests suggest independence of the callback rates and race when considered with other factors like education, skills and related experience.</p>

<p>It can be noted that one of the $\chi^2$ tests did show a divergence greater than 3 and p-value indicating probability of only 0.055 of the distribution including, at least, the test statistic.  While these results may not strongly support acceptance of the null hypothesis, they do not provide significant evidence to reject the null hypothesis.</p>

<p>In each of these narrow test cases, it may be reasonable to conclude that, given equal skills, education, and related experience, race may not be the most important factor in callback success.  However, given the limits of the data provided, other factors may likely require exploration before drawing specific conclusions.</p>