In 1973, the University of California-Berkeley (UC-Berkley) was sued for sex discrimination. Its admission data showed that men applying to graduate school at UC-Berkley were more likely to be admitted than women.

The graduate schools had just accepted 44% of male applicants but only 35% of female applicants. The difference was so great that it was unlikely to be due to chance.  *Let's investigate*

In [14]:
import pandas
import numpy as np

In [7]:
# Read in the world_alcohol.csv data from earlier.
berkeley = pandas.read_csv("berkeley_discrimination_data.csv")

In [11]:
print(list(berkeley.columns.values))

['Admit', 'Gender', 'Dept', 'Freq']


In [12]:
print(berkeley)

       Admit  Gender Dept  Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted  Female    A    89
3   Rejected  Female    A    19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted  Female    B    17
7   Rejected  Female    B     8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted  Female    C   202
11  Rejected  Female    C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted  Female    D   131
15  Rejected  Female    D   244
16  Admitted    Male    E    53
17  Rejected    Male    E   138
18  Admitted  Female    E    94
19  Rejected  Female    E   299
20  Admitted    Male    F    22
21  Rejected    Male    F   351
22  Admitted  Female    F    24
23  Rejected  Female    F   317


In [26]:
freq_dept = berkeley.pivot_table(index=["Dept", "Gender"], values="Freq", columns='Admit', aggfunc=np.sum)

In [27]:
print(freq_dept)

Admit        Admitted  Rejected
Dept Gender                    
A    Female        89        19
     Male         512       313
B    Female        17         8
     Male         353       207
C    Female       202       391
     Male         120       205
D    Female       131       244
     Male         138       279
E    Female        94       299
     Male          53       138
F    Female        24       317
     Male          22       351


Just from looking at the raw counts it doesn't actually appear that women have a significant disadvantage when you break down counts by both gender and dept.  Let's check out admissions/rejections as percent of total.

In [28]:
freq_dept.apply(lambda x : x / x.sum(), axis=1)

Unnamed: 0_level_0,Admit,Admitted,Rejected
Dept,Gender,Unnamed: 2_level_1,Unnamed: 3_level_1
A,Female,0.824074,0.175926
A,Male,0.620606,0.379394
B,Female,0.68,0.32
B,Male,0.630357,0.369643
C,Female,0.340641,0.659359
C,Male,0.369231,0.630769
D,Female,0.349333,0.650667
D,Male,0.330935,0.669065
E,Female,0.239186,0.760814
E,Male,0.277487,0.722513


So this confirms that actually women have a slight advatnage over men in applying.  Would calculate p-value to determine significance.

In [30]:
freq_gender = berkeley.pivot_table(index=["Gender"], values="Freq", columns='Admit', aggfunc=np.sum)

In [31]:
print(freq_gender)

Admit   Admitted  Rejected
Gender                    
Female       557      1278
Male        1198      1493


In [32]:
freq_gender.apply(lambda x : x / x.sum(), axis=1)

Admit,Admitted,Rejected
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,0.303542,0.696458
Male,0.445188,0.554812


Yet here we arrive at a higher admittance rate for males.

In [41]:
berkeley.pivot_table(index=["Dept"], columns="Gender", values="Freq", aggfunc=np.sum).apply(lambda x : x / x.sum(), axis=1)

Gender,Female,Male
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1
A,0.115756,0.884244
B,0.042735,0.957265
C,0.645969,0.354031
D,0.473485,0.526515
E,0.672945,0.327055
F,0.477591,0.522409


Interestingly, two departments are overwhelmingly male applicants A & B.  As you might recall those were actually the two departments where the acceptance rate is pretty spectacular for both sexes (even more so for women).  If you go up another level you see that they also had 25x the open spots of the smallest department.  I suspect there's a pattern here.  Women on average did not tend to apply to the departments with both greater number of spots and greater acceptance rate as men.  Therefore, the data doesn't support that women were discriminated against in the application process, in fact they may have had a slight advantage over men.