### Reproduction study - Fairness and Bias in Online Selection
This notebook contains all experiments for the reproduction study of the paper "Fairness and Bias in Online Selection" for the 2022 ML Reproducibility Challenge.

The following cell downloads the Pokec dataset and places it in the data directory.

In [4]:
!wget http://snap.stanford.edu/data/soc-pokec-relationships.txt.gz
!wget http://snap.stanford.edu/data/soc-pokec-profiles.txt.gz
!mv soc-pokec-relationships.txt.gz data
!mv soc-pokec-profiles.txt.gz data

#### Secretary experiments & Extension experiment
The following cell contains all secretary experiments of the paper. Additionally, the secretary algorithm is applied to a new data set (URFGS). The same parameters are used as in the paper, except for the Pokec experiment due to time constaints. The plots generated by these experiments can be found within the "plots" folder.

In [5]:
from secretary_experiments import PrintStatistics, SecretaryExperiment, PlotSecretary

# Number of experiment repetitions 
num_rep = 20000
num_rep_pokec = 40000

# Variables for synthetic experiments (equal p, general p)
sizes = [10, 100, 1000, 10000]
equal_prob = [0.25, 0.25, 0.25, 0.25]
general_prob = [0.3, 0.25, 0.25, 0.2]

# Variables for maximization dataset experiments (bank, pokec)
max_prob = [0.2, 0.2, 0.2, 0.2, 0.2]

# Variables for research extention dataset experiment (ufrgs)
ufrgs_prob = [0.5, 0.5]

# Secretary experiments
PlotSecretary(num_rep, sizes, equal_prob, 'plots/Secretaryplot_equal.png', 'synth')
PlotSecretary(num_rep, sizes, general_prob, 'plots/Secretaryplot_general.png', 'synth')
PlotSecretary(num_rep, [], max_prob, 'plots/Secretaryplot_bank.png', 'bank')
PlotSecretary(num_rep_pokec, [], max_prob, 'plots/Secretaryplot_pokec.png', 'pokec')
PlotSecretary(num_rep, [], ufrgs_prob, 'plots/Secretaryplot_ufrgs.png', 'ufrgs')

#### Prophet experiments
The following cell contains all prophet experiments of the paper. The same parameters are used as in the paper. The plots generated by these experiments can be found within the "plots" folder.

In [2]:
from prophet_experiments import ProphetExperiment, PlotProphet
import distributions as ds

# Prophet experiment with uniform distributions.
num_rep = 50000
size = 50 
unif_dist = ds.UniformDistribution(loc=0, scale=1, n=size)
distributions_a = [unif_dist, unif_dist]
data = PlotProphet(num_rep, size, distributions_a, 'plots/Prophetplot_unif.png', printeval=False)

# Prophet experiment with binomial distributions.
size = 1000
bi_dist = ds.BinomialDistribution(size, 0.5)
distributions_b = [bi_dist, bi_dist]
data = PlotProphet(num_rep, size, distributions_b, 'plots/Prophetplot_bi.png', printeval=False)

#### Distribution plots
The following cell contains additional code for the plotting of the data set distributions for our research. The plots generated by these experiments can be found within the "distributions" folder.

In [3]:
from plot_distributions import PlotInstanceDistribution
from secretary_data import GetSecretaryInputBank, GetSecretaryInputPokec, GetSecretaryInputUfrgs
import pickle 

# Creating candidate lists from data sets
instance_bank = GetSecretaryInputBank(100000)
instance_pokec = pickle.load(open('data/pokec_instance.dat', 'rb'))
instance_urfgs = GetSecretaryInputUfrgs(100000)

# Plotting candidate lists distributions
PlotInstanceDistribution(instance_bank, 'Bank')
PlotInstanceDistribution(instance_pokec, 'Pokec')
PlotInstanceDistribution(instance_urfgs, 'Urfgs')