# Introduction

The goal of this work is to detect subtle gender biases like microaggresions, objectifications and condescenision in 2nd person text. Espeically the biased comment normally associated with previous context, so current classifiers that detect hate speech, offensive language, or negative sentiment cannot detect these comments. 

Some example biased sentences are:
* "Oh, you work at an office? I bet you're a secretary"
* "Total tangent I know, but you're gorgeous"

We can see that in these examples, the second segments alone may not be biased, but when we put them in context, they become problematic.

If we could have a automated detection classifier that could detect such biases sentence, that would be beneficial to:
* Make the posters aware of the bias so they will not post them at the first place (proactive response)
* Flag harm comments so readers can filter out biased comments (active response)

A straightforward way is to train a supervised classifier if we have statement and bias label. But the biases are subtle and implicit, even experts are bad at identifying them. It is quite expensive to collect an annotation dataset with a reasonable scale.

In this work, the authors proposed to use an unsupervised method. They proposed to first train a classifier that predicts that gender of the person that the text is addressed to (whether the addressee is male or female). If the classifier makes a prediction with high confidence, the text likely contains bias.

But a significant challenge when using this unsupervised method is that confounds in the input text may contain other aspects of information (which is not indicative of bias) that can lead to high-confidence classifier prediction. For example, if the comment is "Bro, golf is better", then the word "bro" can easily indicate the addressee is male but there is no bias. 

To deal with this challenge, this work further propose methods to control confounding variables when training the gender bias classifier.

To summarize, the general idea of this paper is that comments contain bias if they are **highly predictive** of gender **despite confound control**.

# Data

Each data point contains the following variables:
* OW

[TODO] more about data point, all variables
[TODO] more about the facebook dataset with some examples

# Method: Classifier for Addressee Gender Prediction

The input to the prediction model is the following comment, the ouput is the gender of the addressee. we aim to identify bias in the following comment. The authors proposed two methods to control confounds from different perspective:
* Observed confounding variables are balanced through propensity matching
* Latent confounding variables are demoted through adversarial training

## Controlling Observed Confounding Variables through Propnesity Matching

In our problem statement, comments are written in reply to "original text" writtedn by the addressee. Then another writer wrote "comment" to reply to the addressee. Comment content is influenced by both the "original text" and the potential bias factor such as the gender of the addressee. This first method for controlling confounds targets to reduce the influence from the "original text".

The primary method to this is using propensity matching. We discard any comment text training samples whose associated original text is healvily affiliate with only one gender. The goal is to balance the dataset, so that the comment text from male and female has similar probabilities to associate with the original text.

But the problem is it's hard to find comments written by male and females with identical original text in practice (that's why there is bias in the dataset!) 
 
So the propensity score for a comment text is defined as the probability that the writter is female given the original text.Â 

## Controlling Latent Confounding Variables through Adversarial Training

Comments may also influenced by traits of the addressee such as occupation, nationality, nicknames etc other than gender. These additional factors are unique to individuals and there are many of them, so it's hard to enumerate all of them. We also want to control influence from these other confounding factors other than gender.

Traits are inferred from comments using log-odds scores and represented in a vector. The GAN-like training procedure discourages the model from learning these traits.

In [1]:
from __future__ import print_function

import argparse
import os, sys
import time
import numpy as np
import torch
import torch.nn as nn
import copy
import codecs
import random
from tqdm import tqdm
import sys
import os

sys.path.insert(0, os.path.abspath('./src'))

torch.backends.cudnn.enabled=False
torch.backends.cudnn.benchmark = True

from datasets import make_rt_gender, make_rt_gender_op_posts
from model import *
from torchtext.vocab import GloVe
from train import *

topic_criterion = nn.KLDivLoss(size_average=False)
# topic_criterion = nn.CrossEntropyLoss()

pretrained_GloVe_sizes = [50, 100, 200, 300]

args = make_parser().parse_args()
print("[Model hyperparams]: {}".format(str(args)))

cuda = torch.cuda.is_available() and args.cuda
print(torch.cuda.is_available())
print(torch.cuda.device_count())
# device = torch.device("cpu") if not cuda else torch.device("cuda:"+str(args.gpu))
device = torch.device("cpu")
print(device)
seed_everything(seed=1337, cuda=cuda)
vectors = None #don't use pretrained vectors
# vectors = load_pretrained_vectors(args.emsize)

# Load dataset iterators
if args.data == "RT_GENDER":
    iters, TEXT, LABEL, INDEX = make_rt_gender(args.batch_size, base_path=args.base_path, train_file=args.train_file, valid_file=args.valid_file, test_file=args.test_file, device=-1, vectors=vectors, topics=False)
    train_iter, val_iter, test_iter = iters
elif args.data == "RT_GENDER_OP_POSTS":
    iters, TEXT, LABEL, INDEX = make_rt_gender_op_posts(args.batch_size, base_path=args.base_path, train_file=args.train_file, valid_file=args.valid_file, test_file=args.test_file, device=-1, vectors=vectors)
    if len(iters) == 2:
        train_iter, test_iter = iters
        val_iter = test_iter
    else:
        train_iter, val_iter, test_iter = iters
else:
    assert False

print("[Corpus]: train: {}, test: {}, vocab: {}, labels: {}".format(
        len(train_iter.dataset), len(test_iter.dataset), len(TEXT.vocab), len(LABEL.vocab)))

if args.model == "CNN":
    args.embed_num = len(TEXT.vocab)
    args.nlabels = len(LABEL.vocab)
    args.kernel_sizes = [int(k) for k in args.kernel_sizes.split(',')]
    args.embed_dim = args.emsize

    model = CNN_Text(args, num_topics=args.num_topics)

elif args.model == "FFN_BOW":
    args.embed_num = len(TEXT.vocab)
    args.nlabels = len(LABEL.vocab)
    args.embed_dim = args.emsize

    model = FFN_BOW_Text(args)

elif args.model == "FFN":
    args.nlabels = len(LABEL.vocab)
    model = FFN_Text(args)

else:
    ntokens, nlabels = len(TEXT.vocab), len(LABEL.vocab)
    args.nlabels = nlabels # hack to not clutter function arguments

    embedding = nn.Embedding(ntokens, args.emsize, padding_idx=1)
    if vectors: embedding.weight.data.copy_(TEXT.vocab.vectors)
    encoder = Encoder(args.emsize, args.hidden, nlayers=args.nlayers,
                      dropout=args.drop, bidirectional=args.bi, rnn_type=args.rnn_model)

    attention_dim = args.hidden if not args.bi else 2*args.hidden
    attention = BahdanauAttention(attention_dim, attention_dim)

    model = Classifier(embedding, encoder, attention, attention_dim, nlabels, num_topics=args.num_topics)
    print('model initialized')

model.to(device)
print('Moved model to device, ', device)

criterion = nn.CrossEntropyLoss()
# topic_criterion = nn.KLDivLoss(size_average=False)
topic_criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), args.lr, amsgrad=True)

print('A')
if not args.load:
    for p in model.parameters():
        if not p.requires_grad:
            print ("OMG", p)
            p.requires_grad = True
        p.data.uniform_(-0.5, 0.5)
    # print (p.data.norm())

print('B')
# trainloss = evaluate(best_model, train_iter, optimizer, criterion, args, datatype='train', writetopics=args.save_output_topics, itos=TEXT.vocab.itos)
if args.load:
    print(args.save_dir+"/"+args.model_name+"_bestmodel")
    if args.latest:
        best_model = torch.load(args.save_dir+"/"+args.model_name+"_latestmodel")
    else:
        print('C')
        # best_model = torch.load(args.save_dir+"/"+args.model_name+"_bestmodel")
        best_model = torch.load(args.save_dir+"/"+args.model_name+"_bestmodel", map_location=torch.device(device))
    print('saved model loaded')
else:
    try:
        best_valid_loss = None
        best_model = None
        for epoch in range(1, args.epochs + 1):
            train(model, train_iter, optimizer, criterion, args, epoch)
            loss = evaluate(model, val_iter, optimizer, criterion, args)

            if not best_valid_loss or loss < best_valid_loss:
                best_valid_loss = loss
                print ("Updating best model")
                best_model = copy.deepcopy(model)
                torch.save(best_model, args.save_dir+"/"+args.model_name+"_bestmodel")
            torch.save(model, args.save_dir+"/"+args.model_name+"_latestmodel")
    except KeyboardInterrupt:
        print("[Ctrl+C] Training stopped!")

if not args.load:
    trainloss = evaluate(best_model, train_iter, optimizer, criterion, args, datatype='train', writetopics=args.save_output_topics, itos=TEXT.vocab.itos, litos=LABEL.vocab.itos)
    valloss = evaluate(best_model, val_iter, optimizer, criterion, args, datatype='valid', writetopics=args.save_output_topics, itos=TEXT.vocab.itos, litos=LABEL.vocab.itos)
print('start evaluating...')
loss = evaluate(best_model, test_iter, optimizer, criterion, args, datatype=os.path.basename(args.test_file).replace(".txt", "").replace(".tsv", ""), writetopics=args.save_output_topics, itos=TEXT.vocab.itos, litos=LABEL.vocab.itos)


usage: ipykernel_launcher.py [-h] [--data {RT_GENDER,RT_GENDER_OP_POSTS}]
                             --base_path BASE_PATH [--test_file TEST_FILE]
                             [--ood_test_file OOD_TEST_FILE]
                             [--train_file TRAIN_FILE]
                             [--valid_file VALID_FILE] [--rnn_model RNN_MODEL]
                             [--save_dir SAVE_DIR] [--model MODEL]
                             [--model_name MODEL_NAME]
                             [--topic_loss TOPIC_LOSS] [--emsize EMSIZE]
                             [--hidden HIDDEN] [--nlayers NLAYERS]
                             [--num_topics NUM_TOPICS] [--lr LR] [--clip CLIP]
                             [--epochs EPOCHS] [--gpu GPU] [--alpha ALPHA]
                             [--batch_size N] [--drop DROP] [--gradreverse]
                             [--bi] [--save_output_topics]
                             [--output_topics_save_filename OUTPUT_TOPICS_SAVE_FILENAME]
                

SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [None]:
%%bash
################################## TO FILL IN ############################################################
# source activate your-env-here

TOP_DIR="/home/ma/fairness/unsupervised_gender_bias_models" # the path to unzipped tarball of saved models

SUFFIX="facebook_wiki"  # Flip comment to change which data set to run
# SUFFIX="facebook_congress"
####################################################################################################


# Paths to data
DATA_DIR="${TOP_DIR}/${SUFFIX}"

# Intermediate suffixes
MATCHED_SUFFIX="matched_${SUFFIX}"
SUBS_SUFFIX="subs_name2"
EXTRA_SUFFIX="withtopics"
NO_SUFFIX="notopics"

export CUDA_VISIBLE_DEVICES=1

python -m src.train.py --data RT_GENDER --base_path ${DATA_DIR} --train_file train.${SUBS_SUFFIX}.${MATCHED_SUFFIX}.${NO_SUFFIX}.txt --valid_file valid.${SUBS_SUFFIX}.${SUFFIX}.txt --test_file test.${SUBS_SUFFIX}.${SUFFIX}.txt --save_dir ${DATA_DIR}/matched_notopics_${SUBS_SUFFIX} --model RNN --model_name rt_gender_${SUFFIX}_matched_notopics.model --gpu 0 --batch_size 32  --write_attention --epochs 5 --lr 0.0001 --load

# Evaluation