(bias mitigation)=

# Bias Mitigation (Debias)

The following guide is designed to present the more general details on 
using the package to mitigate (debias) bias in word embedding models. 
The following sections show:

- run {class}`~wefe.debias.hard_debias.HardDebias` mitigation method on an
  embedding model to mitigate gender bias (using the ``fit-transform`` interface).
- apply the ``target`` parameter when executing the transformation.
- apply the ``ignore`` parameter when executing the transformation.
- apply the ``copy`` parameter when executing the transformation.
- run {class}`~wefe.debias.multiclass_hard_debias.MulticlassHardDebias` mitigation 
  method on an word embedding model to mitigate ethnic bias.

:::{note}

For a list of metrics implemented in WEFE, refer to the
[debias section](debias-API) of the API reference.  

:::


## Hard Debias

Hard debias is a method that allows mitigating biases through geometric operations on embeddings. 
This method is binary because it only allows 2 classes of the same bias criterion,
such as male or female.

:::{note}

For a multiclass debias (such as for Latinos, Asians and Whites), it is
recommended to visit
{class}`~wefe.debias.multiclass_hard_debias.MulticlassHardDebias` class.

:::




The main idea of this method is:

1. Identify a bias subspace through the defining sets. In the case of gender,
these could be e.g. ``[['woman', 'man'], ['she', 'he'], ...]``

2. Neutralize the bias subspace of embeddings that should not be biased.
First, it is defined a set of words that are correct to be related to the bias
criterion: the *criterion specific gender words*.
For example, in the case of gender, *gender specific* words are:
``['he', 'his', 'He', 'her', 'she', 'him', 'him', 'She', 'man', 'women', 'men'...]``.

Then, it is defined that all words outside this set should have no relation to the
bias criterion and thus have the possibility of being biased. (e.g. for the case of
genthe bias direction, such that neither is closer to the bias direction
than the other: ``['doctor', 'nurse', ...]``). Therefore, this set of words is
neutralized with respect to the bias subspace found in the previous step.

The neutralization is carried out under the following operation:

- $u$ : embedding
- $v$ : bias direction

First calculate the projection of the embedding on the bias subspace.


$$\text{bias_subspace} = \frac{v \cdot (v \cdot u)}{(v \cdot v)}$$

Then subtract the projection from the embedding.

$$u' = u - \text{bias_subspace}$$

  3. Equalizate the embeddings with respect to the bias direction.
  Given an equalization set (set of word pairs such as ``['she', 'he'],
  ['men', 'women'], ...``, but not limited to the definitional set) this step
  executes, for each pair, an equalization with respect to the bias direction.
  That is, it takes both embeddings of the pair and distributes them at the same
  distance from the bias direction, so that neither is closer to the bias direction
  than the other.


The fit parameters define how the neutralization will be calculated. In
Hard Debias, you have to provide the the ``definitional_pairs``, the
``equalize_pairs`` (which could be the same of definitional pairs) and
optionally, a debias ``criterion_name`` (to name the debiased model).


The code shown below shows how to run Hard Debias from gender to the test model 
provided by wefe (reduced word2vec).

In [1]:
from wefe.utils import load_test_model

model = load_test_model()  # load a reduced version of word2vec
model

<wefe.word_embedding_model.WordEmbeddingModel at 0x7f3b83b3b700>

Load the required word sets.

In [2]:
from wefe.datasets import fetch_debiaswe
from wefe.debias.hard_debias import HardDebias

debiaswe_wordsets = fetch_debiaswe()

definitional_pairs = debiaswe_wordsets["definitional_pairs"]
equalize_pairs = debiaswe_wordsets["equalize_pairs"]
gender_specific = debiaswe_wordsets["gender_specific"]


print(f"definitional_pairs: \n{definitional_pairs}")
print(f"equalize_pairs: \n{equalize_pairs}")
print(f"gender_specific: \n{gender_specific}")
print("-" * 70, "\n")

definitional_pairs: 
[['woman', 'man'], ['girl', 'boy'], ['she', 'he'], ['mother', 'father'], ['daughter', 'son'], ['gal', 'guy'], ['female', 'male'], ['her', 'his'], ['herself', 'himself'], ['Mary', 'John']]
equalize_pairs: 
[['monastery', 'convent'], ['spokesman', 'spokeswoman'], ['Catholic_priest', 'nun'], ['Dad', 'Mom'], ['Men', 'Women'], ['councilman', 'councilwoman'], ['grandpa', 'grandma'], ['grandsons', 'granddaughters'], ['prostate_cancer', 'ovarian_cancer'], ['testosterone', 'estrogen'], ['uncle', 'aunt'], ['wives', 'husbands'], ['Father', 'Mother'], ['Grandpa', 'Grandma'], ['He', 'She'], ['boy', 'girl'], ['boys', 'girls'], ['brother', 'sister'], ['brothers', 'sisters'], ['businessman', 'businesswoman'], ['chairman', 'chairwoman'], ['colt', 'filly'], ['congressman', 'congresswoman'], ['dad', 'mom'], ['dads', 'moms'], ['dudes', 'gals'], ['ex_girlfriend', 'ex_boyfriend'], ['father', 'mother'], ['fatherhood', 'motherhood'], ['fathers', 'mothers'], ['fella', 'granny'], ['fraterni

Instantiate and fit the parameters of the debias transformation.
In the fit stage, parameters such as bias direction are calculated and embeddings are 
prepared for the equalization stage.

In [3]:
hd = HardDebias(verbose=False, criterion_name="gender")

hd.fit(
    model, definitional_pairs=definitional_pairs, equalize_pairs=equalize_pairs,
)

<wefe.debias.hard_debias.HardDebias at 0x7f3b810ad6d0>

### Mitigation Parameters

The parameters of the transform method are relatively standard for all
methods. The most important ones are ``target``, ``ignore`` and
``copy``.

In the following example we use ``ignore`` and ``copy``, which are
described below:

-  ``ignore`` (by default, ``None``):

    A list of strings that indicates that the debias method will perform
    the debias in all words except those specified in this list. In case
    it is not specified, debias will be executed on all words. In case
    ignore is not specified or its value is None, the transformation will
    be performed on all embeddings. This may cause words that are
    specific to social groups to lose that component (for example,
    leaving ``'she'`` and ``'he'`` without a gender component).

-  ``copy`` (by default ``True``):

    if the value of copy is ``True``, method attempts to create a copy of
    the model and run debias on the copy. If ``False``, the method is
    applied on the original model, causing the vectors to mutate.

    **WARNING:** Setting copy with ``True`` requires at least 2x RAM of
    the size of the model. Otherwise the execution of the debias may raise
    ``MemoryError``.

The following transformation is executed using a copy of the model,
ignoring the words contained in ``gender_specific``.


In [4]:
gender_debiased_model = hd.transform(model, ignore=gender_specific, copy=True)

Copy argument is True. Transform will attempt to create a copy of the original model. This may fail due to lack of memory.
Model copy created successfully.


100%|██████████| 13013/13013 [00:00<00:00, 118668.18it/s]


### Measuring the Decrease of Bias

Using the metrics displayed in the {ref}`bias measurement` user guide, we
can measure whether or not there was a change in the measured gender bias
between the original model and the debiased model.

In [5]:
from wefe.datasets import load_weat
from wefe.query import Query
from wefe.metrics import WEAT

weat_wordset = load_weat()
weat = WEAT()


Next, we measure the gender bias exposed by query 1 (Male terms and Female terms wrt Career and Family) with respect to the debiased model and the original.

In [6]:
gender_query_1 = Query(
    [weat_wordset["male_terms"], weat_wordset["female_terms"]],
    [weat_wordset["career"], weat_wordset["family"]],
    ["Male terms", "Female terms"],
    ["Career", "Family"],
)
print(gender_query_1, "\n", "-" * 70, "\n")

biased_results_1 = weat.run_query(gender_query_1, model, normalize=True)
debiased_results_1 = weat.run_query(
    gender_query_1, gender_debiased_model, normalize=True
)

print("Debiased vs Biased (absolute values)")
print(
    round(abs(debiased_results_1["weat"]), 3),
    "<",
    round(abs(biased_results_1["weat"]), 3),
)



<Query: Male terms and Female terms wrt Career and Family
- Target sets: [['male', 'man', 'boy', 'brother', 'he', 'him', 'his', 'son'], ['female', 'woman', 'girl', 'sister', 'she', 'her', 'hers', 'daughter']]
- Attribute sets:[['executive', 'management', 'professional', 'corporation', 'salary', 'office', 'business', 'career'], ['home', 'parents', 'children', 'family', 'cousins', 'marriage', 'wedding', 'relatives']]> 
 ---------------------------------------------------------------------- 

Debiased vs Biased (absolute values)
0.047 < 0.463


The above results show that there was a decrease in the measured gender bias.

Next, we measure the gender bias exposed by query 2 (Male Names and Female Names wrt Pleasant and Unpleasant terms) with respect to the debiased model and the original.

In [7]:
gender_query_2 = Query(
    [weat_wordset["male_names"], weat_wordset["female_names"]],
    [weat_wordset["pleasant_5"], weat_wordset["unpleasant_5"]],
    ["Male Names", "Female Names"],
    ["Pleasant", "Unpleasant"],
)

print(gender_query_2, "\n", "-" * 70, "\n")

biased_results_2 = weat.run_query(
    gender_query_2, model, normalize=True, preprocessors=[{}, {"lowercase": True}]
)
debiased_results_2 = weat.run_query(
    gender_query_2,
    gender_debiased_model,
    normalize=True,
    preprocessors=[{}, {"lowercase": True}],
)

print("Debiased vs Biased (absolute values)")
print(
    round(abs(debiased_results_2["weat"]), 3),
    "<",
    round(abs(biased_results_2["weat"]), 3),
)



<Query: Male Names and Female Names wrt Pleasant and Unpleasant
- Target sets: [['John', 'Paul', 'Mike', 'Kevin', 'Steve', 'Greg', 'Jeff', 'Bill'], ['Amy', 'Joan', 'Lisa', 'Sarah', 'Diana', 'Kate', 'Ann', 'Donna']]
- Attribute sets:[['caress', 'freedom', 'health', 'love', 'peace', 'cheer', 'friend', 'heaven', 'loyal', 'pleasure', 'diamond', 'gentle', 'honest', 'lucky', 'rainbow', 'diploma', 'gift', 'honor', 'miracle', 'sunrise', 'family', 'happy', 'laughter', 'paradise', 'vacation'], ['abuse', 'crash', 'filth', 'murder', 'sickness', 'accident', 'death', 'grief', 'poison', 'stink', 'assault', 'disaster', 'hatred', 'pollute', 'tragedy', 'divorce', 'jail', 'poverty', 'ugly', 'cancer', 'kill', 'rotten', 'vomit', 'agony', 'prison']]> 
 ---------------------------------------------------------------------- 

Debiased vs Biased (absolute values)
0.055 < 0.074


Again, the above results show that there was a decrease in the measured gender bias.

### Target Parameter

If a set of words is specified in ``target`` parameter, the debias method is performed
only on the embeddings associated with this set. 
In the case of providing ``None``, the transformation is performed on all vocabulary
words except those specified in ignore. By default ``None``.

In the following example, the target parameter is used to execute the transformation 
only on the career and family word set:

In [8]:
targets = [
    "executive",
    "management",
    "professional",
    "corporation",
    "salary",
    "office",
    "business",
    "career",
    "home",
    "parents",
    "children",
    "family",
    "cousins",
    "marriage",
    "wedding",
    "relatives",
]

hd = HardDebias(verbose=False, criterion_name="gender").fit(
    model, definitional_pairs=definitional_pairs, equalize_pairs=equalize_pairs,
)

gender_debiased_model = hd.transform(model, target=targets, copy=True)


Copy argument is True. Transform will attempt to create a copy of the original model. This may fail due to lack of memory.
Model copy created successfully.


100%|██████████| 16/16 [00:00<00:00, 9428.05it/s]


Next, a bias test is run on the mitigated embeddings associated with the
target words. 

In this case, the value of the metric is lower on the
query executed on the mitigated model than on the original one.
These results indicate that there was a mitigation of bias on embeddings of these words.


In [9]:
gender_query_1 = Query(
    [weat_wordset["male_terms"], weat_wordset["female_terms"]],
    [weat_wordset["career"], weat_wordset["family"]],
    ["Male terms", "Female terms"],
    ["Career", "Family"],
)
print(gender_query_1, "\n", "-" * 70, "\n")

biased_results_1 = weat.run_query(gender_query_1, model, normalize=True)
debiased_results_1 = weat.run_query(
    gender_query_1, gender_debiased_model, normalize=True
)

print("Debiased vs Biased (absolute values)")
print(
    round(abs(debiased_results_1["weat"]), 3),
    "<",
    round(abs(biased_results_1["weat"]), 3),
)



<Query: Male terms and Female terms wrt Career and Family
- Target sets: [['male', 'man', 'boy', 'brother', 'he', 'him', 'his', 'son'], ['female', 'woman', 'girl', 'sister', 'she', 'her', 'hers', 'daughter']]
- Attribute sets:[['executive', 'management', 'professional', 'corporation', 'salary', 'office', 'business', 'career'], ['home', 'parents', 'children', 'family', 'cousins', 'marriage', 'wedding', 'relatives']]> 
 ---------------------------------------------------------------------- 

Debiased vs Biased (absolute values)
0.047 < 0.463


However, if a bias test is run with words that were outside the ``target``
word set, the results are almost the same. The slight difference in the
metric scores lies in the fact that the equalize sets were still
equalized.

:::{warning}

The equalization process can modify embeddings that have not been marked in the target.

Equalization can be deactivated by delivering an empty equalize set (``[]``)

:::

In [10]:
gender_query_2 = Query(
    [weat_wordset["male_names"], weat_wordset["female_names"]],
    [weat_wordset["pleasant_5"], weat_wordset["unpleasant_5"]],
    ["Male Names", "Female Names"],
    ["Pleasant", "Unpleasant"],
)

print(gender_query_2, "\n", "-" * 70, "\n")

biased_results_2 = weat.run_query(
    gender_query_2, model, normalize=True, preprocessors=[{}, {"lowercase": True}]
)
debiased_results_2 = weat.run_query(
    gender_query_2,
    gender_debiased_model,
    normalize=True,
    preprocessors=[{}, {"lowercase": True}],
)

print("Debiased vs Biased (absolute values)")
print(
    round(abs(debiased_results_2["weat"]), 3),
    ">",
    round(abs(biased_results_2["weat"]), 3),
)


<Query: Male Names and Female Names wrt Pleasant and Unpleasant
- Target sets: [['John', 'Paul', 'Mike', 'Kevin', 'Steve', 'Greg', 'Jeff', 'Bill'], ['Amy', 'Joan', 'Lisa', 'Sarah', 'Diana', 'Kate', 'Ann', 'Donna']]
- Attribute sets:[['caress', 'freedom', 'health', 'love', 'peace', 'cheer', 'friend', 'heaven', 'loyal', 'pleasure', 'diamond', 'gentle', 'honest', 'lucky', 'rainbow', 'diploma', 'gift', 'honor', 'miracle', 'sunrise', 'family', 'happy', 'laughter', 'paradise', 'vacation'], ['abuse', 'crash', 'filth', 'murder', 'sickness', 'accident', 'death', 'grief', 'poison', 'stink', 'assault', 'disaster', 'hatred', 'pollute', 'tragedy', 'divorce', 'jail', 'poverty', 'ugly', 'cancer', 'kill', 'rotten', 'vomit', 'agony', 'prison']]> 
 ---------------------------------------------------------------------- 

Debiased vs Biased (absolute values)
0.08 > 0.074


Note that the equalization caused the bias of the debiased model to be slightly larger than the original.


### Saving the Debiased Model

To save the mitigated model one must access the ``KeyedVectors`` (the
gensim object that contains the embeddings) through ``wv`` and then use
the ``save`` method to store the method in a file.



In [11]:
gender_debiased_model.wv.save("gender_debiased_glove.kv")



## Multiclass Hard Debias

Multiclass Hard Debias is a generalized version of Hard Debias that
enables multiclass debiasing. Generalized refers to the fact that this
method extends Hard Debias in order to support more than two types of
social target sets within the definitional set.

For example, for the case of religion bias, it supports a debias using
words associated with Christianity, Islam and Judaism.

The usage is very similar to Hard Debias with the difference that the
``definitional_sets`` can be larger than pairs.


In [12]:
from wefe.datasets import fetch_debias_multiclass
from wefe.debias.multiclass_hard_debias import MulticlassHardDebias

multiclass_debias_wordsets = fetch_debias_multiclass()
weat_wordsets = load_weat()
weat = WEAT()

ethnicity_definitional_sets = multiclass_debias_wordsets["ethnicity_definitional_sets"]
ethnicity_equalize_sets = list(
    multiclass_debias_wordsets["ethnicity_analogy_templates"].values()
)

print(f"ethnicity_definitional_sets: \n{ethnicity_definitional_sets}")
print(f"ethnicity_equalize_sets: \n{ethnicity_equalize_sets}")
print("-" * 70, "\n")

mhd = MulticlassHardDebias(verbose=False, criterion_name="ethnicity")
mhd.fit(
    model=model,
    definitional_sets=ethnicity_definitional_sets,
    equalize_sets=ethnicity_equalize_sets,
)

ethnicity_debiased_model = mhd.transform(model, copy=True)


ethnicity_definitional_sets: 
[['black', 'caucasian', 'asian'], ['african', 'caucasian', 'asian'], ['black', 'white', 'asian'], ['africa', 'america', 'asia'], ['africa', 'america', 'china'], ['africa', 'europe', 'asia']]
ethnicity_equalize_sets: 
[['manager', 'executive', 'redneck', 'hillbilly', 'leader', 'farmer'], ['doctor', 'engineer', 'laborer', 'teacher'], ['slave', 'musician', 'runner', 'criminal', 'homeless']]
---------------------------------------------------------------------- 

copy argument is True. Transform will attempt to create a copy of the original model. This may fail due to lack of memory.
Model copy created successfully.


100%|██████████| 13003/13003 [00:00<00:00, 18357.20it/s]


### Measuring the Decrease of Bias


In [13]:
ethnicity_query = Query(
    [
        multiclass_debias_wordsets["white_terms"],
        multiclass_debias_wordsets["black_terms"],
    ],
    [
        multiclass_debias_wordsets["white_biased_words"],
        multiclass_debias_wordsets["black_biased_words"],
    ],
    ["european_american_names", "african_american_names"],
    ["white_biased_words", "black_biased_words"],
)

print(ethnicity_query, "\n", "-" * 70, "\n")

biased_results = weat.run_query(
    ethnicity_query, model, normalize=True, preprocessors=[{}, {"lowercase": True}],
)
debiased_results = weat.run_query(
    ethnicity_query,
    ethnicity_debiased_model,
    normalize=True,
    preprocessors=[{}, {"lowercase": True}],
)

print("Debiased vs Biased (absolute values)")
print(
    round(abs(debiased_results_2["weat"]), 3),
    "<",
    round(abs(biased_results_2["weat"]), 3),
)


<Query: european_american_names and african_american_names wrt white_biased_words and black_biased_words
- Target sets: [['america', 'caucasian', 'europe', 'white'], ['africa', 'african', 'black']]
- Attribute sets:[['manager', 'executive', 'redneck', 'hillbilly', 'leader', 'farmer'], ['slave', 'musician', 'runner', 'criminal', 'homeless']]> 
 ---------------------------------------------------------------------- 

Debiased vs Biased (absolute values)
0.08 < 0.074
