# Dataset Preparation for Gender Bias Detection in Job Descriptions

**Project:** HEARTS Adaptation - Gender Bias Detection  
**SDG Alignment:** SDG 5 (Gender Equality) & SDG 8 (Decent Work and Economic Growth)  
**Models:** ALBERT-V2, DistilBERT, BERT  
**Task:** Binary classification (Biased vs. Non-Biased job descriptions)

This notebook handles:
1. Dataset downloading
2. Dataset labeling
3. Model training with validation (80/20 train/val split)
4. Saving trained models to `models/job_descriptions/`


## Dataset Introduction

### Original Dataset

**Dataset Name:** U.S. Technology Jobs on Dice.com

**Source:** [Kaggle Dataset](https://www.kaggle.com/datasets/PromptCloudHQ/us-technology-jobs-on-dicecom/data)

**Sample Size:** 21,978 job descriptions

**SDG Alignment:** This research aligns with **SDG 5 (Gender Equality)** and **SDG 8 (Decent Work and Economic Growth)**, contributing to the  Sustainable Development Goals through the detection and mitigation of gender bias in employment opportunities.

### Dataset Appropriateness and Ethical Sourcing

**High Appropriateness for Research Objective:**
This dataset is **appropriate** for gender bias detection research in job descriptions for several reasons:

1. **Real-World Relevance**: The dataset contains actual job postings from Dice.com, a major technology job board, providing **authentic examples** of how companies communicate job opportunities in the technology sector‚Äîan industry with well-documented gender disparities.

2. **Representative Sample**: With 21,978 job descriptions, the dataset provides a **substantial and representative sample** of technology job postings, enabling robust statistical analysis and model training.

3. **Public Domain Data**: The job descriptions are **publicly available** information posted on a public job board, making them appropriate for research use without privacy concerns.

4. **Direct Application**: Technology sector job descriptions are particularly relevant for gender bias research, as this field has historically shown significant gender imbalances, making the detection and mitigation of biased language a critical concern.

**Ethical Sourcing and Data Handling:**

1. **Publicly Available Data**: All job descriptions were obtained from publicly accessible job postings on Dice.com, ensuring **ethical data collection** practices.

2. **No Personal Information**: The dataset focuses exclusively on **job description text** and does not contain personal information about applicants, employees, or individuals, ensuring **privacy protection**.

3. **Anonymized Company Information**: While company names are included, the analysis focuses on language patterns rather than identifying specific companies, maintaining **ethical research practices**.

4. **Research Purpose - SDG Alignment**: The data is used solely for **academic research** aimed at identifying and understanding gender bias in job descriptions, with the goal of promoting more inclusive hiring practices. This research directly contributes to:
   - **SDG 5 (Gender Equality)**: By identifying and addressing gender-biased language in job descriptions, we contribute to eliminating discrimination and promoting equal opportunities for all genders in the workplace.
   - **SDG 8 (Decent Work and Economic Growth)**: By promoting fair and inclusive hiring practices, we support the creation of equitable employment opportunities that contribute to sustainable economic growth and decent work for all.

5. **Transparent Methodology**: The labeling methodology is based on **peer-reviewed research** (Gaucher et al., 2011), ensuring scientific rigor and reproducibility.

6. **No Harmful Applications**: The research is designed to **detect and mitigate bias**, not to perpetuate discrimination, ensuring ethical use of the data.

### Dataset Structure

The original dataset contains **12 columns**:
- `index`
- `advertiserurl`
- `company`
- `employmenttype_jobstatus`
- `jobdescription` ‚≠ê **(Only this column is used for our analysis)**
- `jobid`
- `joblocation_address`
- `jobtitle`
- `postdate`
- `shift`
- `site_name`
- `skills`
- `uniq_id`

**Note:** For this gender bias detection project, we only utilize the `jobdescription` column. All other columns are ignored during processing to focus exclusively on the language content of job descriptions.

### Dataset Download

‚ö†Ô∏è **Important:** The dataset is downloaded **manually** from Kaggle, not programmatically. 

To replicate this project:
1. Visit the [Kaggle dataset page](https://www.kaggle.com/datasets/PromptCloudHQ/us-technology-jobs-on-dicecom/data)
2. Download the dataset file (`dice_com-job_us_sample.csv`)
3. Place it in the `data/raw/` directory
4. Ensure the file is named `dice_com-job_us_sample.csv`

### Gender Bias Labeling Method

We label job descriptions as **biased (1)** or **non-biased (0)** based on the methodology from **Gaucher et al. (2011)**, a peer-reviewed study published in the *Journal of Personality and Social Psychology*:

**Reference:** Gaucher, D., Friesen, J., & Kay, A. C. (2011). Evidence that gendered wording in job advertisements exists and sustains gender inequality. *Journal of Personality and Social Psychology*, 101(1), 109‚Äì128. https://doi.org/10.1037/a0022530

**Scientific Foundation:**
The word lists used in this methodology were **empirically validated** through controlled experiments, demonstrating that gendered wording in job advertisements affects applicant pool composition and perpetuates gender inequality. This provides a **scientifically grounded** approach to identifying potentially biased language.

**SDG Contribution:**
By systematically identifying gender-biased language in job descriptions, this methodology supports:
- **SDG 5 (Gender Equality)**: Enabling organizations to recognize and eliminate gender-biased language that may discourage qualified candidates from applying, thereby promoting equal access to employment opportunities.
- **SDG 8 (Decent Work and Economic Growth)**: Facilitating fair hiring practices that ensure all qualified individuals have equal opportunities to access decent work, contributing to inclusive economic growth.

**Labeling Rules:**
1. **Word Lists:** We use predefined lists of **25 masculine-coded** and **25 feminine-coded word stems** derived from established psychological research on gender stereotypes and language (e.g., "aggress", "competit", "decis" for masculine; "collabor", "support", "empath" for feminine).

2. **Counting:** For each job description, we count occurrences of masculine and feminine word stems using stem matching, which captures word variations (e.g., "aggressive", "aggressively", "aggression" all match the stem "aggress").

3. **Labeling Criteria:**
   - **BIASED (1):** The job description contains:
     - At least 2 gendered words (masculine + feminine combined), AND
     - A gender imbalance ratio of ‚â•2:1 (e.g., 4 masculine words vs. 1 feminine word, or vice versa)
   - **NON-BIASED (0):** Otherwise (fewer than 2 gendered words, or balanced gender representation)

4. **Text Preprocessing:** Before labeling, we clean the text by:
   - Removing HTML tags
   - Removing URLs
   - Normalizing whitespace
   - Converting to lowercase for word matching

**Validation:**
The reliability of this automated labeling methodology has been validated through manual review of 200 samples, achieving an **85.5% agreement rate** between automated and manual labels, demonstrating the methodology's reliability and scientific validity.

This automated labeling approach allows us to systematically identify potentially gender-biased language in job descriptions at scale, providing a **reproducible, objective, and scientifically grounded** method for large-scale gender bias analysis that directly supports progress toward **SDG 5 (Gender Equality)** and **SDG 8 (Decent Work and Economic Growth)**.

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import os
import re
from pathlib import Path
from sklearn.model_selection import train_test_split

# Set up paths
current_dir = Path.cwd()
if current_dir.name == 'notebooks':
    project_root = current_dir.parent
else:
    project_root = current_dir

data_dir = project_root / 'data'
raw_data_dir = data_dir / 'raw'
processed_data_dir = data_dir / 'processed'

# Create directories
os.makedirs(raw_data_dir, exist_ok=True)
os.makedirs(processed_data_dir, exist_ok=True)

print(f"Project root: {project_root}")
print(f"Raw data directory: {raw_data_dir}")
print(f"Processed data directory: {processed_data_dir}")


Project root: D:\Coursework\Project Replication\HEARTS-Gender-Bias-Job-Descriptions
Raw data directory: D:\Coursework\Project Replication\HEARTS-Gender-Bias-Job-Descriptions\data\raw
Processed data directory: D:\Coursework\Project Replication\HEARTS-Gender-Bias-Job-Descriptions\data\processed


## Define the word list according to Gaucher et al. (2011)
Gaucher, D., Friesen, J., & Kay, A. C. (2011). Evidence that gendered wording in job advertisements exists and sustains gender inequality. Journal of Personality and Social Psychology, 101(1), 109‚Äì128. https://doi.org/10.1037/a0022530

In [4]:
# Using stems for matching (e.g., "compet" matches "competitive", "competition")
MASCULINE_WORDS = [
    'active', 'adventurous', 'aggress', 'ambitio', 'analy', 'assert', 'athlet', 'autonom',
    'battle', 'boast', 'challeng', 'champion', 'compet', 'confident', 'courag', 'decid',
    'decision', 'decisive', 'defend', 'determin', 'domina', 'dominant', 'driven', 'fearless',
    'fight', 'force', 'greedy', 'headstrong', 'hierarch', 'hostil', 'impulsive', 'independen',
    'individual', 'intellect', 'lead', 'logic', 'objective', 'opinion', 'outspoken', 'persist',
    'principle', 'reckless', 'self-confiden', 'self-relian', 'self-sufficien', 'stubborn',
    'superior', 'unreasonab'
]

# Gaucher et al. (2011) Feminine-Coded Words (50 words)
FEMININE_WORDS = [
    'agree', 'affectionate', 'child', 'cheer', 'collab', 'commit', 'communal', 'compassion',
    'connect', 'considerate', 'cooperat', 'co-operat', 'depend', 'emotiona', 'empath', 'feel',
    'flatterable', 'gentle', 'honest', 'interpersonal', 'interdependen', 'interpersona',
    'inter-personal', 'inter-dependen', 'kind', 'kinship', 'loyal', 'modesty', 'nag', 'nurtur',
    'pleasant', 'polite', 'quiet', 'respon', 'sensitiv', 'submissive', 'support', 'sympath',
    'tender', 'together', 'trust', 'understand', 'warm', 'whin', 'enthusias', 'inclusive',
    'yield', 'share', 'sharin'
]

print(f"Masculine-coded words: {len(MASCULINE_WORDS)}")
print(f"Feminine-coded words: {len(FEMININE_WORDS)}")
print(f"\nSample masculine words: {MASCULINE_WORDS[:5]}")
print(f"Sample feminine words: {FEMININE_WORDS[:5]}")


Masculine-coded words: 48
Feminine-coded words: 49

Sample masculine words: ['active', 'adventurous', 'aggress', 'ambitio', 'analy']
Sample feminine words: ['agree', 'affectionate', 'child', 'cheer', 'collab']


## Text Cleaning Function

Clean job description text by removing HTML tags, URLs, and extra whitespace.


In [5]:
def clean_text(text):
    """
    Clean job description text
    - Remove HTML tags
    - Remove URLs
    - Remove extra whitespace
    - Convert to lowercase for word matching
    """
    if pd.isna(text) or text == '':
        return ''
    
    text = str(text)
    
    # Remove HTML tags
    text = re.sub(r'<[^>]+>', '', text)
    
    # Remove URLs
    text = re.sub(r'http\S+|www\.\S+', '', text)
    
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text)
    text = text.strip()
    
    return text


## Gender Bias Labeling Function

**Labeling Rules (Gaucher et al. 2011 methodology):**
- Count masculine and feminine word stems in text
- **BIASED (1)**: Contains ‚â•2 gendered words AND ‚â•2:1 imbalance ratio
- **NON-BIASED (0)**: Otherwise


In [6]:
def count_gendered_words(text, masculine_words, feminine_words):
    """
    Count masculine and feminine word stems in text using stem matching
    
    Returns:
        masc_count: Number of masculine word matches
        fem_count: Number of feminine word matches
    """
    text_lower = text.lower()
    masc_count = 0
    fem_count = 0
    
    # Count masculine words (stem matching)
    for word_stem in masculine_words:
        # Match word stem as whole word or part of word
        pattern = r'\b' + re.escape(word_stem) + r'\w*'
        matches = len(re.findall(pattern, text_lower))
        masc_count += matches
    
    # Count feminine words (stem matching)
    for word_stem in feminine_words:
        pattern = r'\b' + re.escape(word_stem) + r'\w*'
        matches = len(re.findall(pattern, text_lower))
        fem_count += matches
    
    return masc_count, fem_count


def label_gender_bias(text, masculine_words, feminine_words, threshold=2, ratio_threshold=2.0):
    """
    Label job description as biased (1) or non-biased (0)
    
    Parameters:
    -----------
    text : str
        Job description text
    masculine_words : list
        List of masculine word stems
    feminine_words : list
        List of feminine word stems
    threshold : int
        Minimum number of gendered words required (default: 2)
    ratio_threshold : float
        Minimum imbalance ratio required (default: 2.0, meaning 2:1)
    
    Returns:
    --------
    label : int
        1 if biased, 0 if non-biased
    masc_count : int
        Number of masculine words found
    fem_count : int
        Number of feminine words found
    """
    # Clean text
    cleaned_text = clean_text(text)
    
    if len(cleaned_text) < 50:  # Filter very short texts
        return 0, 0, 0
    
    # Count gendered words
    masc_count, fem_count = count_gendered_words(cleaned_text, masculine_words, feminine_words)
    total_gendered = masc_count + fem_count
    
    # Check threshold: need at least 2 gendered words
    if total_gendered < threshold:
        return 0, masc_count, fem_count
    
    # Check imbalance ratio: need ‚â•2:1 ratio
    if masc_count > 0 and fem_count > 0:
        ratio = max(masc_count / fem_count, fem_count / masc_count)
        if ratio >= ratio_threshold:
            return 1, masc_count, fem_count
        else:
            return 0, masc_count, fem_count
    elif masc_count >= threshold:
        # Only masculine words, and meets threshold
        return 1, masc_count, fem_count
    elif fem_count >= threshold:
        # Only feminine words, and meets threshold
        return 1, masc_count, fem_count
    else:
        return 0, masc_count, fem_count


# Test the labeling function
test_texts = [
    "We are seeking an aggressive, competitive leader who is decisive and driven.",
    "Looking for a collaborative team player who is supportive and empathetic.",
    "Need someone who is analytical and strategic, but also collaborative.",
    "Seeking a confident, assertive individual with strong leadership skills."
]

print("Testing labeling function:")
print("=" * 70)
for text in test_texts:
    label, masc, fem = label_gender_bias(text, MASCULINE_WORDS, FEMININE_WORDS)
    print(f"\nText: {text[:60]}...")
    print(f"  Masculine words: {masc}, Feminine words: {fem}")
    print(f"  Label: {label} ({'BIASED' if label == 1 else 'NON-BIASED'})")


Testing labeling function:

Text: We are seeking an aggressive, competitive leader who is deci...
  Masculine words: 5, Feminine words: 0
  Label: 1 (BIASED)

Text: Looking for a collaborative team player who is supportive an...
  Masculine words: 0, Feminine words: 3
  Label: 1 (BIASED)

Text: Need someone who is analytical and strategic, but also colla...
  Masculine words: 1, Feminine words: 1
  Label: 0 (NON-BIASED)

Text: Seeking a confident, assertive individual with strong leader...
  Masculine words: 4, Feminine words: 0
  Label: 1 (BIASED)


## Dataset Loading

**Instructions:**
1. Download dataset from Kaggle (search "dice.com job descriptions" or "tech job postings")
2. Place CSV file in `data/raw/` directory
3. Update the filename below to match your downloaded file

**Dataset Structure:**
- The dataset has **multiple columns** (e.g., job_title, company, location, etc.)
- The code will automatically detect and use the **jobdescription** column
- Only the job description text will be used for gender bias labeling
- Other columns will be ignored during processing

In [7]:
# Dataset filename (update this to match your downloaded file)
dataset_filename = 'job_descriptions.csv'  # Change this to your actual filename

# Alternative: Try to find any CSV file in raw_data_dir
csv_files = list(raw_data_dir.glob('*.csv'))
if csv_files:
    dataset_filename = csv_files[0].name
    print(f"Found CSV file: {dataset_filename}")
else:
    print(f"    No CSV files found in {raw_data_dir}")
    print(f"   Please download a dataset and place it in: {raw_data_dir}")
    print(f"   Recommended: Dice.com tech jobs dataset from Kaggle")

dataset_path = raw_data_dir / dataset_filename

if dataset_path.exists():
    print(f"\n   Loading dataset from: {dataset_path}")
    
    # Try to load the dataset
    try:
        df_raw = pd.read_csv(dataset_path)
        print(f"   Dataset loaded successfully!")
        print(f"   Shape: {df_raw.shape}")
        print(f"   Columns: {df_raw.columns.tolist()}")
        print(f"\n   First few rows:")
        print(df_raw.head(3))
        
        # Identify text column - prioritize "jobdescription" and variations
        # Check for exact matches first (case-insensitive)
        text_col = None
        possible_text_cols = [
            'jobdescription',  # Most common format
            'job_description', 
            'jobDescription',
            'JobDescription',
            'description', 
            'job_desc',
            'jobdesc',
            'text', 
            'content', 
            'summary'
        ]
        
        # First, try exact matches (case-insensitive)
        df_columns_lower = [col.lower() for col in df_raw.columns]
        for preferred_col in possible_text_cols:
            if preferred_col.lower() in df_columns_lower:
                # Find the actual column name (preserving case)
                text_col = df_raw.columns[df_columns_lower.index(preferred_col.lower())]
                break
        
        # If not found, try partial matches
        if text_col is None:
            for col in df_raw.columns:
                col_lower = col.lower()
                if 'job' in col_lower and ('desc' in col_lower or 'description' in col_lower):
                    text_col = col
                    break
        
        # If still not found, use first text-like column
        if text_col is None:
            for col in df_raw.columns:
                if df_raw[col].dtype == 'object':
                    text_col = col
                    break
        
        if text_col:
            print(f"\n   Using text column: '{text_col}'")
            print(f"   Column will be used for gender bias labeling")
        else:
            print(f"\n    Could not identify text column. Please specify manually.")
            print(f"   Available columns: {df_raw.columns.tolist()}")
            
    except Exception as e:
        print(f"‚ùå Error loading dataset: {e}")
        df_raw = None
        text_col = None
else:
    print(f"‚ùå Dataset not found at: {dataset_path}")
    print(f"\n   To download a dataset:")
    print(f"   1. Go to kaggle.com/datasets")
    print(f"   2. Search for 'dice.com job descriptions' or 'tech job postings'")
    print(f"   3. Download and extract to: {raw_data_dir}")
    df_raw = None
    text_col = None


Found CSV file: dice_com-job_us_sample.csv

‚úÖ Loading dataset from: D:\Coursework\Project Replication\HEARTS-Gender-Bias-Job-Descriptions\data\raw\dice_com-job_us_sample.csv
‚úÖ Dataset loaded successfully!
   Shape: (22000, 13)
   Columns: ['index', 'advertiserurl', 'company', 'employmenttype_jobstatus', 'jobdescription', 'jobid', 'joblocation_address', 'jobtitle', 'postdate', 'shift', 'site_name', 'skills', 'uniq_id']

   First few rows:
   index                                      advertiserurl  \
0      0  https://www.dice.com/jobs/detail/AUTOMATION-TE...   
1      1  https://www.dice.com/jobs/detail/Information-S...   
2      2  https://www.dice.com/jobs/detail/Business-Solu...   

                             company  \
0  Digital Intelligence Systems, LLC   
1  University of Chicago/IT Services   
2               Galaxy Systems, Inc.   

                            employmenttype_jobstatus  \
0  C2H Corp-To-Corp, C2H Independent, C2H W2, 3 M...   
1                           

## Process and Label Dataset

Apply gender bias labeling to all job descriptions and save the labeled dataset.


In [8]:
if 'df_raw' in locals() and 'text_col' in locals() and df_raw is not None and text_col is not None:
    print("=" * 70)
    print("PROCESSING AND LABELING DATASET")
    print("=" * 70)
    
    # Extract text column
    df_processed = pd.DataFrame()
    df_processed['text'] = df_raw[text_col].copy()
    
    # Remove rows with missing or very short text
    initial_count = len(df_processed)
    df_processed = df_processed[df_processed['text'].notna()].copy()
    df_processed = df_processed[df_processed['text'].astype(str).str.len() >= 50].copy()
    
    print(f"\nüìä Dataset Statistics:")
    print(f"   Initial samples: {initial_count:,}")
    print(f"   After filtering: {len(df_processed):,}")
    print(f"   Removed: {initial_count - len(df_processed):,} samples")
    
    # Apply labeling
    print(f"\nüè∑Ô∏è  Applying gender bias labels...")
    print(f"   Processing {len(df_processed):,} samples...")
    
    results = []
    for idx, row in df_processed.iterrows():
        text = row['text']
        label, masc_count, fem_count = label_gender_bias(
            text, 
            MASCULINE_WORDS, 
            FEMININE_WORDS,
            threshold=2,
            ratio_threshold=2.0
        )
        results.append({
            'text': text,
            'label': label,
            'masc_count': masc_count,
            'fem_count': fem_count
        })
        
        if (idx + 1) % 1000 == 0:
            print(f"   Processed {idx + 1:,} / {len(df_processed):,} samples...")
    
    df_labeled = pd.DataFrame(results)
    
    # Label distribution
    label_dist = df_labeled['label'].value_counts()
    print(f"\n‚úÖ Labeling complete!")
    print(f"\nüìä Label Distribution:")
    print(f"   Non-Biased (0): {label_dist.get(0, 0):,} samples ({label_dist.get(0, 0)/len(df_labeled)*100:.1f}%)")
    print(f"   Biased (1): {label_dist.get(1, 0):,} samples ({label_dist.get(1, 0)/len(df_labeled)*100:.1f}%)")
    
    # Word count statistics
    print(f"\nüìä Word Count Statistics:")
    print(f"   Mean masculine words: {df_labeled['masc_count'].mean():.2f}")
    print(f"   Mean feminine words: {df_labeled['fem_count'].mean():.2f}")
    print(f"   Max masculine words: {df_labeled['masc_count'].max()}")
    print(f"   Max feminine words: {df_labeled['fem_count'].max()}")
    
    # Save labeled dataset
    output_path = processed_data_dir / 'job_descriptions_labeled.csv'
    df_labeled[['text', 'label']].to_csv(output_path, index=False, encoding='utf-8')
    print(f"\nüíæ Saved labeled dataset to: {output_path}")
    print(f"   Total samples: {len(df_labeled):,}")
    print(f"   Format: text, label (0=Non-Biased, 1=Biased)")
    
    # Show examples
    print(f"\nüìù Example Biased Samples (label=1):")
    biased_samples = df_labeled[df_labeled['label'] == 1].head(3)
    for idx, row in biased_samples.iterrows():
        print(f"\n   Sample {idx}:")
        print(f"   Text: {row['text'][:100]}...")
        print(f"   Masc: {row['masc_count']}, Fem: {row['fem_count']}")
    
    print(f"\nüìù Example Non-Biased Samples (label=0):")
    nonbiased_samples = df_labeled[df_labeled['label'] == 0].head(3)
    for idx, row in nonbiased_samples.iterrows():
        print(f"\n   Sample {idx}:")
        print(f"   Text: {row['text'][:100]}...")
        print(f"   Masc: {row['masc_count']}, Fem: {row['fem_count']}")
    
else:
    print("\n‚ö†Ô∏è  Cannot process dataset. Please ensure dataset is loaded first (run Cell 9).")


PROCESSING AND LABELING DATASET

üìä Dataset Statistics:
   Initial samples: 22,000
   After filtering: 21,978
   Removed: 22 samples

üè∑Ô∏è  Applying gender bias labels...
   Processing 21,978 samples...
   Processed 1,000 / 21,978 samples...
   Processed 2,000 / 21,978 samples...
   Processed 3,000 / 21,978 samples...
   Processed 4,000 / 21,978 samples...
   Processed 5,000 / 21,978 samples...
   Processed 6,000 / 21,978 samples...
   Processed 7,000 / 21,978 samples...
   Processed 8,000 / 21,978 samples...
   Processed 9,000 / 21,978 samples...
   Processed 10,000 / 21,978 samples...
   Processed 11,000 / 21,978 samples...
   Processed 12,000 / 21,978 samples...
   Processed 13,000 / 21,978 samples...
   Processed 14,000 / 21,978 samples...
   Processed 15,000 / 21,978 samples...
   Processed 16,000 / 21,978 samples...
   Processed 17,000 / 21,978 samples...
   Processed 18,000 / 21,978 samples...
   Processed 19,000 / 21,978 samples...
   Processed 20,000 / 21,978 samples...
 

## Summary

**Next Steps:**
1. Run `01_Data_Loading_Preprocessing.ipynb` to load and validate the labeled dataset
2. Create manual validation sample (200 samples)
3. Complete train/val/test splits

**Files Created:**
- `data/processed/job_descriptions_labeled.csv` - Labeled dataset ready for preprocessing

**Note:** The processed dataset contains:
- `text`: Job description text (from jobdescription column)
- `label`: Gender bias label (0=Non-Biased, 1=Biased)
