# 42578 Project: Advanced Business Analytics
## Title: Bias-detection systems in job descriptions and exploratory analysis of underrepresented groups in the workforce
### Theme: AI for the betterment of society
#### Group members:

- Anna Matzen - s214978 
- Anne Moll-Elsborg - s214986
- Kalle Leander Johansen - s204099
- Paula Granlund - s215001

#### Date: May 2025

## Introduction
Extensive research demonstrates that organizations with diverse workforces consistently outperform their peers across virtually every metric. Diversity enriches all aspects of business, from front-line employees to executive leadership.

By welcoming a wide range of perspectives, working styles, and experiences, companies foster innovation, integration, and sustainable growth. Organizations that proactively cultivate and attract diverse talent position themselves for greater long-term success.

A balanced workforce begins with job advertisements that are free from unconscious bias. Employers who prioritize inclusive language demonstrate forward-thinking values and attract a broader, more diverse pool of qualified candidates.

Unconscious biases within job descriptions can inadvertently keep highly qualified candidates from applying. These biases commonly appear in both the language and format of job postings. By carefully reviewing and revising job advertisements, organizations can eliminate these barriers and ensure they appeal to all suitable applicants.

**Note:** *Unconscious bias* refers to implicit attitudes, stereotypes, or assumptions about certain groups of people that individuals hold without conscious awareness. These biases are shaped by personal experiences, cultural influences, and societal norms, and they can unintentionally influence decisions, behaviors, and interactions, often leading to unfair outcomes or discrimination even when people consciously believe in equality and fairness.

## Motivation and related studies
The study "Evidence That Gendered Wording in Job Advertisements Exists and Sustains Gender Inequality" by Danielle Gaucher and Justin Friesen, University of Waterloo, showed job adverts which included different kinds of gender-coded language to men and women and recorded how appealing the jobs seemed and how much the participants felt that they 'belonged' in that occupation. The research states that despite widespread egalitarian ideals, women remain underrepresented in male-dominated fields such as engineering, business leadership, and the natural sciences. 

The paper identifies job advertisements as an institutional-level factor that perpetuates gender inequality through subtle gendered wording, reinforcing gender stereotypes and discouraging women from applying. Two additional findings are worth to highlight:

- Empirical evidence demonstrates that job ads for male-dominated occupations systematically contain more masculine-themed words (e.g., “leader,” “competitive,” “dominant”) than advertisements in female-dominated areas. 
- No corresponding increase in feminine wording (“support,” “understand,” “interpersonal”) is observed in advertisements for female-dominated occupations, suggesting an asymmetry aligned with social dominance theory rather than social role theory.

From a practical perspective, the results suggest the importance of consciously revising job advertisements to remove masculine biases, thus promoting gender diversity and inclusion in workplaces.

**By now, it should be clear why recognizing and addressing bias in hiring practices is essential - and why eliminating unconscious bias must begin from the earliest stage: the job description itself.**

## Exploratory analysis
In the previous section, it was mentioned that “women remain underrepresented in male-dominated fields.” This statement is originally based on a U.S. analysis from 2011, with many recent studies continuing to support this observation. Given the critical importance of this assumption for the current study, this section will further examine it by analyzing and comparing gender distributions across education and employment in OECD countries, using data from 2021.

In [None]:
# THIS IS WHERE TO BEGIN CODING
## The purpose is to:
## 1) Detect if there are sectors where the dominant gender in the education differ from the dominant gender in the workforce (employment)
## 2) Check the above assumption about underrepresentation

In [None]:
### FEMALE SHARE IN INDUSTRIES

import pandas as pd

# Replace this with the actual name of your Excel file
path = "2025411141143535010615LIGEAI3.xlsx" 

# Read the Excel file from Desktop
df = pd.read_excel(path)

# Mapping categories
target_groups = {
    "Landbrug, skovbrug og fisker": [
        "Growing", "Raising", "Plant propagation", "Hunting", "Silviculture", "Logging",
        "Gathering", "Support services to forestry", "Marine fishing", "Freshwater fishing",
        "Marine aquaculture", "Freshwater aquaculture", "farming", "textile", "Fish"
    ],
    "Industri, råstofindvinding og forsyningsvirksomhed": [
        "Mining", "Extraction", "Quarrying", "Manufacture", "Processing", "Production",
        "Refining", "Casting", "Forging", "Repair of fabricated metal products",
        "Electricity", "Gas", "Water", "Sewerage", "Waste", "Remediation", "textile", "textiles", "leather", "wood", "Printing", "stone", "cold",
        "metals", "Machining", "Building", "manufacturing", "Repair", "drilling", "wall", "Bricklayers", "Tyre",  
    ],
    "Bygge og anlæg": [
        "Construction", "Demolition", "Site preparation", "Electrical installation",
        "Plumbing", "Joinery", "Roofing", "Painting", "Glazing", "Building completion"
    ],
    "Handel og transport mv": [
        "Wholesale", "Retail", "Sale", "Trade", "Repair of motor vehicles", "Transport", 
        "Storage", "Warehousing", "Cargo handling", "Postal activities", "Courier", "Supermarkets", "stores", "Taxi", "taxi", 
        "car", "roads", "harbours", "affairs"
    ],
    "Information og kommunikation": [
        "Publishing", "Motion picture", "Broadcasting", "Telecommunications",
        "Computer programming", "IT", "Web portals", "Information service", "media", "Media", "public"
    ],
    "Finansiering og forsikring": [
        "Banking", "Monetary", "Financial", "Insurance", "Pension", "Fund management",
        "Credit", "Securities", "Investment", "Trusts", "Money", "Risk", "analysis"
    ],
    "Ejendomshandel og udlejning": [
        "Real estate", "Housing", "Renting", "Leasing", "Accommodation"
    ],
    "Erhvervsservice": [
        "Legal", "Accounting", "Consultancy", "Engineering", "Scientific", "Advertising",
        "Design", "Translation", "Veterinary", "Employment", "Security", "Cleaning", 
        "Landscape", "Office administrative", "Call centres", "Business support"
    ],
    "Offentlig administration, undervisning og sundhed": [
        "Public administration", "Education", "Hospitals", "Medical", "Dental",
        "Health care", "Nursing", "Residential", "Social work", "school", "schools", "care", "Day-care", "day-care",
        "Kindergartens", 
    ],
    "Kultur, fritid og anden service": [
        "Theatres", "Artists", "Museums", "Libraries", "Sports", "Recreation",
        "Amusement", "Membership organizations", "Hairdresssing", "Beauty treatment",
        "Funeral", "Laundries", "Repair of personal", "Well-being", "Dismantling", "Hotels", "Holiday", "Restaurants", 
        "takeaways", "food", "bars"
    ],
    "Uoplyst aktivitet": [
        "Activity not stated"
    ]
}

# Classify each industry into a group
def classify_industry(industry):
    for group, keywords in target_groups.items():
        for keyword in keywords:
            if keyword.lower() in str(industry).lower():
                return group
    return "Uoplyst aktivitet"

# Apply classification
df["Grouped_Industry"] = df["Industry"].apply(classify_industry)

# Group by 'Grouped_Industry' and compute the mean of 'Women (per cent)'
mean_women = df.groupby("Grouped_Industry")["Women (per cent)"].mean().reset_index()

# Optional: sort the results
mean_women = mean_women.sort_values(by="Women (per cent)", ascending=False)

## Bias-detection

In the context of job advertisements, unconscious bias often manifests subtly through the language and structure used when describing roles. For example, job titles such as chairman, fireman, or councilman implicitly suggest a preference for male applicants, potentially deterring other qualified candidates. Similarly, the choice of pronouns can introduce bias; using gendered pronouns like he or she rather than gender-neutral alternatives like they or directly addressing the candidate as you can unintentionally reinforce gender stereotypes.

Biased language also appears when describing the ideal candidate. Terms like assertive or competitive tend to align with stereotypically masculine traits, while adjectives such as bubbly or nurturing typically associate with femininity. Additionally, overly detailed job requirements may disproportionately discourage women, who statistically are more likely to apply only if they meet all stated criteria, thereby limiting the applicant pool.

**Recognizing these issues, this study specifically focuses on addressing unconscious bias in hiring by examining the effects of biased language in job descriptions.**

### Topic-modelling

In [1]:
# THIS IS WHERE TO BEGIN CODING

Based on similar studies, non gender-neutral words have been collected. Some words have been reduced to a 'stem' to cover a range of noun, verb and adjective variants; for instance "compet" covers "compete", "competetive" and "competition".

**Feminine-coded words**

| agree- | affectionate- | child-       | cheer-      | collab-     | commit-     |
|--------|----------------|--------------|-------------|-------------|-------------|
| communal- | compassion- | connect-     | considerate-| cooperat-   | co-operat-  |
| depend-   | emotiona-   | empath-      | feel-       | flatterable-| gentle-     |
| honest-   | interpersonal- | interdependen- | interpersona- | inter-personal- | inter-dependen- |
| inter-persona- | kind- | kinship-     | loyal-      | modesty-    | nag-        |
| nurtur-   | pleasant-   | polite-      | quiet-      | respon-     | sensitiv-   |
| submissive- | support- | sympath-     | tender-     | together-   | trust-      |
| understand- | warm-     | whin-        | enthusias-  | inclusive-  | yield-      |
| share-     | sharin-    |              |             |             |             |

**Masculine coded words**

| active- | adventurous- | aggress-     | ambitio-    | analy-      | assert-     |
|---------|---------------|--------------|-------------|-------------|-------------|
| athlet- | autonom-     | battle-      | boast-      | challeng-   | champion-   |
| compet- | confident-   | courag-      | decid-      | decision-   | decisive-   |
| defend- | determin-    | domina-      | dominant-   | driven-     | fearless-   |
| fight-  | force-       | greedy-      | head-strong-| headstrong- | hierarch-   |
| hostil- | impulsive-   | independen-  | individual- | intellect-  | lead-       |
| logic-  | objective-   | opinion-     | outspoken-  | persist-    | principle-  |
| reckless- | self-confiden- | self-relian- | self-sufficien- | selfconfiden- | selfrelian- |
| selfsufficien- | stubborn- | superior- | unreasonab- |             |             |

### Bias-detecion system

In [2]:
# THIS IS WHERE TO BEGIN CODING

## Further work
THIS IS WHERE TO ADD CONCLUSIONS, FURTHER WORK, OR RECOMMENDATIONS