<a href="https://colab.research.google.com/github/Maggiey01/Rights-Colab-YH/blob/main/4Channels_Pipeline_2022_DEI_DEF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**This notebook runs the Spring 2022 pipeline for identifying DEI-related practice and outcome terms and their co-occurrences in a corpus of texts. It is tailored for the Proxy statement dataset. It will output CSVs.**

*For your information:*
*Here are examples of:*
- *the [most recent full CSV](https://docs.google.com/spreadsheets/d/1xr0n7pO6l76EUKbABBI7ZiIp_mjf57reXp8TP-5bNK8/edit?usp=sharing) for TVL GIC Employee Engagement, Diversity, & Inclusion*
- *the [most recent validation sample CSV](https://docs.google.com/spreadsheets/d/1D7f7wb7VpKbkYspcpE-oDcm93XY59_g2HseQbcevR0M/edit?usp=sharing) for TVL GIC Employee Engagement, Diversity, & Inclusion*

Further explanation of our methodology using this pipeline is in [this document](https://docs.google.com/document/d/1GJseNlhVhQhbRLBv-iWtV3ycXFBAWUItFikzLxijok4/edit?usp=sharing).

## (RUN ENTIRE SECTION AS IS): Imports

In [None]:
import datetime
import numpy as np
import os
import pandas as pd
import re
import matplotlib.pyplot as plt
import seaborn as sns
import random

from tqdm import tqdm

import plotly.graph_objects as go
import plotly.figure_factory as ff
import plotly.express as px

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from nltk.tokenize import word_tokenize

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

In [None]:
import itertools
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


## (SLIGHT MODIFICATION): Mount your drive
- MODIFY-1: make sure you are in your working directory

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
'/gdrive/My Drive/DFG/DEF_para/'

'/gdrive/My Drive/DFG/DEF_para/'

In [None]:
cd 'content/drive/MyDrive/Rights Colab RK/TVL' #MODIFY-1: change the filepath to your working directory

[Errno 2] No such file or directory: 'content/drive/MyDrive/Rights Colab RK/TVL #MODIFY-1: change the filepath to your working directory'
/content


## (RUN ENTIRE SECTION AS IS): Define functions to capture terms

In [None]:
# Functions to create patterns of practice/outcome terms

def create_pattern_2(buck1, buck2, rangelist):
  '''
  Creates a pattern that checks if at least one search word in each of the 2 
  buckets of search words are 1) within a certain number of words of each other 
  and 2) as a phrase, within a certain number of words of a DEI context word 
  in a text (context window).

  Parameters
  ----------
  buck1 : list of strings; len(buck1)>=1
      The strings are search words in regex format
  buck2 : list of strings; len(buck2)>=1
      The strings are search words in regex format
  rangelist : list of 2 ints of form [a,b]
      a = context window between buck1 and buck2 search words
      b = context window between buck1-buck2 phrase and DEI context word

  Returns
  -------
  [[buck1, buck2], rangelist] : formatted list of the parameters
  '''
  return [[buck1, buck2], rangelist]

def create_pattern_3(buck1, buck2, buck3, rangelist):
  '''
  Creates a pattern that checks if at least one search word in each of the 3 
  buckets of search words are 1) within a certain number of words of each other 
  and 2) as a phrase, within a certain number of words of a DEI context word 
  in a text (context window).

  Parameters
  ----------
  buck1 : list of strings; len(buck1)>=1
      The strings are search words in regex format
  buck2 : list of strings; len(buck2)>=1
      The strings are search words in regex format
  buck3 : list of strings; len(buck3)>=1
      The strings are search words in regex format
  rangelist : list of 4 ints of form [a_1_2,a_2_3,a_1_3,b]
      a_1_2 = context window between buck1 and buck2 search words
      a_2_3 = context window between buck2 and buck3 search words
      a_1_3 = context window between buck1 and buck3 search words
      b = context window between buck1-buck2-buck3 phrase and DEI context word

  Returns
  -------
  [[buck1, buck2, buck3], rangelist] : formatted list of the parameters
  '''
  return [[buck1, buck2, buck3], rangelist]

In [None]:
# Functions to find practice/outcome terms in a text

def word_indices(input_str, search_word_lst, DEI_contx_list):
  '''
  Finds the all indices (by word) of every occurence of every search word for 
  practice/outcome terms and DEI context terms. 

  Parameters
  ----------
  input_str : string
      This is the input text you are searching through.
  search_word_lst : list of strings
      The strings are the practice/outcome term pattern search words in regex format
  DEI_contx_list : list of strings
      The strings are the DEI context search words in regex format

  Returns
  -------
  word_ind_dict : dict of indices where each search word was found
      key: string of search word in regex format
      value: list of indices where that search word was found in the text

      Example: dict with 4 search words
      word_ind_dict = {'apprentice': [54],
                       'female': [0, 17, 21],
                       'program': [27],
                       'wom(e|a)n': [4]}
  '''
  # Build dict with all words as keys w/ empty lists as values
  total_list = search_word_lst + DEI_contx_list
  li = [(i,[]) for i in total_list] 
  word_ind_dict = {}
  for j in li:
    word_ind_dict[j[0]] = j[1]

  # Fill empty lists with index matches
  for w in total_list:
    for match in re.finditer(w, input_str):
      before_str = word_tokenize(input_str[:match.start()])
      word_ind_dict[w].append(len(before_str))
  
  return word_ind_dict

def check_cooccur(word_ind_dict, terms_dict, DEI_contx_list):
  '''
  Finds every instance of all practice/outcome terms (as pattern + DEI context 
  word) in a text. Uses a dict of indices where each search word is found in the 
  text, the output of function word_indices().

  Parameters
  ----------
  word_ind_dict : dict of indices where each search word is found in a text
      Output of function word_indices()
      key: string of search word in regex format
      value: list of indices where that search word was found in the text
  terms_dict : dict of practice/outcome term patterns
      key: name of practice/outcome term
      value: list of patterns for that term; [p0,p1...]
  DEI_contx_list : list of strings
      The strings are the DEI context search words in regex format

  Returns
  -------
  flagged_terms : dict of every instance of all practice/outcome terms
      key: name of practice/outcome term
      value: list of instances with each instance being a list of search words 
      in regex format and the context window used to flag them

      Example: dict with 1 practice term found in 4 instances
      {'program-retain': [[('program', 'retain', 'talent', 'wom(e|a)n'), [10, 10, 10, 30]],
                          [('program', 'retain', 'talent', 'wom(e|a)n'), [10, 10, 10, 30]],
                          [('program', 'retain', 'talent', 'female'), [10, 10, 10, 30]],
                          [('program', 'apprenticeship', 'gender'), [4, 50]]]}
  '''  
  flagged_terms = {}

  # Iterate through all of the terms and their patterns
  for key, value in terms_dict.items():
    is_match = 0
    match_pattern = []
    
    # Iterate through each pattern of a term
    for pattern in value: # pattern = [[[buck1], [buck2]], [a,b]]
      combo_buck = pattern[0] + [DEI_contx_list] # combo_buck = [[buck1], [buck2], [DEI_contx_list]]
      combos = list(itertools.product(*combo_buck)) # List of combos of 1-ea word + 1 DEI context term, per pattern
      
      # Iterate through each possible search word combo for a pattern
      for c in combos: # c = ('lawsuit', 'discriminat', 'wom(e|a)n')
        
        # Collect indices for each search word
        ind_list = [] # List of indices-lists for each word in combo
        for w in c: # w = 'lawsuit'
          if w in word_ind_dict:
            ind_list.append(word_ind_dict[w]) # [[2,56],[],[45,23,12]]

        # Check if ind_list has enough lists (every word in combo is found in text)
        if len(ind_list) == len(c):
          # Check if indices are in range
          combos_inds = list(itertools.product(*ind_list)) # All possible combos of indices: combos_inds = [(24, 53, 20), (24, 53, 28)]

          # Iterate through each combo of indices
          for c_i in combos_inds: 
            
            # if is a 2-bucket pattern, c_i = (24, 53, 20)
            if len(pattern[1]) == 2: # pattern[1] = [a,b] -> rangelist
              subrange = [c_i[0],c_i[1]]
              subrange.sort()
              # if DEI context word is within term phrase buffer zone (24-(b+1) <= 20 <= 50+(b+1))
              # AND search words are within term pattern context window (|24-53|-1 <= a)
              if (subrange[0]-(pattern[1][1]+1) <= c_i[2] <= subrange[1]+(pattern[1][1]+1)) and (abs(c_i[0] - c_i[1])-1 <= pattern[1][0]):
                is_match = 1
                match_pattern.append([c,pattern[1]])
            
            # if is a 3-bucket pattern, c_i = (50, 34, 60, 61)
            elif len(pattern[1]) == 4: # pattern[1] = [a_1_2,a_2_3,a_1_3,b] -> rangelist
              subrange = [c_i[0],c_i[1],c_i[2]]
              subrange.sort()
              # if DEI context word is within term phrase buffer zone (34-(b+1) <= 61 <= 60+(b+1))
              # AND search words are within term pattern context windows (|50-34|-1 <= a_1_2 AND |34-60|-1 <= a_2_3 AND |50-60|-1 <= a_1_3)
              if (subrange[0]-(pattern[1][3]+1) <= c_i[3] <= subrange[2]+(pattern[1][3]+1)) and (abs(c_i[0] - c_i[1])-1 <= pattern[1][0]) and (abs(c_i[1] - c_i[2])-1 <= pattern[1][1]) and (abs(c_i[0] - c_i[2])-1 <= pattern[1][2]):
                is_match = 1
                match_pattern.append([c,pattern[1]])
            
            else:
              print("wrong range length")
              is_match = -1
    
    # If there is at least one instance found for the term, insert the match pattern into flagged_terms dict
    if is_match == 1:
      flagged_terms[key] = match_pattern
    
  return flagged_terms        

## (UPDATE REGULARLY) Term dictionaries and lists
Refer to the [original notebook](https://colab.research.google.com/drive/1TMTydjBmS3cxpAbHm1H3rL_I1keMKbBW?usp=sharing.) for the most up-to-date term dictionaries and replace your `CELL_1` and `CELL_3` with their up-to-date equivalents. Then, you can run this entire section as is.

*For your information:*
- *practice terms are a thing you do: managing, hiring, training, setting up programs (gerunds)*
- *outcome terms: everything else -> that results from these actions*
- [Spreadsheet](https://docs.google.com/spreadsheets/d/1kvx0vdwRB8C9WJ3vALMviU7i4UqXBugtYJ3FsxQROQE/edit?usp=sharing) of terms and relevant info

In [None]:
# CELL_1: Create the dictionaries for the PRACTICE and OUTCOME terms

terms_dict = {}
terms_category = {}

################################################################################
################################################################################
#                        Talent-Attraction-Retention                           #
################################################################################
################################################################################

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                   PRACTICE - Talent-Attraction-Retention                     #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#

# Practice: 'compensation-equal'
p0 = create_pattern_2(['(-| )pa(id|y)',' wage(s)? ','compensat'],['disparity','(un)?equal',' same','(un)?fair',' more ', ' less', ' gap '],[7,25])
terms_dict['compensation-equal'] = [p0]
terms_category['compensation-equal'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'programs/initiatives-attract'
p0 = create_pattern_3(['program', 'initiative', 'campaign'],['attract',' hir(e(s|d)?|ing) ', 'recruit', ' f(ind|ound)', '(br(ing(ing|s)?|ought)|dr(aw(ing|s)?|ew)) in', ' entic(e|ing)', 'discover', 'acquir(e|ing)', ' gain', 'collect', 'gather', 'procur(e|ing)'],['talent','skill','leader'],[10,10,10,25])
p1 = create_pattern_2(['program', 'initiative', 'campaign'],[' hir(e(s|d)?|ing) ', 'recruit'],[7,25])
terms_dict['programs/initiatives-attract'] = [p0,p1]
terms_category['programs/initiatives-attract'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'programs/initiatives-retain'
p0 = create_pattern_3(['program', 'initiative', 'campaign'],['retain', 'retention',' keep', 'preserv(ation|e)', 'maintain', ' train', ' hold', 'develop'],['talent','skill','leader'],[10,10,10,25])
p1 = create_pattern_2(['program', 'initiative', 'campaign'],[' mentor', 'apprentice'],[7,25])
p2 = create_pattern_3(['program', 'initiative', 'campaign'],['career','employee','work(er|force)','contractor'],['advanc', 'develop', 'promot(e(d|s)?|ing|ion(s)?) ', ' train', '(present(ed|ing|s)|grant(ed|ing|s)|g(ave|iv(e(s)?|ing))|provid(e(d|s)?|ing)|offer(ed|s)?|ing) opportunit'],[10,10,10,25])
terms_dict['programs/initiatives-retain'] = [p0,p1,p2]
terms_category['programs/initiatives-retain'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'policies/public commitment-attract'
p0 = create_pattern_3(['polic(ies|y)', 'commitment'],['attract',' hir(e(s|d)?|ing) ', 'recruit', ' f(ind|ound)', '(br(ing(ing|s)?|ought)|dr(aw(ing|s)?|ew)) in', ' entic(e|ing)', 'discover', 'acquir(e|ing)', ' gain', 'collect', 'gather', 'procur(e|ing)'],['talent','skill','leader'],[10,10,10,25])
p1 = create_pattern_2(['polic(ies|y)', 'commitment'],[' hir(e(s|d)?|ing) ', 'recruit'],[7,25])
terms_dict['policies/public commitment-attract'] = [p0,p1]
terms_category['policies/public commitment-attract'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'policies/public commitment-retain'
p0 = create_pattern_3(['polic(ies|y)', 'commitment'],['retain', 'retention',' keep', 'preserv(ation|e)', 'maintain', ' train', ' hold', 'develop'],['talent','skill','leader'],[10,10,10,25])
p1 = create_pattern_2(['polic(ies|y)', 'commitment'],[' mentor', 'apprentice'],[7,25])
p2 = create_pattern_3(['polic(ies|y)', 'commitment'],['career','employee','work(er|force)','contractor'],['advanc', 'develop', 'promot(e(d|s)?|ing|ion(s)?) ', ' train', '(present(ed|ing|s)|grant(ed|ing|s)|g(ave|iv(e(s)?|ing))|provid(e(d|s)?|ing)|offer(ed|s)?|ing) opportunit'],[10,10,10,25])
terms_dict['policies/public commitment-retain'] = [p0,p1,p2]
terms_category['policies/public commitment-retain'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'whistleblower protection'
p0 = create_pattern_2(['whistle(-| )?blow'],['protect', 'safeguard', 'preserv(ation|e)', 'shelter', 'shield', 'support', 'promot(e(d|s)?|ing|ion(s)?) ', 'defend(ed|ing|s)? ', ' listen', 'accept', 'bolster', 'assist'],[7,25])
terms_dict['whistleblower protection'] = [p0]
terms_category['whistleblower protection'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'hiring/recruitment'
p0 = create_pattern_2([' hir(e(s|d)?|ing) ', 'recruit'],['different','vari(ety|ous)','divers'],[10,25])
p1 = create_pattern_3(['welcom(e|ing)','embrac','celebrat','proud','support'],['employ', 'build(ing)? a (team|work(force|place)?)'],['different','vari(ety|ous)','divers'],[10,10,10,25])
p2 = create_pattern_2([' hir(e(s|d)?|ing) ', 'recruit', 'attract','(bring|draw(n)?) (in|to)'],['employ', 'work(er|force)', 'contractor', ' position', 'opportunit', 'opening'],[8,25])
terms_dict['hiring/recruitment'] = [p0,p1,p2]
terms_category['hiring/recruitment'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'training-employee development'
p0 = create_pattern_2(['apprentice', 'mentor'],['develop','advanc','skill','leader','progress','opportunit','promot(ed|ion) ','(rise|climb|progress|move) up', 'support','empower'],[7, 25])
p1 = create_pattern_3([' train'],['career','employee','work(er|force)','contractor'],['develop','advanc','skill','leader','progress','opportunit','promot(ed|ion) ','(rise|climb|progress|move) up', 'support','empower'],[7,7,7,25])
terms_dict['training-employee-development'] = [p0,p1]
terms_category['training-employee-development'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'promotion-employee'
p0 = create_pattern_2(['promot(ed|ion) ', 'advanc', '(rise|climb|progress|move) up', 'career(.*)progress','progress(.*)career'],['employ(ee|ment)', 'work(er|force|( )?place)', ' job ', 'contractor'],[8,25])
terms_dict['promotion-employee'] = [p0]
terms_category['promotion-employee'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'worker union'
p0 = create_pattern_2(['union'],['employee', 'work(er|force|( )?place)', 'contractor',' labo(u)?r'],[5,25])
terms_dict['worker union'] = [p0]
terms_category['worker union'] = ('Talent-Attraction-Retention','PRACTICE')

# Practice: 'worker committee'
p0 = create_pattern_2(['committee'],['employee', 'work(er|force|( )?place)', 'contractor',' labo(u)?r'],[5,25])
terms_dict['worker committee'] = [p0]
terms_category['worker committee'] = ('Talent-Attraction-Retention','PRACTICE')

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                   OUTCOME - Talent-Attraction-Retention                      #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#

# Outcome: 'labor shortage'
p0 = create_pattern_2(['shortage'],[' labo(u)?r','work(er|force)','employee','contractor'],[7,25])
p1 = create_pattern_2(['shortage'],['significant','persistent', 'pervasive','critical','massive','severe','sustained','well(-| )(known|documented)'],[10,25])
p2 = create_pattern_2(['shortage'],['opportunit','opening'],[10,25])
p3 = create_pattern_2([' position', 'opportunit','opening',' labo(u)?r','work(er|force)','employee','contractor'],['insufficient','unfilled','empty'],[6,25])
terms_dict['labor shortage'] = [p0,p1,p2,p3]
terms_category['labor shortage'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'skill shortage/gap'
p0 = create_pattern_2(['shortage',' gap ','limited pool'],['((low|un|semi|high)(-|ly | |))?skill','talent'],[7,25])
terms_dict['skill shortage/gap'] = [p0]
terms_category['skill shortage/gap'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'discrimination lawsuit'
p0 = create_pattern_2(['lawsuit', ' sue(d)? '],['discriminat'],[15,25])
terms_dict['discrimination lawsuit'] = [p0]
terms_category['discrimination lawsuit'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'attrition'
p0 = create_pattern_2(['turnover'],[' high', 'disruptive', 'work(er|force)', 'employee', 'contractor',' rate', 'voluntary'],[7,25])
p1 = create_pattern_2(['attrition'],[' high', 'disruptive', 'work(er|force)', 'employee', ' rate','contractor'],[7,25])
p2 = create_pattern_2([' quit', ' leav(e(r(s)?)?|ing) ', ' left '],[' high', ' rate'],[6,25])
terms_dict['attrition'] = [p0,p1,p2]
terms_category['attrition'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'talent retention'
p0 = create_pattern_2(['retain', 'retention',' keep', 'preserv(ation|e)', 'maintain', 'invest'],['talent','skill'],[7,25])
terms_dict['talent retention'] = [p0]
terms_category['talent retention'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'talent attraction'
p0 = create_pattern_2(['attract','(br(ing|ought)|draw(n)?) (in|to)'],['talent','skill'],[7,25])
terms_dict['talent attraction'] = [p0]
terms_category['talent attraction'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'diverse workforce composition'
p0 = create_pattern_2(['(under(-| )?)?represent(ation|ed)', 'demographic','compos(e|ition)', 'make(-| )?up of', ' only', '  few',  'divers'],['(non(-|))?executive', 'director', 'board( member|( )?room)', 'manage(ment|r)','(base|low|mid|senior)(-| )level', 'leader', 'c-suite', 'employee', 'work(er|force|( )?place)', ' ceo ', 'contractor', 'professionals', 'technical'],[7,25])
terms_dict['diverse workforce composition'] = [p0]
terms_category['diverse workforce composition'] = ('Talent-Attraction-Retention', 'OUTCOME')

# Outcome: 'aging workforce'
p0 = create_pattern_2(['aging', ' old(er|est)? '],['employee','work(er|force|( )?place)','contractor'],[7,25])
terms_dict['aging workforce'] = [p0]
terms_category['aging workforce'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'go public'
p0 = create_pattern_2(['whistle(-| )?blow', 'worker','employee','contractor'],['(go(es|ing)|went) public'],[8,25])
p1 = create_pattern_3(['whistle(-| )?blow', 'worker','employee','contractor'],['expos(e|ing)', 'alleg', 'report', ' leak', 'br(ing(ing|s)?|ought) to light', 'disclos(e|ing)', 'uncover', 'unmask', 'document', ' claim', 'complain'],['publicly', '(in |to )(the )?public'],[10,10,20,25])
terms_dict['go public'] = [p0,p1]
terms_category['go public'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'retaliation/reprisal'
p0 = create_pattern_2(['retaliat', 'reprisal'],['employee','work(er|force|( )?place)', 'contractor'],[12,25])
terms_dict['retaliation/reprisal'] = [p0]
terms_category['retaliation/reprisal'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'harassment-employee'
p0 = create_pattern_2(['harass','bull(ie|y)',' torment', ' teas(e|ing)', 'mistreat'],['employee','work(er|force|( )?place)', 'contractor'],[10,25])
terms_dict['harassment-employee'] = [p0]
terms_category['harassment-employee'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'unsafe conditions'
p0 = create_pattern_3(['conditions', 'environment', 'work( )?place'],['work','employee','contractor'],['unsafe', 'dangerous', 'hazard', 'violen(ce|t)', 'abus(e|ive)', ' harm', 'threat', 'precarious'],[10,10,10,25])
terms_dict['unsafe conditions'] = [p0]
terms_category['unsafe conditions'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'inclusive culture'
p0 = create_pattern_2(['inclusive', 'friendly', 'support', 'welcom(e|ing)', 'amicable', 'be heard', 'empower'],['culture', 'work( conditions| environment|( )?place)', 'manage(ment|r)', 'boss ', 'leader'],[7,25])
p1 = create_pattern_3(['conditions', 'environment', 'work( )?place'],['work','employee','contractor'],['inclusive', 'friendly', 'support', 'welcom(e|ing)', 'amicable', 'be heard'],[10,10,10,25])
terms_dict['inclusive culture'] = [p0,p1]
terms_category['inclusive culture'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'toxic culture'
p0 = create_pattern_2(['malpractice', 'violat','non(-|)inclusive','toxic','hostile','(not |un)friendly','(not |un)welcom(e|ing)','cruel','mean','disparag','exclus(ion|ive)', 'dismissive', 'abus(e|ive)', 'threat'],['culture', 'work( conditions| environment|( )?place)', 'manage(ment|r)', 'boss ', 'leader'],[7,25])
p1 = create_pattern_3(['(lack|void) of', ' no(ne|t)? ', 'lacking', 'loss of'],['support', 'communicat', 'protect', 'feedback', 'investigat', 'report', ' record', 'follow(-| )up', 'action', 'anonymity', 'confiden(ce|tiality)','integrity','trust', 'transparen', 'approachab', 'accountab', 'psychological(ly)? safe(ty)?', 'disclos', 'honest'],['culture', 'work( conditions| environment|( )?place)', 'manage(ment|r)', 'boss ', 'leader'],[7,10,15,25])
p2 = create_pattern_3(['fear(ing| of)', 'risk(ing| of)', '(scared|afraid) of'],['(lack of |in)action', 'retaliat', 'dismiss', 'judg(ed|ment)', 'sham(e|ing)', 'authority', 'embarrass', 'blame', 'offen(d|se)', 'terminat', ' fire(d)? ', 'promot(ed|ion) ', 'trouble', 'job (safety|security)', 'perc(eive|eption)', 'reputation', 'dismiss', 'relationship'],['culture', 'work( conditions| environment|( )?place)', 'manage(ment|r)', 'boss ', 'leader'],[7,10,15,25])
terms_dict['toxic culture'] = [p0,p1,p2]
terms_category['toxic culture'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'job satisfaction'
p0 = create_pattern_2(['employ(ee|ment)', 'work( conditions| environment|er|force|( )?place)', ' job', 'contractor'],['satisf(action|ied)', 'morale', 'happy', 'engag(ed|ement)', 'content'],[7,25])
terms_dict['job satisfaction'] = [p0]
terms_category['job satisfaction'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'to be liable'
p0 = create_pattern_3(['liab(ility|le)'],[' (is|are|was|were) ','accept', 'take on', 'assume', 'acknowledge',' hold','affirm','recognize'],['company', 'firm', 'business', 'leader', ' ceo ', 'executive', 'president', ' vp ', 'director', 'chair(-)?(wo)?man','spokes(person|man|woman)'],[10,15,10,25])
terms_dict['to be liable'] = [p0]
terms_category['to be liable'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'trust-employee'
p0 = create_pattern_2(['trust'],['employee', 'work(er|force)', 'contractor'],[7,25])
terms_dict['trust-employee'] = [p0]
terms_category['trust-employee'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'quit/resign'
p0 = create_pattern_2(['resign',' quit', ' leav(e(r(s)?)?|ing) ', ' left ','depart(ed|ing|s)'],['employee', 'work(er|force)', 'contractor'],[7,25])
terms_dict['quit/resign'] = [p0]
terms_category['quit/resign'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'strike/walk-out-employee'
p0 = create_pattern_2(['strik(e|ing)', 'walk( |-)?out','refuse(d)? to work'],['employee', 'work(er|force|( )?place)', 'contractor'],[7,25])
terms_dict['strike/walk-out-employee'] = [p0]
terms_category['strike/walk-out-employee'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'sit-in-employee'
p0 = create_pattern_2(['s(at|it)-in'],['employee', 'work(er|force|( )?place)', 'contractor'],[7,25])
terms_dict['sit-in-employee'] = [p0]
terms_category['sit-in-employee'] = ('Talent-Attraction-Retention','OUTCOME')

# Outcome: 'protest-employee'
p0 = create_pattern_2(['protest', 'demonstrat(e|ion)', 'march', 'picket'],['employee', 'work(er|force|( )?place)', 'contractor'],[7,25])
terms_dict['protest-employee'] = [p0]
terms_category['protest-employee'] = ('Talent-Attraction-Retention','OUTCOME')



################################################################################
################################################################################
#           Product-DMD (Product Design, Marketing & Delivery)                 #
################################################################################
################################################################################

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                           PRACTICE - Product-DMD                             #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                           OUTCOME - Product-DMD                              #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#



################################################################################
################################################################################
#                            Community-Relations                               #
################################################################################
################################################################################

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                       PRACTICE - Community-Relations                         #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                        OUTCOME - Community-Relations                         #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#



################################################################################
################################################################################
#                        Innovation-Risk-Recognition                           #
################################################################################
################################################################################

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                   PRACTICE - Innovation-Risk-Recognition                     #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                   OUTCOME - Innovation-Risk-Recognition                      #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#

################################################################################
################################################################################
#                                    Other                                     #
################################################################################
################################################################################

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                               PRACTICE - Other                               #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#

#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#
#                               OUTCOME - Other                                #
#------------------------------------------------------------------------------#
#------------------------------------------------------------------------------#


In [None]:
search_words_list = []
terms_list = []
for key, val in terms_dict.items():
  terms_list.append(key)
  for pattern in val:
    bucks_list = pattern[0]
    for bucket in bucks_list:
      for word in bucket:
        search_words_list.append(word)

search_words_list = list(set(search_words_list))
terms_list = list(set(terms_list))

In [None]:
# CELL_3: Create the dictionary for the DEI context terms
DEI_context_dict = {
    'ethnic': 'ethnic',
    '(disab(i|l)| abilit(y|ies))': '(dis)ability',
    '(marital status|married)': 'marital status',
    'bias': 'bias',
    'religio': 'religio',
    'inclusiv': 'inclusive',
    'divers': 'diverse',
    #' access': 'access',
    ' race ': 'race',
    'racism': 'race',
    'racist': 'race',
    'racial': 'race',
    'bipoc': 'race',
    'people of colo(u)?r': 'race',
    'blackface': 'race',
    'black': 'race',
    'white': 'race',
    'asian': 'race',
    'latino': 'race',
    'hispanic':'ethnic',
    '(indigenous|native(s| (america|population|communit|govern|reservation))|american indian|first nations|trib(al|e)|aborigin)': 'race',
    '(environmental human rights defender|ehrd)': 'advocate',
    'working (famil|parent|mother|mom|father|dad)': 'familial status',
    'veteran': 'military status',
    '(service|guard|reserve) member': 'military status',
    'minorities': 'minorit',
    'minority group': 'minorit',
    'lgbt': 'LGBT', 
    'sexual orientation': 'LGBT',
    'gender identit': 'LGBT',
    ' gay': 'LGBT',
    'lesbian': 'LGBT',
    'bisexual': 'LGBT',
    'transgender': 'LGBT',
    'queer': 'LGBT',
    'asexual': 'LGBT',
    '(homo|trans)phobia': 'LGBT',
    'non(-)?binary': 'LGBT',
    'wom(e|a)n':'gender-M/F',
    'female':'gender-M/F',
    'gender':'gender-M/F',
    'based on sex':'gender-M/F',
    'pregnant':'gender-M/F',
    'on the basis of sex':'gender-M/F',
    'maternity leave':'gender-M/F',
    'sexist':'gender-M/F',
    'sex discrimination':'gender-M/F',
    'age discrimin': 'age',
    'age bias': 'age',
    'ageism': 'age',
    'average age': 'age',
    ' old(er)? ': 'age',
    'youth': 'youth',
    'young': 'youth',
    'next generation': 'youth',
    'nationalit': 'nationality',
    'national origin': 'nationality',
    'foreign nationals': 'nationality',
    'under(-)?represented': 'underrepresented',
    'migrant': 'migrant',
    'foreigner': 'migrant',
    ' visa ': 'migrant',
    'citizen': 'migrant',
    'foreign worker': 'migrant',
    'entry(-| )level': 'education/skill level',
    'education level': 'education/skill level',
    '(college|undergraduate|graduate) degree': 'education/skill level',
    'high school diploma': 'education/skill level',
    '(low|un|semi|high)(-|ly | |)skill': 'education/skill level',
    'economic status': 'economic status',
    'economic class': 'economic status',
    'low(-| )income': 'economic status', 
    'high(-| )income': 'economic status',
    'impoverish': 'economic status',
    'poverty': 'economic status',
    'middle(-| )class': 'economic status',
    'working(-| )class': 'economic status',
    'criminal history': 'criminal history',
    'felon': 'criminal history',
    'background check': 'criminal history',
    '(convict(s)? |formerly convicted|convicted formerly)': 'criminal history',
    'factory work': 'factory work',
    'supplier contract':'supplier contract'
}

DEI_context_list = []
for key, val in DEI_context_dict.items():
  DEI_context_list.append(key)

DEI_context_list = list(set(DEI_context_list))

## PROXY STATEMENT: Create output dataframe w/ indicator and summary columns
- MODIFY-1: Change the `para_text` assignment to be all the column names in your dataframe that include your text data. Be sure to use `.lower` for each to remove uppercase characters.
- MODIFY-2: Based on the columns of document information in your input dataframe `all_industries_events_master_df`, you might want to do a different concatenation of the indicator+summary columns in `ind_df` with those input columns to order them in a way that makes sense. Set the dataframe of the combined columns as `full_master_df`. 
- MODIFY-3: Insert the `ANY_*` indicator columns for practice term, outcome term, DEI context term, and practice-outcome term co-occurrence into `full_master_df` at column indices you prefer for the column order of the final CSV.

### Creat Indicators columns Function

In [None]:
def check_columns(row):
    input_string = row['para_text'].lower() #MODIFY-1: replace w/ the column names that hold your text data
    word_index_dict = word_indices(input_string, search_words_list, DEI_context_list)
    flag_term_dict = check_cooccur(word_index_dict, terms_dict, DEI_context_list)
    
    flagged_terms = []
    OUTCOME_list = ''
    PRACTICE_list = ''
    TAR_list = ''
    PDMD_list = ''
    CR_list = ''
    IRR_list = ''
    OTHER_list = ''
    DEI_terms_list = ''
    
    TAR_bool = 0
    PDMD_bool = 0
    CR_bool = 0
    IRR_bool = 0
    OTHER_bool = 0

    add_columns = ['PRACTICE_TERMS_FOUND','OUTCOME_TERMS_FOUND','DEI-CONTEXT_TERMS_FOUND','TAR_TERMS_FOUND','PDMD_TERMS_FOUND','CR_TERMS_FOUND','IRR_TERMS_FOUND','OTHER_TERMS_FOUND','TAR_ind','PDMD_ind','CR_ind','IRR_ind','OTHER_ind']
    series_columns = []
    for term in terms_list:
      indicator_col_name = "{}_{}_{}".format(term, terms_category[term][1], terms_category[term][0])
      add_columns.append(indicator_col_name)
      series_columns.append(term)
    for dt in DEI_context_list:
      indicator_col_name = "{}_{}_{}".format(dt, 'DEI-CONTEXT', DEI_context_dict[dt])
      add_columns.append(indicator_col_name)
      series_columns.append(dt)

    for key, value in flag_term_dict.items():
      flagged_terms.append(key)
      
      value_unique_str = list(set([str(v) for v in value]))
      DEI_unique = list(set([instance[0][len(instance[0])-1] for instance in value]))
      flagged_terms = flagged_terms + DEI_unique

      if terms_category[key][1] == 'OUTCOME':
        OUTCOME_list = OUTCOME_list + key + ' ('+ terms_category[key][0] + '): '+ '\n'.join(value_unique_str) + '\n\n'
      elif terms_category[key][1] == 'PRACTICE':
        PRACTICE_list = PRACTICE_list + key + ' ('+ terms_category[key][0] + '): '+ '\n'.join(value_unique_str) + '\n\n'

      if terms_category[key][0] == 'Talent-Attraction-Retention':
        TAR_list = TAR_list + key + ' ('+ terms_category[key][1] + '): '+ '\n'.join(value_unique_str) + '\n\n'
        TAR_bool = 1
      elif terms_category[key][0] == 'Product-DMD':
        PDMD_list = PDMD_list + key + ' ('+ terms_category[key][1] + '): '+ '\n'.join(value_unique_str) + '\n\n'
        PDMD_bool = 1
      elif terms_category[key][0] == 'Community-Relations':
        CR_list = CR_list + key + ' ('+ terms_category[key][1] + '): '+ '\n'.join(value_unique_str) + '\n\n'
        CR_bool = 1
      elif terms_category[key][0] == 'Innovation-Risk-Recognition':
        IRR_list = IRR_list + key + ' ('+ terms_category[key][1] + '): '+ '\n'.join(value_unique_str) + '\n\n'
        IRR_bool = 1
      elif terms_category[key][0] == 'Other':
        OTHER_list = OTHER_list + key + ' ('+ terms_category[key][1] + '): '+ '\n'.join(value_unique_str) + '\n\n'
        OTHER_bool = 1

      for DEI_word in DEI_unique:
        DEI_terms_list = DEI_terms_list + DEI_word + ' ('+ DEI_context_dict[DEI_word] + ') '+ ' ['+ key + '], \n'
    
    flagged_terms_ind = []
    for s in series_columns:
      if s in flagged_terms:
        flagged_terms_ind.append(1)
      else:
        flagged_terms_ind.append(0)
    
    pre_list = [PRACTICE_list, OUTCOME_list, DEI_terms_list, TAR_list, PDMD_list, CR_list, IRR_list, OTHER_list, TAR_bool, PDMD_bool, CR_bool, IRR_bool, OTHER_bool]
    series_data = pre_list + flagged_terms_ind
    final_series = pd.Series(data=series_data, index =add_columns)
    
    return final_series

### Import paragraph text csv

In [None]:
import os
txt_path = '/content/drive/My Drive/DFG/DEF_para'
DEF20_txtlist = []
DEF20_1df=0
DEF20_2df=0
DEF20_3df=0
DEF20_4df=0
DEF20_5df=0
for file in os.listdir(txt_path):  
  if (file[0:6]=='DEF20_') & (file[-3:] == 'csv'):
    DEF20_txtlist.append(file)
    if file[6]=='1':
      DEF20_1df+=1
    if file[6]=='2':
      DEF20_2df+=1
    if file[6]=='3':
      DEF20_3df+=1
    if file[6]=='4':
      DEF20_4df+=1
    if file[6]=='5':
      DEF20_5df+=1
    print(file)
print(f'1df has {DEF20_1df} paragraph files')
print(f'2df has {DEF20_2df} paragraph files')
print(f'3df has {DEF20_3df} paragraph files')
print(f'4df has {DEF20_4df} paragraph files')
print(f'5df has {DEF20_5df} paragraph files')

DEF20_1df_para_100.csv
DEF20_1df_para_200.csv
DEF20_1df_para_300.csv
DEF20_1df_para_400.csv
DEF20_1df_para_500.csv
DEF20_1df_para_600.csv
DEF20_1df_para_629.csv
DEF20_2df_para_100.csv
DEF20_2df_para_200.csv
DEF20_2df_para_300.csv
DEF20_2df_para_400.csv
DEF20_2df_para_500.csv
DEF20_2df_para_600.csv
DEF20_2df_para_649.csv
DEF20_3df_para_100.csv
DEF20_3df_para_200.csv
DEF20_3df_para_300.csv
DEF20_3df_para_400.csv
DEF20_3df_para_500.csv
DEF20_3df_para_600.csv
DEF20_3df_para_646.csv
DEF20_4df_para_100.csv
DEF20_4df_para_200.csv
DEF20_4df_para_300.csv
DEF20_4df_para_400.csv
DEF20_4df_para_500.csv
DEF20_4df_para_600.csv
DEF20_4df_para_613.csv
DEF20_5df_para_100.csv
DEF20_5df_para_200.csv
DEF20_5df_para_300.csv
DEF20_5df_para_400.csv
DEF20_5df_para_500.csv
DEF20_5df_para_600.csv
DEF20_5df_para_692.csv
1df has 7 paragraph files
2df has 7 paragraph files
3df has 7 paragraph files
4df has 7 paragraph files
5df has 7 paragraph files


In [None]:
for i in DEF20_txtlist:
  print(pd.read_csv(txt_path+'/'+i).shape)

(8683, 5)
(9689, 5)
(6931, 5)
(9031, 5)
(8257, 5)
(9872, 5)
(2861, 5)
(9989, 5)
(7682, 5)
(8661, 5)
(7472, 5)
(9008, 5)
(8624, 5)
(4001, 5)
(8257, 5)
(10994, 5)
(9559, 5)
(7586, 5)
(9035, 5)
(10470, 5)
(4662, 5)
(9125, 5)
(10343, 5)
(8284, 5)
(9713, 5)
(8122, 5)
(10592, 5)
(756, 5)
(8073, 5)
(8588, 5)
(6745, 5)
(7452, 5)
(8468, 5)
(9478, 5)
(6963, 5)


### Apply on Xdf_para_Y00 csv and save indicators dataframe as csv

In [None]:
import pandas as pd
txt_path = '/content/drive/My Drive/DFG/DEF_para'
dft = pd.read_csv(txt_path+'/'+'DEF20_1df_para_300.csv')
dft['para_text'] = dft['para_text'].astype(str)

In [None]:
# change the row index accordingly
dft1 = dft.iloc[8000:9000,:]
# apply the function to create indicators columns
df_indi_col = dft1.apply(check_columns, axis=1)
# match with each row's characteristics accordingly
df_indi_col1 = pd.merge(dft1[['para_index','label', 'industry_id']], df_indi_col,left_index=True, right_index=True)

from google.colab import files
# Import Drive API and authenticate.
from google.colab import drive

# Mount your Drive to the Colab VM.
drive.mount('/gdrive')

# Write the DataFrame to CSV file. change the row index in the file name accordingly
with open('/gdrive/My Drive/DFG/DEF_Indicator/DEF20_1df_300_para_8000_9000.csv', 'w') as f:
  df_indi_col1.to_csv(f)

In [None]:
df_indi_col1.head()

Unnamed: 0,PRACTICE_TERMS_FOUND,OUTCOME_TERMS_FOUND,DEI-CONTEXT_TERMS_FOUND,TAR_TERMS_FOUND,PDMD_TERMS_FOUND,CR_TERMS_FOUND,IRR_TERMS_FOUND,OTHER_TERMS_FOUND,TAR_ind,PDMD_ind,CR_ind,IRR_ind,OTHER_ind,promotion-employee_PRACTICE_Talent-Attraction-Retention,discrimination lawsuit_OUTCOME_Talent-Attraction-Retention,protest-employee_OUTCOME_Talent-Attraction-Retention,policies/public commitment-attract_PRACTICE_Talent-Attraction-Retention,hiring/recruitment_PRACTICE_Talent-Attraction-Retention,toxic culture_OUTCOME_Talent-Attraction-Retention,talent retention_OUTCOME_Talent-Attraction-Retention,go public_OUTCOME_Talent-Attraction-Retention,talent attraction_OUTCOME_Talent-Attraction-Retention,whistleblower protection_PRACTICE_Talent-Attraction-Retention,sit-in-employee_OUTCOME_Talent-Attraction-Retention,trust-employee_OUTCOME_Talent-Attraction-Retention,harassment-employee_OUTCOME_Talent-Attraction-Retention,to be liable_OUTCOME_Talent-Attraction-Retention,attrition_OUTCOME_Talent-Attraction-Retention,training-employee-development_PRACTICE_Talent-Attraction-Retention,compensation-equal_PRACTICE_Talent-Attraction-Retention,diverse workforce composition_OUTCOME_Talent-Attraction-Retention,job satisfaction_OUTCOME_Talent-Attraction-Retention,quit/resign_OUTCOME_Talent-Attraction-Retention,retaliation/reprisal_OUTCOME_Talent-Attraction-Retention,programs/initiatives-attract_PRACTICE_Talent-Attraction-Retention,skill shortage/gap_OUTCOME_Talent-Attraction-Retention,strike/walk-out-employee_OUTCOME_Talent-Attraction-Retention,programs/initiatives-retain_PRACTICE_Talent-Attraction-Retention,inclusive culture_OUTCOME_Talent-Attraction-Retention,worker committee_PRACTICE_Talent-Attraction-Retention,unsafe conditions_OUTCOME_Talent-Attraction-Retention,policies/public commitment-retain_PRACTICE_Talent-Attraction-Retention,labor shortage_OUTCOME_Talent-Attraction-Retention,aging workforce_OUTCOME_Talent-Attraction-Retention,worker union_PRACTICE_Talent-Attraction-Retention,high school diploma_DEI-CONTEXT_education/skill level,minorities_DEI-CONTEXT_minorit,white_DEI-CONTEXT_race,based on sex_DEI-CONTEXT_gender-M/F,maternity leave_DEI-CONTEXT_gender-M/F,transgender_DEI-CONTEXT_LGBT,divers_DEI-CONTEXT_diverse,sexual orientation_DEI-CONTEXT_LGBT,(service|guard|reserve) member_DEI-CONTEXT_military status,old(er)? _DEI-CONTEXT_age,ageism_DEI-CONTEXT_age,(convict(s)? |formerly convicted|convicted formerly)_DEI-CONTEXT_criminal history,queer_DEI-CONTEXT_LGBT,felon_DEI-CONTEXT_criminal history,national origin_DEI-CONTEXT_nationality,average age_DEI-CONTEXT_age,criminal history_DEI-CONTEXT_criminal history,asexual_DEI-CONTEXT_LGBT,foreigner_DEI-CONTEXT_migrant,(disab(i|l)| abilit(y|ies))_DEI-CONTEXT_(dis)ability,bipoc_DEI-CONTEXT_race,gender_DEI-CONTEXT_gender-M/F,nationalit_DEI-CONTEXT_nationality,(college|undergraduate|graduate) degree_DEI-CONTEXT_education/skill level,youth_DEI-CONTEXT_youth,visa _DEI-CONTEXT_migrant,age bias_DEI-CONTEXT_age,high(-| )income_DEI-CONTEXT_economic status,sexist_DEI-CONTEXT_gender-M/F,factory work_DEI-CONTEXT_factory work,bias_DEI-CONTEXT_bias,non(-)?binary_DEI-CONTEXT_LGBT,citizen_DEI-CONTEXT_migrant,veteran_DEI-CONTEXT_military status,working(-| )class_DEI-CONTEXT_economic status,blackface_DEI-CONTEXT_race,ethnic_DEI-CONTEXT_ethnic,next generation_DEI-CONTEXT_youth,entry(-| )level_DEI-CONTEXT_education/skill level,low(-| )income_DEI-CONTEXT_economic status,poverty_DEI-CONTEXT_economic status,female_DEI-CONTEXT_gender-M/F,background check_DEI-CONTEXT_criminal history,young_DEI-CONTEXT_youth,gender identit_DEI-CONTEXT_LGBT,foreign worker_DEI-CONTEXT_migrant,working (famil|parent|mother|mom|father|dad)_DEI-CONTEXT_familial status,on the basis of sex_DEI-CONTEXT_gender-M/F,(marital status|married)_DEI-CONTEXT_marital status,people of colo(u)?r_DEI-CONTEXT_race,gay_DEI-CONTEXT_LGBT,wom(e|a)n_DEI-CONTEXT_gender-M/F,(homo|trans)phobia_DEI-CONTEXT_LGBT,black_DEI-CONTEXT_race,latino_DEI-CONTEXT_race,(low|un|semi|high)(-|ly | |)skill_DEI-CONTEXT_education/skill level,migrant_DEI-CONTEXT_migrant,religio_DEI-CONTEXT_religio,impoverish_DEI-CONTEXT_economic status,sex discrimination_DEI-CONTEXT_gender-M/F,middle(-| )class_DEI-CONTEXT_economic status,under(-)?represented_DEI-CONTEXT_underrepresented,pregnant_DEI-CONTEXT_gender-M/F,inclusiv_DEI-CONTEXT_inclusive,bisexual_DEI-CONTEXT_LGBT,hispanic_DEI-CONTEXT_ethnic,foreign nationals_DEI-CONTEXT_nationality,(indigenous|native(s| (america|population|communit|govern|reservation))|american indian|first nations|trib(al|e)|aborigin)_DEI-CONTEXT_race,supplier contract_DEI-CONTEXT_supplier contract,racist_DEI-CONTEXT_race,(environmental human rights defender|ehrd)_DEI-CONTEXT_advocate,race _DEI-CONTEXT_race,lgbt_DEI-CONTEXT_LGBT,racism_DEI-CONTEXT_race,economic status_DEI-CONTEXT_economic status,economic class_DEI-CONTEXT_economic status,age discrimin_DEI-CONTEXT_age,racial_DEI-CONTEXT_race,lesbian_DEI-CONTEXT_LGBT,minority group_DEI-CONTEXT_minorit,asian_DEI-CONTEXT_race,education level_DEI-CONTEXT_education/skill level
0,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,,"diverse workforce composition (Talent-Attraction-Retention): [('divers', 'leader', 'wom(e|a)n'), [7, 25]]\n[('divers', 'work(er|force|( )?place)', 'divers'), [7, 25]]\n[('divers', ' ceo ', 'divers'), [7, 25]]\n[('divers', ' ceo ', 'gender'), [7, 25]]\n[('divers', 'leader', 'divers'), [7, 25]]\n[('divers', 'leader', 'gender'), [7, 25]]\n[('divers', 'work(er|force|( )?place)', 'wom(e|a)n'), [7, 25]]\n[('divers', 'work(er|force|( )?place)', 'gender'), [7, 25]]\n[('divers', 'leader', 'female'), [7, 25]]\n[('divers', ' ceo ', 'wom(e|a)n'), [7, 25]]\n[('divers', 'leader', 'inclusiv'), [7, 25]]\n\ninclusive culture (Talent-Attraction-Retention): [('inclusive', 'culture', 'divers'), [7, 25]]\n[('inclusive', 'culture', 'inclusiv'), [7, 25]]\n[('inclusive', 'culture', 'female'), [7, 25]]\n\n","gender (gender-M/F) [diverse workforce composition], \nfemale (gender-M/F) [diverse workforce composition], \ndivers (diverse) [diverse workforce composition], \ninclusiv (inclusive) [diverse workforce composition], \nwom(e|a)n (gender-M/F) [diverse workforce composition], \ndivers (diverse) [inclusive culture], \ninclusiv (inclusive) [inclusive culture], \nfemale (gender-M/F) [inclusive culture], \n","diverse workforce composition (OUTCOME): [('divers', 'leader', 'wom(e|a)n'), [7, 25]]\n[('divers', 'work(er|force|( )?place)', 'divers'), [7, 25]]\n[('divers', ' ceo ', 'divers'), [7, 25]]\n[('divers', ' ceo ', 'gender'), [7, 25]]\n[('divers', 'leader', 'divers'), [7, 25]]\n[('divers', 'leader', 'gender'), [7, 25]]\n[('divers', 'work(er|force|( )?place)', 'wom(e|a)n'), [7, 25]]\n[('divers', 'work(er|force|( )?place)', 'gender'), [7, 25]]\n[('divers', 'leader', 'female'), [7, 25]]\n[('divers', ' ceo ', 'wom(e|a)n'), [7, 25]]\n[('divers', 'leader', 'inclusiv'), [7, 25]]\n\ninclusive culture (OUTCOME): [('inclusive', 'culture', 'divers'), [7, 25]]\n[('inclusive', 'culture', 'inclusiv'), [7, 25]]\n[('inclusive', 'culture', 'female'), [7, 25]]\n\n",,,,,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,,,,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
