# Data Cleaning for Gender IDEAL

Before committing, please re-run the kernel with clear any output to avoid any merge issues with jupyter and github.

## Imports

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt
import altair as alt

## Reading in csvs

#### Summary about columns in Vision file

* There are 44 columns in Vision & Commitment. The multiple choice answers do not come downloaded with the question label. The answers are the column names and the first column name is the first option a user could choose.
* If a type of multiple choice response was already used in an earlier question, a number is appended to it, like None.1
* Tableau prefers data to be "tall and thin" (i.e. instead of one row per respondent, one row per question) https://www.tableau.com/about/blog/2018/2/prepare-survey-data-analysis-three-easy-steps-83122 -> This can be done in Tableau using the pivot functionality



### Vision
This includes the benchmark questions such as number of employees, number of years, industry, workforce type. 

Also includes "other relevant info" which is open response. Can we remove this?

In [4]:
vision = pd.read_csv("../data_original/vision_anon.csv")
vision

Unnamed: 0,How many employees does your organization have?,For how many years has your organization been in operation?,What is your organization's industry?,Tell us about your organization's workforce?,"Thanks! You're done with the registration process. Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",Women of all types of abilities,...,"Vision & Commitment: Does your workplace publish an annual report on its performance on gender equity, diversity and inclusion that includes the following:",Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3,"Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?"
0,,,,,,,,,,,...,,,,,,,,,,
1,Fewer than 100 Employees,Fewer than 5 years,541611,At least 80% of employees are Hourly,,,Women and people who identify as women,,"Women of color, including all races and ethnic...",,...,Publish an annual EDI report that includes per...,,Womens Empowerment Principles (UN),,,,,,,
2,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,US employees only,"N/A, No EDI policy in place",,,,,...,,Internal report of demographic metrics,,,,,,,,
3,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,,"N/A, No EDI policy in place",,,,,...,No EDI report is published,,,,,,,,,The CEO has mentioned to me (the only female i...
4,Fewer than 100 Employees,Fewer than 5 years,523920,At least 80% of employees are Salaried,Not relevant,"N/A, No EDI policy in place",,,,,...,Publish an annual EDI report that includes per...,,,,,,,,,"We have a broader DEI policy, goals, etc. tha..."
5,Fewer than 100 Employees,5-14 years,61,At least 80% of employees are Salaried,No employees outside of the US. Not challengin...,"N/A, No EDI policy in place",,,,,...,No EDI report is published,,,,,,,,,We have a predominantly female workforce - 70...
6,Fewer than 100 Employees,5-14 years,2426347,At least 80% of employees are Salaried,not relevant,"N/A, No EDI policy in place",,,,,...,No EDI report is published,,,,,,,,,
7,Fewer than 100 Employees,Fewer than 5 years,236210,At least 80% of employees are Hourly,All of our employees are in US. We are a mun...,,,,"Women of color, including all races and ethnic...",,...,,"Prepare report on work on EDI, no gender perfo...",,,,,,,,I am completing this from the perspective of o...
8,100-249 Employees,15 or more years,Finance (private equity),At least 80% of employees are Salaried,"We have employees in London, Dublin & Singapor...",,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,...,No EDI report is published,,,,,,,,,"We have women partners, etc that are members o..."
9,Fewer than 100 Employees,5-14 years,523920,We have a mixed workforce of hourly and salari...,We have one hourly employee currently working ...,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,...,No EDI report is published,,Womens Empowerment Principles (UN),,,,,,Gender Equity Now,We reinforce our commitment to gender equity e...


In [5]:
df = vision.copy()
df

Unnamed: 0,How many employees does your organization have?,For how many years has your organization been in operation?,What is your organization's industry?,Tell us about your organization's workforce?,"Thanks! You're done with the registration process. Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",Women of all types of abilities,...,"Vision & Commitment: Does your workplace publish an annual report on its performance on gender equity, diversity and inclusion that includes the following:",Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3,"Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?"
0,,,,,,,,,,,...,,,,,,,,,,
1,Fewer than 100 Employees,Fewer than 5 years,541611,At least 80% of employees are Hourly,,,Women and people who identify as women,,"Women of color, including all races and ethnic...",,...,Publish an annual EDI report that includes per...,,Womens Empowerment Principles (UN),,,,,,,
2,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,US employees only,"N/A, No EDI policy in place",,,,,...,,Internal report of demographic metrics,,,,,,,,
3,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,,"N/A, No EDI policy in place",,,,,...,No EDI report is published,,,,,,,,,The CEO has mentioned to me (the only female i...
4,Fewer than 100 Employees,Fewer than 5 years,523920,At least 80% of employees are Salaried,Not relevant,"N/A, No EDI policy in place",,,,,...,Publish an annual EDI report that includes per...,,,,,,,,,"We have a broader DEI policy, goals, etc. tha..."
5,Fewer than 100 Employees,5-14 years,61,At least 80% of employees are Salaried,No employees outside of the US. Not challengin...,"N/A, No EDI policy in place",,,,,...,No EDI report is published,,,,,,,,,We have a predominantly female workforce - 70...
6,Fewer than 100 Employees,5-14 years,2426347,At least 80% of employees are Salaried,not relevant,"N/A, No EDI policy in place",,,,,...,No EDI report is published,,,,,,,,,
7,Fewer than 100 Employees,Fewer than 5 years,236210,At least 80% of employees are Hourly,All of our employees are in US. We are a mun...,,,,"Women of color, including all races and ethnic...",,...,,"Prepare report on work on EDI, no gender perfo...",,,,,,,,I am completing this from the perspective of o...
8,100-249 Employees,15 or more years,Finance (private equity),At least 80% of employees are Salaried,"We have employees in London, Dublin & Singapor...",,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,...,No EDI report is published,,,,,,,,,"We have women partners, etc that are members o..."
9,Fewer than 100 Employees,5-14 years,523920,We have a mixed workforce of hourly and salari...,We have one hourly employee currently working ...,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,...,No EDI report is published,,Womens Empowerment Principles (UN),,,,,,Gender Equity Now,We reinforce our commitment to gender equity e...


In [27]:
df.columns

Index(['How many employees does your organization have?',
       'For how many years has your organization been in operation?',
       'What is your organization's industry?',
       'Tell us about your organization's workforce?',
       'Thanks! You're done with the registration process.  Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).',
       'N/A, No EDI policy in place', 'Women and people who identify as women',
       'Working mothers',
       'Women of color, including all races and ethnicities',
       'Women of all types of abilities',
       'Persons of all non-binary gender identities',
       'Women of all sexual orientations', 'Women of all religions',
       'Women of all socio-economic levels',
       'Conducted a demographic analysis of the current workforce',
       'Conducted a baseline culture analysis using employee feedback to identify gaps',
       'Established goals regarding gender equity for the whole o

In [28]:
#Adding a letter for name
df["company_name"]=["A","B","C","D","E","F","G","H","I","J"]

In [29]:
#Renaming some of the columns to be shorter
df.rename(columns={'How many employees does your organization have?':'number_of_employees','For how many years has your organization been in operation?':'number_of_years',"What is your organization's industry?":'industry',"Tell us about your organization's workforce?":"workforce"},inplace=True)
df

Unnamed: 0,number_of_employees,number_of_years,industry,workforce,"Thanks! You're done with the registration process. Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",Women of all types of abilities,...,Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3,"Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?",company_name
0,,,,,,,,,,,...,,,,,,,,,,A
1,Fewer than 100 Employees,Fewer than 5 years,541611,At least 80% of employees are Hourly,,,Women and people who identify as women,,"Women of color, including all races and ethnic...",,...,,Womens Empowerment Principles (UN),,,,,,,,B
2,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,US employees only,"N/A, No EDI policy in place",,,,,...,Internal report of demographic metrics,,,,,,,,,C
3,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,,"N/A, No EDI policy in place",,,,,...,,,,,,,,,The CEO has mentioned to me (the only female i...,D
4,Fewer than 100 Employees,Fewer than 5 years,523920,At least 80% of employees are Salaried,Not relevant,"N/A, No EDI policy in place",,,,,...,,,,,,,,,"We have a broader DEI policy, goals, etc. tha...",E
5,Fewer than 100 Employees,5-14 years,61,At least 80% of employees are Salaried,No employees outside of the US. Not challengin...,"N/A, No EDI policy in place",,,,,...,,,,,,,,,We have a predominantly female workforce - 70...,F
6,Fewer than 100 Employees,5-14 years,2426347,At least 80% of employees are Salaried,not relevant,"N/A, No EDI policy in place",,,,,...,,,,,,,,,,G
7,Fewer than 100 Employees,Fewer than 5 years,236210,At least 80% of employees are Hourly,All of our employees are in US. We are a mun...,,,,"Women of color, including all races and ethnic...",,...,"Prepare report on work on EDI, no gender perfo...",,,,,,,,I am completing this from the perspective of o...,H
8,100-249 Employees,15 or more years,Finance (private equity),At least 80% of employees are Salaried,"We have employees in London, Dublin & Singapor...",,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,...,,,,,,,,,"We have women partners, etc that are members o...",I
9,Fewer than 100 Employees,5-14 years,523920,We have a mixed workforce of hourly and salari...,We have one hourly employee currently working ...,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,...,,Womens Empowerment Principles (UN),,,,,,Gender Equity Now,We reinforce our commitment to gender equity e...,J


In [30]:
#Dropping some of the unneeded columns
df.drop(columns=["Thanks! You're done with the registration process.  Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?"],inplace=True)
df

Unnamed: 0,number_of_employees,number_of_years,industry,workforce,"N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",Women of all types of abilities,Persons of all non-binary gender identities,...,"Vision & Commitment: Does your workplace publish an annual report on its performance on gender equity, diversity and inclusion that includes the following:",Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3,company_name
0,,,,,,,,,,,...,,,,,,,,,,A
1,Fewer than 100 Employees,Fewer than 5 years,541611,At least 80% of employees are Hourly,,Women and people who identify as women,,"Women of color, including all races and ethnic...",,,...,Publish an annual EDI report that includes per...,,Womens Empowerment Principles (UN),,,,,,,B
2,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,"N/A, No EDI policy in place",,,,,,...,,Internal report of demographic metrics,,,,,,,,C
3,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,No EDI report is published,,,,,,,,,D
4,Fewer than 100 Employees,Fewer than 5 years,523920,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,Publish an annual EDI report that includes per...,,,,,,,,,E
5,Fewer than 100 Employees,5-14 years,61,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,No EDI report is published,,,,,,,,,F
6,Fewer than 100 Employees,5-14 years,2426347,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,No EDI report is published,,,,,,,,,G
7,Fewer than 100 Employees,Fewer than 5 years,236210,At least 80% of employees are Hourly,,,,"Women of color, including all races and ethnic...",,,...,,"Prepare report on work on EDI, no gender perfo...",,,,,,,,H
8,100-249 Employees,15 or more years,Finance (private equity),At least 80% of employees are Salaried,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,No EDI report is published,,,,,,,,,I
9,Fewer than 100 Employees,5-14 years,523920,We have a mixed workforce of hourly and salari...,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,No EDI report is published,,Womens Empowerment Principles (UN),,,,,,Gender Equity Now,J


In [31]:
#Printing out the column names
df.columns

Index(['number_of_employees', 'number_of_years', 'industry', 'workforce',
       'N/A, No EDI policy in place', 'Women and people who identify as women',
       'Working mothers',
       'Women of color, including all races and ethnicities',
       'Women of all types of abilities',
       'Persons of all non-binary gender identities',
       'Women of all sexual orientations', 'Women of all religions',
       'Women of all socio-economic levels',
       'Conducted a demographic analysis of the current workforce',
       'Conducted a baseline culture analysis using employee feedback to identify gaps',
       'Established goals regarding gender equity for the whole organization',
       'Developed a timebound roadmap to meet established goals', 'None',
       'Dedicated team responsible for establishing and tracking progress toward gender equity targets',
       'Specific timeline on when targets should be met',
       'Engagement with all staff on targets and their importance to the or

In [32]:
#Renaming some of the columns based on Question # followed by _ then A/B/C/etc if a multiple selection
#Question # is from the survey at: https://gender-ideal.org/the-assessment
df=df.rename(columns={"N/A, No EDI policy in place":"Q9_A","Women and people who identify as women":"Q9_B",
                   "Working mothers":"Q9_C","Women of color, including all races and ethnicities":"Q9_D",
                  "Women of all types of abilities":"Q9_E","Persons of all non-binary gender identities":"Q9_F",
                  "Women of all sexual orientations":"Q9_G","Women of all religions":"Q9_H",
                   "Women of all socio-economic levels":"Q9_I",
                   "Conducted a demographic analysis of the current workforce"
                   :"Q10_A","Conducted a baseline culture analysis using employee feedback to identify gaps":"Q10_B",
                  "Established goals regarding gender equity for the whole organization":"Q10_C",
                  "Developed a timebound roadmap to meet established goals":"Q10_D","None":"Q10_E",
                   "Dedicated team responsible for establishing and tracking progress toward gender equity targets":
                  "Q11_A","Specific timeline on when targets should be met":"Q11_B",
                   "Engagement with all staff on targets and their importance to the organization":"Q11_C",
                  "Training of all employees who have a direct impact on meeting targets":"Q11_D",
                  "None.1":"Q11_E","Other":"Q11_F","All Hiring Managers":"Q12_A","Human Resources Dept":"Q12_B",
                  "All Senior Management":"Q12_C","C-Suite":"Q12_D","None.2":"Q12_E","Other.1":"Q12_F",
                  "No":"Q13_A","Yes, internal statement made to all employees":"Q13_B",
                  "Yes, official public/external statement made about organization's commitment to gender equity":
                  "Q13_C","Other.2":"Q14_Other",
                   "Vision & Commitment: Does your workplace publish an annual report on its performance on gender " +
                   "equity, diversity and inclusion that includes the following:":"Q14",
                   "Womens Empowerment Principles (UN)":"Q15_A","HRC's Corporate Equality Index":"Q15_B",
                  "LeanIn.org/McKinsey Workplace Survey":"Q15_C","Paradigm for Parity":"Q15_D","Gender Equality Index"
                  :"Q15_E","None.3":"Q15_F","Other.3":"Q15_G"})
df

Unnamed: 0,number_of_employees,number_of_years,industry,workforce,Q9_A,Q9_B,Q9_C,Q9_D,Q9_E,Q9_F,...,Q14,Q14_Other,Q15_A,Q15_B,Q15_C,Q15_D,Q15_E,Q15_F,Q15_G,company_name
0,,,,,,,,,,,...,,,,,,,,,,A
1,Fewer than 100 Employees,Fewer than 5 years,541611,At least 80% of employees are Hourly,,Women and people who identify as women,,"Women of color, including all races and ethnic...",,,...,Publish an annual EDI report that includes per...,,Womens Empowerment Principles (UN),,,,,,,B
2,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,"N/A, No EDI policy in place",,,,,,...,,Internal report of demographic metrics,,,,,,,,C
3,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,No EDI report is published,,,,,,,,,D
4,Fewer than 100 Employees,Fewer than 5 years,523920,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,Publish an annual EDI report that includes per...,,,,,,,,,E
5,Fewer than 100 Employees,5-14 years,61,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,No EDI report is published,,,,,,,,,F
6,Fewer than 100 Employees,5-14 years,2426347,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,No EDI report is published,,,,,,,,,G
7,Fewer than 100 Employees,Fewer than 5 years,236210,At least 80% of employees are Hourly,,,,"Women of color, including all races and ethnic...",,,...,,"Prepare report on work on EDI, no gender perfo...",,,,,,,,H
8,100-249 Employees,15 or more years,Finance (private equity),At least 80% of employees are Salaried,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,No EDI report is published,,,,,,,,,I
9,Fewer than 100 Employees,5-14 years,523920,We have a mixed workforce of hourly and salari...,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,No EDI report is published,,Womens Empowerment Principles (UN),,,,,,Gender Equity Now,J


In [33]:
#Printing the renamed columns
df.columns

Index(['number_of_employees', 'number_of_years', 'industry', 'workforce',
       'Q9_A', 'Q9_B', 'Q9_C', 'Q9_D', 'Q9_E', 'Q9_F', 'Q9_G', 'Q9_H', 'Q9_I',
       'Q10_A', 'Q10_B', 'Q10_C', 'Q10_D', 'Q10_E', 'Q11_A', 'Q11_B', 'Q11_C',
       'Q11_D', 'Q11_E', 'Q11_F', 'Q12_A', 'Q12_B', 'Q12_C', 'Q12_D', 'Q12_E',
       'Q12_F', 'Q13_A', 'Q13_B', 'Q13_C', 'Q14', 'Q14_Other', 'Q15_A',
       'Q15_B', 'Q15_C', 'Q15_D', 'Q15_E', 'Q15_F', 'Q15_G', 'company_name'],
      dtype='object')

In [34]:
#Add the topic name as a column
df["topic"]=["Vision & Commitment"]*10
df

Unnamed: 0,number_of_employees,number_of_years,industry,workforce,Q9_A,Q9_B,Q9_C,Q9_D,Q9_E,Q9_F,...,Q14_Other,Q15_A,Q15_B,Q15_C,Q15_D,Q15_E,Q15_F,Q15_G,company_name,topic
0,,,,,,,,,,,...,,,,,,,,,A,Vision & Commitment
1,Fewer than 100 Employees,Fewer than 5 years,541611,At least 80% of employees are Hourly,,Women and people who identify as women,,"Women of color, including all races and ethnic...",,,...,,Womens Empowerment Principles (UN),,,,,,,B,Vision & Commitment
2,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,"N/A, No EDI policy in place",,,,,,...,Internal report of demographic metrics,,,,,,,,C,Vision & Commitment
3,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,D,Vision & Commitment
4,Fewer than 100 Employees,Fewer than 5 years,523920,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,E,Vision & Commitment
5,Fewer than 100 Employees,5-14 years,61,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,F,Vision & Commitment
6,Fewer than 100 Employees,5-14 years,2426347,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,G,Vision & Commitment
7,Fewer than 100 Employees,Fewer than 5 years,236210,At least 80% of employees are Hourly,,,,"Women of color, including all races and ethnic...",,,...,"Prepare report on work on EDI, no gender perfo...",,,,,,,,H,Vision & Commitment
8,100-249 Employees,15 or more years,Finance (private equity),At least 80% of employees are Salaried,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,,,,,,,,,I,Vision & Commitment
9,Fewer than 100 Employees,5-14 years,523920,We have a mixed workforce of hourly and salari...,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,,Womens Empowerment Principles (UN),,,,,,Gender Equity Now,J,Vision & Commitment


In [35]:
#Notice how this column contains the single select responses but some rows contain NAN
df["Q14"]

0                                                  NaN
1    Publish an annual EDI report that includes per...
2                                                  NaN
3                           No EDI report is published
4    Publish an annual EDI report that includes per...
5                           No EDI report is published
6                           No EDI report is published
7                                                  NaN
8                           No EDI report is published
9                           No EDI report is published
Name: Q14, dtype: object

In [36]:
#Notice how the other column contains the input responses
#The rows with these responses are the ones that are NaN above
#Typeform fills in NAN above if user selects other and puts the other input here
df["Q14_Other"]

0                                                  NaN
1                                                  NaN
2               Internal report of demographic metrics
3                                                  NaN
4                                                  NaN
5                                                  NaN
6                                                  NaN
7    Prepare report on work on EDI, no gender perfo...
8                                                  NaN
9                                                  NaN
Name: Q14_Other, dtype: object

In [37]:
#Addressing the NA other issue in Q14
#Relabeling the NAS in Q14 as Other
df["Q14"][pd.notna(df["Q14_Other"])]="Other"
df["Q14"]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Q14"][pd.notna(df["Q14_Other"])]="Other"


0                                                  NaN
1    Publish an annual EDI report that includes per...
2                                                Other
3                           No EDI report is published
4    Publish an annual EDI report that includes per...
5                           No EDI report is published
6                           No EDI report is published
7                                                Other
8                           No EDI report is published
9                           No EDI report is published
Name: Q14, dtype: object

In [39]:
df

Unnamed: 0,number_of_employees,number_of_years,industry,workforce,Q9_A,Q9_B,Q9_C,Q9_D,Q9_E,Q9_F,...,Q14_Other,Q15_A,Q15_B,Q15_C,Q15_D,Q15_E,Q15_F,Q15_G,company_name,topic
0,,,,,,,,,,,...,,,,,,,,,A,Vision & Commitment
1,Fewer than 100 Employees,Fewer than 5 years,541611,At least 80% of employees are Hourly,,Women and people who identify as women,,"Women of color, including all races and ethnic...",,,...,,Womens Empowerment Principles (UN),,,,,,,B,Vision & Commitment
2,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,"N/A, No EDI policy in place",,,,,,...,Internal report of demographic metrics,,,,,,,,C,Vision & Commitment
3,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,D,Vision & Commitment
4,Fewer than 100 Employees,Fewer than 5 years,523920,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,E,Vision & Commitment
5,Fewer than 100 Employees,5-14 years,61,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,F,Vision & Commitment
6,Fewer than 100 Employees,5-14 years,2426347,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,,...,,,,,,,,,G,Vision & Commitment
7,Fewer than 100 Employees,Fewer than 5 years,236210,At least 80% of employees are Hourly,,,,"Women of color, including all races and ethnic...",,,...,"Prepare report on work on EDI, no gender perfo...",,,,,,,,H,Vision & Commitment
8,100-249 Employees,15 or more years,Finance (private equity),At least 80% of employees are Salaried,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,,,,,,,,,I,Vision & Commitment
9,Fewer than 100 Employees,5-14 years,523920,We have a mixed workforce of hourly and salari...,,Women and people who identify as women,Working mothers,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,...,,Womens Empowerment Principles (UN),,,,,,Gender Equity Now,J,Vision & Commitment


In [38]:
#Write out to csv
df.to_csv("vision_clean.csv")

## Overall Steps
1. Ran notebook code above to generate vision_clean file
2. Dragged file into Tableau
3. Selected all columns that began with "Q" and then selected pivot
4. Renamed the pivoted columns to questions and responses
5. Created a group of the questions for Q9_ questions. Renamed this group with label of Q9 question.
6. 

TODO Before Combining:
1. Confirm cleaning approach
2. Confirm naming for vision and edit the name typos.

## Benefits_inclusion

In [44]:
benefits = pd.read_csv("benefits_inclusion_anon.csv")
benefits

Unnamed: 0,Benefits & Policies: Does your organization provide a universal family leave policy to all employees?,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,26-51 weeks for primary caregiver,26-51 for secondary caregiver,52+ weeks for primary caregiver,52+ weeks for secondary caregiver,None for primary caregiver,...,No action taken yet,"Inclusion, Culture, Training & Community: If a retention action plan has been in place for at least 12 months, have the retention goals been met?","Inclusion, Culture, Training & Community: Is there a written policy that sets expectations for gender equity and diversity representation at all internal and external-facing organization events or events at which the organization presents?","Inclusion, Culture, Training & Community: Is there a written policy about the expected gender equity and diversity alignment standards that all philanthropic, community or other partner organizations would be expected to meet?","Inclusion, Culture, Training & Community: Does your organization evaluate and confirm alignment on gender equity and diversity standards with potential recipients of philanthropy, community or business partners?","Inclusion, Culture, Training & Community: What % of annual corporate giving - covering both philanthropic or operating dollars (if relevant) - is targeted toward initiatives that have a gender equity and diversity focus?","Inclusion, Culture, Training & Community: Has your organization supported state, local, or federal-level gender equity initiatives? (ie - Business Coalition for Equality, Equal Rights Amendment, funding for reproductive health access).","Inclusion, Culture, Training & Community: Has your organization partnered with other organizations in your industry to advocate for improved standards and practices related to gender equity?","Inclusion, Culture, Training & Community: Before we move to the next section, is there anything else you want to tell us about how your organization approaches creating a gender diverse, inclusive culture and community?",organization
0,1,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,,,,,,,,...,,N/A (no action plan in place),0,0,0,0.0,No,No,Guidance on all of this would be most welcome!,Compnay 1F
1,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,,N/A (no action plan in place),0,0,1,0.0,not sure,,,Company 1E
2,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,,N/A (no action plan in place),0,0,0,100.0,no,no,no,Company 1D
3,1,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,,,,,,,,...,,N/A (no action plan in place),0,0,1,10.0,no,no,no,Company 1C
4,0,,Less than 12 weeks for secondary caregiver,12-25 weeks for primary caregiver,,,,,,,...,,N/A (no action plan in place),0,1,1,,,"Yes, Women In Private Equity",Imbedding gender equity and inclusiveness into...,Company 1B
5,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,,N/A (no action plan in place),1,1,1,75.0,"Yes, we regularly support any initiative that ...",Yes,We have made it our mission to support a gende...,Company 1A


In [48]:
recruit = pd.read_csv("Recruitment_compensation_anon.csv")
recruit

Unnamed: 0,"Recruitment, Promotion & Pipeline: Has your organization performed an organization-wide review to identify teams that have no or low gender diversity (ie - ""the lonely only"" - one woman, one female of color, one LGBTQ-identifying person, one woman who is differently-abled)?","Recruitment, Promotion & Pipeline: If a review has been conducted, what have been the results?\n","Recruitment, Promotion & Pipeline: Have you designed and implemented a recruitment process to eliminate gender equity bias? \n",Statement about gender equity goals in recruitment on all job postings,Process in-place (with designated and trained teammates or usage of technology) to review and revise job descriptions prior to posting to ensure gender-biased language is eliminated,"Move toward skills-based competencies and growth mindsets and away from focus on certain schools, prior experiences","Requirement to post jobs on a diverse group of job boards, recruiting firms, etc to ensure opportunities are accessible to a diverse candidate pool",Blind resume review and blind auditions (as applicable),Training for HR and all hiring managers on how to identify and reduce gender equity biases throughout the recruitment process,Oversight and support by trained HR or other designated-internal teams to ensure unbiased recruitment process,...,Women who identify as White/Caucasian.1,Women who identify as Disabled.1,Women who identify as LGBTQ.1,Persons who identify as gender non-binary.1,None.5,Give us feedback - how challenging is it to provide demographic details for your workforce to the degree asked above?,"Compensation and Pay Equity: If your organization has a stock-option program or employee ownership program, have you evaluated whether shares have been allocated equitably across genders?","Compensation & Pay Equity: Is the lowest paid woman - part-time, full-time or on contract - paid a living wage?","Compensation & Pay Equity: Before we move to the next section, is there anything else you want to tell us about how your organization tracks, reports on or is working to improve its compensation and pay equity structures?",organization
0,No,,"Process has been designed, not yet implemented",,Process in-place (with designated and trained ...,,Requirement to post jobs on a diverse group of...,,Training for HR and all hiring managers on how...,,...,,,,,,Again - size of company makes some of these qu...,No evaluation conducted,Yes,No - but more suggestions welcome!,Company 1F
1,In progress,,Process has been designed and implemented,,Process in-place (with designated and trained ...,Move toward skills-based competencies and grow...,Requirement to post jobs on a diverse group of...,,Training for HR and all hiring managers on how...,Oversight and support by trained HR or other d...,...,,,,,,disaggregating race within female/male for the...,N/A (no plan),Yes,still plan to do an equity audit taking into a...,Company 1E
2,Yes,No low-diversity teams were identified.,Process has been designed and implemented,,Process in-place (with designated and trained ...,Move toward skills-based competencies and grow...,Requirement to post jobs on a diverse group of...,,Training for HR and all hiring managers on how...,Oversight and support by trained HR or other d...,...,,,,,,easy!,No evaluation conducted,Yes,,Company1D
3,No,,No process has been designed yet,,,,,,,,...,,,,,,"Difficult, our payscale is set and increases a...",N/A (no plan),No evaluation conducted,No,Company1C
4,Yes,A hiring and promotion plan is in place to add...,Process has been designed and implemented,,Process in-place (with designated and trained ...,,Requirement to post jobs on a diverse group of...,,,,...,Women who identify as White/Caucasian,,Women who identify as LGBTQ,,,Not difficult.,No evaluation conducted,Yes,,Company1B
5,Yes,No low-diversity teams were identified.,Process has been designed and implemented,Statement about gender equity goals in recruit...,Process in-place (with designated and trained ...,Move toward skills-based competencies and grow...,Requirement to post jobs on a diverse group of...,,,,...,,,,,,Some percentage of our employees do not wish t...,N/A (no plan),No evaluation conducted,"We are a small team, and many of the questions...",Company1A


In [46]:
leader = pd.read_csv("leadership_customers_legal_anon.csv")
leader

Unnamed: 0,"Leadership Demographics: Has your organization conducted a review to determine whether the diversity of your workplace reflects the diversity of the localities in which your offices/facilities are located? If you have multiple offices or facilities, your response should capture at least 80% of your workforce.","Leadership Demographics: If yes, how closely does your total workforce reflect the diversity of your local region?",Total Number of Workers,% Workers who identify as Women,% Women who identify as Black or African American,% Women who identify as Latina,% Women who identify as Asian,% Women who identify as Native American/Alaskan/Pacific Islander:,% Women who identify as Mixed Race:,% Women who identify as LGBTQ:,...,Legal Compliance: Does your organization have any litigation regarding harassment claims?,"Legal Compliance: If there is current discrimination or EEOC litigation pending, please provide details..1","Legal Compliance: Over the last 5 years, how much has the organization paid in settlements on discrimination or harassment suits?","Legal Compliance: Describe how and with what frequency your organization ensures compliance with all federal, state and local EEOC mandates.",Legal Compliance: Do employees have a collective bargaining agreement?,Legal Compliance: Has your organization eliminated the use of non-disclosure agreements in settling harassment or discrimination claims?,Legal Compliance: Has your organization eliminated forced mediation requirements from all employee contracts and agreements?,"You've done it! Congratulations on completing the Gender IDEAL Assessment. Before you go, let us know how you would rate the assessment.","Before you go, are there any last bits of feedback you can share? Feedback will help us continuously improve.",organization
0,No,,7,43,0,0,33,0.0,0,0.0,...,0,,0.0,We have an outside manager so this should be a...,0,0,0,9,The survey provided LOTS of food for thought b...,Company 1F
1,No,,83,70,9,18,4,,7,,...,0,,,,0,0,0,8,I think it would be helpful prior to completin...,Company 1E
2,No,,35,45,12,6,0,6.0,0,30.0,...,0,na,0.0,"We are co-employed with a PEO, Insperity, who ...",0,0,0,6,no,Company 1D
3,No,,30,17,3,0,3,0.0,0,0.0,...,0,,0.0,Our County does,1,0,0,8,Easier navigation through the sections.,Company 1C
4,No,,216,38,1,1,6,0.0,0,1.0,...,0,,,ongoing,0,0,0,10,Thank you for creating this survey. DEI is so ...,Company 1B
5,In progress,Minimal to no disparities (ie - the differenti...,8,100,13,13,13,26.0,26,13.0,...,0,,0.0,We have outside audits of our adherence to EEO...,1,1,1,8,"Some of these answers were difficult, as we ar...",Company 1A


In [49]:
leader.columns

Index(['Leadership Demographics: Has your organization conducted a review to determine whether the diversity of your workplace reflects the diversity of the localities in which your offices/facilities are located?  If you have multiple offices or facilities, your response should capture at least 80% of your workforce.',
       'Leadership Demographics: If yes, how closely does your total workforce reflect the diversity of your local region?',
       'Total Number of Workers', '% Workers who identify as Women',
       '% Women who identify as Black or African American',
       '%  Women who identify as Latina', '%  Women who identify as Asian',
       '% Women who identify as Native American/Alaskan/Pacific Islander:',
       '% Women who identify as Mixed Race:', '% Women who identify as LGBTQ:',
       '% Women with disabilities',
       '% Persons who identify as Gender non-binary',
       'Give us feedback - how challenging is it to provide demographic details for your workforce to 

In [52]:
#Dropping feedback columns
leader=leader.drop(columns=["Give us feedback - how challenging is it to provide demographic details for your workforce "+
                     "to the degree asked above?","Give us feedback - how challenging is it to provide demographic "+
                     "details for your workforce to the degree asked above?.1","Give us feedback - how challenging "+
                     "is it to provide demographic details for your workforce to the degree asked above?.2",
                    "Give us feedback - how challenging is it to provide demographic details for your workforce to "+
                     "the degree asked above?.3","Give us feedback - how challenging is it to provide demographic "+
                     "details for your workforce to the degree asked above?.4","Give us feedback - how challenging "+
                     "is it to provide demographic details for your workforce to the degree asked above?.5",
                     "Leadership Demographics: Before we move to the next section, is there anything else you want "+
                     "to tell us about how your organization tracks, reports on or seeks to change its leadership "+
                     "demographics?", "You've done it!  Congratulations on completing the Gender IDEAL Assessment.  "+
                     "Before you go, let us know how you would rate the assessment.", "Before you go, are there any "+
                     "last bits of feedback you can share?  Feedback will help us continuously improve."
                    ])

In [56]:
#Renaming the columns
#How to think about Q#s in terms of the different 4 parts of survey?
leader=leader.rename(columns={
    "Leadership Demographics: Has your organization conducted a review to determine whether the diversity of your"+
    " workplace reflects the diversity of the localities in which your offices/facilities are located?  If you "+
    "have multiple offices or facilities, your response should capture at least 80% of your workforce." :
    "Q1", "Leadership Demographics: If yes, how closely does your total workforce reflect the diversity of your "+
    "local region?": "Q2", "Total Number of Workers": "Q3_A", "% Workers who identify as Women": "Q3_B",
    "% Women who identify as Black or African American" : "Q3_C", "%  Women who identify as Latina"
    : "Q3_D", "%  Women who identify as Asian":"Q3_E", "% Women who identify as Native American/Alaskan/Pacific "+
    "Islander:" : "Q3_F", "% Women who identify as Mixed Race:" : "Q3_G", "% Women who identify as LGBTQ:" : "Q3_H",
    "% Women with disabilities" : "Q3_I", "% Persons who identify as Gender non-binary" : "Q3_J", 
    "% Managers who identify as Women" : "Q4_A", "% Women managers who identify as Black or African American" : 
    "Q4_B", "%  Women managers who identify as Latina":"Q4_C", "%  Women managers who identify as Asian" : "Q4_D",
    "% Women managers who identify as Native American/Alaskan/Pacific Islander:": "Q4_E", "% Women managers "+
    "who identify as Mixed-Race:" : "Q4_F", "% Women managers who identify as LGBTQ:":"Q4_G", "% Women managers"+
    " with disabilities" : "Q4_H", "% managers who identify as Gender non-binary" : "Q4_I",
    "% Executive management who identify as Women" : "Q5_A", "% Women executive managers who identify as Black "+
    "or African American" : "Q5_B", "%  Women executive managers who identify as Latina" : "Q5_C", "%  Women "+
    "executive managers who identify as Asian" : "Q5_D", "% Women executive managers who identify as Native "+
    "American/Alaskan/Pacific Islander:" : "Q5_E", "% Women executive managers who identify as Mixed-Race:" :
    "Q5_F", "% Women executive managers who identify as LGBTQ:" : "Q5_G", "% Women executive managers with "+
    "disabilities" : "Q5_H", "% executive managers who identify as Gender non-binary" : "Q5_I",
    "% C-Suite Level who identify as Women" : "Q6_A", "% Women C-Suite Level who identify as Black or African "+
    "American" : "Q6_B",  '%  Women C-Suite Level who identify as Latina' : "Q6_C",
   '%  Women C-Suite Level who identify as Asian' : "Q6_D",
   '% Women C-Suite Level who identify as Native American/Alaskan/Pacific Islander:' : "Q6_E",
   '% Women C-Suite Level who identify as Mixed-Race:' : "Q6_F",
   '% Women C-Suite Level who identify as LGBTQ:' : "Q6_G",
   '% Women C-Suite Level with disabilities' : "Q6_H",
   '% C-Suite Level who identify as Gender non-binary' : "Q6_I",
   '% Board of Directors or Governing Body who identify as Women' : "Q7_A",
   '% Women Board of Directors or Governing Body who identify as Black or African American' : "Q7_B",
   '%  Women Board of Directors or Governing Body who identify as Latina' : "Q7_C",
   '%  Women Board of Directors or Governing Body who identify as Asian' : "Q7_D",
   '% Women Board of Directors or Governing Body who identify as Native American/Alaskan/Pacific Islander:' : "Q7_E",
   '% Women Board of Directors or Governing Body who identify as Mixed-Race:' : "Q7_F",
   '% Women Board of Directors or Governing Body who identify as LGBTQ:' : "Q7_G",
   '% Women Board of Directors or Governing Body with disabilities' : "Q7_H",
   '% Board of Directors or Governing Body who identify as Gender non-binary' : "Q7_I",
   "Our organization is not-for-profit." : "Q8_A",  
   '% Ownership who identify as Women' : "Q8_B",
   '% Women Ownership who identify as Black or African American' : "Q8_C",
   '%  Women Ownership who identify as Latina' : "Q8_D",
   '%  Women Ownership who identify as Asian' : "Q8_E",
   '% Women Ownership who identify as Native American/Alaskan/Pacific Islander:' : "Q8_F",
   '% Women Ownership who identify as Mixed-Race:' : "Q8_G",
   '% Women Ownership who identify as LGBTQ:' : "Q8_H",
   '% Women Ownership with disabilities' : "Q8_I",
   '% Ownership who identify as Gender non-binary' : "Q8_J",
    'C-Suite' : "Q9_A", 'Board/Advisory Bodies' : "Q9_B", 'Senior Management' : "Q9_C", 'Management' : "Q9_D",
    'Mentorship Programs' : "Q9_E", 'Internal Promotions' : "Q9_F", 'No' : "Q9_G", "In the last 12 months, did "+
    "your organization make progress toward achieving gender representation targets?" : "Q10",
    
})

In [57]:
#Write out to CSV
leader.to_csv("leader_clean.csv")

In [None]:
#Add a new column topic with "Leadership Demographics" Q1-Q10
