# Data Cleaning for Gender IDEAL

Before committing, please re-run the kernel with clear any output to avoid any merge issues with jupyter and github.

## Imports

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt
import altair as alt

## Reading in csvs

#### Summary about columns in Vision file

* There are 44 columns in Vision & Commitment. The multiple choice answers do not come downloaded with the question label. The answers are the column names and the first column name is the first option a user could choose.
* If a type of multiple choice response was already used in an earlier question, a number is appended to it, like None.1
* Tableau prefers data to be "tall and thin" (i.e. instead of one row per respondent, one row per question) https://www.tableau.com/about/blog/2018/2/prepare-survey-data-analysis-three-easy-steps-83122 -> This can be done in Tableau using the pivot functionality



### Vision
This includes the benchmark questions such as number of employees, number of years, industry, workforce type. 

Also includes "other relevant info" which is open response. Can we remove this?

In [5]:
vision = pd.read_csv("../data_original/vision_2_anon.csv")
vision

Unnamed: 0.1,Unnamed: 0,How many employees does your organization have?,For how many years has your organization been in operation?,What is your organization's industry?,Tell us about your organization's workforce?,"Thanks! You're done with the registration process. Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",...,"Vision & Commitment: Does your workplace publish an annual report on its performance on gender equity, diversity and inclusion that includes the following:",Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3,"Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?"
0,Company 1M,Fewer than 100 Employees,15 or more years,541613,At least 80% of employees are Salaried,"Teams also in Canada, Singapore and UK","N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,
1,Company 1L,Fewer than 100 Employees,5-14 years,813410,We have a mixed workforce of hourly and salari...,Team is entirely based in NY,,,,"Women of color, including all races and ethnic...",...,No EDI report is published,,,,,,,,,
2,Company 1K,Fewer than 100 Employees,15 or more years,624110,At least 80% of employees are Salaried,Massachusetts-focused,,Women and people who identify as women,,,...,No EDI report is published,,,,,,,,,
3,Company 1J,"1000-4,999 Employees",15 or more years,Financial Services,We have a mixed workforce of hourly and salari...,,,Women and people who identify as women,,"Women of color, including all races and ethnic...",...,No EDI report is published,,,,,,,,,
4,Company 1I,Fewer than 100 Employees,5-14 years,813319 - Other Social Advocacy Organizations,At least 80% of employees are Salaried,"Liberia, Malawi, Ethiopia, Uganda. We did not ...","N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,
5,Company 1H,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,US employees only,"N/A, No EDI policy in place",,,,...,,Internal report of demographic metrics,,,,,,,,
6,Company 1G,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,,"N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,The CEO has mentioned to me (the only female i...


In [6]:
df = vision.copy()
df

Unnamed: 0.1,Unnamed: 0,How many employees does your organization have?,For how many years has your organization been in operation?,What is your organization's industry?,Tell us about your organization's workforce?,"Thanks! You're done with the registration process. Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",...,"Vision & Commitment: Does your workplace publish an annual report on its performance on gender equity, diversity and inclusion that includes the following:",Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3,"Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?"
0,Company 1M,Fewer than 100 Employees,15 or more years,541613,At least 80% of employees are Salaried,"Teams also in Canada, Singapore and UK","N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,
1,Company 1L,Fewer than 100 Employees,5-14 years,813410,We have a mixed workforce of hourly and salari...,Team is entirely based in NY,,,,"Women of color, including all races and ethnic...",...,No EDI report is published,,,,,,,,,
2,Company 1K,Fewer than 100 Employees,15 or more years,624110,At least 80% of employees are Salaried,Massachusetts-focused,,Women and people who identify as women,,,...,No EDI report is published,,,,,,,,,
3,Company 1J,"1000-4,999 Employees",15 or more years,Financial Services,We have a mixed workforce of hourly and salari...,,,Women and people who identify as women,,"Women of color, including all races and ethnic...",...,No EDI report is published,,,,,,,,,
4,Company 1I,Fewer than 100 Employees,5-14 years,813319 - Other Social Advocacy Organizations,At least 80% of employees are Salaried,"Liberia, Malawi, Ethiopia, Uganda. We did not ...","N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,
5,Company 1H,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,US employees only,"N/A, No EDI policy in place",,,,...,,Internal report of demographic metrics,,,,,,,,
6,Company 1G,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,,"N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,The CEO has mentioned to me (the only female i...


In [7]:
df.columns

Index(['Unnamed: 0', 'How many employees does your organization have?',
       'For how many years has your organization been in operation?',
       'What is your organization's industry?',
       'Tell us about your organization's workforce?',
       'Thanks! You're done with the registration process.  Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).',
       'N/A, No EDI policy in place', 'Women and people who identify as women',
       'Working mothers',
       'Women of color, including all races and ethnicities',
       'Women of all types of abilities',
       'Persons of all non-binary gender identities',
       'Women of all sexual orientations', 'Women of all religions',
       'Women of all socio-economic levels',
       'Conducted a demographic analysis of the current workforce',
       'Conducted a baseline culture analysis using employee feedback to identify gaps',
       'Established goals regarding gender equity f

In [9]:
#Renaming some of the columns to be shorter
df.rename(columns={"Unnamed: 0":'organization','How many employees does your organization have?':'number_of_employees','For how many years has your organization been in operation?':'number_of_years',"What is your organization's industry?":'industry',"Tell us about your organization's workforce?":"workforce"},inplace=True)
df

Unnamed: 0,organization,number_of_employees,number_of_years,industry,workforce,"Thanks! You're done with the registration process. Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",...,"Vision & Commitment: Does your workplace publish an annual report on its performance on gender equity, diversity and inclusion that includes the following:",Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3,"Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?"
0,Company 1M,Fewer than 100 Employees,15 or more years,541613,At least 80% of employees are Salaried,"Teams also in Canada, Singapore and UK","N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,
1,Company 1L,Fewer than 100 Employees,5-14 years,813410,We have a mixed workforce of hourly and salari...,Team is entirely based in NY,,,,"Women of color, including all races and ethnic...",...,No EDI report is published,,,,,,,,,
2,Company 1K,Fewer than 100 Employees,15 or more years,624110,At least 80% of employees are Salaried,Massachusetts-focused,,Women and people who identify as women,,,...,No EDI report is published,,,,,,,,,
3,Company 1J,"1000-4,999 Employees",15 or more years,Financial Services,We have a mixed workforce of hourly and salari...,,,Women and people who identify as women,,"Women of color, including all races and ethnic...",...,No EDI report is published,,,,,,,,,
4,Company 1I,Fewer than 100 Employees,5-14 years,813319 - Other Social Advocacy Organizations,At least 80% of employees are Salaried,"Liberia, Malawi, Ethiopia, Uganda. We did not ...","N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,
5,Company 1H,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,US employees only,"N/A, No EDI policy in place",,,,...,,Internal report of demographic metrics,,,,,,,,
6,Company 1G,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,,"N/A, No EDI policy in place",,,,...,No EDI report is published,,,,,,,,,The CEO has mentioned to me (the only female i...


In [10]:
#Dropping some of the unneeded columns
df.drop(columns=["Thanks! You're done with the registration process.  Before we jump into the assessment, can you tell us a few other things about your workplace (if relevant).","Vision & Commitment: Before we move to the next section, is there anything else you want to tell us about how your organization has articulated and reinforces its vision and commitment to gender equity?"],inplace=True)
df

Unnamed: 0,organization,number_of_employees,number_of_years,industry,workforce,"N/A, No EDI policy in place",Women and people who identify as women,Working mothers,"Women of color, including all races and ethnicities",Women of all types of abilities,...,"Yes, official public/external statement made about organization's commitment to gender equity","Vision & Commitment: Does your workplace publish an annual report on its performance on gender equity, diversity and inclusion that includes the following:",Other.2,Womens Empowerment Principles (UN),HRC's Corporate Equality Index,LeanIn.org/McKinsey Workplace Survey,Paradigm for Parity,Gender Equality Index,None.3,Other.3
0,Company 1M,Fewer than 100 Employees,15 or more years,541613,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,
1,Company 1L,Fewer than 100 Employees,5-14 years,813410,We have a mixed workforce of hourly and salari...,,,,"Women of color, including all races and ethnic...",Women of all types of abilities,...,,No EDI report is published,,,,,,,,
2,Company 1K,Fewer than 100 Employees,15 or more years,624110,At least 80% of employees are Salaried,,Women and people who identify as women,,,,...,"Yes, official public/external statement made a...",No EDI report is published,,,,,,,,
3,Company 1J,"1000-4,999 Employees",15 or more years,Financial Services,We have a mixed workforce of hourly and salari...,,Women and people who identify as women,,"Women of color, including all races and ethnic...",Women of all types of abilities,...,"Yes, official public/external statement made a...",No EDI report is published,,,,,,,,
4,Company 1I,Fewer than 100 Employees,5-14 years,813319 - Other Social Advocacy Organizations,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,
5,Company 1H,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,"N/A, No EDI policy in place",,,,,...,,,Internal report of demographic metrics,,,,,,,
6,Company 1G,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,


In [11]:
#Printing out the column names
df.columns

Index(['organization', 'number_of_employees', 'number_of_years', 'industry',
       'workforce', 'N/A, No EDI policy in place',
       'Women and people who identify as women', 'Working mothers',
       'Women of color, including all races and ethnicities',
       'Women of all types of abilities',
       'Persons of all non-binary gender identities',
       'Women of all sexual orientations', 'Women of all religions',
       'Women of all socio-economic levels',
       'Conducted a demographic analysis of the current workforce',
       'Conducted a baseline culture analysis using employee feedback to identify gaps',
       'Established goals regarding gender equity for the whole organization',
       'Developed a timebound roadmap to meet established goals', 'None',
       'Dedicated team responsible for establishing and tracking progress toward gender equity targets',
       'Specific timeline on when targets should be met',
       'Engagement with all staff on targets and their impo

In [12]:
#Renaming some of the columns based on Question # followed by _ then A/B/C/etc if a multiple selection
#Question # is from the survey at: https://gender-ideal.org/the-assessment
#Prepending the original column label with this
df=df.rename(columns={"N/A, No EDI policy in place":"Q9_A","Women and people who identify as women":"Q9_B",
                   "Working mothers":"Q9_C","Women of color, including all races and ethnicities":"Q9_D",
                  "Women of all types of abilities":"Q9_E","Persons of all non-binary gender identities":"Q9_F",
                  "Women of all sexual orientations":"Q9_G","Women of all religions":"Q9_H",
                   "Women of all socio-economic levels":"Q9_I",
                   "Conducted a demographic analysis of the current workforce"
                   :"Q10_A","Conducted a baseline culture analysis using employee feedback to identify gaps":"Q10_B",
                  "Established goals regarding gender equity for the whole organization":"Q10_C",
                  "Developed a timebound roadmap to meet established goals":"Q10_D","None":"Q10_E",
                   "Dedicated team responsible for establishing and tracking progress toward gender equity targets":
                  "Q11_A","Specific timeline on when targets should be met":"Q11_B",
                   "Engagement with all staff on targets and their importance to the organization":"Q11_C",
                  "Training of all employees who have a direct impact on meeting targets":"Q11_D",
                  "None.1":"Q11_E","Other":"Q11_F","All Hiring Managers":"Q12_A","Human Resources Dept":"Q12_B",
                  "All Senior Management":"Q12_C","C-Suite":"Q12_D","None.2":"Q12_E","Other.1":"Q12_F",
                  "No":"Q13_A","Yes, internal statement made to all employees":"Q13_B",
                  "Yes, official public/external statement made about organization's commitment to gender equity":
                  "Q13_C","Other.2":"Q14_Other",
                   "Vision & Commitment: Does your workplace publish an annual report on its performance on gender " +
                   "equity, diversity and inclusion that includes the following:":"Q14",
                   "Womens Empowerment Principles (UN)":"Q15_A","HRC's Corporate Equality Index":"Q15_B",
                  "LeanIn.org/McKinsey Workplace Survey":"Q15_C","Paradigm for Parity":"Q15_D","Gender Equality Index"
                  :"Q15_E","None.3":"Q15_F","Other.3":"Q15_G"})
df

Unnamed: 0,organization,number_of_employees,number_of_years,industry,workforce,Q9_A,Q9_B,Q9_C,Q9_D,Q9_E,...,Q13_C,Q14,Q14_Other,Q15_A,Q15_B,Q15_C,Q15_D,Q15_E,Q15_F,Q15_G
0,Company 1M,Fewer than 100 Employees,15 or more years,541613,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,
1,Company 1L,Fewer than 100 Employees,5-14 years,813410,We have a mixed workforce of hourly and salari...,,,,"Women of color, including all races and ethnic...",Women of all types of abilities,...,,No EDI report is published,,,,,,,,
2,Company 1K,Fewer than 100 Employees,15 or more years,624110,At least 80% of employees are Salaried,,Women and people who identify as women,,,,...,"Yes, official public/external statement made a...",No EDI report is published,,,,,,,,
3,Company 1J,"1000-4,999 Employees",15 or more years,Financial Services,We have a mixed workforce of hourly and salari...,,Women and people who identify as women,,"Women of color, including all races and ethnic...",Women of all types of abilities,...,"Yes, official public/external statement made a...",No EDI report is published,,,,,,,,
4,Company 1I,Fewer than 100 Employees,5-14 years,813319 - Other Social Advocacy Organizations,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,
5,Company 1H,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,"N/A, No EDI policy in place",,,,,...,,,Internal report of demographic metrics,,,,,,,
6,Company 1G,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,


In [13]:
#Printing the renamed columns
df.columns

Index(['organization', 'number_of_employees', 'number_of_years', 'industry',
       'workforce', 'Q9_A', 'Q9_B', 'Q9_C', 'Q9_D', 'Q9_E', 'Q9_F', 'Q9_G',
       'Q9_H', 'Q9_I', 'Q10_A', 'Q10_B', 'Q10_C', 'Q10_D', 'Q10_E', 'Q11_A',
       'Q11_B', 'Q11_C', 'Q11_D', 'Q11_E', 'Q11_F', 'Q12_A', 'Q12_B', 'Q12_C',
       'Q12_D', 'Q12_E', 'Q12_F', 'Q13_A', 'Q13_B', 'Q13_C', 'Q14',
       'Q14_Other', 'Q15_A', 'Q15_B', 'Q15_C', 'Q15_D', 'Q15_E', 'Q15_F',
       'Q15_G'],
      dtype='object')

In [14]:
#Notice how this column contains the single select responses but some rows contain NAN
df["Q14"]

0    No EDI report is published
1    No EDI report is published
2    No EDI report is published
3    No EDI report is published
4    No EDI report is published
5                           NaN
6    No EDI report is published
Name: Q14, dtype: object

In [15]:
#Notice how the other column contains the input responses
#The rows with these responses are the ones that are NaN above
#Typeform fills in NAN above if user selects other and puts the other input here
df["Q14_Other"]

0                                       NaN
1                                       NaN
2                                       NaN
3                                       NaN
4                                       NaN
5    Internal report of demographic metrics
6                                       NaN
Name: Q14_Other, dtype: object

In [16]:
#Addressing the NA other issue in Q14
#Relabeling the NAS in Q14 as Other
df["Q14"][pd.notna(df["Q14_Other"])]="Other"
df["Q14"]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Q14"][pd.notna(df["Q14_Other"])]="Other"


0    No EDI report is published
1    No EDI report is published
2    No EDI report is published
3    No EDI report is published
4    No EDI report is published
5                         Other
6    No EDI report is published
Name: Q14, dtype: object

In [17]:
df

Unnamed: 0,organization,number_of_employees,number_of_years,industry,workforce,Q9_A,Q9_B,Q9_C,Q9_D,Q9_E,...,Q13_C,Q14,Q14_Other,Q15_A,Q15_B,Q15_C,Q15_D,Q15_E,Q15_F,Q15_G
0,Company 1M,Fewer than 100 Employees,15 or more years,541613,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,
1,Company 1L,Fewer than 100 Employees,5-14 years,813410,We have a mixed workforce of hourly and salari...,,,,"Women of color, including all races and ethnic...",Women of all types of abilities,...,,No EDI report is published,,,,,,,,
2,Company 1K,Fewer than 100 Employees,15 or more years,624110,At least 80% of employees are Salaried,,Women and people who identify as women,,,,...,"Yes, official public/external statement made a...",No EDI report is published,,,,,,,,
3,Company 1J,"1000-4,999 Employees",15 or more years,Financial Services,We have a mixed workforce of hourly and salari...,,Women and people who identify as women,,"Women of color, including all races and ethnic...",Women of all types of abilities,...,"Yes, official public/external statement made a...",No EDI report is published,,,,,,,,
4,Company 1I,Fewer than 100 Employees,5-14 years,813319 - Other Social Advocacy Organizations,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,
5,Company 1H,250-999 Employees,5-14 years,518210,We have a mixed workforce of hourly and salari...,"N/A, No EDI policy in place",,,,,...,,Other,Internal report of demographic metrics,,,,,,,
6,Company 1G,Fewer than 100 Employees,Fewer than 5 years,611699,At least 80% of employees are Salaried,"N/A, No EDI policy in place",,,,,...,,No EDI report is published,,,,,,,,


### Examining Each Question
#### Question 9

In [20]:
#Selecting all of the Q9 columns from the dataframe
#This is a choose all that apply question so each answer results in its own column
df[['Q9_A', 'Q9_B', 'Q9_C', 'Q9_D', 'Q9_E', 'Q9_F', 'Q9_G', 'Q9_H', 'Q9_I']]

Unnamed: 0,Q9_A,Q9_B,Q9_C,Q9_D,Q9_E,Q9_F,Q9_G,Q9_H,Q9_I
0,"N/A, No EDI policy in place",,,,,,,,
1,,,,"Women of color, including all races and ethnic...",Women of all types of abilities,,,,
2,,Women and people who identify as women,,,,,,,
3,,Women and people who identify as women,,"Women of color, including all races and ethnic...",Women of all types of abilities,Persons of all non-binary gender identities,Women of all sexual orientations,Women of all religions,
4,"N/A, No EDI policy in place",,,,,,,,
5,"N/A, No EDI policy in place",,,,,,,,
6,"N/A, No EDI policy in place",,,,,,,,


It looks like there's at least one response per row, so no missing data.

#### Question 10

In [22]:
#Selecting all of the Q10 columns from the dataframe
#This is a choose all that apply question so each answer results in its own column
df[['Q10_A', 'Q10_B', 'Q10_C', 'Q10_D', 'Q10_E']]

Unnamed: 0,Q10_A,Q10_B,Q10_C,Q10_D,Q10_E
0,Conducted a demographic analysis of the curren...,Conducted a baseline culture analysis using em...,,,
1,Conducted a demographic analysis of the curren...,,,,
2,Conducted a demographic analysis of the curren...,,,,
3,Conducted a demographic analysis of the curren...,Conducted a baseline culture analysis using em...,,,
4,Conducted a demographic analysis of the curren...,Conducted a baseline culture analysis using em...,Established goals regarding gender equity for ...,Developed a timebound roadmap to meet establis...,
5,Conducted a demographic analysis of the curren...,Conducted a baseline culture analysis using em...,Established goals regarding gender equity for ...,,
6,,Conducted a baseline culture analysis using em...,,,


It looks like there's at least one response per row, so no missing data.

#### Question 11

In [25]:
df[['Q11_A','Q11_B', 'Q11_C', 'Q11_D', 'Q11_E', 'Q11_F']]

Unnamed: 0,Q11_A,Q11_B,Q11_C,Q11_D,Q11_E,Q11_F
0,Dedicated team responsible for establishing an...,,Engagement with all staff on targets and their...,Training of all employees who have a direct im...,,
1,,,,,,
2,,,,,,
3,Dedicated team responsible for establishing an...,,,,,
4,,Specific timeline on when targets should be met,Engagement with all staff on targets and their...,,,Hired an external DEI consultant to help evalu...
5,,,Engagement with all staff on targets and their...,,,
6,,,,,,Engagement with SOME staff on targets and thei...


There is no response for the company is row 2. Will note in the Python spreadsheet.

In [33]:
#Need to double check but for now will recode the missing value for row 2 as None
df['Q11_E',2] = "None"

In [34]:
#Double check new value of row 2 is now None
df[['Q11_A','Q11_B', 'Q11_C', 'Q11_D', 'Q11_E', 'Q11_F']]

Unnamed: 0,Q11_A,Q11_B,Q11_C,Q11_D,Q11_E,Q11_F
0,Dedicated team responsible for establishing an...,,Engagement with all staff on targets and their...,Training of all employees who have a direct im...,,
1,,,,,,
2,,,,,,
3,Dedicated team responsible for establishing an...,,,,,
4,,Specific timeline on when targets should be met,Engagement with all staff on targets and their...,,,Hired an external DEI consultant to help evalu...
5,,,Engagement with all staff on targets and their...,,,
6,,,,,,Engagement with SOME staff on targets and thei...


#### Question 12

In [26]:
df[['Q12_A', 'Q12_B', 'Q12_C','Q12_D', 'Q12_E', 'Q12_F']]

Unnamed: 0,Q12_A,Q12_B,Q12_C,Q12_D,Q12_E,Q12_F
0,,,,,,
1,,,,,,
2,,,,,,
3,,,,,,
4,All Hiring Managers,,All Senior Management,C-Suite,,
5,,,,,,
6,,,,,,


No missing values here but this question also provides a None answer.

#### Question 13

In [28]:
df[['Q13_A', 'Q13_B', 'Q13_C']]

Unnamed: 0,Q13_A,Q13_B,Q13_C
0,No,,
1,No,,
2,,,"Yes, official public/external statement made a..."
3,,,"Yes, official public/external statement made a..."
4,,"Yes, internal statement made to all employees",
5,No,,
6,,"Yes, internal statement made to all employees",


All companies provided a response for this one.

#### Question 14

In [29]:
df[['Q14','Q14_Other']]

Unnamed: 0,Q14,Q14_Other
0,No EDI report is published,
1,No EDI report is published,
2,No EDI report is published,
3,No EDI report is published,
4,No EDI report is published,
5,Other,Internal report of demographic metrics
6,No EDI report is published,


#### Question 15

In [30]:
df[['Q15_A', 'Q15_B', 'Q15_C', 'Q15_D', 'Q15_E', 'Q15_F','Q15_G']]

Unnamed: 0,Q15_A,Q15_B,Q15_C,Q15_D,Q15_E,Q15_F,Q15_G
0,,,,,,,
1,,,,,,,
2,,,,,,,
3,,,,,,,
4,,,,,,,
5,,,,,,,
6,,,,,,,


In [35]:
#Write out to csv
df.to_csv("vision_clean2.csv")

## Overall Steps
1. Ran notebook code above to generate vision_clean file
2. Dragged file into Tableau
3. Selected all columns that began with "Q" and then selected pivot
4. Renamed the pivoted columns to questions and responses
5. Created a group of the questions for Q9_ questions. Renamed this group with label of Q9 question.
6. 

TODO Before Combining:
1. Confirm cleaning approach
2. Confirm naming for vision and edit the name typos.

# Exploring other data files

In [7]:
leader = pd.read_csv("../data_original/leadership_customers_legal_anon.csv")
leader

Unnamed: 0,"Leadership Demographics: Has your organization conducted a review to determine whether the diversity of your workplace reflects the diversity of the localities in which your offices/facilities are located? If you have multiple offices or facilities, your response should capture at least 80% of your workforce.","Leadership Demographics: If yes, how closely does your total workforce reflect the diversity of your local region?",Total Number of Workers,% Workers who identify as Women,% Women who identify as Black or African American,% Women who identify as Latina,% Women who identify as Asian,% Women who identify as Native American/Alaskan/Pacific Islander:,% Women who identify as Mixed Race:,% Women who identify as LGBTQ:,...,Legal Compliance: Does your organization have any litigation regarding harassment claims?,"Legal Compliance: If there is current discrimination or EEOC litigation pending, please provide details..1","Legal Compliance: Over the last 5 years, how much has the organization paid in settlements on discrimination or harassment suits?","Legal Compliance: Describe how and with what frequency your organization ensures compliance with all federal, state and local EEOC mandates.",Legal Compliance: Do employees have a collective bargaining agreement?,Legal Compliance: Has your organization eliminated the use of non-disclosure agreements in settling harassment or discrimination claims?,Legal Compliance: Has your organization eliminated forced mediation requirements from all employee contracts and agreements?,"You've done it! Congratulations on completing the Gender IDEAL Assessment. Before you go, let us know how you would rate the assessment.","Before you go, are there any last bits of feedback you can share? Feedback will help us continuously improve.",organization
0,No,,7,43,0,0,33,0.0,0,0.0,...,0,,0.0,We have an outside manager so this should be a...,0,0,0,9,The survey provided LOTS of food for thought b...,Company 1F
1,No,,83,70,9,18,4,,7,,...,0,,,,0,0,0,8,I think it would be helpful prior to completin...,Company 1E
2,No,,35,45,12,6,0,6.0,0,30.0,...,0,na,0.0,"We are co-employed with a PEO, Insperity, who ...",0,0,0,6,no,Company 1D
3,No,,30,17,3,0,3,0.0,0,0.0,...,0,,0.0,Our County does,1,0,0,8,Easier navigation through the sections.,Company 1C
4,No,,216,38,1,1,6,0.0,0,1.0,...,0,,,ongoing,0,0,0,10,Thank you for creating this survey. DEI is so ...,Company 1B
5,In progress,Minimal to no disparities (ie - the differenti...,8,100,13,13,13,26.0,26,13.0,...,0,,0.0,We have outside audits of our adherence to EEO...,1,1,1,8,"Some of these answers were difficult, as we ar...",Company 1A


In [6]:
leader.columns

Index(['Leadership Demographics: Has your organization conducted a review to determine whether the diversity of your workplace reflects the diversity of the localities in which your offices/facilities are located?  If you have multiple offices or facilities, your response should capture at least 80% of your workforce.',
       'Leadership Demographics: If yes, how closely does your total workforce reflect the diversity of your local region?',
       'Total Number of Workers', '% Workers who identify as Women',
       '% Women who identify as Black or African American',
       '%  Women who identify as Latina', '%  Women who identify as Asian',
       '% Women who identify as Native American/Alaskan/Pacific Islander:',
       '% Women who identify as Mixed Race:', '% Women who identify as LGBTQ:',
       '% Women with disabilities',
       '% Persons who identify as Gender non-binary',
       'Give us feedback - how challenging is it to provide demographic details for your workforce to 

In [52]:
#Dropping feedback columns
leader=leader.drop(columns=["Give us feedback - how challenging is it to provide demographic details for your workforce "+
                     "to the degree asked above?","Give us feedback - how challenging is it to provide demographic "+
                     "details for your workforce to the degree asked above?.1","Give us feedback - how challenging "+
                     "is it to provide demographic details for your workforce to the degree asked above?.2",
                    "Give us feedback - how challenging is it to provide demographic details for your workforce to "+
                     "the degree asked above?.3","Give us feedback - how challenging is it to provide demographic "+
                     "details for your workforce to the degree asked above?.4","Give us feedback - how challenging "+
                     "is it to provide demographic details for your workforce to the degree asked above?.5",
                     "Leadership Demographics: Before we move to the next section, is there anything else you want "+
                     "to tell us about how your organization tracks, reports on or seeks to change its leadership "+
                     "demographics?", "You've done it!  Congratulations on completing the Gender IDEAL Assessment.  "+
                     "Before you go, let us know how you would rate the assessment.", "Before you go, are there any "+
                     "last bits of feedback you can share?  Feedback will help us continuously improve."
                    ])

In [56]:
#Renaming the columns
#How to think about Q#s in terms of the different 4 parts of survey?
leader=leader.rename(columns={
    "Leadership Demographics: Has your organization conducted a review to determine whether the diversity of your"+
    " workplace reflects the diversity of the localities in which your offices/facilities are located?  If you "+
    "have multiple offices or facilities, your response should capture at least 80% of your workforce." :
    "Q1", "Leadership Demographics: If yes, how closely does your total workforce reflect the diversity of your "+
    "local region?": "Q2", "Total Number of Workers": "Q3_A", "% Workers who identify as Women": "Q3_B",
    "% Women who identify as Black or African American" : "Q3_C", "%  Women who identify as Latina"
    : "Q3_D", "%  Women who identify as Asian":"Q3_E", "% Women who identify as Native American/Alaskan/Pacific "+
    "Islander:" : "Q3_F", "% Women who identify as Mixed Race:" : "Q3_G", "% Women who identify as LGBTQ:" : "Q3_H",
    "% Women with disabilities" : "Q3_I", "% Persons who identify as Gender non-binary" : "Q3_J", 
    "% Managers who identify as Women" : "Q4_A", "% Women managers who identify as Black or African American" : 
    "Q4_B", "%  Women managers who identify as Latina":"Q4_C", "%  Women managers who identify as Asian" : "Q4_D",
    "% Women managers who identify as Native American/Alaskan/Pacific Islander:": "Q4_E", "% Women managers "+
    "who identify as Mixed-Race:" : "Q4_F", "% Women managers who identify as LGBTQ:":"Q4_G", "% Women managers"+
    " with disabilities" : "Q4_H", "% managers who identify as Gender non-binary" : "Q4_I",
    "% Executive management who identify as Women" : "Q5_A", "% Women executive managers who identify as Black "+
    "or African American" : "Q5_B", "%  Women executive managers who identify as Latina" : "Q5_C", "%  Women "+
    "executive managers who identify as Asian" : "Q5_D", "% Women executive managers who identify as Native "+
    "American/Alaskan/Pacific Islander:" : "Q5_E", "% Women executive managers who identify as Mixed-Race:" :
    "Q5_F", "% Women executive managers who identify as LGBTQ:" : "Q5_G", "% Women executive managers with "+
    "disabilities" : "Q5_H", "% executive managers who identify as Gender non-binary" : "Q5_I",
    "% C-Suite Level who identify as Women" : "Q6_A", "% Women C-Suite Level who identify as Black or African "+
    "American" : "Q6_B",  '%  Women C-Suite Level who identify as Latina' : "Q6_C",
   '%  Women C-Suite Level who identify as Asian' : "Q6_D",
   '% Women C-Suite Level who identify as Native American/Alaskan/Pacific Islander:' : "Q6_E",
   '% Women C-Suite Level who identify as Mixed-Race:' : "Q6_F",
   '% Women C-Suite Level who identify as LGBTQ:' : "Q6_G",
   '% Women C-Suite Level with disabilities' : "Q6_H",
   '% C-Suite Level who identify as Gender non-binary' : "Q6_I",
   '% Board of Directors or Governing Body who identify as Women' : "Q7_A",
   '% Women Board of Directors or Governing Body who identify as Black or African American' : "Q7_B",
   '%  Women Board of Directors or Governing Body who identify as Latina' : "Q7_C",
   '%  Women Board of Directors or Governing Body who identify as Asian' : "Q7_D",
   '% Women Board of Directors or Governing Body who identify as Native American/Alaskan/Pacific Islander:' : "Q7_E",
   '% Women Board of Directors or Governing Body who identify as Mixed-Race:' : "Q7_F",
   '% Women Board of Directors or Governing Body who identify as LGBTQ:' : "Q7_G",
   '% Women Board of Directors or Governing Body with disabilities' : "Q7_H",
   '% Board of Directors or Governing Body who identify as Gender non-binary' : "Q7_I",
   "Our organization is not-for-profit." : "Q8_A",  
   '% Ownership who identify as Women' : "Q8_B",
   '% Women Ownership who identify as Black or African American' : "Q8_C",
   '%  Women Ownership who identify as Latina' : "Q8_D",
   '%  Women Ownership who identify as Asian' : "Q8_E",
   '% Women Ownership who identify as Native American/Alaskan/Pacific Islander:' : "Q8_F",
   '% Women Ownership who identify as Mixed-Race:' : "Q8_G",
   '% Women Ownership who identify as LGBTQ:' : "Q8_H",
   '% Women Ownership with disabilities' : "Q8_I",
   '% Ownership who identify as Gender non-binary' : "Q8_J",
    'C-Suite' : "Q9_A", 'Board/Advisory Bodies' : "Q9_B", 'Senior Management' : "Q9_C", 'Management' : "Q9_D",
    'Mentorship Programs' : "Q9_E", 'Internal Promotions' : "Q9_F", 'No' : "Q9_G", "In the last 12 months, did "+
    "your organization make progress toward achieving gender representation targets?" : "Q10",
    
})

In [57]:
#Write out to CSV
leader.to_csv("leader_clean.csv")