## TODO

- feature engineer age from DOB
- feature engineer time from data charge was filed to date sent to EEOC- feature engineer case type from basis (race, age, etc.)

## Introduction: Injustice at Work

Our data source explores the relationship between attributes of the complaintents/complaints regarding Employee Discrimination charges and the outcomes of each charge.

Due to working on our personal machines, we chose "X" rows of data at random to represent the full dataset. The original dataset can be found here: https://github.com/PublicI/employment-discrimination/blob/master/data/complaints_10.txt

According to the Injustice at Work Center, each attribute is defined as follows:

- Unique ID: unique identifier for each case (a case is a collection of related charges)
- State Code: complainant state
- No of Employees Code: code indicating the approximate number of employees working for respondent employer
- No of Employees: approximate number of employees working for respondent employer
- NAICS Code: North American Industry Classification System code of respondent employer
- NAICS Description: North American Industry Classification System description of respondent company (e.g., crude petroleum and natural gas extraction)
- Institution Type Code: classification code of respondent employer
- Institution Type: classification of respondent employer (e.g., private employer)
- CP Date of Birth: complainant’s date of birth
- CP Sex: complainant’s gender
- Date First Office: date charge was filed
- Date FEPA Sent to EEOC: date charge was forwarded to the EEOC
- Closure Date: date investigation of case was closed
- Closure Code: code indicating how case was closed
- Closure Type: description indicating how case was closed (e.g., no cause finding issued)
- Monetary Benefits: monetary benefit complainant received
- Statute Code: code for statute under which charge was filed
- Statute: statute under which charge was filed (e.g., Americans with Disabilities Act)
- Basis Code: code for basis of discrimination
- Basis: basis of discrimination (e.g., race-black/African American)
- Issue Code: type code for adverse action alleged by complainant
- Issue: adverse action alleged by complainant (e.g., harassment)
- Court Filing Date: date complainant filed a lawsuit
- Civil Action Number: case number of lawsuit
- Court: court in which lawsuit was filed
- Litigation Resolution Date: date lawsuit was resolved
- Litigation Monetary Benefits: monetary damages recovered through lawsuit
- Litigation Case Type: case type of lawsuit

Our analysis will be looking to classify data by "Closure Code"(? or type), and we have deduced that the possible predictive attributes are as follows:
- State Code: complainant state
- No of Employees Code: code indicating the approximate number of employees working for respondent employer
- NAICS Code: North American Industry Classification System code of respondent employer
- Institution Type Code: classification code of respondent employer
- CP Date of Birth: complainant’s date of birth *
- CP Sex: complainant’s gender
- Date First Office: date charge was filed *
- Date FEPA Sent to EEOC: date charge was forwarded to the EEOC *
- Statute Code: code for statute under which charge was filed
- Basis Code: code for basis of discrimination
- Issue Code: type code for adverse action alleged by complainant
- Litigation Case Type: case type of lawsuit


In [17]:
import pandas as pd

pd.set_option("display.max_columns", 50)
pd.set_option("display.max_rows", 100)
pd.options.display.float_format = "{:,.2f}".format

In [27]:
ncols = 28
data = pd.read_csv("complaints_10.txt", sep="\t", skiprows=1,
                      dtype={1: str},
                      names=["unique_id", "state_code", "num_employee_code", "num_employees",
                             "naics_code", "naics_desc", "inst_type_code", "inst_type",
                             "birth_date", "sex", "date_filed", "date_sent_eeoc", "date_closed",
                             "closure_code", "closure_action", "monetary_benefits", "statute_code",
                             "statute", "basis_code", "basis", "issue_code", "issue",
                             "court_filing_date", "civil_action_num", "court", "resolution_date",
                             "litigation_monetary_benefits", "litigation_case_type"])
data.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,unique_id,state_code,num_employee_code,num_employees,naics_code,naics_desc,inst_type_code,inst_type,birth_date,sex,date_filed,date_sent_eeoc,date_closed,closure_code,closure_action,monetary_benefits,statute_code,statute,basis_code,basis,issue_code,issue,court_filing_date,civil_action_num,court,resolution_date,litigation_monetary_benefits,litigation_case_type
0,3288411.72,MD,B,101 - 200 Employees,611110.0,Elementary and Secondary Schools,E,Private Employer,11/13/66,N,03/03/10,,03/03/10,M3,No Cause Finding Issued,,T,Title VII,OR,Retaliation,T2,Terms/Conditions,,,,,,
1,3288411.72,MD,B,101 - 200 Employees,611110.0,Elementary and Secondary Schools,E,Private Employer,11/13/66,N,03/03/10,,03/03/10,M3,No Cause Finding Issued,,A,ADEA,OR,Retaliation,T2,Terms/Conditions,,,,,,
2,3288411.72,MD,B,101 - 200 Employees,611110.0,Elementary and Secondary Schools,E,Private Employer,11/13/66,N,03/03/10,,03/03/10,M3,No Cause Finding Issued,,A,ADEA,OA,Age,T2,Terms/Conditions,,,,,,
3,3593445.5595,AL,U,Unknown Number Of Employees,,,E,Private Employer,05/02/79,F,02/03/10,,07/30/10,N2,NRTS Issued At CP Request,,T,Title VII,OR,Retaliation,D2,Discharge,,,,,,
4,3593445.5595,AL,U,Unknown Number Of Employees,,,E,Private Employer,05/02/79,F,02/03/10,,07/30/10,N2,NRTS Issued At CP Request,,T,Title VII,OR,Retaliation,B3,Benefits-Insurance,,,,,,
