### **EXTERNAL DATA**
####  **Data from Third Party Sources**

- **Description:**  
This data contains various attributes such as employee demographics, job roles, performance ratings, and work-related factors that are essential for analyzing employee performance at INX Future Inc.

- **Data Source:**  
  The employee performance data for INX Future Inc. was downloaded from the following third-party source:  
  [INX Future Inc. Employee Performance Data](http://data.iabac.org/exam/p2/data/INX_Future_Inc_Employee_Performance_CDS_Project2_Data_V1.8.xls)

  ----

### **PROCESSED DATA**
#### **The final, canonical data sets for modeling**

- **Description:**  
  The processed data consists of the final, cleaned, and transformed datasets that are ready for modeling. This data has been preprocessed to remove any inconsistencies, handle missing values, and ensure that the features are in a suitable format for model training and evaluation.

In [6]:
import pandas as pd
processed_data = pd.read_csv("processed_data.csv")
pd.set_option('display.max_columns', None)
processed_data

Unnamed: 0,Age,Gender,EducationBackground,MaritalStatus,EmpDepartment,EmpJobRole,BusinessTravelFrequency,DistanceFromHome,EmpEducationLevel,EmpEnvironmentSatisfaction,EmpHourlyRate,EmpJobInvolvement,EmpJobLevel,EmpJobSatisfaction,NumCompaniesWorked,OverTime,EmpLastSalaryHikePercent,EmpRelationshipSatisfaction,TotalWorkExperienceInYears,TrainingTimesLastYear,EmpWorkLifeBalance,ExperienceYearsAtThisCompany,ExperienceYearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,Attrition,PerformanceRating
0,32,1.0,2,2,5,13,2.0,10,3,4,55,3,2,4,1,0,12,4,10,2,2,10,7,0,8,0.0,1.0
1,47,1.0,2,2,5,13,2.0,14,4,4,42,3,2,1,2,0,12,4,20,2,3,7,7,1,7,0.0,1.0
2,40,1.0,1,1,5,13,1.0,5,4,4,48,2,3,1,5,1,21,3,20,2,3,18,13,1,12,0.0,2.0
3,41,1.0,0,0,3,8,2.0,10,4,2,73,2,5,4,3,0,15,2,23,2,2,21,6,12,6,0.0,1.0
4,60,1.0,2,2,5,13,2.0,16,4,1,84,3,2,1,8,0,14,4,10,1,3,2,2,2,2,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1195,27,0.0,3,0,5,13,1.0,3,1,4,71,4,2,4,1,1,20,2,6,3,3,6,5,0,4,0.0,2.0
1196,37,1.0,1,2,1,15,2.0,10,2,4,80,4,1,4,3,0,17,1,4,2,3,1,0,0,0,0.0,1.0
1197,50,1.0,3,1,1,15,2.0,28,1,4,74,4,1,3,1,1,11,3,20,3,3,20,8,3,8,0.0,1.0
1198,34,0.0,3,2,0,1,2.0,9,3,4,46,2,3,2,1,0,14,2,9,3,4,8,7,7,7,0.0,1.0


### **RAW DATA**
#### **Original, Immutable Data Dump**

- **Description**: The original, immutable data dump from the source, which has not been processed or cleaned yet.

In [9]:
raw_data = pd.read_excel('INX_Future_Inc_Employee_Performance_CDS_Project2_Data_V1.8.xls')
pd.set_option('display.max_columns',None)
raw_data

Unnamed: 0,EmpNumber,Age,Gender,EducationBackground,MaritalStatus,EmpDepartment,EmpJobRole,BusinessTravelFrequency,DistanceFromHome,EmpEducationLevel,EmpEnvironmentSatisfaction,EmpHourlyRate,EmpJobInvolvement,EmpJobLevel,EmpJobSatisfaction,NumCompaniesWorked,OverTime,EmpLastSalaryHikePercent,EmpRelationshipSatisfaction,TotalWorkExperienceInYears,TrainingTimesLastYear,EmpWorkLifeBalance,ExperienceYearsAtThisCompany,ExperienceYearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,Attrition,PerformanceRating
0,E1001000,32,Male,Marketing,Single,Sales,Sales Executive,Travel_Rarely,10,3,4,55,3,2,4,1,No,12,4,10,2,2,10,7,0,8,No,3
1,E1001006,47,Male,Marketing,Single,Sales,Sales Executive,Travel_Rarely,14,4,4,42,3,2,1,2,No,12,4,20,2,3,7,7,1,7,No,3
2,E1001007,40,Male,Life Sciences,Married,Sales,Sales Executive,Travel_Frequently,5,4,4,48,2,3,1,5,Yes,21,3,20,2,3,18,13,1,12,No,4
3,E1001009,41,Male,Human Resources,Divorced,Human Resources,Manager,Travel_Rarely,10,4,2,73,2,5,4,3,No,15,2,23,2,2,21,6,12,6,No,3
4,E1001010,60,Male,Marketing,Single,Sales,Sales Executive,Travel_Rarely,16,4,1,84,3,2,1,8,No,14,4,10,1,3,2,2,2,2,No,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1195,E100992,27,Female,Medical,Divorced,Sales,Sales Executive,Travel_Frequently,3,1,4,71,4,2,4,1,Yes,20,2,6,3,3,6,5,0,4,No,4
1196,E100993,37,Male,Life Sciences,Single,Development,Senior Developer,Travel_Rarely,10,2,4,80,4,1,4,3,No,17,1,4,2,3,1,0,0,0,No,3
1197,E100994,50,Male,Medical,Married,Development,Senior Developer,Travel_Rarely,28,1,4,74,4,1,3,1,Yes,11,3,20,3,3,20,8,3,8,No,3
1198,E100995,34,Female,Medical,Single,Data Science,Data Scientist,Travel_Rarely,9,3,4,46,2,3,2,1,No,14,2,9,3,4,8,7,7,7,No,3


This section represents the raw, unprocessed data obtained directly from the third-party source. It contains the original attributes with no transformations applied. The raw data includes all records and features in their initial state, ensuring an accurate representation of the data as collected. This dataset serves as the foundation for further cleaning, processing, and modeling.

-----