## Feature Engineering

In this notebook, we're going to analyse the iris Dataset by creating and modified feature to build new ones using **Feature Engineering**.

The following are some of the most important components of feature engineering:
1. Imputation
2. Handling Outliers
3. Binning
4. One-Hot Encoding
5. Grouping Operations
6. Scaling


In [1]:
## Import Libraries

import pandas as pd
import numpy as np
# Import label encoder 
from sklearn import preprocessing

In [2]:
## load the iris dataset and store in iris_df

iris_df = pd.read_csv('Iris.csv')
iris_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [3]:
## Information of the dataset

iris_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


The **LabelEncoder()** function is used to convert categorical variables into numerical. **species** is categorical variable, so convert it into numeric.

In [4]:
label_encoder = preprocessing.LabelEncoder()


In [5]:
## Get the unique species availble in dataset
iris_df['species'].unique()

array(['setosa', 'versicolor', 'virginica'], dtype=object)

Three categories of variable in species such as 'setosa','versicolor' and 'virginica' to be encoded.

In [6]:
## fit_transform() is used to calculate the mean and variance of each feature and transform all the features using respective mean and variance.

iris_df['species']= label_encoder.fit_transform(iris_df['species'])
iris_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [7]:
## View the information- species feature convertes into  numerical type
iris_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    int32  
dtypes: float64(4), int32(1)
memory usage: 5.4 KB


In [8]:
## import One hot encoding library

from sklearn import datasets
from sklearn.preprocessing import OneHotEncoder


In [9]:
## Load the iris dataset and create a dataframe

iris_data = datasets.load_iris()
iris_data = pd.DataFrame(data=np.c_[iris_data["data"], iris_data["target"]],
 columns=iris_data["feature_names"] + ["target"])
y = iris_data.target.values

In [10]:
## View the dataset
iris_data.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [11]:
## View the target variable
y

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])

In [12]:
## Using one hot encoding seperate target variable into three seperate categories.

onehotencoder = OneHotEncoder(categories='auto')
y = onehotencoder.fit_transform(y.reshape(-1,1))
print(y.toarray())

[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0.

**get_dummies()** function is used for data manipulation and converts categorical variable to numerical variables .


In [13]:
pd.get_dummies(iris_data.target).head()

Unnamed: 0,0.0,1.0,2.0
0,1,0,0
1,1,0,0
2,1,0,0
3,1,0,0
4,1,0,0


## Feature Hashing

Feature hasing is a important technique for handling sparse and high-dimensional features in machine learning.

In [14]:
## FeatureHasher library import to perform encoding

from sklearn.feature_extraction import FeatureHasher

In [15]:
## loading the vgsales dataset and View first few data from dataset

game_df = pd.read_csv("vgsales.csv", encoding="utf-8")
game_df.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [16]:
## columns of the dataframe
game_df.columns

Index(['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'NA_Sales',
       'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales'],
      dtype='object')

In [17]:
## shape of the dataframe
game_df.shape

(16598, 11)

In [18]:
## View the data for particular column and by using iloc we getting the required number of rows.

game_df[['Name', 'Platform', 'Year', 'Genre', 'Publisher']].iloc[1:7]

Unnamed: 0,Name,Platform,Year,Genre,Publisher
1,Super Mario Bros.,NES,1985.0,Platform,Nintendo
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo
5,Tetris,GB,1989.0,Puzzle,Nintendo
6,New Super Mario Bros.,DS,2006.0,Platform,Nintendo


In [19]:
## Print unique number of generes present in game_df and count of it.

u_generes = np.unique(game_df["Genre"])
print("Total game generes:", len(u_generes))
print(u_generes)


Total game generes: 12
['Action' 'Adventure' 'Fighting' 'Misc' 'Platform' 'Puzzle' 'Racing'
 'Role-Playing' 'Shooter' 'Simulation' 'Sports' 'Strategy']


## Problem Statements:

You work in HR analytics. You have been given a task to predict the employees who are going to leave the organinzation, so that their
replacement process can be started within the available time frame.

In [20]:
#import required library
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

In [21]:
# Load the Dataset and store in df

df = pd.read_csv("HR-Employee-Attrition.csv")
df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,...,4,80,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,...,2,80,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,...,3,80,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,...,4,80,1,6,3,3,2,2,2,2


In [22]:
df.columns

Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount',
       'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate',
       'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction',
       'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
       'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating',
       'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel',
       'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance',
       'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
       'YearsWithCurrManager'],
      dtype='object')

In [23]:
# Check the shape of the datatset
df.shape

(1470, 35)

In [24]:
# Check if there any null values
df.isna().sum()

Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeCount               0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
Over18                      0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StandardHours               0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSince

In [25]:
for i in df.columns:
    print(i, ":", df[i].value_counts())
    print("_"*40)
    print("_"*40)

Age : 35    78
34    77
31    69
36    69
29    68
32    61
30    60
33    58
38    58
40    57
37    50
27    48
28    48
42    46
39    42
45    41
41    40
26    39
46    33
44    33
43    32
50    30
24    26
25    26
47    24
49    24
55    22
48    19
51    19
53    19
52    18
54    18
22    16
56    14
58    14
23    14
21    13
20    11
59    10
19     9
18     8
60     5
57     4
Name: Age, dtype: int64
________________________________________
________________________________________
Attrition : No     1233
Yes     237
Name: Attrition, dtype: int64
________________________________________
________________________________________
BusinessTravel : Travel_Rarely        1043
Travel_Frequently     277
Non-Travel            150
Name: BusinessTravel, dtype: int64
________________________________________
________________________________________
DailyRate : 691     6
1082    5
329     5
1329    5
530     5
       ..
897     1
891     1
889     1
888     1
102     1
Name: DailyRate, Le

It is an imbalanced data as value count in attrition column is imbalanced.

Over18 is Y (Yes) across all the employees and it is not beneficial for further use.

StandardHours is 80 which is common for all and it is not beneficial for further use.

EmployeeCount , EmployeeNumber are unique which are also not going to
help us predict our end result.


In [26]:
# Print the unique value in each columns

for i in df.columns:
    print (i , ":", df[i].unique())
    print (" _ "*40)
    print (" _ "*40)


Age : [41 49 37 33 27 32 59 30 38 36 35 29 31 34 28 22 53 24 21 42 44 46 39 43
 50 26 48 55 45 56 23 51 40 54 58 20 25 19 57 52 47 18 60]
 _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _ 
 _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _ 
Attrition : ['Yes' 'No']
 _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _ 
 _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _ 
BusinessTravel : ['Travel_Rarely' 'Travel_Frequently' 'Non-Travel']
 _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _ 
 _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _ 
DailyRate : [1102  279 1373 1392  591 1005 

In [27]:
df=df.drop(['Over18','EmployeeNumber','EmployeeCount','StandardHours'],axis=1)


In [28]:
df.shape

(1470, 31)

In [29]:
df.columns

Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField',
       'EnvironmentSatisfaction', 'Gender', 'HourlyRate', 'JobInvolvement',
       'JobLevel', 'JobRole', 'JobSatisfaction', 'MaritalStatus',
       'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'OverTime',
       'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction',
       'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear',
       'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole',
       'YearsSinceLastPromotion', 'YearsWithCurrManager'],
      dtype='object')

In [30]:
df[['RelationshipSatisfaction','JobSatisfaction','EnvironmentSatisfaction','JobInvolvement']]

Unnamed: 0,RelationshipSatisfaction,JobSatisfaction,EnvironmentSatisfaction,JobInvolvement
0,1,4,2,3
1,4,2,3,2
2,2,3,4,2
3,3,3,4,3
4,4,2,1,3
...,...,...,...,...
1465,3,4,3,4
1466,1,1,4,2
1467,2,2,2,4
1468,4,2,4,2


Let us now calculate the mean of all the types of satisfaction. Based on a condition if the mean value is greater than 2.35 then it returns one
else zero

In [31]:
df['TotalSatisfaction_mean'] = (df['RelationshipSatisfaction'] + df['EnvironmentSatisfaction']
 + df['JobSatisfaction'] + df['JobInvolvement'] + df['WorkLifeBalance'])/5
def Satif(df) :
    if df['TotalSatisfaction_mean'] > 2.35 :
        return 1
    else :
         return 0
 
 
df['Satif'] = df.apply(lambda df:Satif(df) ,axis = 1)
df['Satif']

0       0
1       1
2       1
3       1
4       1
       ..
1465    1
1466    0
1467    1
1468    1
1469    1
Name: Satif, Length: 1470, dtype: int64

In [32]:
df.shape

(1470, 33)

In [33]:
# create a separate column for job satisfaction.

df['JobSatisf_mean'] = (df['JobSatisfaction'] + df['JobInvolvement']) / 2

In [34]:
df.shape

(1470, 34)

In [35]:
# Create a function for moving people and create a column
def MovingPeople(df) :
    if df['NumCompaniesWorked'] > 4:
         return 1
    else:
         return 0
        
df['MovingPeople'] = df.apply(lambda df:MovingPeople(df), axis = 1)
df['MovingPeople']

0       1
1       0
2       1
3       0
4       1
       ..
1465    0
1466    0
1467    0
1468    0
1469    0
Name: MovingPeople, Length: 1470, dtype: int64

In [36]:
df.shape

(1470, 35)

In [37]:
# Create a column using 'DistanceFromHome' column

def LongDis(df) :
    if df['DistanceFromHome'] > 11:
         return 1
    else :
         return 0
        
df['LongDis'] = df.apply(lambda df:LongDis(df) ,axis = 1)
df['LongDis']

0       0
1       0
2       0
3       0
4       0
       ..
1465    1
1466    0
1467    0
1468    0
1469    0
Name: LongDis, Length: 1470, dtype: int64

In [38]:
# Create a column using TrainingTimesLastYear column

def MiddleTraining(df) :
    if df['TrainingTimesLastYear'] >= 3 and df['TrainingTimesLastYear'] <= 6:
        return 1
    else:
        return 0
    
df['MiddleTraining'] = df.apply(lambda df:MiddleTraining(df) ,axis = 1)

In [39]:
# Create a column to view number of years worked in each company

df['Time_in_each_comp'] = (df['Age'] - 20) / ((df)['NumCompaniesWorked'] + 1)
df['Time_in_each_comp']


0        2.333333
1       14.500000
2        2.428571
3        6.500000
4        0.700000
          ...    
1465     3.200000
1466     3.800000
1467     3.500000
1468     9.666667
1469     4.666667
Name: Time_in_each_comp, Length: 1470, dtype: float64

In [40]:
df.shape

(1470, 38)

In [41]:
# calculate the number of numeric columns and categorical columns

numeric_df= df.select_dtypes(include=[np.number])

categoric_df=df.select_dtypes(exclude=[np.number])

In [42]:
numericcol=numeric_df.columns.tolist()
categorycol=categoric_df.columns.tolist()

print ("Category :",categorycol)
print ("\n Numeric :",numericcol)

Category : ['Attrition', 'BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', 'MaritalStatus', 'OverTime']

 Numeric : ['Age', 'DailyRate', 'DistanceFromHome', 'Education', 'EnvironmentSatisfaction', 'HourlyRate', 'JobInvolvement', 'JobLevel', 'JobSatisfaction', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion', 'YearsWithCurrManager', 'TotalSatisfaction_mean', 'Satif', 'JobSatisf_mean', 'MovingPeople', 'LongDis', 'MiddleTraining', 'Time_in_each_comp']


In [43]:
# Let us now drop unnecessary columns present in the df dataframe.

df=df.drop(['DailyRate', 'DistanceFromHome', 'EnvironmentSatisfaction','HourlyRate', 'JobInvolvement', 
            'JobSatisfaction', 'NumCompaniesWorked','RelationshipSatisfaction', 'TrainingTimesLastYear'],axis=1)

In [44]:
df.shape

(1470, 29)

In [45]:
data = pd.get_dummies(df, columns=categorycol, drop_first=True)
print(data.columns)
print(data.shape)

Index(['Age', 'Education', 'JobLevel', 'MonthlyIncome', 'MonthlyRate',
       'PercentSalaryHike', 'PerformanceRating', 'StockOptionLevel',
       'TotalWorkingYears', 'WorkLifeBalance', 'YearsAtCompany',
       'YearsInCurrentRole', 'YearsSinceLastPromotion', 'YearsWithCurrManager',
       'TotalSatisfaction_mean', 'Satif', 'JobSatisf_mean', 'MovingPeople',
       'LongDis', 'MiddleTraining', 'Time_in_each_comp', 'Attrition_Yes',
       'BusinessTravel_Travel_Frequently', 'BusinessTravel_Travel_Rarely',
       'Department_Research & Development', 'Department_Sales',
       'EducationField_Life Sciences', 'EducationField_Marketing',
       'EducationField_Medical', 'EducationField_Other',
       'EducationField_Technical Degree', 'Gender_Male',
       'JobRole_Human Resources', 'JobRole_Laboratory Technician',
       'JobRole_Manager', 'JobRole_Manufacturing Director',
       'JobRole_Research Director', 'JobRole_Research Scientist',
       'JobRole_Sales Executive', 'JobRole_Sales Rep

In [46]:
data.head()

Unnamed: 0,Age,Education,JobLevel,MonthlyIncome,MonthlyRate,PercentSalaryHike,PerformanceRating,StockOptionLevel,TotalWorkingYears,WorkLifeBalance,...,JobRole_Laboratory Technician,JobRole_Manager,JobRole_Manufacturing Director,JobRole_Research Director,JobRole_Research Scientist,JobRole_Sales Executive,JobRole_Sales Representative,MaritalStatus_Married,MaritalStatus_Single,OverTime_Yes
0,41,2,2,5993,19479,11,3,0,8,1,...,0,0,0,0,0,1,0,0,1,1
1,49,1,2,5130,24907,23,4,1,10,3,...,0,0,0,0,1,0,0,1,0,0
2,37,2,1,2090,2396,15,3,0,7,3,...,1,0,0,0,0,0,0,0,1,1
3,33,4,1,2909,23159,11,3,0,8,3,...,0,0,0,0,1,0,0,1,0,1
4,27,1,1,3468,16632,12,3,1,6,3,...,1,0,0,0,0,0,0,1,0,0
