# <center>Feature Engineering</center>

<pre>
<b><u>Data Variables</u></b>                                                          <b><u>Type Measurement</u></b>
Industry Energy Consumption                                              Continuous(kWh)
Lagging Current reactive power                                          Continuous(kVarh)
Leading Current reactive power                                          Continuous(kVarh)
tCO2(CO2)                                                                Continuous(ppm)
Lagging Current power factor                                              Continuous(%)
Leading Current Power factor                                              Continuous(%)
Number of Seconds from midnight                                           Continuous(S)
Week status                                                      Categorical(Weekend (0) or a Weekday(1))
Day of week                                                   Categorical Sunday, Monday... Saturday
Load Type                                                 Categorical Light Load, Medium Load, Maximum Load
</pre>

## Import Libraries

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

## Load Dataset

In [2]:
df = pd.read_csv('Steel_industry_data.csv')

In [3]:
df.head()

Unnamed: 0,date,Usage_kWh,Lagging_Current_Reactive.Power_kVarh,Leading_Current_Reactive_Power_kVarh,CO2(tCO2),Lagging_Current_Power_Factor,Leading_Current_Power_Factor,NSM,WeekStatus,Day_of_week,Load_Type
0,01/01/2018 00:15,3.17,2.95,0.0,0.0,73.21,100.0,900,Weekday,Monday,Light_Load
1,01/01/2018 00:30,4.0,4.46,0.0,0.0,66.77,100.0,1800,Weekday,Monday,Light_Load
2,01/01/2018 00:45,3.24,3.28,0.0,0.0,70.28,100.0,2700,Weekday,Monday,Light_Load
3,01/01/2018 01:00,3.31,3.56,0.0,0.0,68.09,100.0,3600,Weekday,Monday,Light_Load
4,01/01/2018 01:15,3.82,4.5,0.0,0.0,64.72,100.0,4500,Weekday,Monday,Light_Load


## Feature Engineering: date

**Convert the date_time data type from object to datetime**

In [4]:
df['date-time'] = pd.to_datetime(df['date'])

**Extract date, month, year, hour, minute**

In [5]:
df['day'] = pd.DatetimeIndex(df['date-time']).day

In [6]:
df['month'] = pd.DatetimeIndex(df['date-time']).month

In [7]:
df['year'] = pd.DatetimeIndex(df['date-time']).year

In [8]:
df['hour'] = pd.DatetimeIndex(df['date-time']).hour

In [9]:
df['minute'] = pd.DatetimeIndex(df['date-time']).minute

In [10]:
df.head()

Unnamed: 0,date,Usage_kWh,Lagging_Current_Reactive.Power_kVarh,Leading_Current_Reactive_Power_kVarh,CO2(tCO2),Lagging_Current_Power_Factor,Leading_Current_Power_Factor,NSM,WeekStatus,Day_of_week,Load_Type,date-time,day,month,year,hour,minute
0,01/01/2018 00:15,3.17,2.95,0.0,0.0,73.21,100.0,900,Weekday,Monday,Light_Load,2018-01-01 00:15:00,1,1,2018,0,15
1,01/01/2018 00:30,4.0,4.46,0.0,0.0,66.77,100.0,1800,Weekday,Monday,Light_Load,2018-01-01 00:30:00,1,1,2018,0,30
2,01/01/2018 00:45,3.24,3.28,0.0,0.0,70.28,100.0,2700,Weekday,Monday,Light_Load,2018-01-01 00:45:00,1,1,2018,0,45
3,01/01/2018 01:00,3.31,3.56,0.0,0.0,68.09,100.0,3600,Weekday,Monday,Light_Load,2018-01-01 01:00:00,1,1,2018,1,0
4,01/01/2018 01:15,3.82,4.5,0.0,0.0,64.72,100.0,4500,Weekday,Monday,Light_Load,2018-01-01 01:15:00,1,1,2018,1,15


**Delete date and date-time from dataframe**

In [11]:
df.drop(columns=['date', 'date-time'], inplace = True)

In [None]:
df.head()

## Encode : WeekStatus

**Weekday --> 1, Weekend --> 2**

In [13]:
df['WeekStatus'].unique()

array(['Weekday', 'Weekend'], dtype=object)

In [14]:
def mapWeekStatus(day):
    if day == 'Weekday':
        return 1
    else:
        return 0

In [15]:
df['WeekStatus'] = df['WeekStatus'].apply(mapWeekStatus)

## Encode: Day_of_week

**Sunday --> 0, Monday --> 1, Tuesday --> 2, Wednesday --> 3, Thursday --> 4, Friday --> 5, Saturday --> 6, Sunday --> 7**

In [16]:
df['Day_of_week'].unique()

array(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
       'Sunday'], dtype=object)

In [17]:
weekDay = {
    'Sunday' : 0,
    'Monday' : 1,
    'Tuesday' : 2,
    'Wednesday' : 3,
    'Thursday' : 4,
    'Friday' : 5,
    'Saturday' : 6
}
def mapWeekDay(day):
    return weekDay[day]

In [18]:
df['Day_of_week'] = df['Day_of_week'].apply(mapWeekDay)

In [19]:
df.head()

Unnamed: 0,Usage_kWh,Lagging_Current_Reactive.Power_kVarh,Leading_Current_Reactive_Power_kVarh,CO2(tCO2),Lagging_Current_Power_Factor,Leading_Current_Power_Factor,NSM,WeekStatus,Day_of_week,Load_Type,day,month,year,hour,minute
0,3.17,2.95,0.0,0.0,73.21,100.0,900,1,1,Light_Load,1,1,2018,0,15
1,4.0,4.46,0.0,0.0,66.77,100.0,1800,1,1,Light_Load,1,1,2018,0,30
2,3.24,3.28,0.0,0.0,70.28,100.0,2700,1,1,Light_Load,1,1,2018,0,45
3,3.31,3.56,0.0,0.0,68.09,100.0,3600,1,1,Light_Load,1,1,2018,1,0
4,3.82,4.5,0.0,0.0,64.72,100.0,4500,1,1,Light_Load,1,1,2018,1,15


## Encode: Load_Type

**Light_Load --> 0, Medium_Load --> 1, Maximum_Load --> 2**

In [20]:
df['Load_Type'].unique()

array(['Light_Load', 'Medium_Load', 'Maximum_Load'], dtype=object)

In [21]:
def mapLoadType(s):
    if s == 'Light_Load':
        return 0
    elif s == 'Medium_Load':
        return 1
    else:
        return 2

In [22]:
df['Load_Type'] = df['Load_Type'].apply(mapLoadType)

In [23]:
df.head()

Unnamed: 0,Usage_kWh,Lagging_Current_Reactive.Power_kVarh,Leading_Current_Reactive_Power_kVarh,CO2(tCO2),Lagging_Current_Power_Factor,Leading_Current_Power_Factor,NSM,WeekStatus,Day_of_week,Load_Type,day,month,year,hour,minute
0,3.17,2.95,0.0,0.0,73.21,100.0,900,1,1,0,1,1,2018,0,15
1,4.0,4.46,0.0,0.0,66.77,100.0,1800,1,1,0,1,1,2018,0,30
2,3.24,3.28,0.0,0.0,70.28,100.0,2700,1,1,0,1,1,2018,0,45
3,3.31,3.56,0.0,0.0,68.09,100.0,3600,1,1,0,1,1,2018,1,0
4,3.82,4.5,0.0,0.0,64.72,100.0,4500,1,1,0,1,1,2018,1,15


## Save Modified Dataset

In [25]:
df.to_csv('Steel_Feature_Engineering.csv', index=False)

<pre>
<b><u>Data Variables</u></b>                                                          <b><u>Type(Measurement)</u></b>
Industry Energy Consumption                                              Continuous(kWh)
Lagging Current reactive power                                          Continuous(kVarh)
Leading Current reactive power                                          Continuous(kVarh)
tCO2(CO2)                                                                Continuous(ppm)
Lagging Current power factor                                              Continuous(%)
Leading Current Power factor                                              Continuous(%)
Number of Seconds from midnight                                           Continuous(S)
Week status                                                      Categorical(Weekend (0) or a Weekday(1))
Day of week                                                   Categorical Sunday, Monday... Saturday
Load Type                                                 Categorical Light Load, Medium Load, Maximum Load
day                                                                       Categorical
month                                                                     Categorical
year                                                                      Categorical
hour                                                                      Categorical
minute                                                                    Categorical
</pre>