#**Problem Description:**
>Significant investments are being made to improve building efficiencies to reduce costs and emissions. The question is, are the improvements working? Under pay-for-performance financing, *the building owner makes payments based on the difference between their real energy consumption and what they would have used without any retrofits.* The latter values have to come from a model. Current methods of estimation are fragmented and do not scale well. Some assume a specific meter type or don’t work with different building types.




#**Solution Description:**
> We have to develop accurate models of metered building energy usage in the following areas:  
1. Chilled water
2. Electric
3. Hot water
4. Steam meters

> The data comes from over 1,000 buildings over a three-year timeframe. With better estimates of these energy-saving investments, large scale investors and financial institutions will be more inclined to invest in this area to enable progress in building efficiencies.









In [2]:
'''
Direcotry: /content/drive/My Drive/Csc-219-Project/DataSet

building_metadata.csv
sample_submission.csv
test.csv
train.csv
weather_test.csv
weather_train.csv

'''

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
#Importing all the libraries in this block
import pandas as pd
import pandas_profiling as pdp
%matplotlib inline
import missingno as msno

import seaborn as sb
import matplotlib.pyplot as plt
import missingno as ms
import numpy as np


import os,random, math, psutil, pickle


#libraries for Data Preprocessing
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics
import lightgbm as lgb
from sklearn.model_selection import train_test_split

#library for cost fuction
from sklearn.metrics import mean_squared_error

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)



**Phase-1: Data Collection**

All raw data that we gonna use in this project.

In [4]:
%%time
raw_building_meta = pd.read_csv("/content/drive/My Drive/Csc-219-Project/DataSet/building_metadata.csv")
raw_sample_data = pd.read_csv("/content/drive/My Drive/Csc-219-Project/DataSet/sample_submission.csv")
raw_test = pd.read_csv("/content/drive/My Drive/Csc-219-Project/DataSet/test.csv")
raw_train = pd.read_csv("/content/drive/My Drive/Csc-219-Project/DataSet/train.csv")
raw_weather_test = pd.read_csv("/content/drive/My Drive/Csc-219-Project/DataSet/weather_test.csv")
raw_weather_train = pd.read_csv("/content/drive/My Drive/Csc-219-Project/DataSet/weather_train.csv")




CPU times: user 23.5 s, sys: 4.24 s, total: 27.7 s
Wall time: 1min 40s


In [0]:
def reduce_mem_usage(df, verbose=True):
    numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    start_mem = df.memory_usage().sum() / 1024**2    
    for col in df.columns:
        col_type = df[col].dtypes
        if col_type in numerics:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)    
    end_mem = df.memory_usage().sum() / 1024**2
    if verbose: print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
    return df



def plot_dist_col(column):
  import seaborn as sns
  '''plot dist curves for train and test weather data for the given column name'''
  fig, ax = plt.subplots(figsize=(10, 10))
  sns.distplot(raw_weather_train[column].dropna(), color='green', ax=ax).set_title(column, fontsize=16)
  sns.distplot(raw_weather_test[column].dropna(), color='purple', ax=ax).set_title(column, fontsize=16)
  plt.xlabel(column, fontsize=15)
  plt.legend(['train', 'test'])
  plt.show()



def report_missing_data(df):
    print('Total Number of rows :', len(df))
    for column in df.columns:
        print(column,':\t\t', 'Missing rows:\t', sum(df[column].isnull()), '|\t\t', '% Missing:\t {:.2f}'.format(sum(df[column].isnull())*100/len(df)),'%')


Reduce the size of data as its current size is >1Gb.

In [6]:
%%time 
## Reducing memory of the data frames
train = reduce_mem_usage(raw_train)
test = reduce_mem_usage(raw_test)
weather_train = reduce_mem_usage(raw_weather_train)
weather_test = reduce_mem_usage(raw_weather_test)
building_metadata = reduce_mem_usage(raw_building_meta)

Mem. usage decreased to 289.19 Mb (53.1% reduction)
Mem. usage decreased to 596.49 Mb (53.1% reduction)
Mem. usage decreased to  3.07 Mb (68.1% reduction)
Mem. usage decreased to  6.08 Mb (68.1% reduction)
Mem. usage decreased to  0.03 Mb (60.3% reduction)
CPU times: user 1.26 s, sys: 194 ms, total: 1.46 s
Wall time: 1.46 s


**Phase-1a: Univariate Analysis**

*Most of the data visualisation are done through Pandas Profiling which has show in the report.html.*

To detect outliers we are plotting normal distribution chart for each feature.

We removed outliers problem via using Standard Scaling all the features in the training data.

No data preprocessing is done on the test data!

In [7]:
'''#plotting normal distribution 
plot_dist_col('air_temperature')
plot_dist_col('cloud_coverage')
plot_dist_col('dew_temperature')
plot_dist_col('precip_depth_1_hr')
plot_dist_col('sea_level_pressure')
plot_dist_col('wind_speed')'''

"#plotting normal distribution \nplot_dist_col('air_temperature')\nplot_dist_col('cloud_coverage')\nplot_dist_col('dew_temperature')\nplot_dist_col('precip_depth_1_hr')\nplot_dist_col('sea_level_pressure')\nplot_dist_col('wind_speed')"

In [8]:
# First merge train and building data
train = pd.merge(train,building_metadata,how = 'left')           
print(train.shape)
train.head()

# Then Merge train_building with weather_train data
train = pd.merge(train,weather_train, on = ['site_id','timestamp'], how = 'left')
print(train.shape)

# First merge test and building data
test = pd.merge(test,building_metadata,how = 'left')           
print(test.shape)
test.head()

# Now Merge test_building with weather_test data
test = pd.merge(test,weather_test, on = ['site_id','timestamp'], how = 'left')
print(test.shape)
test.head()


(20216100, 9)
(20216100, 16)
(41697600, 9)
(41697600, 16)


Unnamed: 0,row_id,building_id,meter,timestamp,site_id,primary_use,square_feet,year_built,floor_count,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,sea_level_pressure,wind_direction,wind_speed
0,0,0,0,2017-01-01 00:00:00,0,Education,7432,2008.0,,17.796875,4.0,11.703125,,1021.5,100.0,3.599609
1,1,1,0,2017-01-01 00:00:00,0,Education,2720,2004.0,,17.796875,4.0,11.703125,,1021.5,100.0,3.599609
2,2,2,0,2017-01-01 00:00:00,0,Education,5376,1991.0,,17.796875,4.0,11.703125,,1021.5,100.0,3.599609
3,3,3,0,2017-01-01 00:00:00,0,Education,23685,2002.0,,17.796875,4.0,11.703125,,1021.5,100.0,3.599609
4,4,4,0,2017-01-01 00:00:00,0,Education,116607,1975.0,,17.796875,4.0,11.703125,,1021.5,100.0,3.599609


In [9]:
'''
#to convert data into pickle

train_df.to_pickle('train_df.pkl')
test_df.to_pickle('test_df.pkl')
   
del train_df, test_df
gc.collect()
'''

'''
#read pickle
train_df = pd.read_pickle('train_df.pkl')
test_df = pd.read_pickle('test_df.pkl')

'''

#%%time 
#report = pdp.ProfileReport(train)

"\n#read pickle\ntrain_df = pd.read_pickle('train_df.pkl')\ntest_df = pd.read_pickle('test_df.pkl')\n\n"

In [10]:
report_missing_data(train)

Total Number of rows : 20216100
building_id :		 Missing rows:	 0 |		 % Missing:	 0.00 %
meter :		 Missing rows:	 0 |		 % Missing:	 0.00 %
timestamp :		 Missing rows:	 0 |		 % Missing:	 0.00 %
meter_reading :		 Missing rows:	 0 |		 % Missing:	 0.00 %
site_id :		 Missing rows:	 0 |		 % Missing:	 0.00 %
primary_use :		 Missing rows:	 0 |		 % Missing:	 0.00 %
square_feet :		 Missing rows:	 0 |		 % Missing:	 0.00 %
year_built :		 Missing rows:	 12127645 |		 % Missing:	 59.99 %
floor_count :		 Missing rows:	 16709167 |		 % Missing:	 82.65 %
air_temperature :		 Missing rows:	 96658 |		 % Missing:	 0.48 %
cloud_coverage :		 Missing rows:	 8825365 |		 % Missing:	 43.66 %
dew_temperature :		 Missing rows:	 100140 |		 % Missing:	 0.50 %
precip_depth_1_hr :		 Missing rows:	 3749023 |		 % Missing:	 18.54 %
sea_level_pressure :		 Missing rows:	 1231669 |		 % Missing:	 6.09 %
wind_direction :		 Missing rows:	 1449048 |		 % Missing:	 7.17 %
wind_speed :		 Missing rows:	 143676 |		 % Missing:	 0.71 %


**Phase-2: Data Cleansing**

There are 6 tables and all of them contains null values. First task is to impute all null values so that after that we can atleast plot our graphs.


In [79]:
'''
#msno.matrix(train)
from google.colab import files
'''
pdp.ProfileReport(train.iloc[:20000])


0,1
Number of variables,16
Number of observations,20000
Total Missing (%),0.0%
Total size in memory,1.6 MiB
Average record size in memory,83.0 B

0,1
Numeric,15
Categorical,0
Boolean,0
Date,0
Text (Unique),0
Rejected,1
Unsupported,0

0,1
Distinct count,1425
Unique (%),7.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,798.71
Minimum,0
Maximum,1448
Zeros (%),0.0%

0,1
Minimum,0
5-th percentile,111
Q1,397
Median,893
Q3,1169
95-th percentile,1377
Maximum,1448
Range,1448
Interquartile range,772

0,1
Standard deviation,423.68
Coef of variation,0.53045
Kurtosis,-1.2306
Mean,798.71
MAD,368.72
Skewness,-0.29528
Sum,15974189
Variance,179500
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
1294,32,0.2%,
1298,32,0.2%,
1293,32,0.2%,
1301,32,0.2%,
1241,32,0.2%,
1331,32,0.2%,
1259,32,0.2%,
1295,32,0.2%,
1232,32,0.2%,
1296,32,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0,9,0.0%,
1,9,0.0%,
2,9,0.0%,
3,9,0.0%,
4,9,0.0%,

Value,Count,Frequency (%),Unnamed: 3
1444,8,0.0%,
1445,8,0.0%,
1446,8,0.0%,
1447,8,0.0%,
1448,8,0.0%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.6445
Minimum,0
Maximum,3
Zeros (%),61.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,3
Maximum,3
Range,3
Interquartile range,1

0,1
Standard deviation,0.9274
Coef of variation,1.4389
Kurtosis,0.27786
Mean,0.6445
MAD,0.78681
Skewness,1.2162
Sum,12890
Variance,0.86006
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,12208,61.0%,
1,3905,19.5%,
2,2676,13.4%,
3,1211,6.1%,

Value,Count,Frequency (%),Unnamed: 3
0,12208,61.0%,
1,3905,19.5%,
2,2676,13.4%,
3,1211,6.1%,

Value,Count,Frequency (%),Unnamed: 3
0,12208,61.0%,
1,3905,19.5%,
2,2676,13.4%,
3,1211,6.1%,

0,1
Distinct count,11852
Unique (%),59.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,893.29
Minimum,0
Maximum,3241600
Zeros (%),14.6%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,8.0
Median,47.195
Q3,174.09
95-th percentile,1537.2
Maximum,3241600.0
Range,3241600.0
Interquartile range,166.09

0,1
Standard deviation,35315
Coef of variation,39.533
Kurtosis,7501.402
Mean,893.29
MAD,1450.2
Skewness,85.039474
Sum,17865764.0
Variance,1247100000
Memory size,234.4 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,2926,14.6%,
25.0,57,0.3%,
20.0,39,0.2%,
10.0,35,0.2%,
30.0,31,0.2%,
21.0,29,0.1%,
15.0,27,0.1%,
3.3333001136779785,27,0.1%,
24.0,27,0.1%,
32.0,25,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,2926,14.6%,
0.0001999999949475,1,0.0%,
0.0003999999898951,8,0.0%,
0.0119000002741813,1,0.0%,
0.0131000000983476,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
160896.0,1,0.0%,
260880.0,1,0.0%,
262584.0,1,0.0%,
1896620.0,1,0.0%,
3241630.0,2,0.0%,

0,1
Correlation,0.98063

0,1
Distinct count,16
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.1569
Minimum,0
Maximum,15
Zeros (%),40.5%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,1
Q3,6
95-th percentile,9
Maximum,15
Range,15
Interquartile range,6

0,1
Standard deviation,3.4629
Coef of variation,1.0969
Kurtosis,-0.20827
Mean,3.1569
MAD,3.0644
Skewness,0.7925
Sum,63138
Variance,11.991
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0,8099,40.5%,
6,4270,21.3%,
1,2267,11.3%,
4,2125,10.6%,
9,1689,8.4%,
3,373,1.9%,
7,241,1.2%,
8,211,1.1%,
5,125,0.6%,
2,111,0.6%,

Value,Count,Frequency (%),Unnamed: 3
0,8099,40.5%,
1,2267,11.3%,
2,111,0.6%,
3,373,1.9%,
4,2125,10.6%,

Value,Count,Frequency (%),Unnamed: 3
11,108,0.5%,
12,97,0.5%,
13,87,0.4%,
14,59,0.3%,
15,104,0.5%,

0,1
Distinct count,1374
Unique (%),6.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,107940
Minimum,283
Maximum,875000
Zeros (%),0.0%

0,1
Minimum,283
5-th percentile,7133
Q1,31635
Median,72193
Q3,139680
95-th percentile,327260
Maximum,875000
Range,874717
Interquartile range,108050

0,1
Standard deviation,117990
Coef of variation,1.0931
Kurtosis,9.9668
Mean,107940
MAD,80220
Skewness,2.6689
Sum,2158807229
Variance,13922000000
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
387638,63,0.3%,
150695,54,0.3%,
64583,45,0.2%,
42755,40,0.2%,
22000,36,0.2%,
24456,36,0.2%,
53130,36,0.2%,
72958,33,0.2%,
84346,33,0.2%,
65000,33,0.2%,

Value,Count,Frequency (%),Unnamed: 3
283,9,0.0%,
356,9,0.0%,
366,9,0.0%,
481,9,0.0%,
520,9,0.0%,

Value,Count,Frequency (%),Unnamed: 3
809530,9,0.0%,
819577,9,0.0%,
850354,9,0.0%,
861524,27,0.1%,
875000,9,0.0%,

0,1
Distinct count,117
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,201.9
Minimum,-999
Maximum,2017
Zeros (%),0.0%

0,1
Minimum,-999
5-th percentile,-999
Q1,-999
Median,-999
Q3,1963
95-th percentile,2005
Maximum,2017
Range,3016
Interquartile range,2962

0,1
Standard deviation,1456.2
Coef of variation,7.2124
Kurtosis,-1.8487
Mean,201.9
MAD,1429.4
Skewness,0.38824
Sum,4038068
Variance,2120500
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
-999,11903,59.5%,
1976,538,2.7%,
1964,268,1.3%,
1966,234,1.2%,
1968,194,1.0%,
1975,182,0.9%,
1970,179,0.9%,
2004,176,0.9%,
1960,175,0.9%,
1967,173,0.9%,

Value,Count,Frequency (%),Unnamed: 3
-999,11903,59.5%,
1900,54,0.3%,
1902,16,0.1%,
1903,42,0.2%,
1904,25,0.1%,

Value,Count,Frequency (%),Unnamed: 3
2013,90,0.4%,
2014,98,0.5%,
2015,18,0.1%,
2016,54,0.3%,
2017,9,0.0%,

0,1
Distinct count,19
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-817.78
Minimum,-999
Maximum,26
Zeros (%),0.0%

0,1
Minimum,-999
5-th percentile,-999
Q1,-999
Median,-999
Q3,-999
95-th percentile,5
Maximum,26
Range,1025
Interquartile range,0

0,1
Standard deviation,385.96
Coef of variation,-0.47196
Kurtosis,0.75692
Mean,-817.78
MAD,296.97
Skewness,1.6603
Sum,-16355544
Variance,148970
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
-999,16387,81.9%,
1,981,4.9%,
2,699,3.5%,
4,360,1.8%,
3,342,1.7%,
6,306,1.5%,
5,261,1.3%,
8,180,0.9%,
7,135,0.7%,
9,88,0.4%,

Value,Count,Frequency (%),Unnamed: 3
-999,16387,81.9%,
1,981,4.9%,
2,699,3.5%,
3,342,1.7%,
4,360,1.8%,

Value,Count,Frequency (%),Unnamed: 3
14,9,0.0%,
16,9,0.0%,
19,27,0.1%,
21,27,0.1%,
26,27,0.1%,

0,1
Distinct count,59
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5.3982
Minimum,-14.398
Maximum,25
Zeros (%),0.0%

0,1
Minimum,-14.398
5-th percentile,-8.2969
Q1,3.0
Median,5.6016
Q3,9.3984
95-th percentile,21.094
Maximum,25.0
Range,39.398
Interquartile range,6.3984

0,1
Standard deviation,7.8438
Coef of variation,1.453
Kurtosis,0.2415
Mean,5.3982
MAD,5.7841
Skewness,-0.041547
Sum,107960
Variance,61.525
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
3.44926582942876,2063,10.3%,
10.0,1901,9.5%,
5.6015625,1537,7.7%,
9.3984375,1309,6.5%,
-7.19921875,924,4.6%,
7.19921875,899,4.5%,
5.0,892,4.5%,
8.8984375,879,4.4%,
-1.0,711,3.6%,
7.80078125,686,3.4%,

Value,Count,Frequency (%),Unnamed: 3
-14.3984375,98,0.5%,
-13.296875,147,0.7%,
-11.1015625,48,0.2%,
-10.6015625,48,0.2%,
-10.0,49,0.2%,

Value,Count,Frequency (%),Unnamed: 3
20.59375,171,0.9%,
21.09375,513,2.6%,
22.796875,171,0.9%,
24.40625,171,0.9%,
25.0,171,0.9%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-610.19
Minimum,-999
Maximum,8
Zeros (%),19.8%

0,1
Minimum,-999
5-th percentile,-999
Q1,-999
Median,-999
Q3,0
95-th percentile,8
Maximum,8
Range,1007
Interquartile range,999

0,1
Standard deviation,488.13
Coef of variation,-0.79997
Kurtosis,-1.7895
Mean,-610.19
MAD,475.75
Skewness,0.4589
Sum,-12203720
Variance,238270
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
-999,12236,61.2%,
0,3951,19.8%,
8,1423,7.1%,
2,1151,5.8%,
6,701,3.5%,
4,538,2.7%,

Value,Count,Frequency (%),Unnamed: 3
-999,12236,61.2%,
0,3951,19.8%,
2,1151,5.8%,
4,538,2.7%,
6,701,3.5%,

Value,Count,Frequency (%),Unnamed: 3
0,3951,19.8%,
2,1151,5.8%,
4,538,2.7%,
6,701,3.5%,
8,1423,7.1%,

0,1
Distinct count,57
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-0.75877
Minimum,-17.203
Maximum,21.094
Zeros (%),5.4%

0,1
Minimum,-17.203
5-th percentile,-11.703
Q1,-5.6016
Median,-1.5996
Q3,2.1992
95-th percentile,20.0
Maximum,21.094
Range,38.297
Interquartile range,7.8008

0,1
Standard deviation,7.8391
Coef of variation,-10.331
Kurtosis,1.8862
Mean,-0.75877
MAD,5.4166
Skewness,1.1056
Sum,-15175
Variance,61.451
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
-2.399274042492159,2063,10.3%,
3.900390625,1585,7.9%,
2.19921875,1412,7.1%,
-5.6015625,1140,5.7%,
0.0,1076,5.4%,
-10.6015625,924,4.6%,
-6.69921875,853,4.3%,
-2.80078125,785,3.9%,
-2.19921875,731,3.7%,
-0.60009765625,687,3.4%,

Value,Count,Frequency (%),Unnamed: 3
-17.203125,98,0.5%,
-16.703125,98,0.5%,
-15.6015625,49,0.2%,
-15.0,49,0.2%,
-13.8984375,96,0.5%,

Value,Count,Frequency (%),Unnamed: 3
8.8984375,66,0.3%,
19.40625,171,0.9%,
20.0,513,2.6%,
20.59375,171,0.9%,
21.09375,684,3.4%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.09706
Minimum,-1
Maximum,5
Zeros (%),61.8%

0,1
Minimum,-1.0
5-th percentile,-1.0
Q1,0.0
Median,0.0
Q3,0.58622
95-th percentile,0.58622
Maximum,5.0
Range,6.0
Interquartile range,0.58622

0,1
Standard deviation,0.49434
Coef of variation,5.0931
Kurtosis,25.05
Mean,0.09706
MAD,0.31146
Skewness,1.9102
Sum,1941.2
Variance,0.24437
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,12366,61.8%,
0.5862238555843402,5836,29.2%,
-1.0,1745,8.7%,
5.0,53,0.3%,

Value,Count,Frequency (%),Unnamed: 3
-1.0,1745,8.7%,
0.0,12366,61.8%,
0.5862238555843402,5836,29.2%,
5.0,53,0.3%,

Value,Count,Frequency (%),Unnamed: 3
-1.0,1745,8.7%,
0.0,12366,61.8%,
0.5862238555843402,5836,29.2%,
5.0,53,0.3%,

0,1
Distinct count,28
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1020.7
Minimum,1013.5
Maximum,1038
Zeros (%),0.0%

0,1
Minimum,1013.5
5-th percentile,1015.5
Q1,1016.7
Median,1020.0
Q3,1022.5
95-th percentile,1029.0
Maximum,1038.0
Range,24.5
Interquartile range,5.8449

0,1
Standard deviation,4.6292
Coef of variation,0.0045353
Kurtosis,1.6297
Mean,1020.7
MAD,3.5275
Skewness,1.2808
Sum,20414000
Variance,21.43
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
1016.6551242422306,3723,18.6%,
1021.5,1547,7.7%,
1023.0,1358,6.8%,
1021.0,1303,6.5%,
1029.0,1217,6.1%,
1019.5,1194,6.0%,
1020.0,948,4.7%,
1022.5,868,4.3%,
1017.0,853,4.3%,
1019.0,738,3.7%,

Value,Count,Frequency (%),Unnamed: 3
1013.5,36,0.2%,
1014.0,142,0.7%,
1014.5,142,0.7%,
1015.0,53,0.3%,
1015.5,661,3.3%,

Value,Count,Frequency (%),Unnamed: 3
1029.0,1217,6.1%,
1030.0,609,3.0%,
1036.0,49,0.2%,
1037.0,292,1.5%,
1038.0,49,0.2%,

0,1
Distinct count,34
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,190.08
Minimum,0
Maximum,360
Zeros (%),11.0%

0,1
Minimum,0
5-th percentile,0
Q1,50
Median,240
Q3,280
95-th percentile,350
Maximum,360
Range,360
Interquartile range,230

0,1
Standard deviation,121.46
Coef of variation,0.639
Kurtosis,-1.2861
Mean,190.08
MAD,105.31
Skewness,-0.39749
Sum,3801600
Variance,14753
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
182.18337445987967,2821,14.1%,
0.0,2194,11.0%,
270.0,1813,9.1%,
300.0,1420,7.1%,
350.0,1398,7.0%,
280.0,1106,5.5%,
340.0,1093,5.5%,
20.0,912,4.6%,
10.0,877,4.4%,
250.0,787,3.9%,

Value,Count,Frequency (%),Unnamed: 3
0.0,2194,11.0%,
10.0,877,4.4%,
20.0,912,4.6%,
30.0,675,3.4%,
40.0,66,0.3%,

Value,Count,Frequency (%),Unnamed: 3
310.0,102,0.5%,
330.0,474,2.4%,
340.0,1093,5.5%,
350.0,1398,7.0%,
360.0,82,0.4%,

0,1
Distinct count,19
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.0241
Minimum,0
Maximum,7.6992
Zeros (%),11.0%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,2.0996
Median,3.0996
Q3,4.1016
95-th percentile,5.6992
Maximum,7.6992
Range,7.6992
Interquartile range,2.002

0,1
Standard deviation,1.5809
Coef of variation,0.52277
Kurtosis,0.44541
Mean,3.0241
MAD,1.1913
Skewness,-0.091182
Sum,60481
Variance,2.4992
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
4.1015625,3414,17.1%,
3.099609375,3071,15.4%,
0.0,2194,11.0%,
1.5,2109,10.5%,
3.457863086317877,2063,10.3%,
3.599609375,1784,8.9%,
2.599609375,1717,8.6%,
2.099609375,766,3.8%,
4.6015625,662,3.3%,
5.1015625,577,2.9%,

Value,Count,Frequency (%),Unnamed: 3
0.0,2194,11.0%,
0.5,53,0.3%,
1.0,242,1.2%,
1.5,2109,10.5%,
2.099609375,766,3.8%,

Value,Count,Frequency (%),Unnamed: 3
5.1015625,577,2.9%,
5.69921875,308,1.5%,
6.19921875,358,1.8%,
7.19921875,89,0.4%,
7.69921875,269,1.3%,

0,1
Distinct count,117
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,50.764
Minimum,1
Maximum,118
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,13.0
Q1,50.802
Median,50.802
Q3,50.802
95-th percentile,94.0
Maximum,118.0
Range,117.0
Interquartile range,0.0

0,1
Standard deviation,19.111
Coef of variation,0.37647
Kurtosis,2.7678
Mean,50.764
MAD,9.722
Skewness,0.58698
Sum,1015300
Variance,365.22
Memory size,312.5 KiB

Value,Count,Frequency (%),Unnamed: 3
50.801937100054566,11903,59.5%,
42.0,538,2.7%,
54.0,268,1.3%,
52.0,234,1.2%,
50.0,194,1.0%,
43.0,182,0.9%,
48.0,179,0.9%,
14.0,176,0.9%,
58.0,175,0.9%,
51.0,173,0.9%,

Value,Count,Frequency (%),Unnamed: 3
1.0,9,0.0%,
2.0,54,0.3%,
3.0,18,0.1%,
4.0,98,0.5%,
5.0,90,0.4%,

Value,Count,Frequency (%),Unnamed: 3
113.0,9,0.0%,
114.0,25,0.1%,
115.0,42,0.2%,
116.0,16,0.1%,
118.0,54,0.3%,

Unnamed: 0,building_id,meter,meter_reading,site_id,primary_use,square_feet,year_built,floor_count,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,sea_level_pressure,wind_direction,wind_speed,age
0,0,0,0.0,0,0,7432,2008,-999,25.0,6,20.0,0.586224,1019.5,0.0,0.0,10.0
1,1,0,0.0,0,0,2720,2004,-999,25.0,6,20.0,0.586224,1019.5,0.0,0.0,14.0
2,2,0,0.0,0,0,5376,1991,-999,25.0,6,20.0,0.586224,1019.5,0.0,0.0,27.0
3,3,0,0.0,0,0,23685,2002,-999,25.0,6,20.0,0.586224,1019.5,0.0,0.0,16.0
4,4,0,0.0,0,0,116607,1975,-999,25.0,6,20.0,0.586224,1019.5,0.0,0.0,43.0


**Handling categorical features**

In [0]:
'''#dendo=msno.dendrogram(train)

matrix = msno.matrix(train.sample(200))'''
le = LabelEncoder()

# train_df['primary_use'] = train_df['primary_use'].astype(str)
train['primary_use'] = le.fit_transform(train['primary_use']).astype(np.int8)

# test_df['primary_use'] = test_df['primary_use'].astype(str)
test['primary_use'] = le.fit_transform(test['primary_use']).astype(np.int8)

**Feature engineering**

Deriving age from year built.

In [0]:
train['age'] = train['year_built'].max() - train['year_built'] + 1
test['age'] = test['year_built'].max() - test['year_built'] + 1

**Handling missing values**

To streamline this though process it is useful to know the 3 categories in which missing data can be classified into:
1.   Missing Completely at Random (MCAR)
2.   Missing at Random (MAR)
3.   Missing Not at Random (MNAR)



In [0]:
train['floor_count'] = train['floor_count'].fillna(-999).astype(np.int16)
test['floor_count'] = test['floor_count'].fillna(-999).astype(np.int16)

train['year_built'] = train['year_built'].fillna(-999).astype(np.int16)
test['year_built'] = test['year_built'].fillna(-999).astype(np.int16)

#train['age'] = train['age'].fillna(-999).astype(np.int16)
#test['age'] = test['age'].fillna(-999).astype(np.int16)

train['cloud_coverage'] = train['cloud_coverage'].fillna(-999).astype(np.int16)
test['cloud_coverage'] = test['cloud_coverage'].fillna(-999).astype(np.int16) 

**Handling missing data with imputer(For train)**

In [0]:
import numpy as np
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

In [27]:

imputer.fit(train[["air_temperature"]])
train['air_temperature'] = imputer.transform(train[["air_temperature"]])
print("missing values in precip_depth_1_hr(after imputer):",train.precip_depth_1_hr.isnull().sum())

print("missing values in Air_temperature(before imputer):",train.air_temperature.isnull().sum())
imputer.fit(train[["precip_depth_1_hr"]])
train['precip_depth_1_hr'] = imputer.transform(train[["precip_depth_1_hr"]])
print("missing values in precip_depth_1_hr(after imputer):",train.precip_depth_1_hr.isnull().sum())


print("missing values in dew_temperature(before imputer):",train.dew_temperature.isnull().sum())
imputer.fit(train[["dew_temperature"]])
train['dew_temperature'] = imputer.transform(train[["dew_temperature"]])
print("missing values in dew_temperature(after imputer):",train.dew_temperature.isnull().sum())


print("missing values in wind_direction(before imputer):",train.wind_direction.isnull().sum())
imputer.fit(train[["wind_direction"]])
train['wind_direction'] = imputer.transform(train[["wind_direction"]])
print("missing values in wind_direction(after imputer):",train.wind_direction.isnull().sum())



print("missing values in sea_level_pressure(before imputer):",train.sea_level_pressure.isnull().sum())
imputer.fit(train[["sea_level_pressure"]])
train['sea_level_pressure'] = imputer.transform(train[["sea_level_pressure"]])
print("missing values in sea_level_pressure(after imputer):",train.sea_level_pressure.isnull().sum())


print("missing values in wind_speed(before imputer):",train.wind_speed.isnull().sum())
imputer.fit(train[["wind_speed"]])
train['wind_speed'] = imputer.transform(train[["wind_speed"]])
print("missing values in wind_speed(after imputer):",train.wind_speed.isnull().sum())

print("missing values in age(before imputer):",train.age.isnull().sum())
imputer.fit(train[["age"]])
train['age'] = imputer.transform(train[["age"]])
print("missing values in age(after imputer):",train.age.isnull().sum())

missing values in precip_depth_1_hr(after imputer): 186681
missing values in Air_temperature(before imputer): 0
missing values in precip_depth_1_hr(after imputer): 0
missing values in dew_temperature(before imputer): 22190
missing values in dew_temperature(after imputer): 0
missing values in wind_direction(before imputer): 65486
missing values in wind_direction(after imputer): 0
missing values in sea_level_pressure(before imputer): 108073
missing values in sea_level_pressure(after imputer): 0
missing values in wind_speed(before imputer): 23378
missing values in wind_speed(after imputer): 0
missing values in age(before imputer): 596820
missing values in age(after imputer): 0


In [29]:
train.isnull().any()

building_id           False
meter                 False
meter_reading         False
site_id               False
primary_use           False
square_feet           False
year_built            False
floor_count           False
air_temperature       False
cloud_coverage        False
dew_temperature       False
precip_depth_1_hr     False
sea_level_pressure    False
wind_direction        False
wind_speed            False
age                   False
dtype: bool

Removing timestamp for the model.


In [0]:
train = train.drop(columns=['timestamp'])

Phase-3: Splitting test and train data

Train = 85%
Test  = 15%

In [0]:
from sklearn.model_selection import train_test_split

#train = train.iloc[0:1000000,]
train_set, test_set = train_test_split(train, test_size=0.15)


target = 'meter'
features = list(train.columns)
features = [f for f in features if f!=target]


X_tr = train_set[features]
y_tr = train_set[[target]]

X_te = test_set[features]
y_te = test_set[[target]]



Scaling the training data

In [0]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_tr = scaler.fit_transform(X_tr)
X_te = scaler.transform(X_te)

In [0]:
#pdp.ProfileReport(train.iloc[:20000])

In [0]:
#cost function
def rmse(predicted, test):
  lin_reg_mse = mean_squared_error(test, predicted)
  lin_reg_rmse = np.sqrt(lin_reg_mse)
  print(lin_reg_rmse)

Phase-4: Model selection



> We choosed 4 models:
1. Linear Regression
2. Ridge
3. Light Gradient Boosting Model(for regression)
4. Neural Networks regressor.




In [36]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from time import time

lin_reg = LinearRegression()
start = time()
lin_reg.fit(X_tr, np.ravel(y_tr))
print("lin_reg took %.2f seconds for candidates"
      " parameter settings." % (time() - start))

pred_train= lin_reg.predict(X_tr)
pred_test = lin_reg.predict(X_te)

lin_reg took 0.28 seconds for candidates parameter settings.


Training and test error for detecting overfitting or underfitting.

In [40]:
print("Train rmse:")
train_rmse = rmse(pred_train, y_tr)
print("\nTest rmse:")
test_rmse = rmse(pred_test, y_te)  

Train rmse:
0.8716823561554352

Test rmse:
0.8706234603590223


Ridge Model

In [41]:
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV


param_grid = [{'alpha': [0.600,0.69,0.699,0.7, 0.705, 0.777,0.799, 0.8, 0.889,0.899, 0.9,]}]
scoring_method = 'r2'

grid_search_rr = GridSearchCV(Ridge(), param_grid, cv=5, scoring='neg_mean_squared_error', verbose= 2)
start = time()
grid_search_rr.fit(X_tr, y_tr)
print("grid_search_rr took %.2f seconds"
      " parameter settings." % (time() - start))

Fitting 3 folds for each of 11 candidates, totalling 33 fits
[CV] alpha=0.6 .......................................................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV] ........................................ alpha=0.6, total=   0.2s
[CV] alpha=0.6 .......................................................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s


[CV] ........................................ alpha=0.6, total=   0.2s
[CV] alpha=0.6 .......................................................
[CV] ........................................ alpha=0.6, total=   0.2s
[CV] alpha=0.69 ......................................................
[CV] ....................................... alpha=0.69, total=   0.2s
[CV] alpha=0.69 ......................................................
[CV] ....................................... alpha=0.69, total=   0.2s
[CV] alpha=0.69 ......................................................
[CV] ....................................... alpha=0.69, total=   0.2s
[CV] alpha=0.699 .....................................................
[CV] ...................................... alpha=0.699, total=   0.2s
[CV] alpha=0.699 .....................................................
[CV] ...................................... alpha=0.699, total=   0.2s
[CV] alpha=0.699 .....................................................
[CV] .

[Parallel(n_jobs=1)]: Done  33 out of  33 | elapsed:    7.0s finished


Training and test error for detecting overfitting or underfitting.

In [46]:
print("Best Params: ")
print(grid_search_rr.best_params_)
print("sqrt: ")
print(np.sqrt(-grid_search_rr.best_score_))
print("Best Score: ")
print(grid_search_rr.best_score_)

rr_model = grid_search_rr.best_estimator_
pred_ridge = rr_model.predict(X_te)

print("\n**Test rmse:")
rmse(pred_ridge, y_te)

pred_ridge = rr_model.predict(X_tr)
print("\n**Train rmse:")
rmse(pred_ridge, y_tr)

Best Params: 
{'alpha': 0.9}
sqrt: 
0.8717186788912231
Best Score: 
-0.7598934551278593

**Test rmse:
0.870623431363034

**Train rmse:
0.871682356236957


In [0]:
lgb_train = lgb.Dataset(X_tr, y_tr)
lgb_val = lgb.Dataset(X_te, y_te)

import lightgbm as lgb


params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': 'rmsle',
    'max_depth': 8,
    'num_leaves': 77, 
    'learning_rate': 0.09,
    'min_samples_split' : 700,
    'verbose': 10,
    'subsample' : 0.8,
    'n_estimators' : 160,
    'random_state': 8,
    'max_features': 7,
    'num_iterations' :3500
    }


lgbm_model = lgb.train(params, train_set = lgb_train, valid_sets = lgb_val, verbose_eval=1)




In [0]:
test_pred_lgb = lgbm_model.predict(X_te)
train_pred_lgb = lgbm_model.predict(X_tr)

In [178]:
print("\n**Train rmse:")
rmse(train_pred_lgb, y_tr)

print("\n**Test rmse:")
test_rmse_lgb=rmse(test_pred_lgb, y_te)


**Train rmse:
0.24226924729675892

**Test rmse:
0.29156041425987694


Neural Network Model: MLPRegressor 

In [0]:
from sklearn.neural_network import MLPRegressor
MLP_reg = MLPRegressor(activation='relu',max_iter = 15, verbose = 'true', hidden_layer_sizes =(5,), solver='adam',
                                     learning_rate='adaptive',)


In [0]:
param_grid = {'alpha': [0.001,0.005,0.0075, 0.008, 0.009]
              }
grid_search_MLPR = GridSearchCV(MLP_reg, param_grid,  cv=5, scoring='neg_mean_squared_error', verbose = 2)

In [61]:
start = time()
grid_search_MLPR.fit(X_tr, y_tr)
print("grid_search_MLPR took %.2f seconds"
      " parameter settings." % (time() - start))

grid_search

Fitting 5 folds for each of 4 candidates, totalling 20 fits
[CV] alpha=0.001 .....................................................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Iteration 1, loss = 0.38873593
Iteration 2, loss = 0.34305503
Iteration 3, loss = 0.32364677
Iteration 4, loss = 0.31374757
Iteration 5, loss = 0.30907719
Iteration 6, loss = 0.30540030
Iteration 7, loss = 0.30297993
Iteration 8, loss = 0.30094266
Iteration 9, loss = 0.29743212
Iteration 10, loss = 0.29541430
Iteration 11, loss = 0.29444251
Iteration 12, loss = 0.29329471
Iteration 13, loss = 0.29244844
Iteration 14, loss = 0.29205906
Iteration 15, loss = 0.29187237
[CV] ...................................... alpha=0.001, total=  12.8s
[CV] alpha=0.001 .....................................................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   12.8s remaining:    0.0s


Iteration 1, loss = 0.41091447
Iteration 2, loss = 0.35993593
Iteration 3, loss = 0.33916680
Iteration 4, loss = 0.32338423
Iteration 5, loss = 0.31340696
Iteration 6, loss = 0.30596808
Iteration 7, loss = 0.30170923
Iteration 8, loss = 0.29918307
Iteration 9, loss = 0.29775014
Iteration 10, loss = 0.29666650
Iteration 11, loss = 0.29595815
Iteration 12, loss = 0.29533380
Iteration 13, loss = 0.29503780
Iteration 14, loss = 0.29467755
Iteration 15, loss = 0.29445425
[CV] ...................................... alpha=0.001, total=  12.6s
[CV] alpha=0.001 .....................................................
Iteration 1, loss = 0.40585525
Iteration 2, loss = 0.35627555
Iteration 3, loss = 0.33034366
Iteration 4, loss = 0.31657454
Iteration 5, loss = 0.30913075
Iteration 6, loss = 0.30557666
Iteration 7, loss = 0.30363233
Iteration 8, loss = 0.30256899
Iteration 9, loss = 0.30213105
Iteration 10, loss = 0.30171580
Iteration 11, loss = 0.30156410
Iteration 12, loss = 0.30136618
Iteration 13

[Parallel(n_jobs=1)]: Done  20 out of  20 | elapsed:  4.2min finished


Iteration 1, loss = 0.38974316
Iteration 2, loss = 0.34661094
Iteration 3, loss = 0.32434640
Iteration 4, loss = 0.31347257
Iteration 5, loss = 0.30760827
Iteration 6, loss = 0.30420281
Iteration 7, loss = 0.30012482
Iteration 8, loss = 0.29596266
Iteration 9, loss = 0.29329561
Iteration 10, loss = 0.29127001
Iteration 11, loss = 0.29040038
Iteration 12, loss = 0.28991588
Iteration 13, loss = 0.28954651
Iteration 14, loss = 0.28929117
Iteration 15, loss = 0.28890891
grid_search_MLPR took 270.47 seconds parameter settings.


In [195]:
#model=grid_search_MLPR.best_estimator_
model = MLPRegressor(activation='relu', alpha=0.05, batch_size='auto', beta_1=0.9,
             beta_2=0.999, early_stopping=False, epsilon=1e-08,
             hidden_layer_sizes=50, learning_rate='adaptive',
             learning_rate_init=0.001, max_iter=100, momentum=0.9,
             n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
             random_state=1000, shuffle=True, solver='adam', tol=0.0001,
             validation_fraction=0.15, verbose='true', warm_start=False)

model.fit(X_tr,y_tr)

Iteration 1, loss = 0.35332349
Iteration 2, loss = 0.32150534
Iteration 3, loss = 0.30796254
Iteration 4, loss = 0.30155866
Iteration 5, loss = 0.29796846
Iteration 6, loss = 0.29603591
Iteration 7, loss = 0.29463402
Iteration 8, loss = 0.29397760
Iteration 9, loss = 0.29286618
Iteration 10, loss = 0.29266735
Iteration 11, loss = 0.29193542
Iteration 12, loss = 0.29180998
Iteration 13, loss = 0.29140392
Iteration 14, loss = 0.29109353
Iteration 15, loss = 0.29081943
Iteration 16, loss = 0.29055331
Iteration 17, loss = 0.29031810
Iteration 18, loss = 0.29039018
Iteration 19, loss = 0.29019613
Iteration 20, loss = 0.29006317
Iteration 21, loss = 0.28971013
Iteration 22, loss = 0.28966246
Iteration 23, loss = 0.28934025
Iteration 24, loss = 0.28938069
Iteration 25, loss = 0.28925457
Iteration 26, loss = 0.28927816
Iteration 27, loss = 0.28944324
Iteration 28, loss = 0.28872742
Iteration 29, loss = 0.28875200
Iteration 30, loss = 0.28861450
Iteration 31, loss = 0.28868305
Iteration 32, los

MLPRegressor(activation='relu', alpha=0.05, batch_size='auto', beta_1=0.9,
             beta_2=0.999, early_stopping=False, epsilon=1e-08,
             hidden_layer_sizes=50, learning_rate='adaptive',
             learning_rate_init=0.001, max_iter=100, momentum=0.9,
             n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
             random_state=1000, shuffle=True, solver='adam', tol=0.0001,
             validation_fraction=0.15, verbose='true', warm_start=False)

In [196]:
pred_mlrp = model.predict(X_tr)
print("\n**Train rmse:")
rmse(pred_mlrp, y_tr)

pred_mlrp = model.predict(X_te)

print("\n**Test rmse:")
rmse(pred_mlrp, y_te)


**Train rmse:
0.7329422470347852

**Test rmse:
0.7324916614472208
