### DecisionTree_Power Distribution 

#### Problem Statement

Create Regression model which can able to predict the household power consumption based on given data values

1.date: Date in format dd/mm/yyyy

2.time: time in format hh:mm:ss

3.global_active_power: household global minute-averaged active power (in kilowatt)

4.global_reactive_power: household global minute-averaged reactive power (in kilowatt)

5.voltage: minute-averaged voltage (in volt)

6.global_intensity: household global minute-averaged current intensity (in ampere)

7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).

8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.

9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

#### Approach To Model

1. Data collection
2. EDA
3. Feature Engineering
4. Modelling
5. Hyperparameter tuening
6. Evaluating Metrics

EDA and Feature Engineering for power distribution : https://github.com/Murali423/ML-assingment/blob/master/Householdpower-assignment.ipynb 

Can be seen in this post

LinkedIn post: https://www.linkedin.com/posts/murali-divya-teja-gummadidala-machine-learning_linearridge-assignment-activity-6994367928766722048-U-LC?utm_source=share&utm_medium=member_desktop

In [2]:
import pandas as pd
import numpy as np

#visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:
df = pd.read_csv(r'C:\Users\HP\Documents\ML-Assignment\ML-assingment\Regression\Household_FE.csv')
df.head()

Unnamed: 0,Global_active_power,Global_reactive_power,Voltage,Sub_metering_1,Sub_metering_2,Sub_metering_3,Year,Month,Day,Hours,Minute,Total_metering,Energy_minute
0,0.197943,0.049246,239.13,-0.0,-0.0,-0.0,2007,10,15,23,57,-0.0,4.133333
1,0.240113,-0.0,245.87,-0.0,-0.0,0.598538,2010,2,11,3,27,0.632205,4.3
2,0.435888,0.139043,241.34,-0.0,0.41599,-0.0,2008,5,25,17,3,0.950787,11.1
3,0.188924,0.095637,242.79,-0.0,-0.0,0.598538,2009,8,9,11,59,0.632205,2.9
4,0.252441,0.074875,242.65,-0.0,-0.0,-0.0,2008,2,19,22,53,-0.0,5.666667


In [5]:
df.columns

Index(['Global_active_power', 'Global_reactive_power', 'Voltage',
       'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3', 'Year', 'Month',
       'Day', 'Hours', 'Minute', 'Total_metering', 'Energy_minute'],
      dtype='object')

In [4]:
from sklearn.model_selection import train_test_split

In [6]:
X = df.drop(['Energy_minute'],axis=1)
X.columns

Index(['Global_active_power', 'Global_reactive_power', 'Voltage',
       'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3', 'Year', 'Month',
       'Day', 'Hours', 'Minute', 'Total_metering'],
      dtype='object')

In [7]:
y = df['Energy_minute']


In [8]:
y.head()

0     4.133333
1     4.300000
2    11.100000
3     2.900000
4     5.666667
Name: Energy_minute, dtype: float64

In [9]:
X.head()

Unnamed: 0,Global_active_power,Global_reactive_power,Voltage,Sub_metering_1,Sub_metering_2,Sub_metering_3,Year,Month,Day,Hours,Minute,Total_metering
0,0.197943,0.049246,239.13,-0.0,-0.0,-0.0,2007,10,15,23,57,-0.0
1,0.240113,-0.0,245.87,-0.0,-0.0,0.598538,2010,2,11,3,27,0.632205
2,0.435888,0.139043,241.34,-0.0,0.41599,-0.0,2008,5,25,17,3,0.950787
3,0.188924,0.095637,242.79,-0.0,-0.0,0.598538,2009,8,9,11,59,0.632205
4,0.252441,0.074875,242.65,-0.0,-0.0,-0.0,2008,2,19,22,53,-0.0


In [10]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.33,random_state=42)

In [11]:
X_train.head()

Unnamed: 0,Global_active_power,Global_reactive_power,Voltage,Sub_metering_1,Sub_metering_2,Sub_metering_3,Year,Month,Day,Hours,Minute,Total_metering
23990,0.167505,0.077479,236.26,-0.0,-0.0,-0.0,2007,4,27,6,12,-0.0
8729,0.233211,-0.0,239.62,-0.0,-0.0,0.598538,2009,4,23,12,33,0.632205
3451,0.615852,0.114589,242.61,-0.0,0.356905,1.675643,2010,9,18,20,42,2.076484
2628,0.190225,-0.0,241.53,-0.0,-0.0,-0.0,2008,2,10,4,28,-0.0
38352,0.625605,-0.0,235.47,-0.0,-0.0,1.646295,2008,12,2,9,58,2.007244


In [12]:
X_train.shape

(33500, 12)

In [13]:
y_train.shape

(33500,)

In [14]:
from sklearn.tree import DecisionTreeRegressor

In [15]:
model = DecisionTreeRegressor()

In [16]:
model.fit(X_train,y_train)

In [25]:
from sklearn import tree

In [26]:
fig = plt.figure(figsize=(20,6))

<Figure size 1440x432 with 0 Axes>

In [27]:
tree.plot_tree(model,filled=True)

[Text(0.45800109611166523, 0.9833333333333333, 'X[0] <= 0.629\nsquared_error = 90.546\nsamples = 33500\nvalue = 9.314'),
 Text(0.1898135159859659, 0.95, 'X[0] <= 0.307\nsquared_error = 22.467\nsamples = 27265\nvalue = 6.316'),
 Text(0.04526725450325768, 0.9166666666666666, 'X[0] <= 0.211\nsquared_error = 2.074\nsamples = 14484\nvalue = 4.073'),
 Text(0.01904303112890176, 0.8833333333333333, 'X[0] <= 0.16\nsquared_error = 0.963\nsamples = 6115\nvalue = 2.91'),
 Text(0.009705691351190223, 0.85, 'X[11] <= 0.316\nsquared_error = 0.515\nsamples = 1812\nvalue = 1.885'),
 Text(0.005155325128816591, 0.8166666666666667, 'X[0] <= 0.12\nsquared_error = 0.263\nsamples = 962\nvalue = 2.315'),
 Text(0.002360218030064302, 0.7833333333333333, 'X[0] <= 0.096\nsquared_error = 0.074\nsamples = 445\nvalue = 1.845'),
 Text(0.0009879487288676915, 0.75, 'X[0] <= 0.085\nsquared_error = 0.031\nsamples = 164\nvalue = 1.557'),
 Text(0.0005450751607545885, 0.7166666666666667, 'X[0] <= 0.077\nsquared_error = 0.001

Error in callback <function flush_figures at 0x0000029A9B6A5D30> (for post_execute):


KeyboardInterrupt: 

In [17]:
model.score(X_train,y_train)

1.0

In [18]:
y_pred = model.predict(X_test)

In [19]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [21]:
print(mean_squared_error(y_test,y_pred))
print(mean_absolute_error(y_test,y_pred))
print(np.sqrt(mean_squared_error(y_test,y_pred)))

0.9216680134680136
0.14241212121212266
0.9600354230277202


In [23]:
score = r2_score(y_test,y_pred)
print(score)

0.9897545166433215


In [24]:
## Adjusted R2 need to write
adjR = 1 - ( 1-score ) * ( len(y) - 1 ) / ( len(y) - X.shape[1] - 1 )
print(adjR)

0.9897520570878314


### Test data has the accuracy of 98% 

### Thank you