# **Artificial Neural Network in R for Regression**

## **Data Set Information:**
The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.

**Attribute Information:**

Features consist of hourly average ambient variables
* Temperature (T) in the range 1.81°C and 37.11°C,
* Ambient Pressure (AP) in the range 992.89-1033.30 milibar,
* Relative Humidity (RH) in the range 25.56% to 100.16%
* Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
* Net hourly electrical energy output (EP) 420.26-495.76 MW

## **Part - 1: Data Preprocessing**

### **Importing the dataset**

In [1]:
ds = read.csv('/content/Power Plant Data.csv')
head(ds)

Unnamed: 0_level_0,AT,V,AP,RH,PE
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,14.96,41.76,1024.07,73.17,463.26
2,25.18,62.96,1020.04,59.08,444.37
3,5.11,39.4,1012.16,92.14,488.56
4,20.86,57.32,1010.24,76.64,446.48
5,10.82,37.5,1009.23,96.62,473.9
6,26.27,59.44,1012.23,58.77,443.67


In [2]:
summary(ds)

       AT              V               AP               RH        
 Min.   : 1.81   Min.   :25.36   Min.   : 992.9   Min.   : 25.56  
 1st Qu.:13.51   1st Qu.:41.74   1st Qu.:1009.1   1st Qu.: 63.33  
 Median :20.34   Median :52.08   Median :1012.9   Median : 74.97  
 Mean   :19.65   Mean   :54.31   Mean   :1013.3   Mean   : 73.31  
 3rd Qu.:25.72   3rd Qu.:66.54   3rd Qu.:1017.3   3rd Qu.: 84.83  
 Max.   :37.11   Max.   :81.56   Max.   :1033.3   Max.   :100.16  
       PE       
 Min.   :420.3  
 1st Qu.:439.8  
 Median :451.6  
 Mean   :454.4  
 3rd Qu.:468.4  
 Max.   :495.8  

In [3]:
#find number of rows with missing values
sum(!complete.cases(ds))

### **Splitting the dataset into the Training set and Test set**

In [4]:
install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(ds$PE, SplitRatio = 4/5)
head(split)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



In [5]:
train_set = subset(ds, split == TRUE)
test_set = subset(ds, split == FALSE)
print(head(train_set,5))

print(head(test_set,3))

     AT     V      AP    RH     PE
1 14.96 41.76 1024.07 73.17 463.26
2 25.18 62.96 1020.04 59.08 444.37
3  5.11 39.40 1012.16 92.14 488.56
6 26.27 59.44 1012.23 58.77 443.67
7 15.89 43.96 1014.02 75.24 467.35
     AT     V      AP    RH     PE
4 20.86 57.32 1010.24 76.64 446.48
5 10.82 37.50 1009.23 96.62 473.90
8  9.48 44.71 1019.12 66.43 478.42


## **Part 2 - Building & Training the ANN**

In [6]:
install.packages("h2o")
library(h2o)
h2o.init(nthreads = -1)
model = h2o.deeplearning(y = 'PE',
                         training_frame = as.h2o(train_set),
                         activation = 'Rectifier',
                         hidden = c(5,5),
                         epochs = 100,
                         stopping_metric="MSE", ## could be "misclassification","logloss","r2"
                         train_samples_per_iteration = -2)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependency ‘RCurl’



----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit https://docs.h2o.ai

----------------------------------------------------------------------



Attaching package: ‘h2o’


The following objects are masked from ‘package:stats’:

    cor, sd, var


The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc





H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpbrGEff/file1257ca93818/h2o_UnknownUser_started_from_r.out
    /tmp/RtmpbrGEff/file1253ecb9e8f/h2o_UnknownUser_started_from_r.err


Starting H2O JVM and connecting: .... Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 764 milliseconds 
    H2O cluster timezone:       Etc/UTC 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.36.0.3 
    H2O cluster version age:    1 month and 4 days  
    H2O cluster name:           H2O_started_from_R_root_ppf172 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   3.17 GB 
    H2O cluster total cores:    2 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:   

## **Part 3 - Making the predictions and evaluating the model**

### **Predicting the Test set results**

In [7]:
y_pred = h2o.predict(model, newdata = as.h2o(test_set[-5]))

y_pred = as.data.frame(y_pred)




In [8]:
summary(model)

Model Details:

H2ORegressionModel: deeplearning
Model Key:  DeepLearning_model_R_1647874682704_1 
Status of Neuron Layers: predicting PE, regression, gaussian distribution, Quadratic loss, 61 weights/biases, 4.4 KB, 765,400 training samples, mini-batch size 1
  layer units      type dropout       l1       l2 mean_rate rate_rms momentum
1     1     4     Input  0.00 %       NA       NA        NA       NA       NA
2     2     5 Rectifier  0.00 % 0.000000 0.000000  0.001753 0.000660 0.000000
3     3     5 Rectifier  0.00 % 0.000000 0.000000  0.016275 0.044116 0.000000
4     4     1    Linear      NA 0.000000 0.000000  0.001900 0.000975 0.000000
  mean_weight weight_rms mean_bias bias_rms
1          NA         NA        NA       NA
2   -0.133833   0.401025  0.514058 0.586826
3   -0.200495   0.803140  0.832801 0.288028
4   -0.203933   0.584445  0.103761 0.000000

H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **

MSE:  17.1261

In [9]:
y_test = as.data.frame(test_set$PE)

### **Metrics**

In [10]:
d = y_pred - y_test
head(d)

Unnamed: 0_level_0,predict
Unnamed: 0_level_1,<dbl>
1,1.32983971
2,-0.08426849
3,1.64182828
4,4.85579652
5,-3.95473218
6,6.07932437


In [11]:
d = as.double(unlist(d))
d

In [12]:
mse = mean((d)^2)
mae = mean(abs(d))
rmse = sqrt(mse)
R2 = 1-(sum((d)^2)/sum((test_set$PE-mean(test_set$PE))^2))

cat(" MAE:", mae, "\n", "MSE:", mse, "\n", 
    "RMSE:", rmse, "\n", "R-squared:", R2)

 MAE: 3.170411 
 MSE: 17.36656 
 RMSE: 4.16732 
 R-squared: 0.939426

In [16]:
#install.packages('hydroGOF')
library(hydroGOF)
 mae = mae(y_pred, test_set$PE)
mse = mse(y_pred, test_set$PE)
#Calculate RMSE 
RMSE=rmse(y_pred,test_set$PE)
cat(" MAE:", mae, "\n", "MSE:", mse, "\n", 
    "RMSE:", rmse, "\n")

 MAE: 3.170411 
 MSE: 17.36656 
 RMSE: 4.16732 


In [17]:
#Correlation, Variance and Covariance
Correlation = cor(y_pred, test_set$PE)
Covariance = cov(y_pred, test_set$PE)
Variance = var(y_pred, test_set$PE)
cat("Correlation:", Correlation, "\n", "Covariance:", Covariance, "\n", 
    "Variance:", Variance, "\n")

Correlation: 0.9700344 
 Covariance: 278.987 
 Variance: 278.987 


In [18]:
h2o.shutdown()

Are you sure you want to shutdown the H2O instance running at http://localhost:54321/ (Y/N)? Y
