# **Artificial Neural Network in R**

## **Part 1 - Data Preprocessing**

### **Importing the dataset**

In [1]:
ds = read.csv('/content/Churn_Modelling.csv')
head(ds)

Unnamed: 0_level_0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
Unnamed: 0_level_1,<int>,<int>,<chr>,<int>,<chr>,<chr>,<int>,<int>,<dbl>,<int>,<int>,<int>,<dbl>,<int>
1,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
2,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
3,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
4,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
5,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
6,6,15574012,Chu,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1


In [2]:
ds = ds[4:14]

### **Encoding the categorical variables as factors**

In [3]:
ds$Geography = as.numeric(factor(ds$Geography,
                                      levels = c('France', 'Spain', 'Germany'),
                                      labels = c(1, 2, 3)))
ds$Gender = as.numeric(factor(ds$Gender,
                                   levels = c('Female', 'Male'),
                                   labels = c(1, 2)))
head(ds)

Unnamed: 0_level_0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<int>,<dbl>,<int>
1,619,1,1,42,2,0.0,1,1,1,101348.88,1
2,608,2,1,41,1,83807.86,1,0,1,112542.58,0
3,502,1,1,42,8,159660.8,3,1,0,113931.57,1
4,699,1,1,39,1,0.0,2,0,0,93826.63,0
5,850,2,1,43,2,125510.82,1,1,1,79084.1,0
6,645,2,2,44,8,113755.78,2,1,0,149756.71,1


### **Splitting the dataset into the Training set and Test set**

In [4]:
install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(ds$Exited, SplitRatio = 0.8)
training_set = subset(ds, split == TRUE)
test_set = subset(ds, split == FALSE)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependency ‘bitops’




### **Feature Scaling**

In [5]:
training_set[-11] = scale(training_set[-11])
test_set[-11] = scale(test_set[-11])

## **Part 2 - Building & Training the ANN**

### **Fitting ANN to the Training set**

In [8]:
install.packages("h2o")
library(h2o)
h2o.init(nthreads = -1)
model = h2o.deeplearning(y = 'Exited',
                         training_frame = as.h2o(training_set),
                         activation = 'Rectifier',
                         hidden = c(5,5),
                         epochs = 100,
                         train_samples_per_iteration = -2)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependency ‘RCurl’



----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit https://docs.h2o.ai

----------------------------------------------------------------------



Attaching package: ‘h2o’


The following objects are masked from ‘package:stats’:

    cor, sd, var


The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc





H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpLaWNQq/file3a78838be/h2o_UnknownUser_started_from_r.out
    /tmp/RtmpLaWNQq/file3a7e30fc30/h2o_UnknownUser_started_from_r.err


Starting H2O JVM and connecting: .... Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 721 milliseconds 
    H2O cluster timezone:       Etc/UTC 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.36.0.3 
    H2O cluster version age:    1 month and 4 days  
    H2O cluster name:           H2O_started_from_R_root_cti312 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   3.17 GB 
    H2O cluster total cores:    2 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:      

“Response is numeric, so the regression model will be trained. However, the cardinality is equaled to two, so if you want to train a classification model, convert the response column to categorical before training..
”




## **Part 3 - Making the predictions and evaluating the model**

### **Predicting the Test set results**

In [9]:
y_pred = h2o.predict(model, newdata = as.h2o(test_set[-11]))
y_pred = (y_pred > 0.5)
y_pred = as.vector(y_pred)

# Making the Confusion Matrix
cm = table(test_set[, 11], y_pred)



### **Evaluation Metrics**

In [10]:
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
diag = diag(cm) # number of correctly classified instances per class 
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
p = rowsums / n # distribution of instances over the actual classes
q = colsums / n # distribution of instances over the predicted classes
accuracy = sum(diag) / n 
cat("Accuracy of Decision Tree Model is:", accuracy)  
precision = diag / colsums 
recall = diag / rowsums 
f1 = 2 * precision * recall / (precision + recall) 
data.frame(precision, recall, f1)

Accuracy of Decision Tree Model is: 0.867

Unnamed: 0_level_0,precision,recall,f1
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
0,0.878494,0.9667294,0.9205021
1,0.7854251,0.4766585,0.5932722


In [11]:
h2o.shutdown()

Are you sure you want to shutdown the H2O instance running at http://localhost:54321/ (Y/N)? Y
