<h2 align = "center"> Neural Network </h2>

### Some Important Terms

#### Activation function

* transforms a neuron's combined input signals into a single output signal to be broadcasted further in the network (think of package your stuff to a luggage)
* Think of the graph as the sum of input signals on the x and the output signal on the y (usually on a 0-1/(-1-1) or (-inf, +inf) scale) 
* sigmoid activation (logistic sigmoid)/linear/Guassian (Radial Basis Function Network)
* Squashing function: hard to differentiate the effect of the input values with a relatively large absolute value. Standardiziong or normalizing is very important so that the features' value will fall within a small range around 0.

#### Network topology: the number of neurons and the number of layers and how they are connected

* Number of Layers (hidden layers, whether fully connected the nodes in one layer fully connected to all the nodes in another layer) 
* Whether information in the network is allowed to travel backward (feedforward networks, deep neural network) 
* The number of nodes within each layer of the network
* A neural network with at least one hidden layer of sufficient neurons is a universal function approximator. 

#### Training Algorithm:  How weights are set in order to inhibit or excite neurons in proportion to the input signal 

* Backpropagation:
* Each cycle is known as an epoch, the starting weights are typically set at random. Then the algorithm iterates therough the processes, until a stopping criterion is reached. 
* Each epoch has two phases. 1) forward phase: neurons are activated in sequence from the input layer to the output layer, applying each neuron's weights and activation function along the way 2) Backward phase: the output signal resulting from the forward phase is compared to the true target value in the training data. The difference between the network's output signal and the true value reuslts in an error that is propagated backwards in the network to modify the connection weights between neurons and reduce future errors. (Gradient Descent) 

#### Think it as the first layer produces some smaller models and the output of these models were used for the upper layer. The stack of the models can be represented by a large parameterized numeric functio and the parameters are the coefficients of all the models 

  Strenghts     | Weakness
  ------------- | -------------
  1) Can be adapted to classification or numeric prediction problems <br/> 2) capable of modeling more complex patterns  <br/>  3) make few assumptions about the data's underlying relationship| 1) extremely computationally intensive and slow to train <br />  2) Prone to overfitting the data  <br/>3) Difficult to explain 

In [1]:
##### Chapter 7: Neural Networks and Support Vector Machines -------------------

##### Part 1: Neural Networks -------------------
## Example: Modeling the Strength of Concrete  ----

## Step 2: Exploring and preparing the data ----
# read in data and examine structure\
setwd("E:/Personal/InterviewQuestion/Rscripts/Machine Learning with R, Second Edition_Code/Chapter 07")

In [2]:
concrete <- read.csv("concrete.csv")
str(concrete)

'data.frame':	1030 obs. of  9 variables:
 $ cement      : num  141 169 250 266 155 ...
 $ slag        : num  212 42.2 0 114 183.4 ...
 $ ash         : num  0 124.3 95.7 0 0 ...
 $ water       : num  204 158 187 228 193 ...
 $ superplastic: num  0 10.8 5.5 0 9.1 0 0 6.4 0 9 ...
 $ coarseagg   : num  972 1081 957 932 1047 ...
 $ fineagg     : num  748 796 861 670 697 ...
 $ age         : int  28 14 28 28 28 90 7 56 28 28 ...
 $ strength    : num  29.9 23.5 29.2 45.9 18.3 ...


In [3]:
# custom normalization function
normalize <- function(x) { 
  return((x - min(x)) / (max(x) - min(x)))
}

# apply normalization to entire data frame
concrete_norm <- as.data.frame(lapply(concrete, normalize))

# confirm that the range is now between zero and one
summary(concrete_norm$strength) 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.2664  0.4001  0.4172  0.5457  1.0000 

In [4]:
# compared to the original minimum and maximum
summary(concrete$strength)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.33   23.71   34.44   35.82   46.14   82.60 

Any transformation applied to the data prior to training the model will have to be applied in reverse later on, in order to convert back to the original units of measurements

In [5]:
# create training and test data
concrete_train <- concrete_norm[1:773, ]
concrete_test <- concrete_norm[774:1030, ]

In [6]:
## Step 3: Training a model on the data ----
# train the neuralnet model
library(neuralnet) #we can use nnet 

Loading required package: grid
Loading required package: MASS


In [7]:
# simple ANN with only a single hidden neuron
set.seed(12345) # to guarantee repeatable results
concrete_model <- neuralnet(formula = strength ~ cement + slag +
                              ash + water + superplastic + 
                              coarseagg + fineagg + age,
                              data = concrete_train)

* A neural network with a single hidden node can be thought of as a distant cousion of the linear regression models. The weight between each input node and the hidden node is similar to the regression coefficients, and the weight for the bias term is similar to the intercept.

In [14]:
# visualize the network topology
plot(concrete_model)

dev.new(): using pdf(file="Rplots10.pdf")


In [19]:
## Step 4: Evaluating model performance ----
# obtain model results
model_results <- compute(concrete_model, concrete_test[1:8])
# obtain predicted strength values
predicted_strength <- model_results$net.result
# examine the correlation between predicted and actual values
cor(predicted_strength, concrete_test$strength)

0
0.806465557619181


In [20]:
## Step 5: Improving model performance ----
# a more complex neural network topology with 5 hidden neurons
set.seed(12345) # to guarantee repeatable results
concrete_model2 <- neuralnet(strength ~ cement + slag +
                               ash + water + superplastic + 
                               coarseagg + fineagg + age,
                               data = concrete_train, hidden = 5)

# plot the network
plot(concrete_model2)

# evaluate the results as we did before
model_results2 <- compute(concrete_model2, concrete_test[1:8])
predicted_strength2 <- model_results2$net.result
cor(predicted_strength2, concrete_test$strength)

dev.new(): using pdf(file="Rplots8.pdf")


0
0.92445334258464


In [17]:
dev.off()

In [18]:
dev.off()

* training algorithm: