<a href="https://colab.research.google.com/github/dyjdlopez/intro-compvis/blob/main/modules%5Cmodule_1_deeplearning_intro%5Ccv_1_1_the_perceptron.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Module 1.1 Machine Learning Review
Copyright (c) D.Lopez 2024 | All Rights reserved <br><br>
The main difference between machine learning and traditional programming is how a solution is created.<br>
<b>Traditional Programming</b><br>
In traditional programming or the usual way, when we code a solution, we start with a set of given inputs and some rules we code so that when we run our program with the set of inputs, we get some desirable output.<br>
<b>Machine Learning</b><br>
In machine learning programming, we have a set of inputs and outputs, and we try to determine a rule, pattern, or equation that will describe their relationship.

![image](https://raw.githubusercontent.com/JiaRuiShao/TensorFlow/master/1-Introduction%20to%20Tensorflow%20for%20AI%2C%20ML%20and%20DL/images/W1.1.PNG?raw=true)<br>

In this module, we will have a start with machine learning. We’ll learn about datasets and the learning algorithms that we can use to recognize patterns between datasets.

## Part 1: Datasets
Datasets consist of data that are relevant to a certain scenario or subject of interest. It may contain numerical, text, image data, or a mix of them. Datasets, depending on where they are obtained, need to be cleaned and transformed to fit your needs.

### 1.1 Pandas

Another tool to add to the machine learning engineer's toolbox is Pandas. [Pandas](https://pandas.pydata.org/docs/#module-pandas) is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. <br>
Check out:
* [Setting up DataFrames in Pandas](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html#min-tut-01-tableoriented)
* [Reading and writing Data](https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html#min-tut-02-read-write)
* [Summarizing Stastics of a Dataset](https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html#min-tut-06-stats)

In [None]:
# !pip install pandas
import pandas as pd
import numpy as np

In [None]:
# We'll use a preset dataset available in Google Colab
ds = pd.read_csv('/content/sample_data/california_housing_train.csv')

In [None]:
ds.keys()

In [None]:
ds.describe()

### 1.2 Visualizing a dataset

In [None]:
import matplotlib.pyplot as plt

In [None]:
ds.plot(x='housing_median_age', y='median_house_value', style='o', alpha=0.2)
plt.title('RM vs MEDV', fontsize=16)
plt.xlabel('RM')
plt.ylabel('MEDV')
plt.show()

## Part 2: Curve Fitting
A fundamental concept in Pattern recognition is curve fitting. This allows our programs to do approximations and optimizations given a data set. For this section, we will use [SciKit Learn](https://scikit-learn.org/stable/index.html). SciKit Learn is one of the most valuable libraries for Data Science and Machine Learning Engineering in modeling learning algorithms. It has a range of APIs that will significantly assist in Data wrangling, data validation, supervised learning, and unsupervised learning. Check out these SciKit learn [tutorials](https://scikit-learn.org/stable/user_guide.html) for better understanding.

In [None]:
#!pip install scikit-learn
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

### 2.1 Linear Regression
Linear regression is one of the fundamental and easiest curve-fitting techniques in Pattern Recognition. Long story short, linear regression finds the best-fit first-order polynomial to a given dataset $X$. This line is represented as:
$$y = \omega X+b$$
A linear regression algorithm or a linear regressor $y$ learns the weights $\omega$ and bias $b$.

In [None]:
X = ds['population'].values.reshape(-1,1)
y = ds['total_bedrooms'].values.reshape(-1,1)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=1)

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
model_summary = pd.DataFrame(['population'], columns=['Features'])
model_summary['Weights Raw'] = model.coef_
model_summary = pd.concat([model_summary, pd.DataFrame({'Features':['Intercept'], 'Weights Raw':[float(model.intercept_)]})], ignore_index=True)
model_summary

From here we can interpret this as the regressor as:
$$y_{\text{total bedrooms}} = \omega_{\text{population}}X+b \\
y_{\text{total bedrooms}} = 0.321\cdot X + 80.366$$

In [None]:
preds = model.predict(X_test)
out = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': preds.flatten()})
out

In [None]:
plt.figure(figsize=(10,6))
plt.title('Predictions', fontsize=16)

plt.scatter(preds, y_test, s = 50,  alpha=0.4)
plt.xlabel('Ground Truth', fontsize=10)
plt.ylabel('Prediction', fontsize=10)

plt.show()

In [None]:
plt.figure(figsize=(10,6))
plt.scatter(X_test, y_test,  s = 50, alpha=0.5)
plt.plot(X_test, preds, color='red', linewidth=2)
plt.show()

<b>Formula: Adjusted $R^2$</b><br>
$R^2_{adj.} = 1-(1-R^2)*\frac{n-1}{n-p-1}$

Whereas: p = Predictors; n = Observations

In [None]:
def adjr2(r2,x):
    n = x.shape[0]
    p = x.shape[1]
    adjusted_r2 = 1-(1-r2)*(n-1)/(n-p-1)
    return adjusted_r2

In [None]:
MSE = metrics.mean_squared_error(y_test, preds)
RMSE = np.sqrt(MSE)
R2 = metrics.r2_score(y_test, preds)
AR2 = adjr2(R2,X_train)
model_metrics = pd.DataFrame([['MSE'],['RMSE'],['R^2'],
                              ['Adjusted R^2']],
                             columns=['Metrics'])
model_metrics['Simple Regression'] = MSE, RMSE, R2, AR2
model_metrics

### 2.2 Multiple Linear Regression
Taking linear regressors to the new level, we’ll use Multiple Linear Regression. In a multiple linear regressor, we take $n$ number of parameters or features to describe a target $y$. We can represent this as:
$$y = \omega_1X + \omega_2X + ... +\omega_nX + b$$

In [None]:
ds.plot(x='population', y='total_bedrooms', style='o', alpha=0.2)
plt.title('Population vs Total Bedrooms', fontsize=16)
plt.xlabel('Population')
plt.ylabel('Total Bedrooms')
plt.show()

In [None]:
ds.plot(x='households', y='total_bedrooms', style='o', alpha=0.2)
plt.title('Number of households vs Total bedrooms', fontsize=16)
plt.xlabel('Number of Households')
plt.ylabel('Total bedrooms')
plt.show()

In [None]:
X = pd.DataFrame(np.c_[ds['population'], ds['households']], columns=['population','households'])
y = ds['total_bedrooms']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=1)
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
model_summary = pd.DataFrame(X.columns, columns=['Features'])
model_summary['Weights Raw'] = model.coef_.reshape(2,1)
model_summary = pd.concat([model_summary, pd.DataFrame({'Features':['Intercept'], 'Weights Raw':[float(model.intercept_)]})], ignore_index=True)
model_summary

In [None]:
preds = model.predict(X_test)
out = pd.DataFrame({'Actual': y_test, 'Predicted': preds})
out

In [None]:
plt.figure(figsize=(10,6))
plt.title('Predictions', fontsize=16)

plt.scatter(y_test, preds, s = 50,  alpha=0.4)
plt.xlabel('Ground Truth')
plt.ylabel('Prediction', fontsize=10)

plt.show()

In [None]:
MSE = metrics.mean_squared_error(y_test, preds)
RMSE = np.sqrt(MSE)
R2 = metrics.r2_score(y_test, preds)
AR2 = adjr2(R2,X_train)
model_metrics['Multiple Regression'] = MSE, RMSE, R2, AR2
model_metrics

In [None]:
site1 = np.array([[5126, 1270]])
model.predict(site1)

## Part 3: Gradient Descent
Diving deeper into the Machine Learning rabbit hole, we need to discuss the fundamental technique in machine learning—Gradient Descent. Gradient descent is an optimization algorithm used to minimize functions by iteratively moving in the direction of the steepest descent as defined by the negative of the gradient. In machine learning, gradient descent is used to update the parameters of models. In this section, we'll try to apply this algorithm with the fundamental unit of a neural network—the Perceptron.

### 3.1 Perceptron Algorithm
The Perceptron was first conceptualized by Frank Rosenblatt in his paper [The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain](https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf) in 1958. The perceptron is one of the earliest mathematical conceptualization of a brain neuron. In simplest terms, a perceptron does a weighted sum of all inputs and then performs an activation. In the early implementations of the perceptron the activation used was a step function described as:
$$step(z) = \left\{
  \begin{array}\\
    1 \text{ if } \ b+ \sum w_iX_n\geq 0 \\
    0 \text{ otherwise}
    \end{array}
\right.
$$
![image](https://jontysinai.github.io/assets/article_images/2017-11-11-the-perceptron/bio-vs-MCP.png)


First let's create a dummy dataset for binary classification.

In [None]:
N = 100
m1 = np.array([-2, 0]).T
m2 = np.array([2, 0]).T
S = np.identity(2)

In [None]:
np.random.seed(0)
X_train = np.array([np.random.multivariate_normal(m1,S,int(N/2)),np.random.multivariate_normal(m2,S,int(N-(N/2)))]).T
y_train = np.array([np.ones(int(N/2)), np.zeros(int(N-(N/2)))]).reshape((-1,1))
X_train = np.concatenate((X_train[0], X_train[1]), axis=0)

In [None]:
X_train[0:5]

In [None]:
X_train.shape

In [None]:
y_train[0:5]

In [None]:
y_train.shape

In [None]:
np.random.seed(100)
M = N*0.4
X_test = np.array([np.random.multivariate_normal(m1,S,int(M/2)),np.random.multivariate_normal(m2,S,int(M-(M/2)))]).T
X_test = np.concatenate((X_test[0], X_test[1]), axis=0)
y_test = np.array([np.ones(int(M/2)), np.zeros(int(M-(M/2)))]).reshape((-1,1))

In [None]:
def visualize(X):
  plt.figure(figsize=(8,8))
  mid = int(X.shape[0]/2)
  plt.scatter(X[:mid,0], X[:mid,1], c='r', label='1')
  plt.scatter(X[mid:,0], X[mid:,1], c='b', label='0')

  plt.legend()
  plt.grid()
  plt.show()


In [None]:
visualize(X_train)

In [None]:
visualize(X_test)

In [None]:
def step_activation(z):
  """
  Compute the step activation of z

  Arguments:
  z -- A scalar or numpy array of any size.

  Return:
  filtered step activations step(z)
  """
  return np.where(z>=0, 1,0)

In [None]:
x = np.arange(-0.5,0.5,0.01)
plt.plot(x, step_activation(x))
plt.ylabel('Activation')
plt.xlabel('Input Feature')
plt.grid()
plt.show()

In [None]:
def init_weights(dim):
  """
  Does a zero-initialization of the weights and bias

  Arguments:
  dim -- Desired dimension for the weights.

  Return:
  w -- initialized weights
  b -- initilaized bias
  """
  w = np.zeros(shape=(dim,1))
  b = 0
  return w, b

In [None]:
def sum_err(preds,y):
  """
  Computes the Sum of Squared Errors for a set of predictions
  and truth values

  Arguments:
  preds -- Set of predictions.
  y -- Set of truth values

  Return:
  sse -- Sum of the squared errors
  """
  sse = np.sum(np.square(y-preds))
  return sse


In [None]:
def accuracy(preds, Y):
  """
  Computes the accuracy for a set of predictions
  and truth values

  Arguments:
  preds -- Set of predictions.
  y -- Set of truth values

  Return:
  accuracy -- Computed accuracy
  """
  accuracy = 1-np.mean(np.abs(preds-Y))
  return accuracy

In [None]:
def propagate(X,y,w,b):

  # Compute for the transformed vector of the
  # dataset w.r.t the weights and biases
  z = (X@w) + b

  # Compute for the step activation
  A = step_activation(z)

  # Compute for the prediction error
  error = A-y
  acc = accuracy(y,A)

  # Update the weights and biases
  # Learning/Update routine
  w = np.dot(X.T,error)
  b = np.sum(error)

  # Compute the cost
  cost = sum_err(A,y)

  # Store the parameters in a dictionary for tracking
  grads = {"dw": w,
           "db": b}

  return grads, cost, acc

In [None]:
w,b = init_weights(X_train.shape[1])
propagate(X_train,y_train,w,b)

In [None]:
from tqdm.notebook import tqdm
def train(w, b, X, y, lr, epochs, early_stopping=True, stop_thresh=0.9):
  costs = []
  accuracies = []

  for i in tqdm(range(epochs)):
    # Do a forward propagation to obtain the gradients
    grads, cost, accuracy = propagate(X,y,w,b)

    # Locally store the gradients
    dw=grads['dw']
    db=grads['db']

    # Update routine per epoch
    w = w - lr*dw
    b = b - lr*db

    # Store the costs per epoch for logs

    # print (f"Epoch {i}: Loss: {cost} Accuracy: {accuracy}")
    costs.append(cost)
    accuracies.append(accuracy)

    # Store the learned parameters for logs
    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}
    if early_stopping and accuracy >= stop_thresh:
      print(f"Target metric met, stopping the training at {i} epoch(s).\n")
      break

  return params, grads, costs


In [None]:
w,b = init_weights(X_train.shape[1])
learning_rate = 1
epochs = 100

params, grads, ff_costs = train(w, b, X_train, y_train,
                             lr=learning_rate, epochs=epochs,
                             early_stopping=True, stop_thresh=1.0)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

In [None]:
plt.plot(np.arange(epochs), ff_costs, 'bo-')
plt.ylabel('Training Cost')
plt.xlabel('Epoch')
plt.grid()
plt.show()

In [None]:
def predict(X, weights, bias):
  z = (X@weights)+bias
  return np.where(z>=0, 1,0)

In [None]:
weights = params["w"]
bias = params["b"]
preds = predict(X_test,weights,bias)
accuracy(y_test, preds)

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns

c_matrix = confusion_matrix(y_test, preds)
sns.heatmap(c_matrix, annot=True)
plt.xlabel("Ground Truths")
plt.ylabel("Predicted")
plt.show()

In [None]:
from sklearn.metrics import f1_score, recall_score, precision_score
print(f"F1 Score: \t{f1_score(y_test, preds)}")
print(f"Recall: \t{recall_score(y_test, preds)}")
print(f"Precision: \t{precision_score(y_test, preds)}")

In [None]:
def plot_weights(X,w,b):
  plt.figure(figsize=(10,10))
  plt.scatter(X[:int(X.shape[0]/2),0], X[:int(X.shape[0]/2),1],
              s = 50, color='blue', alpha=0.5, label=1)
  plt.scatter(X[int(X.shape[0]/2):,0], X[int(X.shape[0]/2):,1],
              s = 50, color='red', alpha=0.5, label=0)
  x_min, x_max = X[:,0].min() - 1, X[:,0].max() + 1
  linex = np.linspace(x_min, x_max)
  liney = -w[0]/w[1] * linex - b/w[1]
  plt.plot(linex, liney, label='decision bounday')
  plt.legend()
  plt.axhline(color='black')
  plt.axvline(color='black')
  plt.grid()
  plt.show()

In [None]:
plot_weights(X_train,params['w'],params['b'])

### 3.2 Gradient with Backpropagation
Although the Perceptron with the step activation produces good results producing a linear classifier, it lacks another fundamental technique for being a robust neural network model—Backpropagation. Backpropagation is a short form for "backward propagation of errors." It is a method of training artificial neural networks. This method helps to calculate the gradient of a loss function for all the weights in the network. <br>
In this section, we will use a sigmoid function as an activation function instead of a step activation. Since backpropagation will not be effective with the step function its gradient (derivative) is zero, and that will not be useful for computing the loss function.<br>
<b>Loss Function</b><br>
A loss function is the function we want to minimize or maximize is called the objective function or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function [[1]](https://www.deeplearningbook.org/contents/numerical.html). <br>
To save you the time and brainpower, for our example our loss function is:
$$J(\theta)=\frac{1}{m} \sum^m_{i}cost(h_{\theta}(x^{(i)}, y^{(i)}) \\
\text{if y = 1} : -\log{(h_\theta(x))}\\
\text{if y = 0} : -\log{(1-h_\theta(x))}\\
J(\theta)=-\frac{1}{m} \sum^m_{i}{y^{(i)}\log{(h_\theta(x))}+(1-y^{(i)})(\log{(1-h_\theta(x))}} \\
J(\theta)=-\frac{1}{m} \sum^m_{i}{Y^T\log(h)+(1-Y)^T\log(1-h)}
$$


In [None]:
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """
    s = 1 / (1 + np.exp(-z))
    return s

In [None]:
x = np.arange(-10,10,0.01)
plt.plot(x, sigmoid(x))
plt.ylabel('Activation')
plt.xlabel('Input Feature')
plt.grid()
plt.show()

In [None]:
def transfer_derivative(d):
  return d*(1.0-d)

In [None]:
def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array
    b -- bias, a scalar
    X -- data of size
    Y -- true "label" vector

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b

    """

    m = X.shape[0]
    alpha = 10**-8

    # FORWARD PROPAGATION (FROM X TO COST)
    h = sigmoid((X@w)+b)                                   # compute activation
    J = -1 / m * np.sum(Y * np.log(h+alpha) + (1-Y) * np.log((1-h)+alpha))  # compute cost
    error = (h-Y)*transfer_derivative(h)
    # BACKWARD PROPAGATION (TO FIND GRAD)
    dw = 1/m * X.T @ error
    db = 1/m * np.sum(error)

    cost = np.squeeze(J)

    grads = {"dw": dw,
             "db": db}

    return grads, cost

In [None]:
def optimize(w, b, X, Y, epochs, lr, print_cost = True):
    """
    This function optimizes w and b by running a gradient descent algorithm

    Arguments:
    w -- weights, a numpy array of size
    b -- bias, a scalar
    X -- data of shape
    Y -- true "label" vector
    epochs -- number of iterations of the optimization loop
    lr -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps

    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.

    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """

    costs = []

    for i in tqdm(range(epochs)):


        # Cost and gradient calculation
        grads, cost = propagate(w, b, X, Y)

        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]

        w = w - lr * dw
        b = b - lr * db

        # Record the costs
        costs.append(cost)


    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs

In [None]:
w,b = init_weights(X_train.shape[1])
learning_rate = 0.1
epochs = 100
params, grads, bp_costs = optimize(w, b, X_train, y_train,
                             lr=learning_rate, epochs=epochs)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

In [None]:
plt.plot(np.arange(epochs), ff_costs, 'b-')
plt.plot(np.arange(epochs), bp_costs, 'r-')

plt.ylabel('Training Cost')
plt.xlabel('Epoch')
plt.grid()
plt.show()

In [None]:
def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)

    Arguments:
    w -- weights, a numpy array of size
    b -- bias, a scalar
    X -- data of size

    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''

    A = sigmoid((X@w)+b)
    Y_prediction = np.where(A>=0.5,1,0)

    return Y_prediction

In [None]:
def model(X_train, Y_train, X_test, Y_test, num_iterations = 10, learning_rate = 0.5, print_cost = True):
    """
    Builds the logistic regression model by calling the function you've implemented previously

    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations

    Returns:
    d -- dictionary containing information about the model.
    """

    # initialize parameters with zeros
    w, b = init_weights(X_train.shape[1])

    # Gradient descent
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]

    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))


    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test,
         "Y_prediction_train" : Y_prediction_train,
         "w" : w,
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}

    return d

In [None]:
neuron_model = model(X_train, y_train, X_test, y_test, num_iterations=100, learning_rate=1)

In [None]:
c_matrix = confusion_matrix(y_test, neuron_model['Y_prediction_test'])
sns.heatmap(c_matrix, annot=True)
plt.xlabel("Ground Truths")
plt.ylabel("Predicted")
plt.show()

In [None]:
print(f"F1 Score: \t{f1_score(y_test, neuron_model['Y_prediction_test'])}")
print(f"Recall: \t{recall_score(y_test, neuron_model['Y_prediction_test'])}")
print(f"Precision: \t{precision_score(y_test, neuron_model['Y_prediction_test'])}")

In [None]:
plot_weights(X_train,neuron_model['w'],neuron_model['b'])

# Up Next: Artificial Neural Networks
![image](https://www.researchgate.net/profile/Sandra_Vieira5/publication/312205163/figure/fig1/AS:453658144972800@1485171938968/a-The-building-block-of-deep-neural-networks-artificial-neuron-or-node-Each-input-x.png)
<br><i>Image from: Vieira, Sandra & Pinaya, Walter & Mechelli, Andrea. (2017). [Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications.](https://www.researchgate.net/publication/312205163_Using_deep_learning_to_investigate_the_neuroimaging_correlates_of_psychiatric_and_neurological_disorders_Methods_and_applications) Neuroscience & Biobehavioral Reviews. 74. 10.1016/j.neubiorev.2017.01.002.