# Task proposal


Concepts and calculations applied to neural networks and how they learn, from calculating model parameters to evaluation methods.


Course: https://www.udemy.com/course/deep-learning-com-python-az-curso-completo/?couponCode=ST7MT41824

# Calculations in neural networks (section 3)


The first layer of a network is formed by neurons that represent and describe the problem's input, where each neuron present is associated with a weight, connecting all of this layer to the next neuron. Upon arrival, all input values are multiplied by their respective weights and all of this is added together. At the end, the sum value is passed through the activation function, which will produce an output for the given neuron.


Neural network applications are concerned with finding the most appropriate weights for each neuron in order to increase the effectiveness of the model.


The simplest model is the one-layer perceptron. Its use is most common in linearly separable problems, which does not represent the majority of real problems. For this, there are neural networks with a greater number of layers.


The video shows an example of a model that must learn to predict values of the logical operator xor, where the first time the model tries to find an answer, all its weights are placed randomly, causing it to perform poorly.


To measure how well a network learns, one of the ways is:


error = correct answer - calculated answer


# Gradient


Considering the representation of the error as a function that is altered as the weights of the network change, that is, it receives the values of the weights and associates an error with them, the image of this function can be conceptually understood as a field of valleys and mountains, In the valleys, smaller errors are present, meaning the best weights are also present.


The Gradient searches for paths in the image of this function in order to find the smallest possible point.


So, in short, the gradient:


- Try to find the best combination of weights
- Shows how much to adjust the weights


##Delta


The calculation of the gradient is done through the derivative of the activation function, where it is first calculated using the error measured from the first iteration of the model. This shows what step the model should take to improve.


In the end, for each different input, the derivative of the neuron's activation function is considered, something that I understand as how the neuron is prone to change times its own output weight, this multiplied by the output Delta which tells where the model must walk to reduce the error.


## Back propagation


This way, the new values for each weight are calculated backwards through the layers, with the Delta output calculating the direction in which the model should move, and this value influences the entire network as new weights are calculated.


# Bias


These are extra values placed at both the input and output, changing the result based on the bias unit.


#MSE and RMSE


The above way of calculating the error is not the most suitable for real models, for this there are Mean Square Error (MSR) and Root Mean Square Error (RMSE).


##MSE


Calculated in this way with the previous error, however, the old result is squared, which is also later added and averaged.


The act of squaring translates into a greater error penalty for the model.


## RMSE


Identical to MSE, however, at the end, the root of the calculated average is taken.


# Stochastic gradient


Unlike previous learning that recalculated the weights of each input, the stochastic gradient will update the weights based on blocks of the dataset input, calculating the overall error for each block and updating its weight.


This approach helps prevent local minima and is faster because it does not need to load all the data into memory.


# Mini batch gradient descent


In order to mix the concepts of both gradients presented, it will choose a register number to run and update the weights.


# Parameters


Learning rate = learning rate

Batch size = batch size

Epochs = epochs


# Activation functions


## Step function


It only returns the value 0 or 1, and is therefore used in problems with linear solutions. Not used in more complex problems.


## Sigmoid


Returns values between 0 and 1. Widely used in probability return and binary problems (two classes).


## Hyperbolic tangent


Returns values between -1 and 1. Also used for binary classification and allows greater consideration of negative numbers.


##ReLU


Returns 0 or values greater than 0, without a maximum value.


##Linear


It just returns the input value. Used in regression problems.


##Softmax


Returns probability for problems larger than two classes.


# Example breast cancer (section 4)


One of the cases where it is possible to use networks is in the case of binary classification in breast cancer, predicting whether there is cancer or not.


The data is taken from UCI, a data repository for machine learning.


Present in the data are attributes such as radius, texture, perimeter, area, smoothness, among others.


The course uses separation of training and test data.


## Keras


The Keras library is used in the course, which provides thecreation of deep learning models with few lines. Matters relating to it are consulted in its documentation.


Firstly, the Sequential model is imported from keras.models, meaning that, when creating this model, each layer must be added.


Specifically for creating the layer, it is done using a Dense object from the layers part of keras, where data such as the number of neurons, the activation function, the initialization of the weights and the dimension are provided to create this object. of the entrance.


NOTE: The Dense layer connects all neurons in a layer with all neurons in the subsequent layer.


The last layer must contextually match the given problem.


After the network is created, it is compiled using parameters such as optimizer, loss function, and metric.


Finally, the network is trained using the fit method in the classifier.


## Neural network predictions and evaluation


With a trained network, the predict method will take the test data into account and predict it, so that it can then be evaluated.


In the course, the methods used are the confusion matrix and accuracy_score, both from sklearn.metrics.


The classifier can also be evaluated using the evaluate method.


## Creating an optimizer


Keras optimizers are initialized as an object, having a learning rate, decay (how much the learning rate decreases) and clipvalue (limits gradient step size)


## Cross validation


As previously studied in this bootcamp, the same technique as cross-validation can be used in the neural network.


Validation is done using KerasClassifier together with cross_val_score.


## Dropout


As a way of preventing overfitting, the dropout will reset some neurons in the input layer, that is, part of the input is disregarded.


The dropout is implemented after adding a layer, causing the effect to be created on it.


Default dropout should be considered when implementing a neural network, as the high number of parameters when creating a network can easily lead to overfitting.


## Tuning


Unlike the process done so far of passing single parameters to a network, the GridSearch method will receive possible parameters that the model can receive, and from that, make different combinations that will have different accuracies in order to find the model that best performed, thus finding good parameters.


## Model in production


Once the model has been trained and tuned, it can be saved in a json file so that the same model can be used in the future or by someone else.


NOTE: The weights are saved separately in a .h5 file


# Iris multiclass classification (section 5)


The iris database consists of determining which flower is being analyzed based on attributes such as the size of the petal, sepal and others.


For this problem, there are 3 possible classes of flowers, thus making it a multiclass problem.


These three classes translate into adding only 3 neurons to the last layer of the network, each one representing a classification. NOTE: The activation function to be used must be softmax


To compile the network, the loss function and metric are changed to satisfy the multiclass context.


For the network to be able to understand the dataset's output data, it must be ensured that it is numeric, applying transformations if necessary, in cases such as the course that came as strings. (LabelEnconder is used in the course to solve this problem)


Furthermore, the data format must have the number of dimensions equal to the number of neurons in the last layer, so that the network can compare results. (np.utils from keras.utils is used for this)


The formats are undone so that the model can be analyzed using the confusion matrix.


The cross-validation process is also applied, but this time the mean and standard deviation of the results are used for better analysis.


# Used car base (section 6)


The objective is to determine the price of the vehicle based on attributes that describe it. Unlike the problems seen before, which were classification problems, this problem requires a real value as an answer, thus being a regression problem.


Firstly, the dataset is analyzed and those columns that have no relation to the final value of the car are excluded.


The value counts are also used to extract information about the columns, and once those that have a data imbalance in the value counts for that attribute are found, they are excluded.


The same LabelEncoder process is used to handle non-numeric data.


## one hot encoder


Where before there was categorical data from the LabelEncoder, that is, a single column having values such as 1,2,3 and so on. Now a number of columns equal to the number of possible classes of this attribute will be created. Thus, for a class, only one column corresponding to it will receive the value 1, whilehow many other columns receive 0


## Neural network creation


The network is created with the loss function and metrics such as mean_absolute_error since it is a regression problem.


The last layer of the network has only one neuron, which obtains the linear activation function.


# Video game database (section 7)


Unlike the last task, this time the process will still be a regression problem, with the objective of predicting the value of a game, however, there will be different outputs, predicting the value of a game for different countries.


Just like the last example, columns that don't relate to the game value are excluded.


Predictive data is separated from desired outputs.


String data is processed with LabelEnconder and OneHotEncodar as in the car example in section 6.


Unlike the previous examples, this network will not use the Sequential model. You will use the Input object, which is related to layers of the Dense type.


Through this relationship, 3 independent neurons representing the predicted value of 3 countries are connected to the last hidden layer. In the end, these three neurons are passed to the model as output.