### Q) What are ANN (artificial neural networks)?

* Artificial neural networks are computational models inspired by the structure and function of the human brain. 


* They consist of interconnected nodes (neurons) organized into layers, where each neuron processes and transfers information to other neuron. 


* ANN can learn from data to perform various tasks, such as classification, regression, and pattern recognition.

### Q: What are the main components of an ANN?

The main components are input layer, hidden layers (if any), output layer, weights, biases, and activation functions. Each layer contains neurons that process information and pass it to neurons of the next layer.

### Q) What are the advantages and disadvantages of ANNs?

ANNs have several **`advantages`** over other machine learning models, including:

* They can learn complex patterns from data.


* They are able to generalize to new data.


* They can be used to solve a wide variety of problems.


**`Disadvantages`**


They can be computationally expensive to train.


They can be difficult to interpret.


They can be sensitive to noise in the data.



### Q) What are the different activation functions in an ANN?
The different activation functions in an ANN are:

**Sigmoid function**: This function is a non-linear function that is typically used in the output layer.


**ReLU function**: This function is a non-linear function that is typically used in hidden layers.


**Tanh function**: This function is a non-linear function that is typically used in hidden layers.

### Q) What is the difference between ANNs and machine learning?
Machine learning is a field of computer science that deals with the development of algorithms that can learn from data. ANNs are a type of machine learning model that is inspired by the human brain.

### Q) What are the different layers in an ANN?

The different layers in an ANN are:

**Input layer**: This layer is responsible for receiving the input data.


**Hidden layers**: These layers are responsible for processing the input data and learning patterns.


**Output layer**: This layer is responsible for generating the output data.

### Q) What are the advantages of using ANNs in comparison to traditional machine learning algorithms?

* In ANN we don't have to feature selection and and then choose from multiple models which perform better. ANN can automatically do feature selection internally.


* ANN can handle large and complex data and perform better on large data as compare to ML.


* ANN can also handle high dimensional data and their is no need of converting high dimensional data into PC's.


* We can perform Computer Vision, natural language processing, and speech recognition tasks using neural networks.

### Q) What is the difference between supervised and unsupervised learning in ANN?

Supervised learning involves providing labeled training data (input and corresponding output), while unsupervised learning deals with unlabeled data, and the network learns to find patterns or representations in the data.

### Q) How do you handle imbalanced datasets in ANN?
Techniques like oversampling, undersampling, and using class weights can be used to handle imbalanced datasets in ANN.

### Q) Can ANNs be used for regression tasks?


Yes, ANNs can be used for regression tasks by adjusting the output layer and using appropriate loss functions like mean squared error (MSE).

### Q) Batch normalization is used ANN and when with code and best practice to apply batch normalization??

Yes, batch normalization can be used in Artificial Neural Networks (ANNs). In fact, batch normalization was originally introduced for feedforward ANNs and has since been widely adopted in various neural network architectures.

**Steps**:-

1) Input data (features) are fed into the layer.


2) The layer applies a linear transformation (weights and biases) to the input data.


3) Batch Normalization is applied.


4) The output of the linear transformation is passed through an activation function, producing the activations of the layer.

**`Best Practice`**:-

**Apply Batch Normalization after Every Hidden Layer (or Almost All)**

**Avoid Batch Normalization in the Output Layer**

In [None]:
import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, input_shape=(input_dim,)),
    tf.keras.layers.BatchNormalization(), 
    tf.keras.layers.Activation('relu'),    
    
    tf.keras.layers.Dense(64),
    tf.keras.layers.BatchNormalization(), 
    tf.keras.layers.Activation('relu'),
    
    tf.keras.layers.Dense(output_dim, activation='softmax')  
])


model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs, validation_data=(x_val, y_val))

### Q) What is class weight technique to deal imbalance dataset problem?


Class weights assign higher weights to the minority class and lower weights to the majority class during the training process. By doing so, the model gives more importance to the minority class, effectively compensating for its lower representation in the dataset. This helps in making the model more sensitive to the minority class and improves its ability to correctly classify instances of the minority class.

### Q) Explain the concept of transfer learning in ANN.


Transfer learning involves using pre-trained neural network models and adapting them to new tasks. By reusing learned features, transfer learning can significantly reduce the training time and data requirements for new tasks.

### Q) Can ANNs handle missing data?

ANNs can handle missing data by imputing or replacing missing values with techniques like mean imputation or using autoencoders for feature learning.

### Q) Explain the concept of weight initialization in ANN.


Weight initialization refers to setting initial values for the weights in the network. Proper weight initialization is crucial to prevent issues like vanishing or exploding gradients during training. Common initialization methods include random initialization and Xavier/Glorot initialization.

### Q: What is the role of weights in an ANN?

Weights is the hyperparameter and represents the strength of connections between neurons. They are learnable parameters that the network adjusts during training to improve its performance.

--------

Weights are responsible for learning the patterns and relationships in the data during the training process. The weights determine the strength of the connections between neurons in different layers and play a crucial role in the overall performance and convergence of the neural network.

### Q) What is bias?



Bias is a hyperparameter parameter in neural networks that is used to adjust the weights of the network. Every neuron has its own bias, which is added to the weighted sum of the inputs to the neuron and apply activation function over it to generate output.

### Q: What is the purpose of activation functions in neurons?

* Activation functions introduce non-linearity to the output of each neuron. IIt allows the network to learn complex patterns and make decisions based on the input data. Common activation functions include ReLU, Sigmoid, and Tanh.



### Q) What is Linear Activation Function?
For regression tasks, a common choice for the output layer is the linear activation function, which means the output of the neuron is the weighted sum of the inputs without any non-linearity applied. In this case, the output directly represents the predicted continuous values.

### Q) What is the role of the activation function in the output layer?
The activation function in the output layer is selected based on the nature of the problem. For binary classification, a sigmoid function is commonly used, while for multi-class classification, a softmax function is often used and for regression problem linear activation function is used.

### Q) What is batch normalization?

Batch normalization is a technique used in neural networks to improve the training process and the performance of the model. It involves normalizing the output before applying the activation function. During training, batch normalization is applied on all mini-batches (subsets) of data.

The normalization process ensures that the activations (the sum of weights and biases) within each mini-batch have a mean of zero and a standard deviation of one. This normalization is beneficial for the training process as it stabilizes gradients and helps the model learn more effectively.

By normalizing the activations within each mini-batch, batch normalization allows the model to converge faster during training and can lead to an overall improvement in performance. It is an important technique in deep learning for enhancing the training process and achieving better results.

### Q: How does backpropagation work in ANN?

Backpropagation is an optimization algorithm used during training. It is a process in which we update the weights of neurons by calculating the value of gradients with respect to weights and updates the weights to optimize the loss.

--------


Backpropagation is an algorithm that is used to train ANNs. Backpropagation works by calculating the error between the predicted output and the actual output, and then using this error to update the weights of the ANN.

### Q: How do you prevent overfitting in an ANN?
To prevent overfitting, techniques like dropout, regularization, and early stopping can be applied. These methods help generalize the network's learning beyond the training data and improve its performance on new data.

### Q: What is the role of the learning rate in training an ANN?

The learning rate determines the step size in gradient descent during weight updates. A higher learning rate can lead to faster convergence (minimize loss faster) but may overshoot the optimal solution (miss the global minima). A lower learning rate may take longer to converge (take more time and epochs to reach at global minima) but can be more accurate.

### Q) When we don't take learning rate in neural network then what will happen?

Learning rate is a parameter which is used to specify step size by which the model's weights are updated during each iteration of the optimization process and if step size is not specify then loss will shuffle randomly and can't reaches to global minima.


### Q) What is happen we take learning rate as small, large and not specify it?


**Small learning rate**: The network will take a long time to converge, and it may not be able to converge at all.


**Large learning rate**: The network may not converge to a minimum loss value, and it may oscillate wildly.


**No learning rate**: The network will not be able to learn from the training data, and the loss function will not decrease.

### Q: What is the difference between supervised and unsupervised learning in ANN?
In supervised learning, the ANN is trained on labeled data with known outputs, while in unsupervised learning, the network learns from unlabeled data and identifies patterns without explicit output labels.

### Q: How do you choose the architecture of an ANN?
The architecture, such as the number of layers and neurons, depends on the complexity of the task, the amount of data available, and the computational resources. It is often determined through experimentation and validation.

### Q) How many neurons are present in the output layer?

* Number of nuerons in output is depends on problem statement. If we have regression and binary classification problem then only one neuron is present in output layer and when we have multi-class classification problem, number of neurons are depend on the number of categories or labels presents in dependent variable. Example: We have 8 categories, so number of neurons in output layer is 8.


* Algorithm will compute the probability for each class. The class having high probability among all that class consider as output.

### Q) What is regularization in ANN, and why is it used?

Regularization techniques, such as L1 and L2 regularization, used to reduce or prevent overfitting in neural network by adding a penalty term to the loss function during training process, which helps neural network to learn effectively. Regularization penalize large weight values into small weight values. 


* **L1 Regularization (Lasso)**: In L1 regularization, the penalty term is indeed proportional to the absolute values of the weights. Mathematically, the L1 regularization term is represented as the sum of the absolute values of all the weights in the network, multiplied by a hyperparameter (usually denoted as λ or alpha) that controls the strength of the regularization:

L1 Regularization Term = λ * Σ|w|


* **L2 Regularization (Ridge)**: In L2 regularization, the penalty term is indeed proportional to the square of the weights. The L2 regularization term is represented as the sum of the squares of all the weights in the network, multiplied by the regularization strength (λ or alpha):

L2 Regularization Term = λ * Σ(w^2)


**In code regularization is applied to hidden and output layer, same as dropout.**

### Q) What is Dropout?

Dropout is a regularization technique where neurons are randomly deactivated during training. This prevents the network from relying too much on specific neurons and improves generalization.

### Q) What is the role of the input layer in ANN?

The input layer receives and passes the input data to the subsequent layers for processing.

### Q) How do you determine the number of hidden layers in an ANN?

The number of hidden layers is typically determined by experimentation and the complexity of the problem.

### Q) What is the purpose of the output layer?

The output layer provides the final results of the ANN's computations.

### Q) How do you calculate the number of neurons in a hidden layer?

The number of neurons in a hidden layer is usually set based on the problem's complexity and heuristics.

### Q) What is early stopping?
* Early stopping is a technique that is used to prevent overfitting. Early stopping works by stopping the training of the ANN early, before it has had a chance to overfit the training data.


* Early stopping is a technique used in machine learning and deep learning to prevent overfitting and improve model generalization. It involves monitoring the performance of the model on a validation dataset during training and stopping the training process when the performance on the validation data starts to degrade.

### Q) What is gradient descent?

Gradient descent is an optimization algorithm used to minimize the cost function by adjusting the weights.

### Q) How can you prevent overfitting in an ANN?

Techniques like early stopping, dropout, and regularization can help prevent overfitting.

### Q) What are the advantages of ANNs over traditional algorithms?

ANNs can learn complex patterns and generalize well to new data, making them suitable for various tasks.

### Q) What is the role of bias in ANNs?

Bias allows ANNs to adjust the output even when input features are zero. Bias can prevent from died neuron problem.

### Q) How do you evaluate the performance of an ANN?
Performance evaluation can be done using metrics like accuracy, precision, recall, and F1-score, depending on the problem type.

### Q) How to increase accuracy of ANN Model?


1) Increase number of hidden layers and neurons in each layer but use dropout regularization to prevent from overfitting.


2) Increase number of epochs but use early stopping regularization to prevent from overfitting.