**<h2>Introduction to Neural Networks</h2>**

Neural networks are a class of machine learning algorithms designed to recognize patterns. They are composed of layers of interconnected nodes, called neurons, which are inspired by biological neurons in the human brain.


**<h2>Key Concepts</h2>**

- **Neurons:** Fundamental units in a neural network that receive input, process it, and pass it on to the next layer.
- **Layers:** Groups of neurons. Common types include:
  - **Input Layer:** The first layer that receives the input data.
  - **Hidden Layers:** Intermediate layers that process inputs from the input layer.
  - **Output Layer:** The final layer that produces the output.
- **Weights and Biases:** Parameters adjusted during training to minimize the error in the network's predictions.
- **Activation Functions:** Functions that determine the output of neurons by introducing non-linearity.



**<h2>Creating a Simple Neural Network</h2>**

Let's build a simple neural network using TensorFlow and the Keras API. We'll start by importing the necessary libraries.


**Step 1: Import Libraries**

In [2]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import BinaryCrossentropy, MeanSquaredError
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import Accuracy, MeanAbsoluteError

**Step 2: Define the Model**

We'll create a simple feedforward neural network with one hidden layer.

In [None]:
######### METHOD 1 ###########

# Define the model
model = Sequential()

# Input layer and first hidden layer
model.add(Dense(units=10, input_shape=(8,), activation='relu'))

# Output layer
model.add(Dense(units=1, activation='sigmoid'))

In [None]:
######### METHOD 2 ###########

# Define the model
input = tf.keras.Input(shape=(8,))
hidden = Dense(10, activation='relu')(input)
output = Dense(1, activation='sigmoid')(hidden)
model = tf.keras.Model(inputs=input, outputs=output)

**Activation Functions**

Activation functions introduce non-linearity into the network, enabling it to learn complex patterns. Here are some common activation functions:

- ReLU (Rectified Linear Unit): $f(x) = max(0, x)$
- Sigmoid: $f(x) = 1 / (1 + \exp(-x))$
- Tanh: $f(x) = tanh(x)$
- Softmax: Often used in the output layer for classification tasks.



**Step 3: Compile the Model**

Compilation involves specifying the loss function, the optimizer, and any metrics we want to monitor.

In [None]:
# Compile the model
model.compile(optimizer=Adam(),
              loss=BinaryCrossentropy(),
              metrics=[Accuracy()])

**Loss Functions**

Loss functions measure how well the model's predictions match the actual data. TensorFlow provides several loss functions under tf.keras.losses.
Common Loss Functions

- Mean Squared Error (MSE): Used for regression tasks.
- Binary Crossentropy: Used for binary classification tasks.
- Categorical Crossentropy (one-hot encoding) or sparse categorical crossentropy: Used for multi-class classification tasks.

**MSE example**

In [4]:
model_outputs = tf.constant([[0.1], [0.2], [0.3]])
expected_outputs = tf.constant([[0.9], [0.8], [0.7]])
print(model_outputs.shape)

(3, 1)


In [14]:
mse = MeanSquaredError(reduction="sum_over_batch_size")
mse(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=0.38666666>

In [10]:
mse = MeanSquaredError(reduction="auto")
mse(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=0.38666666>

In [7]:
mse = MeanSquaredError(reduction="none")
mse(expected_outputs, model_outputs)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.6399999 , 0.36      , 0.15999998], dtype=float32)>

In [16]:
# sum_over_batch_size = 1/N * sum
mse = MeanSquaredError(reduction="sum")
mse(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=1.16>

**Binary Crossentropy Example**

In [17]:
model_outputs = tf.constant([[2.3], [3.2], [1.7]])
expected_outputs = tf.constant([[0], [0], [1]])
print(model_outputs.shape)

(3, 1)


In [18]:
# If sigmoid is used in layer, from_logits = False

binary_crossentropy = BinaryCrossentropy(from_logits=True, reduction="sum_over_batch_size")
binary_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=1.9344283>

In [19]:
binary_crossentropy = BinaryCrossentropy(from_logits=True, reduction="none")
binary_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([2.3955455 , 3.2399533 , 0.16778602], dtype=float32)>

In [20]:
binary_crossentropy = BinaryCrossentropy(from_logits=True, reduction="sum")
binary_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=5.803285>

**Categorical Crossentropy**

In [26]:
model_outputs = tf.constant([[2.3, 2.5, 2.6], [3.2, 3.9, 2.1], [1.7, 0.9, 1.2]])
expected_outputs = tf.constant([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
print(model_outputs.shape)

(3, 3)


In [27]:
categorical_crossentropy = tf.keras.losses.CategoricalCrossentropy(from_logits=True, reduction="sum_over_batch_size")
categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=1.0005217>

In [28]:
categorical_crossentropy = tf.keras.losses.CategoricalCrossentropy(from_logits=True, reduction="none")
categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.2729189, 0.5079519, 1.2206941], dtype=float32)>

In [29]:
categorical_crossentropy = tf.keras.losses.CategoricalCrossentropy(from_logits=True, reduction="sum")
categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=3.001565>

**Sparse Categorical Crossentropy**

In [30]:
model_outputs = tf.constant([[2.3, 2.5, 2.6], [3.2, 3.9, 2.1], [1.7, 0.9, 1.2]])
expected_outputs = tf.constant([[0], [1], [2]])
print(model_outputs.shape)

(3, 3)


In [32]:
sparse_categorical_crossentropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction="sum_over_batch_size")
sparse_categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=1.0005217>

In [33]:
sparse_categorical_crossentropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction="none")
sparse_categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([1.2729189, 0.5079519, 1.2206941], dtype=float32)>

In [34]:
sparse_categorical_crossentropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction="sum")
sparse_categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=3.001565>

**Sparse Categorical Crossentropy in Machine Translation**

In [35]:
model_outputs = tf.constant([[[2.3, 2.5, 2.6], [3.3, 3.5, 2.4], [2.2, 2.2, 2.0]], [[3.2, 3.9, 2.1], [3.2, 3.9, 2.1], [3.2, 3.9, 2.1]], [[1.7, 0.9, 1.2], [1.7, 0.9, 1.2], [1.7, 0.9, 1.2]]])
expected_outputs = tf.constant([[0, 2, 1], [1, 1, 0], [2, 2, 2]])
print(model_outputs.shape)

(3, 3, 3)


In [37]:
sparse_categorical_crossentropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction="sum_over_batch_size")
sparse_categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=1.1179284>

In [36]:
sparse_categorical_crossentropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction="none")
sparse_categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1.2729189, 1.8662125, 1.0362867],
       [0.5079519, 0.5079519, 1.207952 ],
       [1.2206941, 1.2206941, 1.2206941]], dtype=float32)>

In [38]:
sparse_categorical_crossentropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction="sum")
sparse_categorical_crossentropy(expected_outputs, model_outputs)

<tf.Tensor: shape=(), dtype=float32, numpy=10.061356>

In [42]:
10.061356 / (model_outputs.shape[0]*model_outputs.shape[1])

1.1179284444444444

**Optimizers**

Optimizers adjust the weights of the neural network to minimize the loss function. TensorFlow provides various optimizers under tf.keras.optimizers.
Common Optimizers

- SGD (Stochastic Gradient Descent): Basic optimizer.
- Adam: Adaptive Moment Estimation, combines the benefits of two other extensions of stochastic gradient descent.

In [None]:
# Stochastic Gradient Descent
sgd = tf.keras.optimizers.SGD(learning_rate=0.01)

# Adam Optimizer
adam = tf.keras.optimizers.Adam(learning_rate=0.001)

**<h2>Training the Model</h2>**

Let's train the model using some dummy data. We'll generate random data for demonstration purposes.

In [None]:
import numpy as np

# Generate dummy data
X_train = np.random.random((1000, 8))
y_train = np.random.randint(2, size=(1000, 1))

# Compile the model with different metrics and loss functions
model.compile(optimizer=Adam(),
              loss=BinaryCrossentropy(),
              metrics=[Accuracy(), MeanAbsoluteError()])

# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32)