# https://blog.datumo.com/en/ai_tech/16035
- The applications of algorithms trained via images are virtually endless. images are used to train driverless cars, to detect skin cancerous cells, to identify covid 19 using lung X-rays and so on.
- TensorFlow Keras treats an image as a matrix with spatial relationships between different data points within a matrix. A grayscale image is treated as a single matrix. On other hand, a coloured image is treated as as a stacked combination of three matrices (one for each of the red, green, and blue colour channels)

Spatial Relationships:

Unlike structured data, such as tables, where swapping column locations doesn't affect the dataset, images have spatial relationships between different regions. Swapping image regions can significantly impact the image dataset.
Difficulty in Management:

Image datasets pose challenges in management due to variations in shapes (height/width) and color channels. Unlike structured data, images exhibit diverse dimensions and color representations.
Large Input Dimensions:

Directly passing images to a Deep Neural Network (DNN) leads to large input dimensions, potentially slowing down model training. Additionally, DNNs are susceptible to spatial variances between images. To address this, Convolutional Neural Networks (CNNs) are recommended as they are better suited for processing image data efficiently

Structure:

DNN (Deep Neural Network): DNNs consist of layers where each neuron is connected to all neurons in the subsequent layers (fully connected layers). They are used for general-purpose machine learning tasks.
CNN (Convolutional Neural Network): CNNs also have layers but include specialized layers like convolutional and pooling layers. These layers help extract features and preserve spatial relationships within images.
Use Case:

DNN: Often used for generic machine learning tasks, text data analysis, structured data, and simpler image tasks.
CNN: Primarily designed for handling image-related tasks like object recognition, classification, segmentation, and other computer vision applications.
Architecture:

DNN: Stacks fully connected layers, with each neuron connected to every neuron in the subsequent layer.
CNN: Employs convolutional layers that apply filters to detect features in different parts of the input image. Pooling layers reduce dimensionality, preserving the essential features.
Parameter Sharing:

DNN: Each neuron in a layer has its parameters, resulting in a high number of parameters.
CNN: Utilizes parameter sharing through the use of filters/kernels, significantly reducing the number of parameters and enabling efficient feature extraction.


### Grayscale image with dimensions 3x3
image_array = np.array([
    [100, 150, 200],
    [50, 120, 180],
    [25, 75, 100]
])

Input Layer (inp_layer):

What it does: Represents the input layer of the model.
Connection: Connects to the first convolutional layer (conv1).
Why we need it: It defines the shape of the input data, which should match the shape of a single image in the dataset.
First Convolutional Layer (conv1):

What it does: Applies convolutional operations to learn spatial hierarchies of features.
Connection: Connected to the input layer (inp_layer).
Why we need it: Captures low-level features like edges and patterns in the input images. The ReLU activation introduces non-linearity.
Max Pooling Layer (maxp1):

What it does: Performs down-sampling by selecting the maximum value from a group of values.
Connection: Connected to the first convolutional layer (conv1).
Why we need it: Reduces spatial dimensions, retaining important information while reducing computational complexity.
Second Convolutional Layer (conv2):

What it does: Another convolutional layer to capture higher-level features.
Connection: Connected to the max pooling layer (maxp1).
Why we need it: Learns more complex features and patterns as it goes deeper into the network.
Flatten Layer (flat1):

What it does: Flattens the output from the previous layer into a one-dimensional tensor.
Connection: Connected to the second convolutional layer (conv2).
Why we need it: Prepares the data for the transition from convolutional layers to dense layers.
First Dense Layer (dense1):

What it does: Fully connected layer with 256 neurons and ReLU activation.
Connection: Connected to the flatten layer (flat1).
Why we need it: Learns global patterns and combinations of features from the flattened output.
Second Dense Layer (dense2):

What it does: Another fully connected layer with 128 neurons and ReLU activation.
Connection: Connected to the first dense layer (dense1).
Why we need it: Further refines learned features for higher-level abstraction.
Output Dense Layer (out_layer):

What it does: Output layer with neurons equal to the number of unique labels, using softmax activation.
Connection: Connected to the second dense layer (dense2).
Why we need it: Produces the final probability distribution over the classes. The number of neurons matches the number of unique labels in the dataset.
Model Compilation (model_func.compile(...)):

What it does: Compiles the model with specific optimizer, loss function, and metrics.
Why we need it: Prepares the model for training by specifying how it should learn and evaluate its performance.
In summary, the convolutional layers extract hierarchical features, the flatten layer prepares the data for dense layers, and the dense layers learn global patterns. The output layer produces class probabilities for classification. Each layer contributes to the model's ability to understand and classify images.




User
what is the difference between a convulational and dense layers

The main difference between convolutional (Conv) layers and dense (fully connected) layers lies in how they process and connect neurons:

Convolutional Layers:

Spatial Hierarchies: Convolutional layers are designed to capture spatial hierarchies of features in input data, such as images. They learn local patterns (edges, textures) and progressively combine them to recognize more complex structures.
Local Receptive Fields: Neurons in a convolutional layer are connected to local regions (receptive fields) in the input data. This allows them to focus on specific areas and detect patterns regardless of their position.
Parameter Sharing: Convolutional layers use parameter sharing, meaning the same set of weights and biases is applied across different locations in the input. This significantly reduces the number of parameters compared to fully connected layers.
Translation Invariance: Due to the use of shared weights and local receptive fields, convolutional layers are partially translation-invariant, meaning they can recognize patterns regardless of their position in the input.
Dense (Fully Connected) Layers:

Global Patterns: Dense layers, on the other hand, connect each neuron to every neuron in the previous layer, capturing global patterns and relationships. They don't preserve spatial information as convolutional layers do.
Global Receptive Field: Neurons in dense layers have a global receptive field, meaning they consider the entire input. They are not specialized for recognizing local patterns but instead learn complex combinations of features.
High Parameter Count: Dense layers typically have a higher number of parameters because each neuron is connected to every neuron in the previous layer. This can lead to a higher risk of overfitting, especially in datasets with limited samples.
In summary, convolutional layers are well-suited for processing structured grid-like data (e.g., images) by capturing spatial hierarchies and sharing parameters, while dense layers are more general-purpose and learn global patterns by connecting all neurons to each other. In convolutional neural networks (CNNs), convolutional layers are often used for feature extraction, and dense layers are used for classification or regression tasks based on the learned features.



Certainly! Let's compare a convolutional neural network (CNN) with a 2-layer neural network that uses ReLU and softmax activations.

Convolutional Neural Network (CNN):
python
Copy code
from keras.layers import Input, Conv2D, Dense, Flatten, Dropout, MaxPool2D
from keras.models import Model

# Input layer
inp_layer = Input(shape=(28, 28, 1))  # Assuming input images are 28x28 pixels with 1 color channel

# Convolutional layers
conv1 = Conv2D(32, (3, 3), strides=2, activation='relu')(inp_layer)
maxp1 = MaxPool2D(2, 2)(conv1)
conv2 = Conv2D(64, (3, 3), strides=2, activation='relu')(maxp1)

# Flatten layer
flat1 = Flatten()(conv2)

# Dense layers
dense1 = Dense(256, activation='relu')(flat1)
dense2 = Dense(128, activation='relu')(dense1)

# Output layer
out_layer = Dense(10, activation='softmax')(dense2)  # Assuming 10 output classes for simplicity

# Model
model_cnn = Model(inp_layer, out_layer)
2-Layer Neural Network (ReLU and Softmax):
python
Copy code
from keras.layers import Input, Dense
from keras.models import Model

# Input layer
inp_layer = Input(shape=(28 * 28,))  # Flatten the 28x28 images into a vector

# Dense layers
dense1 = Dense(256, activation='relu')(inp_layer)
dense2 = Dense(128, activation='relu')(dense1)

# Output layer
out_layer = Dense(10, activation='softmax')(dense2)  # Assuming 10 output classes for simplicity

# Model
model_nn = Model(inp_layer, out_layer)
Comparison:
Input Layer:

CNN: Accepts images as input with dimensions (28, 28, 1).
NN: Flattens the 28x28 images into a vector of size 784.
Architecture:

CNN: Utilizes convolutional layers to capture spatial features in the input images.
NN: Consists of fully connected dense layers.
Activation Function:

Both: Use ReLU activation for intermediate layers.
Output Layer:
CNN: Applies softmax activation for multi-class classification.
NN: Applies softmax activation for multi-class classification.
Parameters:

CNN: Learns filters in convolutional layers to detect patterns in images.
NN: Learns weights in dense layers to capture relationships in flattened input.
The CNN is particularly well-suited for image-related tasks, capturing spatial hierarchies and local patterns. The convolutional layers in the CNN learn hierarchical features, while the fully connected layers capture global relationships. The 2-layer neural network, in contrast, lacks the ability to automatically learn hierarchical features from images and relies on flattened input.



Downsampling, in the context of convolutional neural networks (CNNs), refers to the process of reducing the spatial dimensions of an input volume. In the case of image data, downsampling typically involves decreasing the width and height of the input.

In the context of max-pooling, which is a common downsampling technique, a window (often 2x2 or 3x3) slides over the input feature map, and only the maximum value within each window is retained. The result is a downsampled feature map with reduced spatial dimensions.

For example, consider an input feature map of size 4x4:

csharp
Copy code
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]
Applying max-pooling with a 2x2 window and a stride of 2 (meaning the window moves 2 steps at a time) would result in:

csharp
Copy code
[6, 8]
[14, 16]
Here, the spatial dimensions are halved (from 4x4 to 2x2). Downsampling is beneficial in neural networks because it reduces the computational complexity, focuses on the most essential features, and helps prevent overfitting by creating a more generalized representation of the input.

For image data, the input should be specified in the form of (height, wight, number of colour channels)


# While using mobilenet blog and github link
https://medium.com/hackernoon/tf-serving-keras-mobilenetv2-632b8d92983c
https://github.com/malnakli/ML/blob/master/tf_serving_keras_mobilenetv2/main.ipynb

# Mobilenet webpage for support
https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet_v2/MobileNetV2