# Signature Validation AI Development Outline

---

## 1. Data Collection and Preparation

- **Data Acquisition:**
  - Obtain a dataset of signatures with labels (genuine or forged).

- **Data Preprocessing:**
  - Convert signature images into a suitable format (e.g., grayscale, standardized size).
  - Split the dataset into training and testing sets.

---

## 2. Feature Extraction

- **Image Processing Techniques:**
  - Extract features using methods like HOG, SIFT, or CNNs.
  - Normalize and preprocess extracted features.

---

## 3. Model Selection and Training

- **Choose a Model:**
  - Select CNNs.

- **Training:**
  - Train the selected model on the preprocessed training dataset.

---

## 4. Model Evaluation

- **Testing:**
  - Evaluate the model's accuracy using the testing dataset.
  - Use metrics like accuracy.

- **Fine-tuning:**
  - Adjust the model based on evaluation results.

---



###  Import Everything

In [None]:
import os
import cv2
import numpy as np
from tqdm import tqdm 
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import AdamW
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.layers import DepthwiseConv2D, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization, GlobalAveragePooling2D, Add, Input, Activation, Reshape, multiply, GlobalMaxPooling2D


### Start preprosessing the image so that the model can understand the image better

In [None]:
def preprocess_image(image_path):
    # Load the image in grayscale and resize it to (128, 128)
    img = load_img(image_path, color_mode='grayscale', target_size=(128, 128))
    
    # Convert the image to a numpy array of type 'uint8'
    img = img_to_array(img).astype('uint8')
    
    # Apply Gaussian Blur to the image to reduce noise
    blurred = cv2.GaussianBlur(img, (3, 3), 1)
    
    # Apply Canny edge detection to detect edges in the image
    edges = cv2.Canny(blurred, threshold1=70, threshold2=100)
    
    # Dilate the edges to thicken the detected edges
    kernel = np.ones((2, 2), np.uint8)
    edges = cv2.dilate(edges, kernel, iterations=2)
    
    # Invert the edges: detected edges should be black, and the rest should be white
    edges = cv2.bitwise_not(edges)
    
    # Normalize the image pixel values to be between 0 and 1
    edges = edges / 255.0
    
    # Add a channel dimension to the image
    edges = np.expand_dims(edges, axis=-1)
    
    return edges

### Explanation of Steps:

1. **Load and Resize Image:**
   - `img = load_img(image_path, color_mode='grayscale', target_size=(128, 128))`
     - Loads the image from `image_path` in grayscale mode and resizes it to a fixed size of 128x128 pixels.

2. **Convert to Numpy Array:**
   - `img = img_to_array(img).astype('uint8')`
     - Converts the image to a numpy array of type `uint8`, which is suitable for processing with OpenCV.

3. **Apply Gaussian Blur:**
   - `blurred = cv2.GaussianBlur(img, (3, 3), 1)`
     - Applies Gaussian blur with a kernel size of (3, 3) and a sigma of 1 to reduce noise in the image.

4. **Edge Detection (Canny):**
   - `edges = cv2.Canny(blurred, threshold1=70, threshold2=100)`
     - Detects edges using the Canny edge detection algorithm with thresholds set at 70 and 100.

5. **Dilate Edges:**
   - `edges = cv2.dilate(edges, kernel, iterations=2)`
     - Dilates the detected edges using a 2x2 kernel to thicken them for better feature extraction.

6. **Invert Edges:**
   - `edges = cv2.bitwise_not(edges)`
     - Inverts the edges so that detected edges are black and the background is white, which is often preferred for neural network input.

7. **Normalize Image:**
   - `edges = edges / 255.0`
     - Normalizes the pixel values of the image to be between 0 and 1, which is typical for neural network inputs.

8. **Expand Dimension:**
   - `edges = np.expand_dims(edges, axis=-1)`
     - Adds a channel dimension to the image to make it suitable for feeding into a convolutional neural network (CNN), where `-1` signifies the last axis.

This preprocessing function prepares the image for further feature extraction or directly as input to a CNN model for signature verification tasks.

### Load images from folder

In [None]:
def load_images_from_folder(folder):
    images = []  # Initialize an empty list to store processed images
    for filename in tqdm(os.listdir(folder)):  # Iterate through each file in the folder
        img_path = os.path.join(folder, filename)  # Construct the full path to the image file
        if os.path.isfile(img_path):  # Check if the path leads to a file (not a directory)
            processed_img = preprocess_image(img_path)  # Preprocess the image using a predefined function
            images.append(processed_img)  # Append the processed image to the list of images
    return np.array(images)  # Convert the list of processed images into a numpy array and return itz

### Explanation and Improvements:

1. **Iterate through Files in Folder:**
   - `for filename in tqdm(os.listdir(folder)):`  
     - Iterates through each file in the `folder` directory using `os.listdir()`. The `tqdm` wrapper is used to display a progress bar, which is helpful when processing a large number of images.

2. **Check if File Exists:**
   - `if os.path.isfile(img_path):`
     - Ensures that `img_path` points to a file and not a directory. This check prevents errors that could occur if `filename` refers to a subdirectory or invalid path.

3. **Preprocess and Append Images:**
   - `processed_img = preprocess_image(img_path)`
     - Calls the `preprocess_image` function defined earlier to preprocess each image before appending it to the `images` list.

4. **Convert to Numpy Array:**
   - `return np.array(images)`
     - Converts the list of processed images into a numpy array before returning it. This conversion is typically necessary for compatibility with neural network frameworks like Keras or TensorFlow.

### Notes:
- Make sure `preprocess_image` function is defined before `load_images_from_folder` and correctly handles image preprocessing as per your requirements.
- Consider handling exceptions that may arise during image loading or preprocessing to ensure robustness in real-world scenarios.
- Adjust the function according to specific needs, such as adding error handling, logging, or modifying preprocessing steps.

This function will load and preprocess images from a specified folder, making them ready for further use in machine learning models or other applications.

In [None]:
def create_dataset(base_path):
    X = []  # Initialize an empty list to store images
    y = []  # Initialize an empty list to store labels
    for person_id in range(1, 65):  # Assuming there are 64 persons in total
        if person_id in [5, 7, 8, 10, 11]:  # Skip specific person IDs
            continue
        
        person_id_str = str(person_id).zfill(3)  # Format person_id as 3-digit string (e.g., '001')
        
        real_folder = os.path.join(base_path, person_id_str)  # Path to genuine signatures folder
        forge_folder = os.path.join(base_path, f"{person_id_str}_forg")  # Path to forged signatures folder

        real_images = load_images_from_folder(real_folder)  # Load genuine signature images
        forge_images = load_images_from_folder(forge_folder)  # Load forged signature images

        for img in real_images:
            X.append(img)
            y.append(0)  # Label 0 for genuine signatures

        for img in forge_images:
            X.append(img)
            y.append(1)  # Label 1 for forged signatures

    X = np.array(X)  # Convert list of images to numpy array
    y = np.array(y)  # Convert list of labels to numpy array
    
    X = X / 255.0  # Normalize pixel values to [0, 1]
    X = X.reshape(-1, 128, 128, 1)  # Reshape images for CNN input (assuming 128x128 grayscale images)
    
    return X, y

base_path = 'sign_data/train'  # Base path where genuine and forged signature folders are located
X, y = create_dataset(base_path)  # Create the dataset


### Explanation with Comments:

1. **Initialize Lists for Images and Labels:**
   - `X = []` and `y = []`
     - Initialize empty lists `X` to store images and `y` to store corresponding labels.

2. **Iterate through Person IDs:**
   - `for person_id in range(1, 65):`
     - Loop through each person ID from 1 to 64, assuming each person has a folder of genuine signatures (`person_id`) and a folder of forged signatures (`person_id_forg`).

3. **Skip Specific Person IDs:**
   - `if person_id in [5, 7, 8, 10, 11]: continue`
     - Skip specific person IDs (5, 7, 8, 10, 11) as indicated in the list.

4. **Format Person ID:**
   - `person_id_str = str(person_id).zfill(3)`
     - Format the person ID as a 3-digit string (e.g., '001', '002') using `.zfill(3)`.

5. **Construct Paths to Signature Folders:**
   - `real_folder = os.path.join(base_path, person_id_str)`
     - Construct the path to the folder containing genuine signatures for the current person.
   - `forge_folder = os.path.join(base_path, f"{person_id_str}_forg")`
     - Construct the path to the folder containing forged signatures for the current person.

6. **Load Images from Folders:**
   - `real_images = load_images_from_folder(real_folder)`
     - Load genuine signature images from `real_folder` using the `load_images_from_folder` function.
   - `forge_images = load_images_from_folder(forge_folder)`
     - Load forged signature images from `forge_folder` using the `load_images_from_folder` function.

7. **Label Assignment:**
   - For each image loaded:
     - Append the image (`img`) to `X`.
     - Append the corresponding label (`0` for genuine, `1` for forged) to `y`.

8. **Convert to Numpy Arrays:**
   - `X = np.array(X)` and `y = np.array(y)`
     - Convert lists `X` and `y` into numpy arrays for efficient handling in machine learning frameworks.

9. **Normalization and Reshaping:**
   - `X = X / 255.0`
     - Normalize pixel values of images to the range [0, 1].
   - `X = X.reshape(-1, 128, 128, 1)`
     - Reshape the images for CNN input. Assuming images are grayscale and resized to 128x128 pixels, `-1` indicates that the first dimension (number of samples) is inferred based on the other dimensions.

10. **Return Dataset:**
    - `return X, y`
      - Return the preprocessed images (`X`) and corresponding labels (`y`) as the final dataset.

### Additional Notes:

- **Progress Bar (tqdm):** The `tqdm` library is used optionally to provide a progress bar during the image loading process, which can be helpful for monitoring progress, especially with a large number of images.

- **Error Handling:** Consider adding error handling within the `create_dataset` function or within the `load_images_from_folder` function to manage potential errors that may occur during image loading or preprocessing.

- **Adjustments:** Depending on specific requirements (e.g., different image sizes, additional preprocessing steps), you may need to modify the function accordingly.

This function is designed to create a dataset suitable for training a machine learning model for signature verification, combining genuine and forged signature images with appropriate labels. Adjust paths and parameters as per your dataset structure and project needs.

In [None]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

### Just Display some sample images to see how your image preprosessing works

In [None]:

# Display a single image before and after preprocessing
def display_image_before_after(file_path):
    img = cv2.imread(file_path, cv2.IMREAD_GRAYSCALE)
    img_resized = cv2.resize(img, (128, 128))
    processed_img = preprocess_image(file_path)
    
    plt.figure(figsize=(10, 5))
    plt.subplot(1, 2, 1)
    plt.title('Original Image')
    plt.imshow(img_resized, cmap='gray')
    plt.subplot(1, 2, 2)
    plt.title('Processed Image')
    plt.imshow(processed_img, cmap='gray')
    plt.show()

# Example image display
example_image_path = 'WhatsApp Image 2024-07-10 at 10.49.40 PM.jpeg'  # Provide a valid image path here
display_image_before_after(example_image_path)
# Example image display
example_image_path = 'img1.jpeg'  # Provide a valid image path here
display_image_before_after(example_image_path)
# Example image display
example_image_path = 'sign_data/test/049/08_049.png'  # Provide a valid image path here
display_image_before_after(example_image_path)# Example image display
example_image_path = 'sign_data/train/001_forg/0201001_01.png'  # Provide a valid image path here
display_image_before_after(example_image_path)

## Now the Actual AI development Starts

In [None]:
def squeeze_excite_block(input, ratio=16):
    init = input
    channel_axis = -1
    filters = init.shape[channel_axis]

    se = GlobalAveragePooling2D()(init)
    se = Reshape((1, 1, filters))(se)
    se = Dense(filters // ratio, activation='relu', kernel_initializer='he_normal', use_bias=False)(se)
    se = Dense(filters, activation='sigmoid', kernel_initializer='he_normal', use_bias=False)(se)

    x = multiply([init, se])
    return x

def depthwise_separable_conv_block(x, filters, kernel_size=(3, 3), stride=1):
    x = DepthwiseConv2D(kernel_size, padding='same', strides=stride, activation='relu')(x)
    x = Conv2D(filters, (1, 1), activation='relu', padding='same')(x)
    x = BatchNormalization()(x)
    x = squeeze_excite_block(x)
    return x

def build_smaller_model(input_shape=(128, 128, 1)):
    inputs = Input(shape=input_shape)
    
    # Initial convolutional block
    x = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2, 2))(x)
    
    # Depthwise separable convolution blocks
    x = depthwise_separable_conv_block(x, 64)
    x = MaxPooling2D((2, 2))(x)
    
    x = depthwise_separable_conv_block(x, 128)
    x = MaxPooling2D((2, 2))(x)
    
    x = depthwise_separable_conv_block(x, 256)
    x = MaxPooling2D((2, 2))(x)
    
    x = depthwise_separable_conv_block(x, 512)
    x = MaxPooling2D((2, 2))(x)

    x = depthwise_separable_conv_block(x, 1024)
    x = MaxPooling2D((2, 2))(x)

    # Global Average Pooling and final dense layers
    x = GlobalAveragePooling2D()(x)
    x = Dropout(0.2)(x)
    outputs = Dense(1, activation='sigmoid')(x)
    
    model = Model(inputs, outputs)
    
    optimizer = AdamW(learning_rate=0.0001)
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

def lr_schedule(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return float(lr * tf.math.exp(-0.1))

model = build_smaller_model()
# Callbacks
callbacks = [
    LearningRateScheduler(lr_schedule),
]

# Summary of the model
model.summary()

## Explanation with Comments:

1. **Squeeze and Excite Block (`squeeze_excite_block`):**
   - Implements a squeeze and excite block, which enhances feature representation by focusing on important features and suppressing less relevant ones.
   - `GlobalAveragePooling2D()` computes the average value for each channel across the spatial dimensions.
   - `Reshape((1, 1, filters))(se)` reshapes the pooled features to prepare them for scaling.
   - Two `Dense` layers with `relu` and `sigmoid` activations respectively adjust the importance of features across channels.

2. **Depthwise Separable Convolution Block (`depthwise_separable_conv_block`):**
   - Utilizes depthwise separable convolution, which reduces computational cost and model size while maintaining performance.
   - `DepthwiseConv2D` performs convolution independently over each channel of input, followed by a `Conv2D` for combining outputs.
   - `BatchNormalization` ensures stable training by normalizing inputs to each layer.
   - `squeeze_excite_block` enhances the feature representation learned by convolution layers.

3. **Model Architecture (`build_smaller_model`):**
   - Begins with a standard convolutional layer followed by batch normalization and max pooling.
   - Sequentially applies multiple depthwise separable convolution blocks, each followed by max pooling to downsample feature maps.
   - Concludes with global average pooling to reduce spatial dimensions while maintaining channel information.
   - Includes dropout regularization to prevent overfitting and a final dense layer with sigmoid activation for binary classification (genuine vs. forged signatures).

4. **Optimizer and Compilation:**
   - Uses `AdamW` optimizer with a specified learning rate (`0.0001`).
   - Compiles the model with binary cross-entropy loss (suitable for binary classification tasks) and accuracy metric.

5. **Learning Rate Scheduler (`lr_schedule`):**
   - Adjusts the learning rate during training epochs.
   - Maintains the initial learning rate for the first 10 epochs and decays it exponentially thereafter.

6. **Callbacks:**
   - `LearningRateScheduler` is a callback that adjusts the learning rate according to the defined `lr_schedule` function during training.

7. **Model Summary:**
   - `model.summary()` prints a concise summary of the model architecture, displaying layer types, output shapes, and number of parameters.

### Additional Considerations:

- **Data Input:** Ensure that input data (`X`) matches the expected shape (`input_shape`) defined in `build_smaller_model`.
- **Training and Evaluation:** Implement the training loop using appropriate training data (`X_train, y_train`) and evaluation metrics to monitor model performance.
- **Hyperparameters:** Fine-tune hyperparameters such as learning rate, dropout rate, and model depth based on validation performance.
- **Deployment:** Consider requirements for deployment, such as model serialization (`model.save`) and inference implementation (`model.predict`).

This structured approach to building and training the model ensures scalability, efficiency, and effectiveness in solving the signature verification task. Adjust the architecture and parameters based on specific data characteristics to achieve optimal results.

In [None]:
# Train the model
history = model.fit(X_train, y_train, batch_size=32, epochs=50, validation_data=(X_test, y_test), callbacks=callbacks)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc}")

### Explanation:

1. **Training (`model.fit`):**
   - `model.fit(X_train, y_train, batch_size=32, epochs=50, validation_data=(X_test, y_test), callbacks=callbacks)`
     - Trains the model on training data (`X_train`, `y_train`) for 50 epochs with a batch size of 32.
     - Uses validation data (`X_test`, `y_test`) to evaluate the model's performance after each epoch.
     - `callbacks` include the `LearningRateScheduler` defined earlier, which adjusts the learning rate during training epochs.

2. **Evaluation (`model.evaluate`):**
   - `test_loss, test_acc = model.evaluate(X_test, y_test)`
     - Evaluates the trained model on the test data (`X_test`, `y_test`).
     - Computes and returns the test loss (`test_loss`) and test accuracy (`test_acc`).

3. **Printing Test Accuracy:**
   - `print(f"Test Accuracy: {test_acc}")`
     - Prints the test accuracy after evaluating the model on the test set.

### Additional Considerations:

- **Data Preparation:** Ensure `X_train`, `y_train`, `X_test`, and `y_test` are properly prepared and normalized as required by the model.
  
- **Callbacks:** Adjust callbacks (`LearningRateScheduler`, etc.) based on training requirements and model performance.

- **Hyperparameters:** Fine-tune hyperparameters such as batch size, epochs, learning rate, and model architecture based on validation results to achieve optimal performance.

- **Monitoring Training Progress:** Utilize `history` object returned by `model.fit` to visualize training metrics (e.g., loss, accuracy) over epochs for both training and validation sets.

- **Model Saving:** After training, consider saving the trained model (`model.save`) for future deployment or further analysis.

By following these steps and considerations, you can effectively train and evaluate your signature verification model, ensuring robustness and accuracy in detecting genuine and forged signatures. Adjustments and optimizations can be made iteratively based on validation performance and specific project requirements.

## Now Development is done. To Use the model in future we use this method...

In [None]:
# Save the model
model.save('enhanced_signature_verification_model3_4.keras')

#### Keep in mind that there are several formate you can save the model but here the best way isto use the .keras formate

### Now the Actual job begins

#### the task is to take real world data and test the models performance 

In [None]:
from tensorflow.keras.models import load_model
import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.image import load_img
models=[]
models.append(load_model("enhanced_signature_verification_model3_4.keras"))### Load the model

In [None]:
import numpy as np

def preprocess_image1(image_path):
    # Preprocess the image using existing preprocess_image function
    processed_img = preprocess_image(image_path)
    
    # Normalize pixel values to [0, 1]
    processed_img = processed_img / 255.0
    
    # Reshape the image to match model input shape
    processed_img = processed_img.reshape(1, 128, 128, 1)
    
    return processed_img

def predict_signature(model, real_signature_path, test_signature_path):
    # Preprocess real and test signatures
    real_img = preprocess_image1(real_signature_path)
    test_img = preprocess_image1(test_signature_path)
    
    # Predict probabilities for real and test signatures
    real_pred = model.predict(real_img)
    test_pred = model.predict(test_img)
    
    # Calculate absolute difference in predictions
    difference = np.abs(real_pred - test_pred)
    
    # Scale the difference for better interpretation
    difference_scaled = difference * 100
    
    return difference_scaled


In [None]:
# Example usage
real_signature_path = 'img1.jpeg'  # Provide a valid image path of real image
test_signature_path = 'img2.jpeg'  # Provide a valid image path of test image

In [None]:
for i,model in enumerate(models):
    difference = predict_signature(model, real_signature_path, test_signature_path) # use the model to get the test result
    print(f"Difference: {difference} by model{i+1}\n\n")

## 🚀 Development and testing are complete! You can now integrate it into real-world websites and apps for signature testing.
# Happy coding! 🎉