# Signature Detection by Reading Pixels

* Author: docai-incubator@google.com


## Disclaimer

This tool is not supported by the Google engineering team or product team. It is provided and supported on a best-effort basis by the DocAI Incubator Team. No guarantees of performance are implied.

## Purpose and Description
This documentation outlines the procedure for detecting the signature in the document by taking normalized bounding box coordinates of signature location.
While using this code, the user needs to set two values while calling the function a) BlankLine Pixel count b)  Signature Pixel Count (only for the black pixels).


## Prerequisites

1. Access to vertex AI Notebook or Google Colab
2. Python
3. Python Libraries like cv2, PIL, base64, io, numpy etc.

## Step by Step procedure 

### 1. Import the required Libraries

In [None]:
%pip install google.cloud

In [None]:
!wget https://raw.githubusercontent.com/GoogleCloudPlatform/document-ai-samples/main/incubator-tools/best-practices/utilities/utilities.py

In [None]:
import io
from io import BytesIO
from typing import Dict
import os
import base64
from PIL import Image
import numpy as np
import json
import cv2
import PIL
from utilities import documentai_json_proto_downloader, file_names
from google.cloud import documentai_v1beta3 as documentai

### 2. Input details

<ul>
    <li><b>input_path :</b> GCS Storage name. It should contain DocAI processed output json files. This bucket is used for processing input files in the folders.</li>
    <li><b>normalized_vertices: </b> 4 coordinates of the signature entity where it is expected to be present.
    <li><b>page_number: </b>Page Number where the signature is expected to be present
    <li><b>blank_line_pixel_count: </b>Count of the total black pixel (considering the image is a binary image(black & white)) of the blank signature field.
    <li><b>signature_threshold_pixel_count: </b>Threshold Count of the total black pixel (considering the image is a binary image(black & white)) signature field having signature in it.
</ul>

In [None]:
def signature_detection(
    document_proto: documentai.Document,
    normalized_vertices: Dict,
    page_number: int,
    blank_line_pixel_count: int = 600,
    signature_threshold_pixel_count: int = 1000,
) -> bool:
    """
    Detects signatures within a document.

    Args:
        document_proto (documentai.Document): Document AI proto object.
        normalized_vertices (Dict): Normalized vertices containing bounding box information.
        page_number (int): Page number to process.
        blank_line_pixel_count (int): Threshold for considering a line as blank. Default is 600.
        signature_threshold_pixel_count (int): Threshold for considering a signature. Default is 1000.

    Returns:
        bool: True if signature is detected, False otherwise.
    """

    bounding_box = normalized_vertices

    # Getting the height & width of the page
    img_height = document_proto.pages[page_number].image.height
    img_width = document_proto.pages[page_number].image.width

    x = [i["x"] for i in bounding_box]
    y = [i["y"] for i in bounding_box]

    left = min(x) * img_width - 1
    top = min(y) * img_height - 2
    right = max(x) * img_width + 5
    bottom = max(y) * img_height + 17

    # Setting up the bounding box coordinates to crop the image to only the signature part.
    bounding_box_coordinates = (left, top, right, bottom)

    # Fetching the Image data which is in base64 encoded format
    content = document_proto.pages[page_number].image.content

    image = Image.open(io.BytesIO(content))

    # Cropping the image where signature is present
    cropped_image = image.crop(bounding_box_coordinates)

    # Saving cropped image
    cropped_image.save("cropped.jpeg")

    cropped_img = cv2.imread("cropped.jpeg", 0)  # Read image as grayscale

    # Apply binary thresholding
    _, cropped_bw_image = cv2.threshold(cropped_img, 127, 255, cv2.THRESH_BINARY)

    # Count the total black & white pixels in the cropped image part
    pixel_value, occurrence = np.unique(cropped_bw_image, return_counts=True)
    pixel_counts = dict(zip(pixel_value, occurrence))

    cropped_black_pixel = pixel_counts.get(0, 0)  # Count of black pixels

    # Logic to determine if the cropped part contains a signature
    os.remove("cropped.jpeg")
    if (
        cropped_black_pixel > blank_line_pixel_count
        and cropped_black_pixel > signature_threshold_pixel_count
    ):
        print("Signature Detected")
        return True
    else:
        print("No Signature Detected")
        return False

In [None]:
# INPUT : Storage bucket name
INPUT_PATH = "gs://{bucket_name}/{folder_path}/{file_name}.json"
normalized_vertices = [
    {"x": 0.33105803, "y": 0.7846154},
    {"x": 0.41695109, "y": 0.7846154},
    {"x": 0.41695109, "y": 0.80000001},
    {"x": 0.33105803, "y": 0.80000001},
]
input_bucket_name = INPUT_PATH.split("/")[2]
path_parts = INPUT_PATH.split("/")[3:]
file_name = "/".join(path_parts)
file_name = INPUT_PATH[len(input_bucket_name) + 6 :]
document_proto = documentai_json_proto_downloader(input_bucket_name, file_name)
# Function Calling
signature_detection(document_proto, normalized_vertices, 0)

### 3.Output

Upon the above code execution, it will prompt whether the given image with bounding box coordinates is having the signature in it or not. 

<table>
    <tr>
        <td><h3><b>Input Json Image</b></h3></td>
        <td><h3><b>Output</b></h3></td>
    </tr>
<tr>
<td><img src="./images/image_input.png"></td>
<td><img src="./images/image_output.png"></td>
</tr>
</table>

