# Bank Statement Post-Processing Guide

* Author: docai-incubator@google.com

## Disclaimer

This tool is not supported by the Google engineering team or product team. It is provided and supported on a best-effort basis by the DocAI Incubator Team. No guarantees of performance are implied.

## Purpose and Description

<p><span>This tool is designed to take bank statements from Google Cloud Storage (GCS) and parse them via a DocAI bank statement processor and post process the response from the parser (post processing as per gatless requirement in </span><span><a href="https://www.google.com/url?q=http://go/gateless-bank-statement-parser-project-specs&amp;sa=D&amp;source=editors&amp;ust=1704297767371296&amp;usg=AOvVaw3e5p8Z7cQPeaJM3pJ6G1EU">project specs</a></span><span>) then provide the output in json format.</span></p>
      <p><span></span></p>
      <p><span>Below are the steps the tool will follow</span></p>
      <p><span></span></p>
      <ol  start="1">
         <li><span>Bank statements are parsed through the bank statement processor.</span></li>
         <li ><span>Post-processing the response from the bank statement processor and saving the result in json format in the bucket.</span></li>
         <li ><span><a>This scipt</a></span><span>&nbsp;includes the option to parse checks for the top three banks, where it is required to train a CDE model. </span></li>
         <li ><span>Top Three Banks: WellsFargo, Bank Of America, Chase</span></li>
      </ol>
      <p><span></span></p>
      <img alt="" src="images/image2.png" style="width: 1020.00px; height: 92.67px; " title=""></span></h1>
      <p><span style="padding-left:25%"><b>Bank Statement Parser working chart</b></span><span>&nbsp;</span></p>
<div style="padding-bottom:30px"></div>

# Installation Guide
This Bank Statement Parser is in a Python notebook script format which can be used in <b>Vertex AI JupyterLab Environment </b>. First, put the Bank Statement script in JupyterLab and then, put all the reference documents in a specific folder . Also, create or use an existing folder as an output folder. 

# step 1 : Installing modules

In [None]:
%pip install deepparse
%pip install google.cloud
%pip install dataclasses
%pip install difflib
%pip install pandas
%pip install numpy

In [None]:
!wget https://raw.githubusercontent.com/GoogleCloudPlatform/document-ai-samples/main/incubator-tools/best-practices/utilities/utilities.py

# Step by Step procedure
<div style="background-color:#f5f569"><b>NOTE:</b> Stepwise status can be seen in “logging.txt”</div><br>

 <img alt="" src="images/image5.png" style="width: 817.50px; height: 553.35px; " title="">
      <p><span style="padding-left:25%"><b>Bank Statement Parser logging.txt</b></span></p>
   <h3> <b>Step 1:</b> Create a <span>Bank Statement Parser from Processor Gallery (Workbench).</span></h3><br>
   <h4> Step 1.2:<h4> Make pretrained-bankstatement-v3.0-2022-05-16<span>&nbsp;as a default Processor Version.


<h3><b>Step 3 : Input Parameters </b></h3><span> &nbsp; Fill the details in for </span><span>Project</span><span>&nbsp;and GCS folder </span><span>e</span><span>nter the path of the </span><span>input files</span><span>&nbsp;in the Code.</span></h3>
     <img alt="" src="./images/image4.png" style="width: 720.00px; height: 138.67px; margin-left: 0.00px" title="">
      <ul >
         <li ><span style="font-weight:800">Project name:</span><span>&nbsp;Enter the google cloud project name</span></li>
         <li ><span style="font-weight:800">Project_Id</span><span>: Enter the google cloud project id</span></li>
         <li ><span style="font-weight:800">Processor_Id:</span><span>&nbsp;Enter the bank statement processor id</span></li>
         <li ><span style="font-weight:800">gcs_input_dir: </span><span>Enter the path of files which have to be parsed</span></li>
         <li ><span style="font-weight:800">gcs_output_dir: </span><span>Enter the path of files where you want to save the output jsons after processing the files to bank statement parser</span></li>
         <li ><span style="font-weight:800">gcs_new_output_json_path: </span><span>Enter the path where the post processed json files have to be saved</span></li>
      </ul>
      <p><span></span></p>
      <p><span>If Checks has to be parsed then fill the below details of CDE Trained processor</span></p>
      <p><span></span></p>
      <ul >
         <li ><span style="font-weight:800">checksFlag</span><span>: </span><span>True</span></li>
      </ul>
      <p><span>Update the checksFlag as True if you need checks to be parsed thru cde trained parser else it can be marked as </span><span>False</span></p>
      <p><span>Fill the below details only &nbsp;the checksFlag is TRUE else not needed</span></p>
      <p><span></span></p>
      <ul >
         <li ><span style="font-weight:800">Processor_id_checks: </span><span>Enter the CDE trained processor id</span></li>
         <li ><span style="font-weight:800">Processor_version_checks: </span><span>Enter the CDE trained processor version</span></li>
      </ul>
<div style="padding-bottom:30px"></div>
    

In [3]:
# input details
project_name = "xxxxxxx"  # project name
project_id = "xxxxxxxx"  # project number
processor_id = "xxxxxxxxxx"  # processor id
gcs_input_dir = "gs://xxxxxxx/xxxxxxx/xxxxxxxx/input_pdfs"  # input documents path
gcs_output_dir = "gs://xxxxxxx/xxxxxxx/xxxxxxxxxx/processor_output"  # output documents for async parsing, suggested to use a diff bucket than ‘gcs_input_dir’
gcs_new_output_json_path = "gs://xxxxxx/xxxxxx/xxxxxx/pp_output/"  # post process json path, , suggested to use a diff bucket than ‘gcs_input_dir’
### To Parse Checks Table Items, Please Train a CDE Model , and provide CDE processorID and processorVersionID below, and set checksFlag=True
checksFlag = (
    True  # Checks Flag, if True, It will use CDE Model to parse the checks table
)
processor_id_checks = "xxxxxxx"  # CDE processor_id
processor_version_checks = "xxxxxx"  # CDE processor_version_id


<h3 ><b>Step 4 :</b></h3><p><span> Processing and Post processing the documents[run these cells without editing anything]</span></p>
<img alt="" src="images/image1.png" style="width: 861.50px; height: 580.27px;" title=""><br>
      <p><span>If consolidated CSV is needed please uncomment the below area for CSV generation.</span></p>
      <img alt="" src="images/image10.png" style="width: 1020.00px; height: 202.00px;" title="">
<div style="padding-bottom:30px"></div>
    
    
 <p><span>Cheques Entity Detection:(currently considered only for top 3 banks)</span></p>
      <p><span></span></p>
      <ol start="1">
         <li ><span>Trained a CDE model for cheque entity detection , which further needs to be post processed and combined with the bank statement parser post processed json file.</span></li>
      </ol>
      <p><span></span></p>
      <ol start="2">
         <li ><span>We has to train a CDE model as below: [</span><span>Refer</span><span><a href="https://www.google.com/url?q=https://cloud.google.com/document-ai/docs/workbench/build-custom-processor&amp;sa=D&amp;source=editors&amp;ust=1704297767379409&amp;usg=AOvVaw02t9j3qCz7KaDInejsTV3s">&nbsp;DocAI Workbench CDE Guide</a></span><span>&nbsp;to setup a CDE processor</span><span>]</span></li>
      </ol>
      <ul >
         <li ><span>Create a CDE parser with the below schema</span></li>
         <li ><span>Select and Label documents which have Check details. Use the below convention and train the processor using those documents. Example Training instructions (illustration) given below.</span></li>
      </ul>
      <p><span style="background-color:#f5f569">Incubator Implementation Notes:</span></p>
      <ul style="padding-left:50px">
         <li ><span>For Check Description (check_desc), Incubator Team didn&rsquo;t have any sample, so disabled that entity while training the CDE model.</span></li>
         <li ><span>For the POC, 45 Training and 15 Test Documents were used.</span></li>
         <li ><span>Getting ~90 percent accuracy for the top 3 banks.</span></li>
      </ul>
      <p><span>Schema:</span></p>
      <table style="border: 1px solid black;padding:0px; margin:0px">
         <tr style="border: 1px solid black; font-weight:700;">
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>#</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Description of the item</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Entity name</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Occurrence Type</span></p>
            </td>
         </tr>
         <tr style="border: 1px solid black;">
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>1. </span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Total line item (includes Check Number, check date and check amount, check description) &nbsp;</span><span>[</span><span>Parent</span><span>]</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>check_item</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Optional multiple</span></p>
            </td>
         </tr>
         <tr style="border: 1px solid black;">
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>2.</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Check number [child]</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>check_number</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Optional once</span></p>
            </td>
         </tr>
         <tr style="border: 1px solid black;">
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>3.</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Check date [child]</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>check_date</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Optional once</span></p>
            </td>
         </tr>
         <tr style="border: 1px solid black;">
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>4</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Check amount [child]</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>check_amount</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Optional once</span></p>
            </td>
         </tr>
         <tr style="border: 1px solid black;">
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>5.</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Check Description [child]</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>check_desc</span></p>
            </td>
            <td  style="border: 1px solid black;" colspan="1" rowspan="1">
               <p><span>Optional once</span></p>
            </td>
         </tr>
      </table>
      <p><span style="overflow: hidden; display: inline-block; margin: 0.00px -0.00px; border: 1.33px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 667.33px; height: 265.00px;"><img alt="" src="images/image6.png" style="width: 667.33px; height: 332.54px; margin-left: 0.00px; margin-top: -0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p>
      <p><span style="overflow: hidden; display: inline-block; margin: 0.00px -0.00px; border: 1.33px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 667.36px; height: 253.50px;"><img alt="" src="images/image7.png" style="width: 754.86px; height: 477.23px; margin-left: -87.50px; margin-top: -88.34px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p>
      <p><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;* </span><span>Blue</span><span>&nbsp;boxes are the labeled bounding boxes</span></p>
      <ul class="c22 lst-kix_yqix1oj3s0ix-0 start">
         <li ><span>The Trained processor will detect the check entities but with the characteristic of detecting the whole row as a parent item(check_item) , if there are multiple tables (horizontally stacked).</span></li>
      </ul>
      <p><span style="overflow: hidden; display: inline-block; margin: -0.00px -0.00px; border: 1.33px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 733.55px; height: 189.71px;"><img alt="" src="images/image3.png" style="width: 758.84px; height: 189.71px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p>
      <p><span></span></p>
      <p><span>The above issue is taken care of in the post processing code and below is the output after post processing code.</span></p>
      <p><span></span></p>
      <p><span style="overflow: hidden; display: inline-block; margin: 0.00px -0.00px; border: 1.33px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 734.00px; height: 426.60px;"><img alt="" src="./images/image8.png" style="width: 1030.38px; height: 519.40px; margin-left: 0.00px; margin-top: -68.40px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p>
      <p><span></span></p>
      <p><span></span></p>
      <p><span></span></p>
      <ol class="c22 lst-kix_uwprj8q9rdjm-0" start="3">
         <li ><span>The Post processing code for modifying the CDE output and combining with the Bank statement parser post processing json is given below</span></li>
      </ol>
<div style="padding-bottom:30px"></div>

In [None]:
from google.cloud import storage
import json
import re
import pandas as pd
import copy
import os
import random
import string
from dataclasses import dataclass
from difflib import SequenceMatcher
import numpy as np
from io import BytesIO
from deepparse.parser import AddressParser
from datetime import datetime
from google.cloud import documentai_v1beta3 as documentai
from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
from utilities import (
    file_names,
    documentai_json_proto_downloader,
    copy_blob,
    process_document_sample,
    store_document_as_json,
    batch_process_documents_sample,
    blob_downloader,
    create_pdf_bytes_from_json,
)
import warnings

warnings.filterwarnings("ignore")


# finds maximum id
def maxIdFinder(jsonData: documentai.Document) -> int:
    """
    Function to get the maximum id from the entity id attribute..

    Parameters
    ----------
    jsonData : documentai.Document
            The document proto having all the entities with the id attribute.
    Returns
    -------
    int
        Returns the maximum id.
    """
    global maxId

    allEntities = jsonData.entities
    noOfEntitiesInJsonFile = len(allEntities)
    jsonDict = {
        "confidence": [],
        "id": [],
        "mention_text": [],
        "normalized_value": [],
        "page_anchor": [],
        "text_anchor": [],
        "type": [],
    }
    entitiesArray = []

    for i in range(0, noOfEntitiesInJsonFile):
        try:
            if allEntities[i].id:
                entitiesArray.append(allEntities[i])
        except:
            for j in allEntities[i].properties:
                entitiesArray.append(j)

    entitiesArray = sorted(entitiesArray, key=lambda x: x.id)

    for i in range(0, len(entitiesArray)):
        try:
            jsonDict["confidence"].append(entitiesArray[i].confidence)
        except:
            jsonDict["confidence"] = 0
        try:
            jsonDict["id"].append(entitiesArray[i].id)
        except:
            jsonDict["id"].append("")
        try:
            jsonDict["mention_text"].append(entitiesArray[i].mention_text)
        except:
            jsonDict["mention_text"].append("")
        try:
            jsonDict["normalized_value"].append(entitiesArray[i].normalized_value)
        except:
            jsonDict["normalized_value"].append("")
        try:
            jsonDict["page_anchor"].append(entitiesArray[i].page_anchor)
        except:
            jsonDict["page_anchor"].append("")
        try:
            jsonDict["text_anchor"].append(entitiesArray[i].text_anchor)
        except:
            jsonDict["text_anchor"].append("")
        try:
            jsonDict["type"].append(entitiesArray[i].type)
        except:
            jsonDict["type"].append("")

    tempList = []
    for i in jsonDict["id"]:
        tempList.append(int(i))
    maxId = max(tempList)
    return maxId


def delete_folder(bucket_name: str, folder_name: str) -> None:
    """
    Function to delete the folder in  a given bucket.

    Parameters
    ----------
    bucket_name : str
            The bucket name where all the folder are stored.
    folder_name : str
            The folder name which needs to be removed.
    """

    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    """Delete object under folder"""
    blobs = list(bucket.list_blobs(prefix=folder_name))
    bucket.delete_blobs(blobs)
    print(f"Folder {folder_name} deleted.")


def get_files_not_parsed(gcs_input_dir: str, gcs_output_dir: str) -> Tuple:
    """
    Function to get the file which are not processed by the processor.

    Parameters
    ----------
    gcs_input_dir : str
            The gcs path where the original documents(PDFs) are stored.
    gcs_output_dir : str
            The gcp path to store the output from the processor.
    Returns
    -------
    Tuple
        Returns the Tuple with values of temporary folder name, temporary bucket name and temporary initial path
    """
    now = datetime.now()
    OutputDirPrefix = now.strftime("%H%M%S%d%m%Y")
    pdfs_names_list, pdfs_names_dict_1 = file_names(gcs_input_dir)
    Jsons_names_list, Jsons_names_dict_1 = file_names(gcs_output_dir)
    file_name_dict = {a.split(".")[0]: a for a in pdfs_names_list}
    json_name_dict = {a.split(".")[0]: a for a in Jsons_names_list}
    files_list = list(file_name_dict.keys())
    list_json_name_dict = list(json_name_dict.keys())
    dict_json = {}
    for i in range(len(list_json_name_dict)):
        if list_json_name_dict[i].endswith("-0"):
            dict_json[(list_json_name_dict[i][:-2])] = list_json_name_dict[i]
        else:
            dict_json[(list_json_name_dict[i])] = list_json_name_dict[i]
    temp_bucket = gcs_input_dir.split("/")[2]
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(temp_bucket)
    list_new = []
    for i in range(len(files_list)):
        if files_list[i] in dict_json.keys():
            print(
                " Processed json file already exists for:{} ".format(
                    file_name_dict[files_list[i]]
                )
            )
        else:
            list_new.append(files_list[i])
            source_blob = source_bucket.blob(file_name_dict[files_list[i]])
            # print(file_name_dict[files_list[i]])
            temp = f"{file_name_dict[files_list[i]]}"
            file_name_temp = pdfs_names_dict_1[temp]
            prefix = (
                gcs_input_dir.split("/")[-1]
                + "/"
                + "temp_"
                + f"{OutputDirPrefix}"
                + "/"
                + temp
            )
            # new_blob = source_bucket.copy_blob(source_blob, destination_bucket, filename[i])
            copy_blob(temp_bucket, file_name_temp, temp_bucket, prefix)
    temp_initial_path = (
        "gs://"
        + temp_bucket
        + "/"
        + gcs_input_dir.split("/")[-1]
        + "/"
        + "temp_"
        + f"{OutputDirPrefix}"
    )
    temp_folder = (
        ("/").join(gcs_input_dir.split("/")[3:])
        + "/"
        + "temp_"
        + f"{OutputDirPrefix}"
        + "/"
    )
    return temp_initial_path, temp_folder, temp_bucket


def get_files_not_postparsed(
    gcs_output_dir: str, gcs_new_output_json_path: str
) -> Tuple:
    """
    Function to get the file which are not processed by the script.

    Parameters
    ----------
    gcs_output_dir : str
            The gcs path where the processed documents are available which are already been parsed by processor.
    gcs_new_output_json_path : str
            The gcp path to store the output from this script.
    Returns
    -------
    Tuple
        Returns the Tuple with values of temporary folder name, temporary bucket name and temporary initial path
    """
    now = datetime.now()
    OutputDirPrefix = now.strftime("%H%M%S%d%m%Y")
    pdfs_names_list, pdfs_names_dict_1 = file_names(gcs_output_dir)
    Jsons_names_list, Jsons_names_dict_1 = file_names(gcs_new_output_json_path)
    file_name_dict = {a.split(".")[0]: a for a in pdfs_names_list}
    json_name_dict = {a.split(".")[0]: a for a in Jsons_names_list}
    files_list = list(file_name_dict.keys())
    temp_bucket = gcs_output_dir.split("/")[2]
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(temp_bucket)
    list_new = []
    for i in range(len(files_list)):
        if files_list[i] in json_name_dict.keys():
            print(
                " Processed json file already exists for:{} ".format(
                    file_name_dict[files_list[i]]
                )
            )
        else:
            list_new.append(files_list[i])
            source_blob = source_bucket.blob(file_name_dict[files_list[i]])
            # print(file_name_dict[files_list[i]])
            temp = f"{file_name_dict[files_list[i]]}"
            file_name_temp = pdfs_names_dict_1[temp]
            prefix = (
                gcs_output_dir.split("/")[-1]
                + "/"
                + "temp_"
                + f"{OutputDirPrefix}"
                + "/"
                + temp
            )
            # new_blob = source_bucket.copy_blob(source_blob, destination_bucket, filename[i])
            copy_blob(temp_bucket, file_name_temp, temp_bucket, prefix)
    temp_initial_path = (
        "gs://"
        + temp_bucket
        + "/"
        + gcs_output_dir.split("/")[-1]
        + "/"
        + "temp_"
        + f"{OutputDirPrefix}"
    )
    temp_folder = (
        ("/").join(gcs_output_dir.split("/")[3:])
        + "/"
        + "temp_"
        + f"{OutputDirPrefix}"
        + "/"
    )
    return temp_initial_path, temp_folder, temp_bucket


# dictionary for entity renaming
dict_ent_rename = {
    "statement_start_date": "Statement_Start_Date",
    "statement_end_date": "Statement_End_Date",
    "bank_name": "Financial_Institution",
}


def accounttype_change(json_data: documentai.Document) -> documentai.Document:
    """
    Function is for  comparing sequences for the account entity (account_type, account number) and update the entity name.

    Parameters
    ----------
    document :documentai.Document
            The document proto having all the entities
    Returns
    -------
    documentai.Document
        Returns the updated document proto .
    """
    import difflib
    from difflib import SequenceMatcher

    accountnodict = {}
    accountnamedict = {}

    def detials_account(account_type):
        account_dict_lst = []
        for i in range(len(json_data.entities)):
            if not hasattr(json_data.entities[i], "properites"):
                if (
                    difflib.SequenceMatcher(
                        None, json_data.entities[i].type, account_type
                    ).ratio()
                    >= 0.9
                ):
                    try:
                        id1 = json_data.entities[i].id
                    except:
                        id1 = ""
                    try:
                        page1 = json_data.entities[i].page_anchor.page_refs[0].page
                    except:
                        page1 = 0
                    try:
                        textSegments1 = json_data.entities[i].text_anchor.text_segments[
                            0
                        ]
                    except:
                        textSegments1 = ""
                    try:
                        temp_y_list = []
                        temp_x_list = []
                        for j in (
                            json_data.entities[i]
                            .page_anchor.page_refs[0]
                            .bounding_poly.normalized_vertices
                        ):
                            temp_y_list.append(float(j.y))
                        for j in (
                            json_data.entities[i]
                            .page_anchor.page_refs[0]
                            .bounding_poly.normalized_vertices
                        ):
                            temp_x_list.append(float(j.x))
                        x_max1 = max(temp_x_list)
                        y_max1 = max(temp_y_list)

                    except:
                        y_max1 = ""
                        x_max1 = ""
                    account_dict_lst.append(
                        {
                            json_data.entities[i].mention_text: {
                                "id": id1,
                                "page": page1,
                                "text_segments": textSegments1,
                                "x_max": x_max1,
                                "y_max": y_max1,
                            }
                        }
                    )

        return account_dict_lst

    accountnamedict = detials_account("account_type")
    accountnodict = detials_account("account_i_number")
    accountnamedict
    temp_del = []
    for i in range(len(accountnamedict)):
        for k in accountnamedict[i]:
            if re.search("\sstatement", k, re.IGNORECASE):
                temp_del.append(k)
    for i in range(len(accountnamedict)):
        try:
            for k in accountnamedict[i]:
                for m in temp_del:
                    if k == m:
                        del accountnamedict[i]
        except:
            pass
    account_comp = []
    for i in range(len(accountnamedict)):
        for j in range(len(accountnodict)):
            for k in accountnamedict[i]:
                for m in accountnodict[j]:
                    y_diff = abs(
                        accountnamedict[i][k]["y_max"] - accountnodict[j][m]["y_max"]
                    )
                    account_comp.append({k: {m: y_diff}})
    final_account_match = {}
    for i in range(len(account_comp)):
        for j in account_comp[i]:
            for k in account_comp[i][j]:
                if j.lower() not in final_account_match.keys():
                    final_account_match[j.lower()] = {k: account_comp[i][j][k]}
                else:
                    for m in final_account_match:
                        for n in final_account_match[m]:
                            if j.lower() == m.lower():
                                if account_comp[i][j][k] < final_account_match[m][n]:
                                    final_account_match[j.lower()] = {
                                        k: account_comp[i][j][k]
                                    }
                                else:
                                    final_account_match[j.lower()] = {
                                        n: final_account_match[m][n]
                                    }

    for i in json_data.entities:
        if not hasattr(i, "properites"):
            for j in final_account_match:
                for k in final_account_match[j]:
                    if i.mention_text:
                        if i.mention_text.lower() == j.lower():
                            for m in json_data.entities:
                                if m.mention_text:
                                    if m.mention_text.lower() == k.lower():
                                        i.type = ("_").join(
                                            m.type.split("_")[:2]
                                        ) + "_name"

    account_names = {}
    for i in range(len(json_data.entities)):
        if not hasattr(json_data.entities[i], "properites"):
            if (
                difflib.SequenceMatcher(
                    None, json_data.entities[i].type, "account_name"
                ).ratio()
                >= 0.9
            ):
                account_names[json_data.entities[i].mention_text] = {
                    "id": json_data.entities[i].id,
                    "type": json_data.entities[i].type,
                }

    for i in range(len(json_data.entities)):
        if json_data.entities[i].type == "account_type":
            for j in list(account_names.keys()):
                if (
                    difflib.SequenceMatcher(
                        None, (json_data.entities[i].mention_text).lower(), j.lower()
                    ).ratio()
                ) > 0.9:
                    json_data.entities[i].type = account_names[j].type
    for i in range(len(json_data.entities)):
        try:
            while json_data.entities[i].type == "account_type":
                del json_data.entities[i]
        except Exception as e:
            pass
    return json_data


# logging function


def logger(filename: str, message: str) -> None:
    """
    Function to write the message (error message, warning messgae, info message) to the logging text file.

    Parameters
    ----------
    filename : str
            The text file name where the message needs to be written.
    message : str
            The string message from functions(error message, warning messgae, info message).

    """
    f = open(filename, "a")
    f.write("{0} -- {1}\n".format(datetime.now().strftime("%Y-%m-%d %H:%M"), message))
    f.close()


# Borrower name split and page Anchors
def borrowerNameFix(jsonData):
    """
    Function to fix the borrower name present in the document by fixing the suffix, prefix of the name and also by divding the full name into
    smaller chunks of first name, middle name, last name

    Parameters
    ----------
    document :documentai.Document
            The document proto having all the entities with the full name of borrower.
    Returns
    -------
    documentai.Document
        Returns the updated document proto .
    """
    global maxId
    extraDict = documentai.Document()
    for i in jsonData.entities:
        if i.type == "client_name":
            extraDict.entities.append(i)

    for i in jsonData.entities:
        if i.type == "client_name":
            jsonData.entities.remove(i)

    def ent_rename_borrower_name(document):
        google_name = "client_name"
        entity_values = []
        entity_dict = {}
        dict_ent1 = {}
        for i in range(len(document.entities)):
            if not hasattr(document.entities[i], "properites"):
                if document.entities[i].type == google_name:
                    entity_dict[document.entities[i].mention_text] = []
        for i in range(len(document.entities)):
            if not hasattr(document.entities[i], "properites"):
                if document.entities[i].type == google_name:
                    ent_val = entity_values.append(document.entities[i].mention_text)
                    if document.entities[i].mention_text in entity_dict.keys():
                        entity_dict[document.entities[i].mention_text].append(
                            document.entities[i].id
                        )
        sorted_list = []
        sorted_dict = {}
        for i in entity_dict:
            temp_list = []
            for j in range(len(entity_dict[i])):
                temp_list.append(int(entity_dict[i][j]))
            sorted_list.append(min(temp_list))
        sorted_list.sort()
        for i in range(len(sorted_list)):
            for j in entity_dict:
                if str(sorted_list[i]) in entity_dict[j]:
                    if j not in sorted_dict:
                        sorted_dict[j] = i

        # return entity_dict,sorted(dict_ent1.items())
        for i in range(len(document.entities)):
            if not hasattr(document.entities[i], "properites"):
                if document.entities[i].type == google_name:
                    for k in entity_dict[document.entities[i].mention_text]:
                        try:
                            if document.entities[i].id == k:
                                document.entities[i].type = (
                                    "Borrower_"
                                    + str(
                                        sorted_dict[document.entities[i].mention_text]
                                        + 1
                                    )
                                    + "_Full_Name"
                                )
                        except:
                            pass

        return document, entity_dict

    def suffix_checker(json_data):
        possible_suffixes = ["JR", "Jr", "III", "II", "MD", "PhD", "DVM", "DDS"]
        suffix_tracker = {}
        for i in range(len(json_data.entities)):
            if "name" in json_data.entities[i].type:
                if json_data.entities[i].mention_text.split()[-1] in possible_suffixes:
                    suffix = json_data.entities[i].mention_text.split()[-1]
                    borrower_number = json_data.entities[i].type[:10]
                    suffix_tracker[borrower_number] = suffix
                    json_data.entities[i].mention_text = " ".join(
                        map(str, i.mention_text.split()[:-1])
                    )
                    temp = copy.deepcopy(json_data.entities[i])
                    temp.type = borrower_number + "_Suffix"
                    temp.text_anchor.text_segments[0].start_index = str(
                        int(temp.text_anchor.text_segments[0].end_index - len(suffix))
                    )
                    temp.mention_text = suffix
                    temp.text_anchor["content"] = suffix
                    json_data.entities.append(temp)
        return json_data

    def split_rename(document, entity_dict):
        import copy

        type_three_names = ["first_name", "middle_name", "last_name"]
        type_three_names_with_comma = ["last_name", "middle_name", "first_name"]
        type_two_names = ["first_name", "last_name"]
        type_two_names_with_comma = ["last_name", "first_name"]
        prefix = ["mr", "mrs", "miss", "ms", "mx", "sir", "dr"]
        try:
            for_entity_count = len(document.entities)
            deleted_entites = []
            for i in range(for_entity_count):
                for j in entity_dict.keys():
                    for k in range(len(entity_dict[j])):
                        if not document.entities[i].properties:
                            if document.entities[i].id == entity_dict[j][k]:
                                name = document.entities[i].mention_text.split(" ")

                                try:
                                    if name[0].lower() in prefix:
                                        k = name[0] + " " + name[1]
                                        name.pop(0)
                                        name.pop(0)
                                        name.insert(0, k)
                                except:
                                    pass

                                if len(name) == 2:
                                    for m in range(len(name)):
                                        temp = copy.deepcopy(document.entities[i])
                                        # del temp.id
                                        temp.mention_text = name[m]
                                        index = temp.text_anchor.text_segments[0].copy()

                                        if m == 0:
                                            temp.text_anchor.text_segments[
                                                0
                                            ].end_index = str(
                                                int(index.start_index) + len(name[0])
                                            )
                                            if name[0].endswith(","):
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_two_names_with_comma[m]
                                                )
                                            else:
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_two_names[m]
                                                )
                                        else:
                                            temp.text_anchor.text_segments[
                                                0
                                            ].start_index = str(
                                                int(index.end_index) - len(name[1])
                                            )
                                            if name[0].endswith(","):
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_two_names_with_comma[m]
                                                )
                                            else:
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_two_names[m]
                                                )
                                        document.entities.append(temp)
                                elif len(name) == 3:
                                    for m in range(len(name)):
                                        temp = copy.deepcopy(document.entities[i])
                                        temp.mention_text = name[m]
                                        index = temp.text_anchor.text_segments[0].copy()
                                        if m == 0:
                                            temp.text_anchor.text_segments[
                                                0
                                            ].end_index = str(
                                                int(index.start_index) + len(name[0])
                                            )
                                            if name[0].endswith(","):
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_three_names_with_comma[m]
                                                )
                                            else:
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_three_names[m]
                                                )
                                        elif k == 1:
                                            temp.text_anchor.text_segments[
                                                0
                                            ].start_index = str(
                                                int(index.start_index)
                                                + len(name[0])
                                                + 1
                                            )
                                            temp.text_anchor.text_segments[
                                                0
                                            ].end_index = str(
                                                int(index.end_index) - len(name[2])
                                            )
                                            if name[0].endswith(","):
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_three_names_with_comma[m]
                                                )
                                            else:
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_three_names[m]
                                                )
                                        else:
                                            temp.text_anchor.text_segments[
                                                0
                                            ].start_index = str(
                                                int(index.end_index) - len(name[2]) + 1
                                            )
                                            if name[0].endswith(","):
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_three_names_with_comma[m]
                                                )
                                            else:
                                                temp.type = (
                                                    (
                                                        ("_").join(
                                                            temp.type.split("_")[:2]
                                                        )
                                                    )
                                                    + "_"
                                                    + type_three_names[m]
                                                )
                                        document.entities.append(temp)

        except Exception as e:
            print(e, " :: ", i)
        return document

    def text_anchorFix(jsonData, tempVar):
        def text_anchorFixKid(jsonData, entityDict):
            entityDict.text_anchor.content = entityDict.mention_text
            if entityDict.type[-9:] != "Full_Name":
                start = int(entityDict.text_anchor.text_segments[0].start_index)
                end = int(entityDict.text_anchor.text_segments[0].end_index)

                while (
                    entityDict.mention_text != jsonData.text[start:end] and end > start
                ):
                    end -= 1
                entityDict.text_anchor.text_segments[0].start_index = str(start)
                entityDict.text_anchor.text_segments[0].end_index = str(end)
            return entityDict

        tempVarEntities = []
        for i in tempVar.entities:
            fixedDict = text_anchorFixKid(jsonData, i)
            tempVarEntities.append(i)

        tempVar.entities = tempVarEntities
        return tempVar

    def page_anchorFix(jsonData, tempVar):
        tokenRange = {}
        for i in range(0, len(jsonData.pages)):
            for j in range(0, len(jsonData.pages[i].tokens)):
                pageNumber = i
                tokenNumber = j
                try:
                    startIndex = int(
                        jsonData.pages[i]
                        .tokens[j]
                        .layout.text_anchor.text_segments[0]
                        .start_index
                    )
                except:
                    startIndex = 0
                endIndex = int(
                    jsonData.pages[i]
                    .tokens[j]
                    .layout.text_anchor.text_segments[0]
                    .end_index
                )
                tokenRange[range(startIndex, endIndex)] = {
                    "page_number": pageNumber,
                    "token_number": tokenNumber,
                }

        for i in tempVar.entities:
            if i.type is not "Borrower_Full_Address":
                start = int(i.text_anchor.text_segments[0].start_index)
                end = int(i.text_anchor.text_segments[0].end_index) - 1

                for j in tokenRange:
                    if start in j:
                        lowerToken = tokenRange[j]
                for j in tokenRange:
                    if end in j:
                        upperToken = tokenRange[j]

                lowerTokenData = (
                    jsonData.pages[int(lowerToken["page_number"])]
                    .tokens[int(lowerToken["token_number"])]
                    .layout.bounding_poly.normalized_vertices
                )
                upperTokenData = (
                    jsonData.pages[int(upperToken["page_number"])]
                    .tokens[int(upperToken["token_number"])]
                    .layout.bounding_poly.normalized_vertices
                )
                # for A

                xA = float(lowerTokenData[0].x)
                yA = float(lowerTokenData[0].y)
                xA_ = float(upperTokenData[0].x)
                yA_ = float(upperTokenData[0].y)
                # for B
                xB = float(lowerTokenData[1].x)
                yB = float(lowerTokenData[1].y)
                xB_ = float(upperTokenData[1].x)
                yB_ = float(upperTokenData[1].y)
                # for C
                xC = float(lowerTokenData[2].x)
                yC = float(lowerTokenData[2].y)
                xC_ = float(upperTokenData[2].x)
                yC_ = float(upperTokenData[2].y)
                # for D
                xD = float(lowerTokenData[3].x)
                yD = float(lowerTokenData[3].y)
                xD_ = float(upperTokenData[3].x)
                yD_ = float(upperTokenData[3].y)

                A = {"x": min(xA, xA_), "y": min(yA, yA_)}
                B = {"x": max(xB, xB_), "y": min(yB, yB_)}
                C = {"x": max(xC, xC_), "y": max(yC, yC_)}
                D = {"x": min(xD, xD_), "y": max(yD, yD_)}
                i.page_anchor.page_refs[0].bounding_poly.normalized_vertices = [
                    A,
                    B,
                    C,
                    D,
                ]
        return tempVar

    x1, y1 = ent_rename_borrower_name(extraDict)
    extraDict = suffix_checker(extraDict)
    tempVar = split_rename(x1, y1)

    tempVar_2 = text_anchorFix(jsonData, tempVar)

    tempVar_3 = page_anchorFix(jsonData, tempVar_2)

    for i in tempVar_3.entities:
        if i.type[-9:] != "Full_Name":
            maxId += 1
            i.id = str(maxId)
    for i in tempVar_3.entities:
        jsonData.entities.append(i)
    return jsonData


# Entities statement_start_date,statement_end_date,starting_balance,ending_balance,bank_name


def ent_rename(
    document: documentai.Document, google_name: str, specific_name: str
) -> documentai.Document:
    """
    Function to rename the entities given by the user in variable dict_ent_rename.

    Parameters
    ----------
    document :documentai.Document
            The document proto having all the entities
    google_name : str
            The entity name present in dict_ent_rename variable as key which need to be replaced .
    specific_name : str
            The specific name which will replace the entity name and present in dict_ent_rename variable as value.
    Returns
    -------
    documentai.Document
        Returns the updated document proto .
    """
    for i in range(len(document.entities)):
        if not hasattr(document.entities[i], "properites"):
            if document.entities[i].type == google_name:
                document.entities[i].type = specific_name
        elif document.entities[i].properties:
            for k in range(len(document.entities[i].properties)):
                if document.entities[i].properties[k].type == google_name:
                    document.entities[i].properties[k].type = specific_name
    return document


# adding pages entity
# total pages function


def add_total_pages(jsonData):
    """
    Function to add the total page number and update the bounding poly.

    Parameters
    ----------
    document : :documentai.Document
            The document proto having all the pages data.

    Returns
    -------
    documentai.Document
        Returns the updated document proto.
    """
    pages_dict = documentai.Document.Entity()
    pages_dict.type = "Total_Pages"
    total = str(len(jsonData.pages))
    pages_dict.mention_text = total
    # jsonData.entities.append(pages_dict)
    pages_dict.confidence = 1
    tokenRange = {}
    start = jsonData.text.rfind("Page" + " " + str(len(jsonData.pages)))
    end = int(start) + 11

    for j in range(0, len(jsonData.pages[(len(jsonData.pages)) - 1].tokens)):
        pageNumber = (len(jsonData.pages)) - 1
        tokenNumber = j
        try:
            startIndex = int(
                jsonData.pages[(len(jsonData.pages)) - 1]
                .tokens[j]
                .layout.text_anchor.text_segments[0]
                .start_index
            )
        except:
            startIndex = 0
        endIndex = int(
            jsonData.pages[(len(jsonData.pages)) - 1]
            .tokens[j]
            .layout.text_anchor.text_segments[0]
            .end_index
        )
        tokenRange[range(startIndex, endIndex)] = {
            "pageNumber": pageNumber,
            "tokenNumber": tokenNumber,
        }

    for j in tokenRange:
        if start in j:
            lowerToken = tokenRange[j]
    for j in tokenRange:
        if end in j:
            upperToken = tokenRange[j]

    lowerTokenData = (
        jsonData.pages[int(lowerToken["pageNumber"])]
        .tokens[int(lowerToken["tokenNumber"])]
        .layout.bounding_poly.normalized_vertices
    )
    upperTokenData = (
        jsonData.pages[int(upperToken["pageNumber"])]
        .tokens[int(upperToken["tokenNumber"])]
        .layout.bounding_poly.normalized_vertices
    )

    # for A
    xA = float(lowerTokenData[0].x)
    yA = float(lowerTokenData[0].y)
    xA_ = float(upperTokenData[0].x)
    yA_ = float(upperTokenData[0].y)
    # for B
    xB = float(lowerTokenData[1].x)
    yB = float(lowerTokenData[1].y)
    xB_ = float(upperTokenData[1].x)
    yB_ = float(upperTokenData[1].y)
    # for C
    xC = float(lowerTokenData[2].x)
    yC = float(lowerTokenData[2].y)
    xC_ = float(upperTokenData[2].x)
    yC_ = float(upperTokenData[2].y)
    # for D
    xD = float(lowerTokenData[3].x)
    yD = float(lowerTokenData[3].y)
    xD_ = float(upperTokenData[3].x)
    yD_ = float(upperTokenData[3].y)

    A = {"x": min(xA, xA_), "y": min(yA, yA_)}
    B = {"x": max(xB, xB_), "y": min(yB, yB_)}
    C = {"x": max(xC, xC_), "y": max(yC, yC_)}
    D = {"x": min(xD, xD_), "y": max(yD, yD_)}
    boundpoly = {}
    boundpoly["normalized_vertices"] = [A, B, C, D]
    # pages_dict.page_anchor.page_refs[0].bounding_poly.normalized_vertices = [A, B, C, D]
    pages_dict.page_anchor = {
        "page_refs": [
            {
                "bounding_poly": {"normalized_vertices": [A, B, C, D]},
                "page": str(int(total) - 1),
            }
        ]
    }
    pages_dict.text_anchor = {
        "content": pages_dict.mention_text,
        "text_segments": [{"end_index": str(end), "start_index": str(start)}],
    }
    jsonData.entities.append(pages_dict)

    return jsonData


def delete_empty(document: documentai.Document) -> documentai.Document:
    """
    Function remove the  enitity from the entities list if the entity  is empty

    Parameters
    ----------
    document : :documentai.Document
            The document proto having all the entities

    Returns
    -------
    documentai.Document
        Returns the updated document proto after removal of empty entities
    """
    for i in range(len(document.entities)):
        try:
            if document.entities[i] == "":
                del document.entities[i]
        except:
            pass
    return document


# splitting client_address and creating new entities


def has_digit(s: str) -> str:
    """
    Function to check if string have digit.

    Parameters
    ----------
    s :str
            The string which can have digit.

    Returns
    -------
    str
        Returns true or false depends on if digit is present or not.
    """
    return any(char.isdigit() for char in s)


def parse_last_line(last_line: str) -> Tuple:
    """
    Function to parse the address .

    Parameters
    ----------
    last_line :str
            The string in address format

    Returns
    -------
    Tuple
        Returns the tuple with city,state, zip .
    """
    match = re.search(r"([A-Z]{2})((\)|\s|,|\.)*)(\d{5})", last_line)
    # match = re.search(r'([A-Za-z]*)|\(|([A-Z]{2})\)|((\s|,|\.)*)(\d{5})', last_line)

    if not match:
        return None
    elif match.start() > 0 and last_line[match.start() - 1].isalnum():
        return None
    matched_state_zip = last_line[match.start() : match.end()]
    zip_start = re.search(r"\d{5}", matched_state_zip).start()

    state, zip = (
        re.sub(r"[^\w\s]", "", matched_state_zip[0:zip_start].strip()),
        matched_state_zip[zip_start:],
    )
    zip_to_end = last_line[match.start() + zip_start :]

    unmatched_tokens = [
        t for t in last_line[0 : match.start()].split() if t and has_digit(t)
    ]
    city_candiates = [
        t for t in last_line[0 : match.start()].split() if t and not has_digit(t)
    ]

    city = None
    if city_candiates:
        if len(city_candiates) > 2:
            unmatched_tokens.extend(city_candiates[0:-2])
            city = " ".join(city_candiates[-2:0])
        else:
            city = " ".join(city_candiates)
    unmatched = " ".join(unmatched_tokens)

    return (city, state, zip, unmatched, zip_to_end)


def split_address_entities(entity_type: str, mention_text: str) -> Dict:
    """
    Function to split the address into multiple address line like zip,stae,city,street address.

    Parameters
    ----------
    entity_type : str
            The entity name from the document proto object
    mention_text : str
            The OCR text of the entity having the actual data of address
    Returns
    -------
    Dict
        Returns the dictonary object with the splitted address inthe form of entity as key and value as text.
    """
    text_lines = [line.strip() for line in mention_text.split("\n") if line.strip()]
    if len(text_lines) == 2 or len(text_lines) == 3:
        parsing = parse_last_line(text_lines[-1])
        if parsing is not None and parsing[0] is not None:
            if (
                len(text_lines) == 3
                and not has_digit(text_lines[0])
                and "box" not in text_lines[0].casefold()
                and not text_lines[0].startswith("o ")
            ):
                del text_lines[0]
            line2 = text_lines[1] if len(text_lines) == 3 else ""
            return {
                f"{entity_type}_StreetAddressOrPostalBox": text_lines[0],
                f"{entity_type}_AdditionalStreetAddressOrPostalBox": line2,
                f"{entity_type}_City": parsing[0],
                f"{entity_type}_State": parsing[1],
                f"{entity_type}_Zip": parsing[4],
            }

    if len(text_lines) == 1:
        parsing = parse_last_line(text_lines[0])
        if parsing is None:
            raise ValueError("Likely invalid redaction.")
        else:
            return {
                f"{entity_type}_StreetAddressOrPostalBox": parsing[3],
                f"{entity_type}_City": parsing[0] if parsing[0] else "",
                f"{entity_type}_State": parsing[1],
                f"{entity_type}_Zip": parsing[2],
            }
    else:
        last_line_candidates = [
            i for i in range(len(text_lines)) if parse_last_line(text_lines[i])
        ]
        if not last_line_candidates:
            all_tokens = " ".join(text_lines).split()
            state_token_id, zip_token_id = None, None
            for i, token in enumerate(all_tokens):
                if re.fullmatch(r"([A-Z]{2})((\s|,|\.)*)", token):
                    state_token_id = i
                if re.fullmatch(r"\d{5}", token):
                    zip_token_id = i

            if state_token_id is None or zip_token_id is None:
                raise ValueError("Likely invalid redaction, no zip or state.")
            else:
                search_start = max(min(state_token_id, zip_token_id) - 1, 0)
                search_end = min(max(state_token_id, zip_token_id) + 2, len(all_tokens))
                city_candidates = [
                    i
                    for i in range(search_start, search_end)
                    if i not in [state_token_id, zip_token_id]
                    and not has_digit(all_tokens[i])
                ]

                ids_to_remove = [state_token_id, zip_token_id]
                city = None
                if city_candidates:
                    city = all_tokens[city_candidates[-1]]
                    ids_to_remove.append(city_candidates[-1])
                line_1 = " ".join(
                    [
                        all_tokens[i]
                        for i in range(len(all_tokens))
                        if i not in ids_to_remove
                    ]
                )
                return {
                    f"{entity_type}_StreetAddressOrPostalBox": line_1,
                    f"{entity_type}_City": city if city else "",
                    f"{entity_type}_State": all_tokens[state_token_id],
                    f"{entity_type}_Zip": all_tokens[zip_token_id],
                }
        else:
            last_line_id = max(last_line_candidates)
            parsing = parse_last_line(text_lines[last_line_id])
            remaining_lines = text_lines[0:last_line_id]
            if not remaining_lines:
                return {
                    f"{entity_type}_StreetAddressOrPostalBox": parsing[3],
                    f"{entity_type}_City": parsing[0] if parsing[0] else "",
                    f"{entity_type}_State": parsing[1],
                    f"{entity_type}_Zip": parsing[2],
                }
            else:
                city = parsing[0]
                if city is None and not has_digit(remaining_lines[-1]):
                    city = remaining_lines[-1]
                    remaining_lines = remaining_lines[0:-1]
                if parsing[3]:
                    remaining_lines.append(parsing[3])
                line_1, line_2 = None, None
                if remaining_lines:
                    line_1 = remaining_lines[0]
                    if len(remaining_lines) > 1:
                        line_2 = " ".join(remaining_lines[1:])
                return {
                    f"{entity_type}_StreetAddressOrPostalBox": line_1 if line_1 else "",
                    f"{entity_type}_AdditionalStreetAddressOrPostalBox": line_2
                    if line_2
                    else "",
                    f"{entity_type}_City": city if city else "",
                    f"{entity_type}_State": parsing[1],
                    f"{entity_type}_Zip": parsing[2],
                }


# Replacing the Address , splitting and page anchors


def address_function(data: documentai.Document) -> documentai.Document:
    """
    Function to fix the address by the entity name (ex : Borrower_Street_Address,Borrower_City,Borrower_State,Borrower_Zip)
    by fixing the text anchor, page anchor.

    Parameters
    ----------
    data : documentai.Document
            The document proto data having the entities which needs to be change.

    Returns
    -------
    documentai.Document
        Returns the updated document proto object.
    """
    global maxId
    newData = documentai.Document()
    for i in data.entities:
        if i.type == "client_address":
            newData.entities.append(i)
            data.entities.remove(i)

    def address_text_anchor(jsonData, tempVar):
        # text_anchor fix
        for i in tempVar.entities:
            if i.type == "Borrower_Full_Address":
                start = int(i.text_anchor.text_segments[0].start_index)
                end = int(i.text_anchor.text_segments[0].end_index)
            else:
                i.mention_text = i.mention_text.replace("\\n", " ")
                end = start + len(i.mention_text)
                while i.mention_text.split() != jsonData.text[start:end].split() and (
                    end < len(jsonData.text) - 1
                ):
                    start += 1
                    end += 1
                i.text_anchor.text_segments[0].start_index = str(start)
                i.text_anchor.text_segments[0].end_index = str(end)
                i.text_anchor.content = i.mention_text
                start = end
        return tempVar

    def address_page_anchor(jsonData, tempVar):
        tokenRange = {}
        for i in range(0, len(jsonData.pages)):
            for j in range(0, len(jsonData.pages[i].tokens)):
                pageNumber = i
                tokenNumber = j
                try:
                    startIndex = int(
                        jsonData.pages[i]
                        .tokens[j]
                        .layout.text_anchor.text_segments[0]
                        .start_index
                    )
                except:
                    startIndex = 0
                endIndex = int(
                    jsonData.pages[i]
                    .tokens[j]
                    .layout.text_anchor.text_segments[0]
                    .end_index
                )
                tokenRange[range(startIndex, endIndex)] = {
                    "pageNumber": pageNumber,
                    "tokenNumber": tokenNumber,
                }

        for i in tempVar.entities:
            if i.type is not "Borrower_Full_Address":
                start = int(i.text_anchor.text_segments[0].start_index)
                end = int(i.text_anchor.text_segments[0].end_index) - 1

                for j in tokenRange:
                    if start in j:
                        lowerToken = tokenRange[j]
                for j in tokenRange:
                    if end in j:
                        upperToken = tokenRange[j]

                lowerTokenData = (
                    jsonData.pages[int(lowerToken["pageNumber"])]
                    .tokens[int(lowerToken["tokenNumber"])]
                    .layout.bounding_poly.normalized_vertices
                )
                upperTokenData = (
                    jsonData.pages[int(upperToken["pageNumber"])]
                    .tokens[int(upperToken["tokenNumber"])]
                    .layout.bounding_poly.normalized_vertices
                )
                # for A
                # for A
                xA = float(lowerTokenData[0].x)
                yA = float(lowerTokenData[0].y)
                xA_ = float(upperTokenData[0].x)
                yA_ = float(upperTokenData[0].y)
                # for B
                xB = float(lowerTokenData[1].x)
                yB = float(lowerTokenData[1].y)
                xB_ = float(upperTokenData[1].x)
                yB_ = float(upperTokenData[1].y)
                # for C
                xC = float(lowerTokenData[2].x)
                yC = float(lowerTokenData[2].y)
                xC_ = float(upperTokenData[2].x)
                yC_ = float(upperTokenData[2].y)
                # for D
                xD = float(lowerTokenData[3].x)
                yD = float(lowerTokenData[3].y)
                xD_ = float(upperTokenData[3].x)
                yD_ = float(upperTokenData[3].y)

                A = {"x": min(xA, xA_), "y": min(yA, yA_)}
                B = {"x": max(xB, xB_), "y": min(yB, yB_)}
                C = {"x": max(xC, xC_), "y": max(yC, yC_)}
                D = {"x": min(xD, xD_), "y": max(yD, yD_)}
                i.page_anchor.page_refs[0].bounding_poly.normalized_vertices = [
                    A,
                    B,
                    C,
                    D,
                ]
        return tempVar

    def address_function_new(data):
        deleted_entities = []
        address_entity_names = [
            "Borrower_Street_Address",
            "Borrower_City",
            "Borrower_State",
            "Borrower_Zip",
        ]
        address_entity_name_and_value = {}
        address_parser = AddressParser(device=0)  # On GPU device 0
        for i in range(len(data.entities)):
            try:
                if data.entities[i].type == "client_address":
                    deleted_entities.append(i)
                    full_address = " ".join(data.entities[i].mention_text.split())
                    parse_address = address_parser(full_address)

                    StreetNameMatch = re.search(
                        (
                            parse_address.StreetNumber
                            + " "
                            + parse_address.StreetName
                            + " "
                        ),
                        full_address,
                        flags=re.IGNORECASE,
                    )
                    address_entity_name_and_value[
                        "Borrower_Street_Address"
                    ] = full_address[StreetNameMatch.start() : StreetNameMatch.end()]

                    CityNameMatch = re.search(
                        parse_address.Municipality, full_address, flags=re.IGNORECASE
                    )
                    address_entity_name_and_value["Borrower_City"] = full_address[
                        CityNameMatch.start() : CityNameMatch.end() + 1
                    ]

                    StateNameMatch = re.search(
                        parse_address.Province.upper(), full_address
                    )
                    address_entity_name_and_value["Borrower_State"] = full_address[
                        StateNameMatch.start() : StateNameMatch.end()
                    ]

                    PostalCodeMatch = re.search(parse_address.PostalCode, full_address)
                    address_entity_name_and_value["Borrower_Zip"] = full_address[
                        PostalCodeMatch.start() : PostalCodeMatch.end() + 1
                    ]

                    for j in range(4):
                        temp = copy.deepcopy(data.entities[i])
                        temp.type = address_entity_names[j]
                        temp.mention_text = address_entity_name_and_value[
                            address_entity_names[j]
                        ]
                        data.entities.append(temp)

            except:
                print("Can't split full_address in sub-parts using deepParse")
                print(data.entities[i])
                if data.entities[i].type == "client_address":
                    deleted_entities.append(i)
                    s = data.entities[i].mention_text
                    split_result = split_address_entities("client_address", s)
                    for j in range(4):
                        temp = copy.deepcopy(data.entities[i])
                        temp.type = address_entity_names[j]
                        if address_entity_names[j] == "Borrower_Street_Address":
                            if (
                                "client_address_AdditionalStreetAddressOrPostalBox"
                                in split_result.keys()
                            ):
                                temp.mention_text = (
                                    split_result[
                                        "client_address_StreetAddressOrPostalBox"
                                    ]
                                    + split_result[
                                        "client_address_AdditionalStreetAddressOrPostalBox"
                                    ]
                                )
                            else:
                                temp.mention_text = split_result[
                                    "client_address_StreetAddressOrPostalBox"
                                ]
                        elif address_entity_names[j] == "Borrower_City":
                            temp.mention_text = split_result["client_address_City"]
                        elif address_entity_names[j] == "Borrower_State":
                            temp.mention_text = split_result["client_address_State"]
                        else:
                            temp.mention_text = split_result["client_address_Zip"]
                        data.entities.append(temp)
        for i in deleted_entities[::-1]:
            data.entities[i].type = "Borrower_Full_Address"
        return data

    tempVar = address_function_new(newData)
    tempVar_2 = address_text_anchor(data, tempVar)
    tempVar_3 = address_page_anchor(data, tempVar)
    for i in tempVar_3.entities:
        if i.type != "Borrower_Full_Address":
            maxId += 1
            i.id = str(maxId)
    for i in tempVar_3.entities:
        data.entities.append(i)
    return data


def fixAccountBalance(document: documentai.Document) -> documentai.Document:
    """
    Function to fix the account balance by updatding the ,ention text of the entities.

    Parameters
    ----------
    jsonData : documentai.Document
            The document proto data having the entities which needs to be change.

    Returns
    -------
    documentai.Document
        Returns the updated document proto object.
    """
    from collections import Counter

    def most_frequent(List):
        occurence_count = Counter(List)
        return occurence_count.most_common(1)[0][0]

    tempDict = {}
    beginning_balance_unique = []
    ending_balance_unique = []
    for i in document.entities:
        if "beginning_balance" in i.type:
            if i.type not in beginning_balance_unique:
                beginning_balance_unique.append(i.type)
        if "ending_balance" in i.type:
            if i.type not in ending_balance_unique:
                ending_balance_unique.append(i.type)
    beg_end_dict = {}
    for i in beginning_balance_unique:
        temp = []

        for j in range(0, len(document.entities)):
            if i == document.entities[j].type:
                temp.append(document.entities[j].mention_text.strip("$#"))
        beg_end_dict[i] = most_frequent(temp)
    for i in ending_balance_unique:
        temp = []
        for j in range(0, len(document.entities)):
            if i == document.entities[j].type:
                temp.append(document.entities[j].mention_text.strip("$#"))
        beg_end_dict[i] = most_frequent(temp)
    for i in document.entities:
        if i.type in beg_end_dict.keys():
            if i.mention_text.strip("$#") != beg_end_dict[
                i.type
            ] and i.mention_text.strip("$#") in list(beg_end_dict.values()):
                i.type = list(beg_end_dict.keys())[
                    list(beg_end_dict.values()).index(i.mention_text.strip("$#"))
                ]
            elif i.mention_text.strip("$#") != beg_end_dict[i.type]:
                document.entities.remove(i)
    return document


def Boundary_markers(
    jsonData: documentai.Document,
) -> documentai.Document:  # TODO : check jsonDict for dict or entity obj
    """
    Function to mark the boundary bonding boxes  for the required entities.

    Parameters
    ----------
    jsonData : documentai.Document
            The document proto data having the entities which needs to be change.

    Returns
    -------
    documentai.Document
        Returns the updated document proto object.
    """
    allEntities = jsonData.entities
    noOfEntitiesInJsonFile = len(allEntities)

    # Find entityIdSchema of Json
    entityIdSchema = {}
    for i in range(0, noOfEntitiesInJsonFile):
        try:
            if allEntities[i].id:
                entityIdSchema[i] = [int(allEntities[i].id)]
        except:
            temp_arr = []
            for j in allEntities[i].properties:
                temp_arr.append(int(j.id))
            entityIdSchema[i] = temp_arr

    # Single Level Entities file : jsonDict
    jsonDict = {
        "confidence": [],
        "id": [],
        "mention_text": [],
        "normalized_value": [],
        "page_anchor": [],
        "text_anchor": [],
        "type": [],
    }
    entitiesArray = []

    for i in range(0, noOfEntitiesInJsonFile):
        try:
            if allEntities[i].id:
                entitiesArray.append(allEntities[i])
        except:
            for j in allEntities[i].properties:
                entitiesArray.append(j)
    entitiesArray = sorted(entitiesArray, key=lambda x: x.id)
    for i in range(0, len(entitiesArray)):
        try:
            jsonDict["confidence"].append(entitiesArray[i].confidence)
        except:
            jsonDict["confidence"].append("")
        try:
            jsonDict["id"].append(entitiesArray[i].id)
        except:
            jsonDict["id"].append("")
        try:
            jsonDict["mention_text"].append(entitiesArray[i].mention_text)
        except:
            jsonDict["mention_text"].append("")
        try:
            jsonDict["normalized_value"].append(entitiesArray[i].normalized_value)
        except:
            jsonDict["normalized_value"].append("")
        try:
            jsonDict["page_anchor"].append(entitiesArray[i].page_anchor)
        except:
            jsonDict["page_anchor"].append("")
        try:
            jsonDict["text_anchor"].append(entitiesArray[i].text_anchor)
        except:
            jsonDict["text_anchor"].append("")
        try:
            jsonDict["type"].append(entitiesArray[i].type)
        except:
            jsonDict["type"].append("")

    # No startIndex handeling
    for i in range(0, len(jsonDict["type"])):
        try:
            if jsonDict["text_anchor"][i]["text_segments"][0]["start_index"]:
                pass
        except:
            try:
                jsonDict["text_anchor"][i]["text_segments"][0]["start_index"] = "0"
            except:
                pass
    accountNumbers = dict()
    for i in range(0, len(jsonDict["id"])):
        if jsonDict["type"][i] == "account_number":
            if (
                re.sub("\D", "", jsonDict["mention_text"][i].strip(".#:' "))
                not in accountNumbers
                and len(re.sub("\D", "", jsonDict["mention_text"][i].strip(".#:' ")))
                > 5
            ):
                accountNumbers[
                    re.sub("\D", "", jsonDict["mention_text"][i].strip(".#:' "))
                ] = ("account_" + str(len(accountNumbers)) + "_number")
    account_number_dict = {}
    import sys

    accountNumberDict = {}
    accountNumberPageDict = {}
    for i in accountNumbers.keys():
        temp_list = []
        temp_page_list = set()
        for j in range(len(jsonDict["mention_text"])):
            if re.sub("\D", "", jsonDict["mention_text"][j].strip(".#:' ")) == i:
                page = 0
                if jsonDict["page_anchor"][j]["page_refs"][0]["page"]:
                    page = int(jsonDict["page_anchor"][j]["page_refs"][0]["page"])
                temp_list.append(
                    (
                        int(
                            jsonDict["text_anchor"][j]["text_segments"][0][
                                "start_index"
                            ]
                        ),
                        int(
                            jsonDict["text_anchor"][j]["text_segments"][0]["end_index"]
                        ),
                        page,
                    )
                )
                temp_page_list.add(page)
        accountNumberPageDict[accountNumbers[i]] = temp_page_list
        accountNumberDict[accountNumbers[i]] = temp_list
    n = set(range(len(jsonData.pages)))
    for i in accountNumberPageDict:
        n = n & accountNumberPageDict[i]

    n = list(n)
    for i in accountNumberDict:
        accountNumberDict[i].sort(key=lambda x: x[2])
    accountNumbersToDelete = []
    for i in accountNumberDict:
        if i != "account_0_number":
            minStartIndex = sys.maxsize
            minEndIndex = sys.maxsize
            minPage = sys.maxsize
            tuppleToRemove = []
            if len(accountNumberDict[i]) > 1:
                for j in accountNumberDict[i]:
                    if (
                        j[2] in n
                        and j[2] < 3
                        and len(tuppleToRemove) < len(accountNumberDict[i]) - 1
                    ):
                        tuppleToRemove.append(j)
                    else:
                        minStartIndex = min(minStartIndex, j[0])
                        minEndIndex = min(minEndIndex, j[1])
                        minPage = min(minPage, j[2])
                        tuppleToRemove.append(j)
                # accountNumberDict[i]=[(minStartIndex,minEndIndex,minPage)]
                for k in tuppleToRemove:
                    accountNumberDict[i].remove(k)
                accountNumberDict[i] = [(minStartIndex, minEndIndex, minPage)]
            else:
                accountNumbersToDelete.append(i)
        else:
            minStartIndex = 0
            minEndIndex = 0
            minPage = sys.maxsize
            tuppleToRemove = []
            for j in accountNumberDict[i]:
                if j[2] == n[0]:
                    minStartIndex = max(minStartIndex, j[0])
                    minEndIndex = max(minEndIndex, j[1])
                    minPage = min(minPage, j[2])
                    tuppleToRemove.append(j)
            for k in tuppleToRemove:
                accountNumberDict[i].remove(k)
            accountNumberDict["account_0_number"] = [
                (minStartIndex, minEndIndex, minPage)
            ]
    for i in accountNumbersToDelete:
        del accountNumberDict[i]

    if len(accountNumbers) > 1:
        borderIndex = []
        for i in accountNumberDict:
            borderIndex.append((accountNumberDict[i][0][0], accountNumberDict[i][0][1]))

        regionSplitter = []
        for i in range(0, len(borderIndex)):
            regionSplitter.append(borderIndex[i][0])

        # regionSplitterDict = {0 : 'account_summary'}
        regionSplitterDict = {}
        for i in range(0, len(regionSplitter)):
            regionSplitterDict[int(regionSplitter[i])] = "account_" + str(i)
        regionSplitterDict[len(jsonData["text"])] = "last_index"

    else:
        tempVar = len(jsonData.text)
        regionSplitterDict = {tempVar: "account_0"}
        regionSplitterDict[len(jsonData.text) + 1] = "last_index"

    for i in range(0, len(jsonDict["id"])):
        if (
            jsonDict["type"][i] == "account_number"
            and len(re.sub("\D", "", jsonDict["mention_text"][i].strip(".#:' "))) > 5
        ):
            jsonDict["type"][i] = accountNumbers[
                re.sub("\D", "", jsonDict["mention_text"][i].strip(".#:' "))
            ]

    for i in range(0, len(jsonDict["id"])):
        try:
            si = jsonDict["text_anchor"][i]["text_segments"][0]["start_index"]
        except:
            continue

        if jsonDict["type"][i] == "starting_balance":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_beginning_balance"
                    )
                    break
        if jsonDict["type"][i] == "ending_balance":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_ending_balance"
                    )
                    break

        if jsonDict["type"][i] == "table_item/transaction_deposit_date":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_transaction"
                        + "/"
                        + "deposit_date"
                    )
                    break

        if jsonDict["type"][i] == "table_item/transaction_deposit_description":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_transaction"
                        + "/"
                        + "deposit_desc"
                    )
                    break

        if jsonDict["type"][i] == "table_item/transaction_deposit":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_transaction"
                        + "/"
                        + "deposit_amount"
                    )
                    break

        if jsonDict["type"][i] == "table_item/transaction_withdrawal_date":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_transaction"
                        + "/"
                        + "withdraw_date"
                    )
                    break

        if jsonDict["type"][i] == "table_item/transaction_withdrawal_description":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_transaction"
                        + "/"
                        + "withdraw_desc"
                    )
                    break

        if jsonDict["type"][i] == "table_item/transaction_withdrawal":
            for j in range(1, len(regionSplitterDict)):
                if int(si) < list(regionSplitterDict.keys())[j]:
                    jsonDict["type"][i] = (
                        regionSplitterDict[list(regionSplitterDict.keys())[j - 1]]
                        + "_transaction"
                        + "/"
                        + "withdraw_amount"
                    )
                    break

    newEntitiesArray = documentai.Document()
    for i in range(0, len(entitiesArray)):
        newEntitiesArray.entities.append(
            {
                "confidence": jsonDict["confidence"][i],
                "id": jsonDict["id"][i],
                "mention_text": jsonDict["mention_text"][i],
                "normalized_value": jsonDict["normalized_value"][i],
                "page_anchor": jsonDict["page_anchor"][i],
                "text_anchor": jsonDict["text_anchor"][i],
                "type": jsonDict["type"][i],
            }
        )

    newEntitiesArrayToIdDict = {}
    for i in newEntitiesArray.entities:
        newEntitiesArrayToIdDict[int(i.id)] = i

    allEntitiesNewArray = (" " * len(entityIdSchema)).split(" ")
    for i in entityIdSchema:
        if len(entityIdSchema[i]) == 1:
            allEntitiesNewArray[i] = newEntitiesArrayToIdDict[entityIdSchema[i][0]]
        else:
            tempA = []
            for j in range(0, len(entityIdSchema[i])):
                tempA.append(newEntitiesArrayToIdDict[entityIdSchema[i][j]])
            allEntitiesNewArray[i] = allEntities[i]
            allEntitiesNewArray[i].properties = tempA
    allEntitiesNewArray = [x for x in allEntitiesNewArray if x]
    for i in allEntitiesNewArray:
        if i == "":
            allEntitiesNewArray.remove(i)
        if i.id:
            if i.id == "":
                del i.id
        if i.normalized_value:
            if i.normalized_value == "":
                del i.normalized_value
        if i.confidence:
            if i.confidence == "":
                del i.confidence
        if i.page_anchor:
            if i.page_anchor == "":
                del i.page_anchor
        if i.mention_text:
            if i.mention_text == "":
                del i.mention_text
        if i.text_anchor:
            if i.text_anchor == "":
                del i.text_anchor
        if i.properties:
            for j in i.properties:
                if j.normalized_value:
                    if j.normalized_value == "":
                        del j.normalized_value
                if j.confidence:
                    if j.confidence == "":
                        del j.confidence
                if j.page_anchor:
                    if j.page_anchor == "":
                        del j.page_anchor
                if j.id:
                    if j.id == "":
                        del j.id
                if j.mention_text:
                    if j.mention_text == "":
                        del j.mention_text
                if j.text_anchor:
                    if j.text_anchor == "":
                        del j.text_anchor
    for i in allEntitiesNewArray:
        if i.type == "table_item":
            account_prefix = i.properties[0].type.split("/")[0]
            i.type = account_prefix

    newJsonData = jsonData
    newJsonData.entities = allEntitiesNewArray

    return newJsonData


def groupChecks(jsonData: documentai.Document, bankName: str) -> documentai.Document:
    """
    Function to check for the bank name if they falls in top 3 banks(Wells Fargo, Bank of America, Chase),
    if it found in the list entities will get sort.

    Parameters
    ----------
    jsonData : documentai.Document
            The document proto data having the entities which needs to be change.
    bankName :str
            bank name which are present in document OCR and need to be check if it falls in top 3 banks.

    Returns
    -------
    documentai.Document
        Returns the updated document proto object.
    """
    # dictionary storing format of the table for top 3 banks
    bankFormat = {
        "wellsfargo": ["check_number", "check_date", "check_amount"],
        "bankofamerica": ["check_date", "check_number", "check_amount"],
        "chase": ["check_number", "check_date", "check_amount"],
    }
    bankChecksColumn = bankFormat[bankName]
    allEntities = jsonData.entities
    noOfEntitiesInJsonFile = len(allEntities)

    # Find entityIdSchema of Json
    entityIdSchema = {}
    for i in range(0, noOfEntitiesInJsonFile):
        try:
            if allEntities[i].id:
                entityIdSchema[i] = [int(allEntities[i].id)]
        except:
            temp_arr = []
            for j in allEntities[i].properties:
                temp_arr.append(int(j.id))
            entityIdSchema[i] = temp_arr

    # Single Level Entities file : jsonDict
    jsonDict = documentai.Document.Entity()

    entitiesArray = []

    for i in range(0, noOfEntitiesInJsonFile):
        try:
            if allEntities[i].id:
                entitiesArray.append(allEntities[i])
        except:
            for j in allEntities[i].properties:
                entitiesArray.append(j)
    # Sorting the entities using y-coordinates to order them according to the rows
    entitiesArray = sorted(
        entitiesArray,
        key=lambda x: x.page_anchor.page_refs[0].bounding_poly.normalized_vertices[0].y,
    )
    newEntitiesArray = []
    for i in entitiesArray:
        print("-------------------")
        print("Parent : ", i.type, " : ", i.mention_text)
        # Sorting the properties of a single line item using x-coordinates to order them according to the table
        if len(i.properties) > 0:
            x2 = sorted(
                i.properties,
                key=lambda x: x.page_anchor.page_refs[0]
                .bounding_poly.normalized_vertices[0]
                .x,
            )
            j = 0
            while j < len(x2):
                k = 0
                newEntity = documentai.Document.Entity()  # Adding a new parentItem
                newEntity.confidence = i.confidence
                newEntity.mention_text = ""
                xValues = []
                yValues = []
                properties = []
                textSegments = []
                while k < len(bankChecksColumn) and j < len(x2):
                    if x2[j].type == bankChecksColumn[k]:
                        newEntity.mention_text = (
                            newEntity.mention_text + x2[j].mention_text
                        )
                        for m in (
                            x2[j]
                            .page_anchor.page_refs[0]
                            .bounding_poly.normalized_vertices
                        ):
                            xValues.append(m.x)
                            yValues.append(m.y)
                        print(x2[j].type, ":", x2[j].mention_text)
                        properties.append(x2[j])
                        textSegments.append(x2[j].text_anchor.text_segments[0])
                    else:
                        k += 1
                        continue
                    j += 1
                    if j < len(x2) and x2[j].type == bankChecksColumn[k]:
                        newEntity.mention_text = (
                            newEntity.mention_text + x2[j].mention_text
                        )
                        for m in (
                            x2[j]
                            .page_anchor.page_refs[0]
                            .bounding_poly.normalized_vertices
                        ):
                            xValues.append(m.x)
                            yValues.append(m.y)
                        print(x2[j].type, ":", x2[j].mention_text)
                        properties.append(x2[j])
                        textSegments.append(x2[j].text_anchor.text_segments[0])
                        j += 1

                    k += 1
                    # j+=1
                # if len(xValues)>0:
                xParentMax = max(xValues)
                xParentMin = min(xValues)
                yParentMax = max(yValues)
                yParentMin = min(yValues)
                if i.page_anchor.page_refs[0].page:
                    newEntity.page_anchor = {
                        "page_refs": [
                            {
                                "bounding_poly": {
                                    "normalized_vertices": [
                                        {"x": xParentMin, "y": yParentMin},
                                        {"x": xParentMax, "y": yParentMin},
                                        {"x": xParentMax, "y": yParentMax},
                                        {"x": xParentMin, "y": yParentMax},
                                    ]
                                },
                                "page": i.page_anchor.page_refs[0].page,
                            }
                        ]
                    }
                else:
                    newEntity.page_anchor = {
                        "page_refs": [
                            {
                                "bounding_poly": {
                                    "normalized_vertices": [
                                        {"x": xParentMin, "y": yParentMin},
                                        {"x": xParentMax, "y": yParentMin},
                                        {"x": xParentMax, "y": yParentMax},
                                        {"x": xParentMin, "y": yParentMax},
                                    ]
                                }
                            }
                        ]
                    }
                newEntity.properties = properties
                newEntity.text_anchor = {"text_segments": textSegments}
                newEntity.type = i.type
                print("*****************")
                print(newEntity)
                newEntitiesArray.append(newEntity)
                print("*****************")
    entitiesArray = newEntitiesArray
    for e in entitiesArray:
        if e.properties:
            if len(e.properties) > 0:
                for j in e.properties:
                    if j.id:
                        del j.id
    for e in entitiesArray:
        if e.properties:
            if len(e.properties) > 0:
                for j in e.properties:
                    if j.type:
                        print(e.type)
                        j.type = e.type + "/" + j.type
    jsonData.entities = entitiesArray
    return jsonData


def jsonToResultDf(jsonData: documentai.Document, jsonFileName: str) -> pd.DataFrame:
    """
    Convert the document proto into a csv report with file name with the entity,id, confidence and text

    Parameters
    ----------
    jsonData : documentai.Document
            The document proto with the updated data.
    jsonFileName :str
            Document file name.

    Returns
    -------
    pd.DataFrame
        Returns the csv report with the required columns.
    """
    import pandas as pd

    allEntities = jsonData.entities
    noOfEntitiesInJsonFile = len(allEntities)

    def get_jsonDict_prop(allEntities, i):
        jsonDict_prop = documentai.Document.Entity()
        try:
            jsonDict_prop.id = allEntities[i].id

        except:
            jsonDict_prop.id = ""
        try:
            jsonDict_prop.type = allEntities[i].type
        except:
            jsonDict_prop.type = ""
        try:
            jsonDict_prop.confidence = allEntities[i].confidence
        except:
            jsonDict_prop.confidence = ""
        try:
            jsonDict_prop.mention_text = allEntities[i].mention_text
        except:
            jsonDict_prop.mention_text = ""
        return jsonDict_prop

    # Single Level Entities file : jsonDict
    jsonDict = {
        "File Name": [],
        "ID": [],
        "Entity Type": [],
        "Confidence": [],
        "Text": [],
    }

    entitiesArray = []

    for i in range(0, noOfEntitiesInJsonFile):
        try:
            if allEntities[i].id:
                entitiesArray.append(allEntities[i])
        except:
            try:
                if allEntities[i].properties:
                    json_dict_temp = get_jsonDict_prop(allEntities, i)

                    entitiesArray.append(json_dict_temp)
                for j in allEntities[i].properties:
                    entitiesArray.append(j)
            except:
                entitiesArray.append(
                    {
                        "type": allEntities[i].type,
                        "mention_text": allEntities[i].mention_text,
                    }
                )

    for i in range(0, len(entitiesArray)):
        jsonDict["File Name"].append(jsonFileName)
        try:
            jsonDict["ID"].append(entitiesArray[i].id)
        except:
            jsonDict["ID"].append("")
        try:
            jsonDict["Entity Type"].append(entitiesArray[i].type)
        except:
            jsonDict["Entity Type"].append("")
        try:
            jsonDict["Confidence"].append(entitiesArray[i].confidence)
        except:
            jsonDict["Confidence"].append(None)
        try:
            jsonDict["Text"].append(entitiesArray[i].mention_text)
        except:
            jsonDict["Text"].append("")

    df = pd.DataFrame(jsonDict)
    return df


# Asynchronous processing of files using Bank statement parser provided
logger(
    "logging.txt",
    "----------------------------------------LOGGING STARTED----------------------------------------",
)
try:
    temp_intital_pdfpath, temp_pdffolder, temp_pdfbucket = get_files_not_parsed(
        gcs_input_dir, gcs_output_dir
    )
    logger("logging.txt", "Batch processing the documents.......")
    res = batch_process_documents_sample(
        project_id, "us", processor_id, temp_intital_pdfpath, gcs_output_dir
    )
    logger("logging.txt", "Batch processing of documents done")
    delete_folder(temp_pdfbucket, temp_pdffolder)
except Exception as e:
    logger(
        "logging.txt",
        "Files are not processed because of error message--> {}".format(e),
    )
    delete_folder(temp_pdfbucket, temp_pdffolder)
    pass
# Getting bucket names and prefixes for further use
Input_bucket_name = gcs_input_dir.split("/")[2]
prefix_input_files = "/".join(gcs_input_dir.split("/")[3:])
Output_bucket_name = gcs_output_dir.split("/")[2]
prefix_output_files = "/".join(gcs_output_dir.split("/")[3:])
New_output_json_bucket = gcs_new_output_json_path.split("/")[2]
New_prefix_output_jsons = "/".join(gcs_new_output_json_path.split("/")[3:])


df3 = pd.DataFrame()
try:
    temp_json_path, temp_json_folder, temp_json_bucket = get_files_not_postparsed(
        gcs_output_dir, gcs_new_output_json_path
    )
    json_f, file_dict = file_names(temp_json_path)
    json_files = list(file_dict.values())
    logger("logging.txt", "list of json files prepared in the output folder")
    # delete_folder(temp_json_bucket,temp_json_folder)
    try:
        for i in range(len(json_f)):
            try:
                temp_json_gcs_path = temp_json_path + "/" + json_f[i]
                temp_bucket_name = temp_json_gcs_path.split("/")[2]
                prefix_temp_file_name = "/".join(temp_json_gcs_path.split("/")[3:])
                document = documentai_json_proto_downloader(
                    temp_bucket_name, prefix_temp_file_name
                )
                logger(
                    "logging.txt",
                    "loaded json file--||{}||-- from the output GCS folder".format(
                        json_f[i]
                    ),
                )

                try:
                    maxId = maxIdFinder(document)
                    logger("logging.txt", "Getting the Max id in the json data")
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt get the Max id because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    document = Boundary_markers(document)
                    logger("logging.txt", "account number, transactions are renamed")
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt rename account related entities because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    document = fixAccountBalance(document)
                    logger(
                        "logging.txt",
                        "Account balance (starting and ending balance ) is fixed",
                    )
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt fix starting and ending balance because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    document = accounttype_change(document)
                    logger("logging.txt", "account_type is changed to account_name")
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt change account_type is changed to account_name because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    document = borrowerNameFix(document)
                    logger(
                        "logging.txt",
                        "entities- client_name is split into first_name, last_name and middle_name if available",
                    )
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt split the client_name because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    for key in dict_ent_rename.keys():
                        document = ent_rename(document, key, dict_ent_rename[key])
                    logger(
                        "logging.txt",
                        "Some entities are renamed as per gatless names given",
                    )
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt rename some entities because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    document = address_function(document)
                    logger(
                        "logging.txt",
                        "Client address is split into street name,zip code and city",
                    )
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt split the Client address because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    document = add_total_pages(document)  # TODO : here will start
                    logger("logging.txt", "Adding total pages entity into json")
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt radd total pages entity into pages because of error message--> {}".format(
                            e
                        ),
                    )
                    pass
                try:
                    if checksFlag == True:
                        Financial_Institution = ""
                        for e in document.entities:
                            if e.type == "Financial_Institution":
                                Financial_Institution = e.mention_text
                                break
                        file_path = gcs_input_dir + "/" + json_f[i][:-7] + ".pdf"
                        file_bucket_name = file_path.split("/")[2]
                        prefex_file_path = "/".join(file_path.split("/")[3:])
                        Financial_Institution = "".join(
                            Financial_Institution.strip().split()
                        ).lower()
                        if Financial_Institution in [
                            "wellsfargo",
                            "chase",
                            "bankofamerica",
                        ]:
                            storage_client = storage.Client()
                            bucket = storage_client.bucket(file_bucket_name)
                            blob = bucket.blob(prefex_file_path)
                            pdf_bytes = blob.download_as_string()
                            check_json = process_document_sample(
                                project_id,
                                "us",
                                processor_id_checks,
                                file_path,
                                pdf_bytes,
                                processor_version_checks,
                            )
                            check_json_output = groupChecks(
                                check_json.document, Financial_Institution
                            )
                            combined_entities = (
                                document.entities + check_json_output.entities
                            )
                            document.entities = combined_entities
                            # fs.pipe(gcs_new_output_checks_json_path+json_f[i].split('/')[-1],bytes(json.dumps(check_json_output,ensure_ascii=False),'utf-8'),content_type='application/json')
                            logger("logging.txt", "Checks Details are added ")
                        else:
                            logger(
                                "logging.txt",
                                "Financial Institution is out of Scope--> {}".format(
                                    Financial_Institution
                                ),
                            )
                            pass
                    else:
                        pass
                except:
                    logger(
                        "logging.txt",
                        "Couldnt find Checks Detail due to error message--> {}".format(
                            e
                        ),
                    )
                    continue
                try:
                    final_files_list, final_files_dict = file_names(
                        gcs_new_output_json_path
                    )
                    store_document_as_json(
                        documentai.Document.to_json(document),
                        New_output_json_bucket,
                        New_prefix_output_jsons + final_files_list[i],
                    )
                    logger(
                        "logging.txt",
                        "post processed json files are moved to gcs postprocessed folder provided",
                    )
                except Exception as e:
                    logger(
                        "logging.txt",
                        "Couldnt upload the post processed json file because of error message--> {}".format(
                            e
                        ),
                    )
                    continue
            except Exception as e:
                print(e)
                continue
    except Exception as e:
        logger(
            "logging.txt",
            "Couldnt load json file because of error message--> {}".format(e),
        )
        pass
except Exception as e:
    logger(
        "logging.txt",
        "Couldnt get list of json files because of error message--> {}".format(e),
    )
    delete_folder(temp_json_bucket, temp_json_folder)
    pass

# changing meta data of post processed files

try:
    !gsutil -m setmeta -h "content-Type:application/json" {gcs_new_output_json_path}*
    logger(
        "logging.txt",
        "meta data for post processed json files changed to application/json ",
    )
except Exception as e:
    logger(
        "logging.txt",
        "Couldnt update meta data for post processed json files changed to application/json  because of error message--> {}".format(
            e
        ),
    )

delete_folder(temp_json_bucket, temp_json_folder)
# creating data frame and saving data into csv
try:
    logger("logging.txt", "creating dataframe to create consolidated csv")
    final_files_list, final_files_dict = file_names(gcs_new_output_json_path)
    for i in range(len(final_files_list)):
        json_2 = documentai_json_proto_downloader(
            New_output_json_bucket, New_prefix_output_jsons + final_files_list[i]
        )
        df1 = jsonToResultDf(json_2, final_files_list[i])
        df2 = df1
        df3 = pd.concat([df3, df2], ignore_index=True)
except Exception as e:
    logger(
        "logging.txt",
        "failed to create dataframe to create consolidated csv because of error--> {}".format(
            e
        ),
    )
    print(e)

try:
    df3.to_csv("Consolidated.csv")
    logger("logging.txt", "Consolidated CSV file created")
except Exception as e:
    logger(
        "logging.txt",
        "Couldnt create consolidated CSV file  because of error message--> {}".format(
            e
        ),
    )


logger(
    "logging.txt",
    "----------------------------------------END OF POST PROCESSING----------------------------------------",
)