# DocAI Processor Migration

* Author: docai-incubator@google.com

## Disclaimer

This tool is not supported by the Google engineering team or product team. It is provided and supported on a best-effort basis by the DocAI Incubator Team. No guarantees of performance are implied.

## Purpose and Description

The python script aims to automate the process of migrating a Document AI processor from one project to another by handling tasks such as importing the data, creating the schema, and automatically training the processor using the dataset from the source project.

## Pre-Requisites

* Python : Jupyter notebook (Vertex AI) 
* Permissions to give access to the service account in both source and destination projects.


## Installation Procedure

The script consists of Python code. It can be loaded and run via: 
* Upload the IPYNB file or copy the code to the Vertex Notebook and follow the step by step procedure.

  Drive Link to IPYNB File : [DocAI_Processor_Migration.ipynb](https://drive.google.com/file/d/10GbjjQl56n79D9kj5GYbED3QPxO_ELB7/view?resourcekey=0-3qE82iFBnbX99AS8r_B3fg) 


## Step by Step Procedure

### 1. Identifying the Service Account associated with VertexAI Notebook

In [None]:
!gcloud config list account # This gives your account and active configuration details.

### 2. Granting Required Permissions to the Service Account

<img src="./Images/image_1.png" width=800 height=400></img>

In the Google Cloud project that is the intended destination for migration, add the service account that was acquired in the previous step and assign the below two roles.
* Document AI Administrator 
* Storage Admin 


For the migration to work, the service account used for running this notebook needs to have roles in both source and destination projects to create the dataset bucket (if it does not exist) and read/write all objects.

* Document AI Administrator
* Storage Admin 


### 3. Installing the dependencies


In [None]:
!pip install google-api-core google-cloud-documentai google-cloud-storage tqdm ipywidgets -q

 This is a command used in Python to install packages.    
 google-cloud-documentai related to Google Cloud's Document AI service, which is a tool for extracting structured information from documents.     
 ipywidgets is a library for creating interactive widgets in Jupyter notebooks.

### 4. Import the modules

In [2]:
# importing necessary modules

from google.api_core.client_options import ClientOptions
import google.auth.transport.requests
from google import auth
from google.cloud import documentai
from google.cloud import storage

import requests
import re
import time
import json
from typing import Dict, List
from tqdm.auto import tqdm

from ipywidgets import Output
from IPython.display import clear_output

### 5. Setup the required inputs

* **source_processor_name** - This involves source project_id and source processor_id.

     Ex: projects/**Project_number**/locations/us/processors/**Processor_ID**
* **destination_project_number** - This contains the project number to which the processor needs to be moved. 
* **destination_processor_location** - This indicates the processor destination location. 
* **destination_processor_dataset_gcs_uri** - The GCS bucket path which is used for the destination processor dataset, automatically created if it does not exist.
* **source_exported_dataset_gcs_uri** - This is the GCS bucket path where the dataset from the source processor has been exported. Ensure that you export your dataset via the user interface and then input its path here.
* **destination_exported_dataset_gcs_uri** - This is the GCS bucket path where the dataset from the source processor will be copied over to the destination project. Simply provide an empty bucket path here.
  
  Allowed path example: gs://bucket
  
  Not Allowed path example: gs://bucket/sub_folder


In [None]:
# Configure the Inputs
kms_key_name = ""
source_processor_name = (
    "projects/<your-project-number>/locations/us/processors/<your-processor-id>"
)
destination_project_number = "<your-project-number>"
destination_processor_location = "us"
destination_processor_dataset_gcs_uri = "gs://<bucket-name-1>"
source_exported_dataset_gcs_uri = "gs://<bucket-name-2>"
destination_exported_dataset_gcs_uri = "gs://<bucket-name-3>"
gcs_documents_train = []
gcs_documents_test = []

### 6. Run the Required Functions

In [None]:
def create_destination_dataset_bucket(
    project_id: str, destination_exported_dataset_gcs_uri: str
) -> None:
    """
    This function will create destination dataset bucket.

    Args:
      project_id (str): The number representing the Google Cloud project.
      destination_exported_dataset_gcs_uri (str): This is the GCS bucket path where the dataset from the source processor will be copied over to the destination project.
                                                  Simply provide an empty bucket path here.

    Returns:
            None
    """

    client = storage.Client(project=project_id)
    bucket = client.bucket(destination_exported_dataset_gcs_uri.split("//")[1])
    if not bucket.exists():
        tqdm.write(f"Creating bucket {bucket.name}")
        client.create_bucket(bucket)


def move_exported_dataset(
    source_exported_dataset_gcs_uri: str, destination_exported_dataset_gcs_uri: str
) -> None:
    """
     This function will copy files from source exported dataset bucket into destination exported dataset bucket and splitting train and test documents.

     Args:
         source_exported_dataset_gcs_uri (str) : This is the bucket path where the dataset from the source processor has been exported.
         destination_exported_dataset_gcs_uri (str): This is the GCS bucket path where the dataset from the source processor will be copied over to the destination project.
                                                   Simply provide an empty bucket path here.

    Returns:
         None
    """

    client = storage.Client()
    bucket_src = client.get_bucket(source_exported_dataset_gcs_uri.split("//")[1])
    blobs_src = client.list_blobs(source_exported_dataset_gcs_uri.split("//")[1])
    bucket_dest = storage.Bucket(
        client, destination_exported_dataset_gcs_uri.split("//")[1]
    )

    from datetime import datetime

    now = datetime.now()
    dt_string = now.strftime("%Y-%m-%d-%H-%M-%S")
    print("date and time =", dt_string)
    for blob_src in blobs_src:
        blob_new = bucket_src.copy_blob(
            blob_src, bucket_dest, new_name=dt_string + "/" + blob_src.name
        )
        print(
            f"Copied [{source_exported_dataset_gcs_uri}/{blob_src.name}] into: [{destination_exported_dataset_gcs_uri}/{dt_string}]"
        )
        gcs_document = {
            "gcsUri": destination_exported_dataset_gcs_uri
            + "/"
            + dt_string
            + "/"
            + blob_src.name,
            "mimeType": "application/json",
        }
        if blob_src.name.split("/")[0] == "train":
            gcs_documents_train.append(gcs_document)
            gcs_documents_train.append(gcs_document)
        if blob_src.name.split("/")[0] == "test":
            gcs_documents_test.append(gcs_document)
            gcs_documents_test.append(gcs_document)

    print("gcs_documents_train:")
    print(gcs_documents_train)
    print("\n")
    print("gcs_documents_test:")
    print(gcs_documents_test)


def import_document_by_type(
    destination_processor_name: str, gcs_documents: List[str], dataset_type: str
) -> Dict[str, str]:
    """
    This function will import document to its destination processor by its document type either test or train.

    Args:
        destination_processor_name (str) : Name of the destination processor.
        gcs_documents (list) : Takes the list of files from splitted train or splitted test documents from destination exported dataset bucket.
        dataset_type (str) : Takes the values 'DATASET_SPLIT_TEST' or 'DATASET_SPLIT_TRAIN'.

    Returns:
         Dictionary representing JSON data using their names.
    """

    tqdm.write("Import document")
    url = get_base_url(destination_processor_name) + "/dataset:importDocuments"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    import_documents_request = {
        "batch_documents_import_configs": {
            "dataset_split": dataset_type,
            "batch_input_config": {"gcs_documents": {"documents": gcs_documents}},
        }
    }
    import_document_response = requests.post(
        url, headers=headers, json=import_documents_request
    )
    import_document_response.raise_for_status()
    import_document_result = get_operation_result(
        import_document_response.json()["name"]
    )
    return import_document_result


def get_access_token() -> str:
    """
    This function is used as an authentication mechanism to obtain current user / service account credentials.

    Returns:
         A string representing the access token.
    """

    credentials, _ = auth.default()
    credentials.refresh(google.auth.transport.requests.Request())
    return credentials.token


def get_base_url(name: str) -> str:
    """
    The function uses a regular expression to extract a specific part of the input name.

    Args:
       name (str) : This is a string containing some kind of identifier or path.

    Returns:
         A formatted URL string using the extracted location and name.
    """

    location = re.search(r"projects/[^/]+/locations/([^/]+)/.*", name).group(1)
    return f"https://{location}-documentai.googleapis.com/v1beta3/{name}"


def get_operation_result(
    operation_name: str, message: str = "Waiting for operation to finish."
) -> Dict[str, str]:
    """
    This function retrieves the result of a long-running operation.
    It interacts with an API using HTTP requests and uses the tqdm library for progress reporting

    Args:
        operation_name (str): This is a string representing the name or identifier of a long-running operation.
        message (str with default value): This is a string that provides a message to be displayed while waiting for the operation to finish.
                                          It has a default value of "Waiting for operation to finish."

    Returns:
        Dictionary representing JSON data.
    """

    tqdm.write(message, end="")
    url = get_base_url(operation_name)
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    get_operation_response = requests.get(url, headers=headers)
    get_operation_response.raise_for_status()
    if (
        not "done" in get_operation_response.json()
        or not get_operation_response.json()["done"]
    ):
        time.sleep(1)
        return get_operation_result(operation_name, message=".")
    tqdm.write("")
    return get_operation_response.json()


def get_processor_details(processor_name: str) -> Dict[str, str]:
    """
    This function is used to retrieve the processor details using processor name.

    Args:
       processor_name (str) : This is the processor name for which you want to retrieve the details of a processor.

    Returns:
       Dictionary representing JSON data of processor.
    """

    tqdm.write("Getting processor details")
    url = get_base_url(processor_name)
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    get_processor_response = requests.get(url, headers=headers)
    get_processor_response.raise_for_status()
    return get_processor_response.json()


def get_processor_version_details(processor_name: str, version_name: str) -> str:
    """
    This function is used to get the processor version details.

    Args:
       processor_name (str) : This is the name of the processor for which you want to retrieve details.
       version_name (str) : This is the name of the version for which you want to retrieve details.

    Returns:
         A String containing the deployed_version displayName.
    """

    tqdm.write("Getting processor version details")
    url = get_base_url(processor_name) + "/processorVersions"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    get_processor_version_response = requests.get(url, headers=headers)
    get_processor_version_response.raise_for_status()
    deployed_version = ""
    for data in get_processor_version_response.json()["processorVersions"]:
        if data["name"] == version_name and data["state"] == "DEPLOYED":
            deployed_version = data["displayName"]
            print(deployed_version)
            break
    return deployed_version


def get_processor_dataset_schema(processor_name: str) -> Dict[str, str]:
    """
    This function is used to get the processor dataset schema.

    Args:
       processor_name (str) : This is the name of the processor for which you want to retrieve dataset schema.

    Returns:
        Dictionary representing JSON data of processor schema.
    """

    tqdm.write("Getting processor dataset schema")
    url = get_base_url(processor_name) + "/dataset/datasetSchema"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    get_schema_response = requests.get(url, headers=headers)
    get_schema_response.raise_for_status()
    return get_schema_response.json()


def create_processor(
    project_id: str,
    location: str,
    processor_details: Dict[str, str],
    kms_key_name: str = "",
) -> str:
    """
    This function is used to create a processor in the destination project.

    Args:
       project_id (str): This is a string representing the ID of the project.
       location (str): This is a string representing the location of the project.
       processor_details (dictionary): This is a dictionary containing details about the processor being created.
       kms_key_name (str): This is a string representing the Key Management Service (KMS) key name.
                           It has a default value of an empty string.

    Returns:
          A string representing the name of the created processor.
    """

    tqdm.write("Create processor")
    url = f"https://{location}-documentai.googleapis.com/uiv1beta3/projects/{project_id}/locations/{location}/processors"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    create_processor_request = {
        "type": processor_details["type"],
        "displayName": processor_details["displayName"] + "_v2",
    }
    # enable CMEK if kms_key_name not empty
    if kms_key_name:
        create_processor_request["kms_key_name"] = kms_key_name
    create_processor_response = requests.post(
        url, headers=headers, json=create_processor_request
    )
    create_processor_response.raise_for_status()
    return create_processor_response.json()["name"]


def add_processor_dataset(
    processor_name: str, dataset_gcs_uri: str, project_id: str
) -> Dict[str, str]:
    """
    This function is used to add processor dataset into destination project.

    Args:
        processor_name (str): This is a string representing the name or identifier of the processor.
        dataset_gcs_uri (str): This is a string representing the URI of the dataset in Google Cloud Storage.
        project_id (str): This is a string representing the ID of the Document AI project.

    Returns:
        Return value would likely be a JSON object containing information about the operation status or result.
    """

    tqdm.write("Add processor dataset")
    # first check if bucket of dataset_gcs_uri exists
    create_destination_dataset_bucket(project_id, dataset_gcs_uri)
    url = get_base_url(processor_name) + "/dataset"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    # dataset_string =         {'gcsManagedConfig': {'gcsPrefix': {'gcsUriPrefix': 'gs://bachir_test'}}
    update_dataset_request = {
        "gcsManagedConfig": {"gcsPrefix": {"gcsUriPrefix": dataset_gcs_uri}},
        "spannerIndexingConfig": {},
    }
    add_dataset_response = requests.patch(
        url, headers=headers, json=update_dataset_request
    )
    add_dataset_response.raise_for_status()
    add_dataset_result = get_operation_result(add_dataset_response.json()["name"])
    return add_dataset_result


def update_processor_dataset_schema(
    processor_name: str, schema: Dict[str, str]
) -> Dict[str, str]:
    """
    This function is responsible for updating the processor dataset schema in Document AI Project.

    Args:
       processor_name (str) : This is a string representing the name or identifier of the processor.
       schema (dictionary) : This is a dictionary containing the updated schema for the dataset.

    Returns:
       Dictionary representing JSON data likely to have information about the status of the schema update.
    """

    tqdm.write("Updating processor dataset schema")
    url = get_base_url(processor_name) + "/dataset/datasetSchema"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    update_schema_response = requests.patch(url, headers=headers, json=schema)
    update_schema_response.raise_for_status()
    return update_schema_response.json()


def get_dataset_split_stats(processor_name: str) -> Dict[str, str]:
    """
    This function retrieves statistics about dataset splits associated with a processor in a Document AI project.

    Args:
       processor_name (str) : This is a string representing the name or identifier of the processor.

    Returns:
       Dictionary representing JSON data contains information about the dataset split statistics.
    """

    tqdm.write("Getting dataset split statistics")
    url = get_base_url(processor_name) + "/dataset:getAllDatasetSplitStats"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    get_dataset_split_stats_response = requests.get(url, headers=headers)
    get_dataset_split_stats_response.raise_for_status()
    return get_dataset_split_stats_response.json()


def list_processor_dataset_documents(
    processor_name: str,
    page_size: int = 50,
    next_page_token: str = None,
    dataset_split: str = None,
) -> Dict[str, str]:
    """
    This function will list the processor dataset documents.

    Args:
        processor_name (str) : This is a string representing the name or identifier of the processor.
        page_size (int) : This parameter is optional and represents the number of documents to retrieve per page. If not provided, it defaults to 50.
        next_page_token (str) : This parameter is optional and is used for pagination. It represents a token that indicates which page of results to retrieve next.
        dataset_split (str) : This parameter is optional and represents a specific split of the dataset. If provided, it filters the documents based on this split type.

    Returns:
        JSON content of the response which is about the listed documents.
    """

    tqdm.write("List documents in processor dataset")
    document_metadata = []
    url = get_base_url(processor_name) + "/dataset:listDocuments"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    list_documents_request = {}
    if next_page_token:
        list_documents_request["page_size"] = page_size
        list_documents_request["page_token"] = next_page_token
    else:
        list_documents_request["page_size"] = page_size
    if dataset_split:
        list_documents_request["filter"] = f"SplitType={dataset_split}"
    list_documents_response = requests.post(
        url, headers=headers, json=list_documents_request
    )
    list_documents_response.raise_for_status()
    return list_documents_response.json()


def get_document(processor_name: str, document_metadata: Dict[str, str]) -> str:
    """
    This function is used to extract a specific document from the corresponding processor name.

    Args:
       processor_name (str): This is a string representing the name or identifier of the processor.
       document_metadata (dictionary): This is a dictionary containing metadata about the document to retrieve.

    Returns:
       A string representing document ID.
    """

    tqdm.write("Get document")
    url = get_base_url(processor_name) + "/dataset:getDocument"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    params = {
        "documentId.gcsManagedDocId.gcsUri": document_metadata["documentId"][
            "gcsManagedDocId"
        ]["gcsUri"]
    }
    get_document_response = requests.get(url, headers=headers, params=params)
    get_document_response.raise_for_status()
    return get_document_response.json()["document"]


def upload_document(
    destination_dataset_gcs_uri: str, display_name: str, document: Dict[str, str]
) -> str:
    """
    This function is used to upload document into a temporary GCS bucket.

    Args:
       destination_dataset_gcs_uri (str) : This is the GCS bucket path where the document will be copied over to the destination project dataset.
       display_name (str) : This is a string which contains display name of the document.
       document (dictionary) : This is a dictionary representing the content of the document.

    Returns:
       A string representing the GCS URI.
    """

    tqdm.write(f"Upload document to temporary GCS import location")
    storage_client = storage.Client()
    gcs_uri = destination_dataset_gcs_uri.strip("/") + "/import/" + display_name
    blob = storage.Blob.from_string(gcs_uri, storage_client)
    blob.upload_from_string(json.dumps(document), content_type="application/json")
    return gcs_uri


def remove_imported_document(gcs_uri: str) -> None:
    """
    This function is used to remove the imported documents from temporary bucket.

    Args:
       gcs_uri (str) : This is the bucket from which documents needs to be removed.
    """

    tqdm.write("Remove document from temporary GCS import location")
    storage_client = storage.Client()
    blob = storage.Blob.from_string(gcs_uri, storage_client)
    blob.delete()


def migrate_documents(
    source_processor_name: str,
    destination_processor_name: str,
    destination_dataset_gcs_uri: str,
) -> None:
    """
    This function is used to migrate documents from source processor to destination processor.

    Args:
        source_processor_name (str) : This is a String containing the source processor name of Document AI project.
        destination_processor_name (str) : This is a String containing the destination processor name of Document AI project.
        destination_dataset_gcs_uri (str) : The GCS bucket path which is used for the destination processor dataset, automatically created if it does not exist.

    Raise:
        ValueError : "List document response is missing documentMetadata"
    """

    get_dataset_split_stats_response = get_dataset_split_stats(source_processor_name)
    total_documents = sum(
        dataset_split_stat.get("datasetStats", {}).get("documentCount", 0)
        for dataset_split_stat in get_dataset_split_stats_response["splitStats"]
    )
    print(total_documents)
    progress_bar = tqdm(
        total=total_documents, desc="Migrating documents", unit="document(s)"
    )
    print(f"Migrating {total_documents} documents")

    counter = 0
    s = set()
    for dataset_split in [
        "DATASET_SPLIT_TEST",
        "DATASET_SPLIT_TRAIN",
        "DATASET_SPLIT_UNASSIGNED",
    ]:
        total_documents = sum(
            dataset_split_stat.get("datasetStats", {}).get("documentCount", 0)
            for dataset_split_stat in get_dataset_split_stats_response["splitStats"]
            if dataset_split_stat.get("type", "") == dataset_split
        )
        print(
            f" Migrating {total_documents} documents of dataset split type {dataset_split}"
        )
        next_page_token = None
        while True:
            out = Output()
            display(out)
            with out:
                list_documents_response = list_processor_dataset_documents(
                    source_processor_name,
                    next_page_token=next_page_token,
                    dataset_split=dataset_split,
                )
                clear_output()
            if not list_documents_response:
                break
            if "documentMetadata" in list_documents_response:
                document_metadata_list = list_documents_response["documentMetadata"]
            else:
                raise ValueError("List document response is missing documentMetadata")
            print(f"  Migrating batch of {len(document_metadata_list)} documents")
            out = Output()
            display(out)
            with out:
                gcs_documents = []
                for document_metadata in document_metadata_list:
                    document = get_document(source_processor_name, document_metadata)

                    if document_metadata["displayName"] not in s:
                        gcs_uri = upload_document(
                            destination_dataset_gcs_uri,
                            document_metadata["displayName"],
                            document,
                        )
                        gcs_document = {
                            "gcsUri": gcs_uri,
                            "mimeType": "application/json",
                        }
                        gcs_documents.append(gcs_document)
                        s.add(document_metadata["displayName"])
                    else:
                        print(
                            "removed document as it is already present in the new processor",
                            dataset_split,
                        )
                    counter += 1
                    clear_output()

                for gcs_document in gcs_documents:
                    try:
                        remove_imported_document(gcs_document["gcsUri"])
                        clear_output()
                    except:
                        print("file removal error")
                progress_bar.update(len(document_metadata_list))
                try:
                    next_page_token = list_documents_response["nextPageToken"]
                except KeyError:
                    break
                except:
                    break
            print("dataset_split ", dataset_split)
            print("len(set)= ", len(s))
        print("set = ", s)


def train_processor(
    destination_processor_name: str, version_display_name: str
) -> Dict[str, str]:
    """
    This function is used to train the destination processor for the required version.

    Args:
       destination_processor_name (str) : This is the name of the destination processor for which you want to train the processor.
       version_display_name (str) : This is the name of the version for which you want to train the processsor.

    Returns:
        Dictionary of JSON data.
    """

    tqdm.write("Training Processor")
    url = get_base_url(destination_processor_name) + "/processorVersions:train"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    train_processor_request = {
        "processorVersion": {"displayName": version_display_name}
    }
    train_processor_response = requests.post(
        url, headers=headers, json=train_processor_request
    )
    train_processor_response.raise_for_status()
    return train_processor_response


def deploy_processor(trained_processor_version: str) -> str:
    """
    This function is used to deploy the processor after its usage inorder to avoid Quota Issues.

    Args:
       trained_processor_version (str) : This is a string having trained processor version.

    Returns:
       A String having the name of deployed processor.
    """

    tqdm.write("Deploying Processor")
    url = get_base_url(trained_processor_version) + ":deploy"
    headers = {"Authorization": f"Bearer {get_access_token()}"}
    deploy_processor_response = requests.post(url, headers=headers)
    deploy_processor_response.raise_for_status()
    deploy_processor_result = get_operation_result(
        deploy_processor_response.json()["name"]
    )
    return deploy_processor_result


def migrate_processor(
    source_processor_name: str,
    destination_project_id: str,
    destination_processor_location: str,
    destination_dataset_gcs_uri: str,
    kms_key_name: str,
) -> None:
    """
    This is the main function which we need to run for migration of processor from one project to another.

    Args:
       source_processor_name (str) : This is a String containing the source processor name of Document AI project.
       destination_project_id (str) : This is a String containing the destination project ID of the destination project.
       destination_processor_location (str) : This is a String containing the destination project processor location.
       destination_dataset_gcs_uri (str) : The GCS bucket path which is used for the destination processor dataset, automatically created if it does not exist.
       kms_key_name (str) : This is a string representing the Key Management Service (KMS) key name.
    """

    processor_details = get_processor_details(source_processor_name)
    tqdm.write(
        f"Migrating processor {processor_details['displayName']} of type {processor_details['type']}"
    )
    destination_processor_name = create_processor(
        destination_project_id, destination_processor_location, processor_details
    )
    tqdm.write(
        f"Destination processor created with processor name {destination_processor_name}"
    )
    add_processor_dataset(
        destination_processor_name, destination_dataset_gcs_uri, destination_project_id
    )
    schema = get_processor_dataset_schema(source_processor_name)
    update_processor_dataset_schema(destination_processor_name, schema)
    create_destination_dataset_bucket(
        destination_project_id, destination_exported_dataset_gcs_uri
    )
    move_exported_dataset(
        source_exported_dataset_gcs_uri, destination_exported_dataset_gcs_uri
    )
    import_document_by_type(
        destination_processor_name, gcs_documents_train, "DATASET_SPLIT_TRAIN"
    )
    import_document_by_type(
        destination_processor_name, gcs_documents_test, "DATASET_SPLIT_TEST"
    )

    tqdm.write(
        f"Link to UI of migrated processor dataset: https://console.cloud.google.com/ai/document-ai/{'/'.join(destination_processor_name.split('/')[2:])}/dataset?project={destination_project_id}"
    )
    version_display_name = get_processor_version_details(
        source_processor_name, processor_details["defaultProcessorVersion"]
    )
    trained_processor_response = train_processor(
        destination_processor_name, version_display_name
    )

**NOTE**: To automatically deploy the processor upon completion of training, remove the comment symbol from the above line of code before executing            the final code. Alternatively, manually deploy the processor through the user interface once training is complete

### 7. Execute the Processor Migration code

In [None]:
processor = migrate_processor(
    source_processor_name,
    destination_project_number,
    destination_processor_location,
    destination_processor_dataset_gcs_uri,
    kms_key_name,
)

**NOTE**:  If you encounter a rate limiting error, go to the destination processor. You can find the destination processor ID in the output when you execute the above command. Then, in the destination processor, try to manually trigger the training. If there is any minimum criteria issue, fix the issue from the UI and trigger the training from the UI.

### 8.OUTPUT:


Getting processor details
Migrating processor test_processor of type CUSTOM_EXTRACTION_PROCESSOR
Create processor
Destination processor created with processor name projects/XXXXXXXXX/locations/us/processors/XXXXXXXXXX
Add processor dataset
Waiting for operation to finish....
Getting processor dataset schema
Updating processor dataset schema
date and time = 2023-01-10-15-41-17
.............................................
................................
.......................
Import document
Waiting for operation to finish....
Import document
Waiting for operation to finish....
Link to UI of migrated processor dataset: https://console.cloud.google.com/ai/document-ai/locations/us/processors/XXXXXXXXX/dataset?project=XXXXXXXX
Getting processor version details
Training Processor
Waiting for operation to finish....


### This is the screenshot of source Project having certain schema.
<img src="./Images/Processor_A_Schema.png" width=800 height=400></img>
### This is the screenshot of destination Project having the same schema of original source project.
<img src="./Images/Processor_B_Schema.png" width=800 height=400></img>

##### The link in the output is where you can find the newly created processor and access it.

### 9. Deploy the processor


After the new processor has completed training, deploy it by navigating to the "Manage versions" tab in the user interface. 