# Document AI Migrating Schema Between Processors

* Author: docai-incubator@google.com

## Objective

The code snippet utilizes the Google Cloud Document AI library to migrate a Document AI Dataset schema from one processor to another. It allows for the migration of schemas within the same Google Cloud project and also between distinct projects.

## Pre-requisites

* Vertex AI Notebook
* Access to Projects and Document AI Processors


## Step by Step procedure 

### 1. Import the Libraries

In [None]:
from google.cloud import documentai_v1beta3

### 2. Configure the Inputs

* **source_processor_id** : This is the Source Processor ID present in source processor details.
* **destination_processor_id** : This is the Destination Processor ID present in destination processor details.
* **source_project_id** : This is the project id of the source project.
* **destination_project_id** : This is the project id of the destination project.

In [None]:
source_processor_id = "XXX-XXX-XXX"  # Source Processor ID
destination_processor_id = "YYY-YYY-YYY"  # Destination Processor ID
source_project_id = "ZZZ-ZZZ-ZZZ"  # Source Project ID
destination_project_id = "ZZZ-ZZZ-ZZZ"  # Destination Project ID

### 3. Execute the code

In [None]:
client = documentai_v1beta3.DocumentServiceClient()
request = documentai_v1beta3.GetDatasetSchemaRequest(
    name=f"projects/{source_project_id}/locations/us/processors/{source_processor_id}/dataset/datasetSchema",
    visible_fields_only=True,
)

old_schema = client.get_dataset_schema(request=request)

# print(old_schema) # Print the Old Schema

old_schema.name = f"projects/{destination_project_id}/locations/us/processors/{destination_processor_id}/dataset/datasetSchema"  # Destination Processor

request = documentai_v1beta3.UpdateDatasetSchemaRequest(dataset_schema=old_schema)

# Make the request
response = client.update_dataset_schema(request=request)

print("Schema Updated")

With the provided code, users can effortlessly duplicate dataset schemas across processors.

**Note**:
* The **visible_fields_only=True** parameter in the GetDatasetSchemaRequest ensures that only the enabled fields from the source schema are transferred. If set to False, all fields from the source schema, regardless of their visibility status, will be transferred to the destination schema.

* When transferring the schema, all existing schema in the destination processor will be overwritten. This means any pre-existing schema in the destination processor will be replaced with the schema from the source processor.


### 4. Output

* Source Project Processor Schema :

<img src = "./Images/Project_A_Source_Schema.png" width=800 height=400 alt="Project_A_Source_Schema"></img>

* Destination Project Processor Schema : 

<img src = "./Images/Project_B_Destination.png" width=800 height=400 alt="Project_B_Destination"></img>