# CMEK-DocAI-Processor



* Author: docai-incubator@google.com

## Disclaimer

This tool is not supported by the Google engineering team or product team. It is provided and supported on a best-effort basis by the DocAI Incubator Team. No guarantees of performance are implied. 


## Objective

This Document guides how to create Docai processor using CMEK key and rotate the key in regular basis. Delete the old processor after copying trained processor into New one.



## Prerequisites

* Vertex AI Notebook Or Colab (If using Colab, use authentication)
* Processor details to import the processor
* Permission For Google Storage and Vertex AI Notebook.
* GCS path where the labeled documents are placed

## Step by Step procedure

## 1. CMEK KEY CREATION & DESTROYING

### Install Libraries



In [None]:
!pip install google-cloud-kms
!pip install google-cloud-functions
!pip install google-cloud-logging

### IAM Roles

To Manage key through Python Jupyter notebook ,the service account which is used for notebook should have the below roles for creating CMEK key and creating processor using key.

* Cloud KMS Admin
* Cloud Document Understanding AI Resource Admin (or Document AI Admin)
* Storage Admin


### Key ring creation - [Reference Documentation](https://cloud.google.com/kms/docs/create-key-ring)
* Key ring is mandatory to create a CMEK key.

### Import the Required libraries

In [12]:
import time

from google.api_core.client_options import ClientOptions
from google.api_core.exceptions import NotFound
from google.cloud import audit, functions_v2, kms, kms_v1
import google.cloud.documentai_v1beta3 as documentai
from google.cloud.documentai_v1beta3 import DocumentProcessorServiceClient

#### Input & Function

In [None]:
Project_id = "XXXXXXXXXXXXXXXXX"  # project id of the project
Location_id: "xxxx"  # Location of the CMEK key to be created ,
Key_ring_id: "xxx-xxx-x"  # Unique key name has to be provided for the new key ring


def create_key_ring(project_id, location_id, key_ring_id):
    """
    Creates a new key ring in Cloud KMS

    Args:
        project_id (string): Google Cloud project ID (e.g. 'my-project').
        location_id (string): Cloud KMS location (e.g. 'us-east1').
        key_ring_id (string): ID of the key ring to create (e.g. 'my-key-ring').

    Returns:
        KeyRing: Cloud KMS key ring.

    """

    # Create the client.
    client = kms.KeyManagementServiceClient()

    # Build the parent location name.
    location_name = f"projects/{project_id}/locations/{location_id}"

    # Build the key ring.
    key_ring = {}

    # Call the API.
    created_key_ring = client.create_key_ring(
        request={
            "parent": location_name,
            "key_ring_id": key_ring_id,
            "key_ring": key_ring,
        }
    )
    print("Created key ring: {}".format(created_key_ring.name))
    return created_key_ring

### Key creation without rotation - [Documentation Link](https://cloud.google.com/kms/docs/create-key)
* Using the Key ring created by above step , CMEK key is created with the same details .

In [5]:
def create_key_symmetric_encrypt_decrypt(project_id, location_id, key_ring_id, key_id):
    """
    Creates a new symmetric encryption/decryption key in Cloud KMS.

    Args:
        project_id (string): Google Cloud project ID (e.g. 'my-project').
        location_id (string): Cloud KMS location (e.g. 'us-east1').
        key_ring_id (string): ID of the Cloud KMS key ring (e.g. 'my-key-ring').
        key_id (string): ID of the key to create (e.g. 'my-symmetric-key').

    Returns:
        CryptoKey: Cloud KMS key.

    """

    # Create the client.
    client = kms.KeyManagementServiceClient()

    # Build the parent key ring name.
    key_ring_name = client.key_ring_path(project_id, location_id, key_ring_id)

    # Build the key.
    purpose = kms.CryptoKey.CryptoKeyPurpose.ENCRYPT_DECRYPT
    algorithm = (
        kms.CryptoKeyVersion.CryptoKeyVersionAlgorithm.GOOGLE_SYMMETRIC_ENCRYPTION
    )
    key = {
        "purpose": purpose,
        "version_template": {
            "algorithm": algorithm,
        },
    }

    # Call the API.
    created_key = client.create_crypto_key(
        request={"parent": key_ring_name, "crypto_key_id": key_id, "crypto_key": key}
    )
    print("Created symmetric key: {}".format(created_key.name))
    return created_key

### Key Rotation - [Documentation Link](https://cloud.google.com/kms/docs/create-key)
* In the below code rotation period has to be changed (no of days) as per need.

In [6]:
def create_key_rotation_schedule(project_id, location_id, key_ring_id, key_id):
    """
    Creates a new key in Cloud KMS that automatically rotates.

    Args:
        project_id (string): Google Cloud project ID (e.g. 'my-project').
        location_id (string): Cloud KMS location (e.g. 'us-east1').
        key_ring_id (string): ID of the Cloud KMS key ring (e.g. 'my-key-ring').
        key_id (string): ID of the key to create (e.g. 'my-rotating-key').

    Returns:
        CryptoKey: Cloud KMS key.

    """

    # Create the client.
    client = kms.KeyManagementServiceClient()

    # Build the parent key ring name.
    key_ring_name = client.key_ring_path(project_id, location_id, key_ring_id)

    # Build the key.
    purpose = kms.CryptoKey.CryptoKeyPurpose.ENCRYPT_DECRYPT
    algorithm = (
        kms.CryptoKeyVersion.CryptoKeyVersionAlgorithm.GOOGLE_SYMMETRIC_ENCRYPTION
    )
    key = {
        "purpose": purpose,
        "version_template": {
            "algorithm": algorithm,
        },
        # Rotate the key every 30 days.
        "rotation_period": {"seconds": 60 * 60 * 24 * 30},
        # Start the first rotation in 24 hours.
        "next_rotation_time": {"seconds": int(time.time()) + 60 * 60 * 24},
    }

    # Call the API.
    created_key = client.create_crypto_key(
        request={"parent": key_ring_name, "crypto_key_id": key_id, "crypto_key": key}
    )
    print("Created labeled key: {}".format(created_key.name))
    return created_key

### Key destruction function 
* The below function can be used as a trigger to schedule for destruction of key. Once we destroy a key we can retrieve and enable the key within 24 hours by default. Any data encrypted with this key version will not be recoverable after 24 hours of destroying.

In [7]:
def destroy_key_version(project_id, location_id, key_ring_id, key_id, version_id):
    """
    Schedule destruction of the given key version.

    Args:
        project_id (string): Google Cloud project ID (e.g. 'my-project').
        location_id (string): Cloud KMS location (e.g. 'us-east1').
        key_ring_id (string): ID of the Cloud KMS key ring (e.g. 'my-key-ring').
        key_id (string): ID of the key to use (e.g. 'my-key').
        version_id (string): ID of the key version to destroy (e.g. '1').

    Returns:
        CryptoKeyVersion: The version.

    """

    # Create the client.
    client = kms.KeyManagementServiceClient()

    # Build the key version name.
    key_version_name = client.crypto_key_version_path(
        project_id, location_id, key_ring_id, key_id, version_id
    )

    # Call the API.
    destroyed_version = client.destroy_crypto_key_version(
        request={"name": key_version_name}
    )
    print("Destroyed key version: {}".format(destroyed_version.name))
    return destroyed_version

### Grating access to key for DOC AI and Cloud storage 
* To use the CMEK key to create a Doc AI processor and GCS bucket , we need to grant cloud KMS cryptokey Encrypter/Decrypter access for service agents in the CMEK key.

* To create Doc AI processor - service-Project number:@gcp-sa-prod-dai-core.iam.gserviceaccount.com
* To  create GCS bucket-service-Project number@gs-project-accounts.iam.gserviceaccount.com

<img src="./Images/key_ring_destroy.png" width=800 height=400></img>

## 2. CREATE NEW GCS BUCKET USING CMEK KEY 

* To create GCS bucket using CMEK key, need to provide the details of project id, cmek key , bucket name needed and location of bucket (this has to be same as the CMEK key location)


In [10]:
def create_bucket_cmek(project_id, bucket_name, cmek_key_name, location):
    """
    Get the IAM policy for a resource.

    Args:
        project_id (string): Google Cloud project ID (e.g. 'my-project').
        bucket_name (string): bucket name needed.
        cmek_key_name (string): key name in format projects/{project_id}/locations/{location_key}/keyRings/{key_ring_id}/cryptoKeys/{key_id}'
        location=location of key and bucket has to be same

    Returns:
        bucket

    """

    # Creates a client
    client = storage.Client()

    # Creates a bucket with a CMEK key
    bucket = client.bucket(bucket_name)
    # bucket.encryption_key = cmek_key_name
    bucket = client.create_bucket(bucket, location=location)
    bucket.default_kms_key_name = cmek_key_name
    bucket.patch()
    print(f"bucket {bucket_name} is created")

## 3. CREATE NEW PROCESSOR USING CMEK KEY 
* Doc AI processors can be created using CMEK Key.

* CMEK key has to be provided in the kms_key_name as below

In the below code the display name of the processor can be changed from ‘new_processor’ to the desired display name.

The processor type should be specified as needed.


In [None]:
parent= f'projects/{project_id}/locations/{location}'#api end point

kms_key_name=”projects/{project_name}/locations/{key_location}/keyRings/{key_ring_name}/cryptoKeys/{key_name}"

processor={"display_name":'new_processor',"type":'CUSTOM_EXTRACTION_PROCESSOR','kms_key_name':kms_key_name}

new_processor=documentai.CreateProcessorRequest(parent=parent,processor=processor)

client = DocumentProcessorServiceClient()

processor=client.create_processor(new_processor)

## 4. COPYING AND DEPLOYING THE PROCESSOR VERSION TO THE NEW PROCESSOR
* To copy the existing processor version to the new processor created in the above step.The below code does not copy the dataset from the source processor.


In [None]:
# provide the source version(to copy) processor details in the below format
client = documentai.DocumentProcessorServiceClient()

source_version = client.processor_version_path(
    project=project_number,
    location=location,
    processor=processor_id,
    processor_version=processor_version,
)
# provide the new processor name in the parent variable in format 'projects/{project_number}/locations/{location}/processors/{new_processor_id}'

op_import_version_req = (
    documentai.types.document_processor_service.ImportProcessorVersionRequest(
        processor_version_source=source_version, parent=processor.name
    )
)

# copying the processor

op_import_version = client.import_processor_version(request=op_import_version_req)


# Deploying

req_deploy = documentai.types.document_processor_service.DeployProcessorVersionRequest(
    name=op_import_version.metadata.common_metadata.resource
)
client.deploy_processor_version(req_deploy)

## 5. TRAINING A DOC AI PROCESSOR 
(https://cloud.google.com/document-ai/docs/samples/documentai-train-processor-version)
* Update the base_processor_version  when you want to up train from the existing version.

In [None]:
# TODO(developer): Uncomment these variables before running the sample.
# project_id = 'YOUR_PROJECT_ID'
# location = 'YOUR_PROCESSOR_LOCATION' # Format is 'us' or 'eu'
# processor_id = 'YOUR_PROCESSOR_ID'
# processor_version_display_name = 'new-processor-version'
# train_data_uri = 'gs://bucket/directory/' # (Optional)
# test_data_uri = 'gs://bucket/directory/' # (Optional)
# base_processor_version='projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}'  # (Optional) The processor version to use as a base for training. This processor version must be a child of parent. Format: projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}.


def train_processor_version_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version_display_name: str,
    train_data_uri: str = None,
    test_data_uri: str = None,
):
    # You must set the api_endpoint if you use a location other than 'us', e.g.:
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # The full resource name of the processor
    # e.g. `projects/{project_id}/locations/{location}/processors/{processor_id}
    parent = client.processor_path(project_id, location, processor_id)

    processor_version = documentai.ProcessorVersion(
        display_name=processor_version_display_name
    )

    # If train/test data is not supplied, the default sets in the Cloud Console will be used
    input_data = documentai.TrainProcessorVersionRequest.InputData(
        training_documents=documentai.BatchDocumentsInputConfig(
            gcs_prefix=documentai.GcsPrefix(gcs_uri_prefix=train_data_uri)
        ),
        test_documents=documentai.BatchDocumentsInputConfig(
            gcs_prefix=documentai.GcsPrefix(gcs_uri_prefix=test_data_uri)
        ),
    )

    request = documentai.TrainProcessorVersionRequest(
        parent=parent,
        processor_version=processor_version,
        input_data=input_data,
        base_processor_version=base_processor_version(optional),
    )

    operation = client.train_processor_version(request=request)
    # Print operation details
    print(operation.operation.name)
    # Wait for operation to complete
    response = documentai.TrainProcessorVersionResponse(operation.result())

    metadata = documentai.TrainProcessorVersionMetadata(operation.metadata)

    print(f"New Processor Version:{response.processor_version}")
    print(f"Training Set Validation: {metadata.training_dataset_validation}")
    print(f"Test Set Validation: {metadata.test_dataset_validation}")

## 6. DELETING PROCESSOR 
* Processor can be deleted with the below code , Once the processor is deleted we cannot retrieve it back.

In [None]:
# TODO(developer): Uncomment these variables before running the sample.
# project_id = 'YOUR_PROJECT_ID'
# location = 'YOUR_PROCESSOR_LOCATION' # Format is 'us' or 'eu'
# processor_id = 'YOUR_PROCESSOR_ID'


def delete_processor_sample(project_id: str, location: str, processor_id: str):
    # You must set the api_endpoint if you use a location other than 'us'.
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # The full resource name of the processor
    # e.g.: projects/project_id/locations/location/processors/processor_id
    processor_name = client.processor_path(project_id, location, processor_id)

    # Delete a processor
    try:
        operation = client.delete_processor(name=processor_name)
        # Print operation details
        print(operation.operation.name)
        # Wait for operation to complete
        operation.result()
    except NotFound as e:
        print(e.message)

## 7. DESTROYING CMEK KEY AUTOMATICALLY
* To destroy the CMEK key after 21 days , we need to use the Cloud function with a trigger using EVENTRAC as below.

* First create a CMEK key with rotation period of 21 days(change as per your requirement) and create a cloud function as per below steps


### Service account

Create or use the existing service account with the below IAM roles needed

* Cloud Functions Service Agent
* Cloud KMS Admin
* Compute Admin
* Eventarc Event Receiver

<img src="./Images/service_account_role.png" width=800 height=400></img>

### Enable KMS is audit logs
* Enable Audit logs as below for triggering (Eventarc) of cloud function when CMEK key new version is created.
<img src="./Images/Audit_logs.png" width=800 height=400></img>

### Cloud function 
* Create a Cloud function from the below function and source code provided

* Provide the input details for cloud function as below.


In [11]:
# input Details #
project_id = "xxx-xxxx-xxxx"  # update your project id#
bucket = "xxxxxxxxxxx"  # update the bucket path where zip file is saved #
object_1 = "xxx/source.zip"  # ZIP file location in GCP#
function_name = "delete_version_v1"  # function name (DONT CHANGE)#
service_account_email = "xxxxxx@xxxxx-xxxx-xxxx.iam.gserviceaccount.com"  # use service account which has roles specified above#
key_location = "us-central1"  # location of key#
key_ring_name = "xx-xxxxxx"  # key ring name#
key_name = "xx-xxxx"  # key name #


# function to create cloud function#
def cloud_func_cmek(
    project_id,
    bucket,
    object_1,
    service_account_email,
    key_ring_name,
    key_location,
    key_name,
    function_name,
):
    """
    Create a Cloud Function with a Cloud Audit Log trigger.

    Args:
        project_id (str): The Google Cloud project ID.
        bucket (str): The Cloud Storage bucket storing the Cloud Function source code.
        object_1 (str): The Cloud Storage object path to the source code.
        service_account_email (str): The service account email to associate with the Cloud Function.
        key_ring_name (str): The name of the key ring in Cloud KMS.
        key_location (str): The location of the key ring in Cloud KMS.
        key_name (str): The name of the crypto key in Cloud KMS.
        function_name (str): The name of the Cloud Function.

    Returns:
        None
    """

    function_build = functions_v2.BuildConfig(
        runtime="python311",
        entry_point=function_name,
        source=functions_v2.Source(
            storage_source=functions_v2.StorageSource(bucket=bucket, object=object_1)
        ),
    )

    service_build = functions_v2.ServiceConfig(
        service_account_email=service_account_email
    )

    event_trigger = functions_v2.EventTrigger(
        trigger_region=key_location,
        event_type="google.cloud.audit.log.v1.written",
        service_account_email=service_account_email,
    )
    event_trigger.event_filters = [
        {"attribute": "methodName", "value": "CreateCryptoKeyVersion"},
        {
            "attribute": "resourceName",
            "value": f"projects/{project_id}/locations/{key_location}/keyRings/{key_ring_name}/cryptoKeys/{key_name}",
            "operator": "match-path-pattern",
        },
        {"attribute": "serviceName", "value": "cloudkms.googleapis.com"},
    ]

    function = functions_v2.Function(
        name=f"projects/{project_id}/locations/{key_location}/functions/{function_name}",
        environment=functions_v2.Environment.GEN_2,
        build_config=function_build,
        event_trigger=event_trigger,
        service_config=service_build,
    )
    request = functions_v2.CreateFunctionRequest(
        parent=f"projects/{project_id}/locations/{key_location}",
        function_id=function_name,
        function=function,
    )

    function_c = functions_v2.FunctionServiceClient()
    function_new = function_c.create_function(request=request)


# calling the function #


cloud_func = cloud_func_cmek(
    project_id,
    bucket,
    object_1,
    service_account_email,
    key_ring_name,
    key_location,
    key_name,
    function_name,
)

### Source code
 * Update the input details and save the below code as main.py

In [None]:
def delete_version_v1(*args):
import time
    #input details
    project_id='xxxx-xxxx-xxxx'
    key_location='us-central1'
    key_ring_name='xxxx-xxx'
    key_name='xxxx-xxxx-xx'
    key_version=5 #keep the latest version will be created after roatation#
#---DONT EDIT AFTER THIS---#    
    print("==============Inside Cloud Function=================")
    def update_key_remove_rotation(project_id, location_id, key_ring_id, key_id):
        """
        Remove a rotation schedule from an existing key.

        Args:
            project_id (string): Google Cloud project ID (e.g. 'my-project').
            location_id (string): Cloud KMS location (e.g. 'us-east1').
            key_ring_id (string): ID of the Cloud KMS key ring (e.g. 'my-key-ring').
            key_id (string): ID of the key to use (e.g. 'my-key').

        Returns:
            CryptoKey: Updated Cloud KMS key.

        """
        print("#################### Remove Rotation Function #######################")
        # Create the client.
        client = kms_v1.KeyManagementServiceClient()

        # Build the key name.
        key_name = client.crypto_key_path(project_id, location_id, key_ring_id, key_id)

        key = {
            'name': key_name
        }

        # Build the update mask.
        update_mask = {'paths': ['rotation_period', 'next_rotation_time']}

        # Call the API.
        updated_key = client.update_crypto_key(request={'crypto_key': key, 'update_mask': update_mask})
        print('Updated key: {}'.format(updated_key.name))
        return updated_key
    
    def destroy_key_version(project_id, location_id, key_ring_id, key_id, version_id):
        """
        Schedule destruction of the given key version.

        Args:
            project_id (string): Google Cloud project ID (e.g. 'my-project').
            location_id (string): Cloud KMS location (e.g. 'us-east1').
            key_ring_id (string): ID of the Cloud KMS key ring (e.g. 'my-key-ring').
            key_id (string): ID of the key to use (e.g. 'my-key').
            version_id (string): ID of the key version to destroy (e.g. '1').

        Returns:
            CryptoKeyVersion: The version.

        """

        # Create the client.
        client = kms_v1.KeyManagementServiceClient()

        # Build the key version name.
        key_version_name = client.crypto_key_version_path(project_id, location_id, key_ring_id, key_id, version_id)

        # Call the API.
        destroyed_version = client.destroy_crypto_key_version(request={'name': key_version_name})
        print('Destroyed key version: {}'.format(destroyed_version.name))
        return destroyed_version
    update_key_remove_rotation(project_id=project_id,location_id=key_location,key_ring_id=key_ring_name, key_id=key_name)   
    time.sleep(30)
    x=destroy_key_version(project_id=project_id,location_id=key_location,key_ring_id=key_ring_name, key_id=key_name, version_id=key_version)

### requirements.txt 

Create a requirements.txt file with adding the below library to install

* google-cloud-kms

Now ZIP the main.py file and requirements.txt files and save in GCS and provide the path in the  cloud function in the above step in source and object.
