### <font color='#4285f4'>Overview</font>

Overview: This notebook will automate tagging of BigQuery tables with Aspect Tags based upon the data contained in the table.

There will be three apporaches:
1. We will use Gemini and sample data for generating the tags.  This will generate source code.
2. We will use Gemini and sample data for generating the tags.  This will generate structured output JSON.
3. We will use the output of the SDP scan for generating the tags.  This should be more accurate since our table scans should correctly identify senstive data.  This will generate structured output JSON.

Process Flow:
1. Gather the table schema
2. Gather the data from the table (or a sample size) and turn into JSON
3. Provide the Aspect Type definitions so Gemini knows what fields to output.
   a. Construct response schema for Gemini (this is structured JSON output)
4. Gather the Sensitive Data Protection Results
   a. Not every table has been scanned in the demo.
   b. Not every field is outputted in the scan results, for these we will use the sample data technique.
5. Construct the prompt
6. Run the prompt
7. Optional (highly recommended): If you wanted to automate this without human intervention you should have Gemini check the results of your LLM.
   a. For an example see the last cell in this notebook ([GitHub](https://github.com/GoogleCloudPlatform/data-analytics-golden-demo/blob/main/data-analytics-demos/data-engineering-agents/Data-Engineering-Agents-Demo.ipynb)).
8. Construct the code to update the Aspect Type

Notes:
* If you plan on using automated data goverance you should design your aspect types so they are easy to automate.  This might mean seperating fields like "Data Owner" from "Privacy Fields".  Or you can upsert the privacy attributes of your aspect type.

Cost:
* Approximate cost: less than $1 (does not include SDP scan costs)

Author:
* Adam Paternostro

In [None]:
# Architecture Diagram
from IPython.display import Image
Image(url='https://storage.googleapis.com/data-analytics-golden-demo/colab-diagrams/BigQuery-Data-Governance-Automated-Data-Governance.png', width=1000)

### <font color='#4285f4'>Video Walkthrough</font>

[Video](https://storage.googleapis.com/data-analytics-golden-demo/colab-videos/Automated-Aspect-Data-Governance.mp4)


In [None]:
from IPython.display import HTML

HTML("""
<video width="800" height="600" controls>
  <source src="https://storage.googleapis.com/data-analytics-golden-demo/colab-videos/Automated-Aspect-Data-Governance.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")

### <font color='#4285f4'>License</font>

```
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
```

### <font color='#4285f4'>Pip installs</font>

In [None]:
# PIP Installs (if necessary)
import sys

# !{sys.executable} -m pip install REPLACE-ME

### <font color='#4285f4'>Initialize</font>

In [None]:
from PIL import Image
from IPython.display import HTML
import IPython.display
import google.auth
import requests
import json
import uuid
import base64
import os
import cv2
import random
import time
import datetime
import base64
import random

import logging
from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception

In [None]:
# Set these (run this cell to verify the output)

bigquery_location = "${bigquery_location}"
dataplex_region = "${dataplex_region}"
location = "${dataplex_region}"

# Get the current date and time
now = datetime.datetime.now()

# Format the date and time as desired
formatted_date = now.strftime("%Y-%m-%d-%H-%M")

# Get some values using gcloud
project_id = os.environ["GOOGLE_CLOUD_PROJECT"]
user = !(gcloud auth list --filter=status:ACTIVE --format="value(account)")

if len(user) != 1:
  raise RuntimeError(f"user is not set: {user}")
user = user[0]

print(f"project_id = {project_id}")
print(f"user = {user}")

### <font color='#4285f4'>Helper Methods</font>

#### restAPIHelper
Calls the Google Cloud REST API using the current users credentials.

In [None]:
def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:
  """Calls the Google Cloud REST API passing in the current users credentials"""

  import requests
  import google.auth
  import json

  # Get an access token based upon the current user
  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request()
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
    "Content-Type" : "application/json",
    "Authorization" : "Bearer " + access_token
  }

  if http_verb == "GET":
    response = requests.get(url, headers=headers)
  elif http_verb == "POST":
    response = requests.post(url, json=request_body, headers=headers)
  elif http_verb == "PUT":
    response = requests.put(url, json=request_body, headers=headers)
  elif http_verb == "PATCH":
    response = requests.patch(url, json=request_body, headers=headers)
  elif http_verb == "DELETE":
    response = requests.delete(url, headers=headers)
  else:
    raise RuntimeError(f"Unknown HTTP verb: {http_verb}")

  if response.status_code == 200:
    return json.loads(response.content)
    #image_data = json.loads(response.content)["predictions"][0]["bytesBase64Encoded"]
  else:
    error = f"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'"
    raise RuntimeError(error)

#### RetryCondition (for retrying LLM calls)

In [None]:
def RetryCondition(error):
  error_string = str(error)
  print(error_string)

  retry_errors = [
      "RESOURCE_EXHAUSTED",
      "No content in candidate",
      # Add more error messages here as needed
  ]

  for retry_error in retry_errors:
    if retry_error in error_string:
      print("Retrying...")
      return True

  return False

#### Gemini LLM

In [None]:
@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))
def GeminiLLM(prompt, model = "gemini-2.0-flash", response_schema = None,
                 temperature = 1, topP = 1, topK = 32):

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models

  llm_response = None
  if temperature < 0:
    temperature = 0

  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request() # required to acess access token
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token
  }

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
  url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent"

  generation_config = {
    "temperature": temperature,
    "topP": topP,
    "maxOutputTokens": 8192,
    "candidateCount": 1,
    "responseMimeType": "application/json",
  }

  # Add inthe response schema for when it is provided
  if response_schema is not None:
    generation_config["responseSchema"] = response_schema

  if model == "gemini-2.0-flash":
    generation_config["topK"] = topK

  payload = {
    "contents": {
      "role": "user",
      "parts": {
          "text": prompt
      },
    },
    "generation_config": {
      **generation_config
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }

  response = requests.post(url, json=payload, headers=headers)

  if response.status_code == 200:
    try:
      json_response = json.loads(response.content)
    except Exception as error:
      raise RuntimeError(f"An error occurred parsing the JSON: {error}")

    if "candidates" in json_response:
      candidates = json_response["candidates"]
      if len(candidates) > 0:
        candidate = candidates[0]
        if "content" in candidate:
          content = candidate["content"]
          if "parts" in content:
            parts = content["parts"]
            if len(parts):
              part = parts[0]
              if "text" in part:
                text = part["text"]
                llm_response = text
              else:
                raise RuntimeError("No text in part: {response.content}")
            else:
              raise RuntimeError("No parts in content: {response.content}")
          else:
            raise RuntimeError("No parts in content: {response.content}")
        else:
          raise RuntimeError("No content in candidate: {response.content}")
      else:
        raise RuntimeError("No candidates: {response.content}")
    else:
      raise RuntimeError("No candidates: {response.content}")

    # Remove some typically response characters (if asking for a JSON reply)
    llm_response = llm_response.replace("```json","")
    llm_response = llm_response.replace("```","")
    llm_response = llm_response.replace("\n","")

    return llm_response

  else:
    raise RuntimeError(f"Error with prompt:'{prompt}'  Status:'{response.status_code}' Text:'{response.text}'")

In [None]:
@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))
def GeminiLLM_VerifyImage(prompt, imageBase64, model = "gemini-2.0-flash", response_schema = None,
                 temperature = 1, topP = 1, topK = 32):

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models

  llm_response = None
  if temperature < 0:
    temperature = 0

  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request() # required to acess access token
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token
  }

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
  url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent"

  generation_config = {
    "temperature": temperature,
    "topP": topP,
    "maxOutputTokens": 8192,
    "candidateCount": 1,
    "responseMimeType": "application/json",
  }

  # Add inthe response schema for when it is provided
  if response_schema is not None:
    generation_config["responseSchema"] = response_schema

  if model == "gemini-2.0-flash":
    generation_config["topK"] = topK

  payload = {
    "contents": {
      "role": "user",
      "parts": [
          { "text": prompt },
          { "inlineData": {  "mimeType": "image/png", "data": f"{imageBase64}" } }
        ]
    },
    "generation_config": {
      **generation_config
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }

  response = requests.post(url, json=payload, headers=headers)

  if response.status_code == 200:
    try:
      json_response = json.loads(response.content)
    except Exception as error:
      raise RuntimeError(f"An error occurred parsing the JSON: {error}")

    if "candidates" in json_response:
      candidates = json_response["candidates"]
      if len(candidates) > 0:
        candidate = candidates[0]
        if "content" in candidate:
          content = candidate["content"]
          if "parts" in content:
            parts = content["parts"]
            if len(parts):
              part = parts[0]
              if "text" in part:
                text = part["text"]
                llm_response = text
              else:
                raise RuntimeError("No text in part: {response.content}")
            else:
              raise RuntimeError("No parts in content: {response.content}")
          else:
            raise RuntimeError("No parts in content: {response.content}")
        else:
          raise RuntimeError("No content in candidate: {response.content}")
      else:
        raise RuntimeError("No candidates: {response.content}")
    else:
      raise RuntimeError("No candidates: {response.content}")

    # Remove some typically response characters (if asking for a JSON reply)
    llm_response = llm_response.replace("```json","")
    llm_response = llm_response.replace("```","")
    llm_response = llm_response.replace("\n","")

    return llm_response

  else:
    raise RuntimeError(f"Error with prompt:'{prompt}'  Status:'{response.status_code}' Text:'{response.text}'")

#### Helper Functions

In [None]:
def RunQuery(sql):
  import time
  from google.cloud import bigquery
  client = bigquery.Client()

  if (sql.startswith("SELECT") or sql.startswith("WITH")):
      df_result = client.query(sql).to_dataframe()
      return df_result
  else:
    job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)
    query_job = client.query(sql, job_config=job_config)

    # Check on the progress by getting the job's updated state.
    query_job = client.get_job(
        query_job.job_id, location=query_job.location
    )
    print("Job {} is currently in state {} with error result of {}".format(query_job.job_id, query_job.state, query_job.error_result))

    while query_job.state != "DONE":
      time.sleep(2)
      query_job = client.get_job(
          query_job.job_id, location=query_job.location
          )
      print("Job {} is currently in state {} with error result of {}".format(query_job.job_id, query_job.state, query_job.error_result))

    if query_job.error_result == None:
      return True
    else:
      raise Exception(query_job.error_result)

### <font color='#4285f4'>Table Schema and Data</font>

In [None]:
def GetTableSchema(project, dataset_name, table_name):
  import io
  from google.cloud import bigquery
  client = bigquery.Client()

  dataset_ref = client.dataset(dataset_name, project)
  table_ref = dataset_ref.table(table_name)
  table = client.get_table(table_ref)

  f = io.StringIO("")
  client.schema_to_json(table.schema, f)
  return f.getvalue()

In [None]:
def GetTableSampleData(project_id, dataset_name, table_name):
  # Get some sample data to pass to Gemini (we should only sample the data if it has many rows)
  sample_data_results_json = []
  try:
    sql = f"""SELECT TOTAL_ROWS AS Cnt
                FROM `{project_id}.region-{bigquery_location}.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT`
               WHERE TABLE_NAME = '{table_name}'
                 AND TABLE_SCHEMA = '{dataset_name}'"""

    print(f"sql: {sql}")
    results = RunQuery(sql)
    count = 0
    for index, row in results.iterrows():
        count = int(row["Cnt"])

    print(f"{table_name} count: {count}")
  except Exception as e:  # Catch any standard exception and assign it to 'e'
      # do nothing, we might not be able to query this table due to security access
      print(f"\nTable: {table_name}: Cannot view data in INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT.")
      print(f"Reason (suppressed): {type(e).__name__}: {e}")
      count = 10000 # Trigger table sample

  try:
    sample_percent = 100
    if count < 10000:
      sample_percent = 100
    else:
      sample_percent = 10

    sql = f"SELECT * FROM `{project_id}.{dataset_name}.{table_name}` TABLESAMPLE SYSTEM ({sample_percent} PERCENT) LIMIT 100"
    print(f"sql: {sql}")
    sample_data_results = RunQuery(sql)
    sample_data_results_json = sample_data_results.to_json(orient='records')
    return sample_data_results_json

    print(f"Table: {table_name}: {sample_data_results_json}")
  except Exception as e:  # Catch any standard exception and assign it to 'e'
      print(f"\nTable: {table_name}: Cannot view data in table.")
      print(f"Reason (suppressed): {type(e).__name__}: {e}")

  return []

In [None]:
dataset_name = "governed_data_curated"

table_list_sql = f"""SELECT table_name
                       FROM {dataset_name}.INFORMATION_SCHEMA.TABLES
                      WHERE table_type = 'BASE TABLE'
                        AND table_name IN ('customer','customer_transaction','product','product_category') -- Same tables as SDP scan, you can comment this out
                      ORDER BY table_name;"""

results = RunQuery(table_list_sql)

tables_to_create_aspects = []

for index, row in results.iterrows():
  table_name = row['table_name']
  sample_data_results_json = GetTableSampleData(project_id, dataset_name, table_name)
  data_result = {
      "project_id" : project_id,
      "dataset_name": dataset_name,
      "table_name": table_name,
      "sample_data_json": sample_data_results_json
  }

  tables_to_create_aspects.append(data_result)

In [None]:
tables_to_create_aspects

### <font color='#4285f4'>Automated Data Governance - Aspect Types (Gemini Only) - Code Generation</font>

Generate Python code that we can copy and paste and just run to set our Aspect Types.  Some values need to be in the prompt like business owner and such.

We could seperate out the items like business owner, data steward into its own aspect type and have our privacy (PII, GDPR, etc) in a seperate aspect type which gets fully automated.

##### <font color='#4285f4'>Gemini Prompts</font>

In [None]:
def CreateGeminiPrompt_TableSample_Python_Results(project_id, dataset_name, table_name, sample_data_results_json, metadata_info, model):
  table_schema = GetTableSchema(project_id, dataset_name, table_name)
  response_schema = {
    "type": "object",
    "required": [
      "explanation",
      "generated_python_code",
    ],
    "properties": {
      "explanation": {
        "type": "string",
        "format": "text"
      },
      "generated_python_code": {
        "type": "string",
        "format": "text"
      }
    }
  }

  dataGovernanceAspectId = "data-governance-aspect-type"
  governedTableEntryTypeId = "governed-table"
  governedTableEntryTypeLocation = "global"
  entryGroupLocation = "us"  # This has to be "us" since our tables are US multi-region
  bigqueryProjectId = project_id
  bigqueryDataset = dataset_name
  bigqueryTable = table_name

  aspect_type_table_sample_code = """aspects = {
      f"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}": {
          "data": {
          "data-steward": "Jane Doe",
          "owner-group": "data-governance-team",
          "business-owner": "Jane Doe",
          "documentation-url": "http://yourcompany.com/customer-table-documentation",
          "data-lifecycle": "Dev",
          "classification-level": "Restricted",
          "data-sensitivity-level": "High",
          "contains-pii": True
          }}
  }"""

  aspect_type_column_sample_code = """aspects[f"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}"] = {
        "data": {
            "contains-pii": True,
            "pii-type": pii_info["pii_type"],
            "data-sensitivity-level": "High",
            "compliance-requirements": ["GDPR", "CCPA"]
        }}"""

  aspect_type_table_defination = """  {
      "name": "data-governance-aspect-type-metadataTemplate",
      "type": "record",
      "recordFields": [
          {
              "name": "data-steward",
              "type": "string",
              "annotations": {
                  "displayName": "Data Steward",
                  "description": "The name or ID of the data steward responsible for this table."
              },
              "index": 1,
              "constraints": {
                  "required": True
              }
          },
          {
              "name": "owner-group",
              "type": "string",
              "annotations": {
                  "displayName": "Owner Group",
                  "description": "The IAM group or team responsible for this table."
              },
              "index": 2,
              "constraints": {
                  "required": True
              }
          },
          {
              "name": "business-owner",
              "type": "string",
              "annotations": {
                  "displayName": "Business Owner",
                  "description": "Name of the owner or contact for the data asset"
              },
              "index": 3,
              "constraints": {
                  "required": True
              }
          },
          {
              "name": "documentation-url",
              "type": "string",
              "annotations": {
                  "displayName": "Documentation URL",
                  "description": "URL to documentation about the table, including access, usage, etc."
              },
              "index": 4,
              "constraints": {
                  "required": False
              }
          },
          {
              "name": "data-lifecycle",
              "type": "enum",
              "annotations": {
                  "displayName": "Data Lifecycle",
                  "description": "The lifecycle stage of the asset (Dev, Test, QA, Production, Deprecated)"
              },
              "index": 5,
              "constraints": {
                  "required": True
              },
              "enumValues": [
                  {
                      "name": "Dev",
                      "index": 1
                  },
                  {
                      "name": "Test",
                      "index": 2
                  },
                  {
                      "name": "QA",
                      "index": 3
                  },
                  {
                      "name": "Production",
                      "index": 4
                  },
                  {
                      "name": "Deprecated",
                      "index": 6
                  },
              ]
          },
          {
              "name": "classification-level",
              "type": "enum",
              "annotations": {
                  "displayName": "Classification Level",
                  "description": "Indicates the sensitivity and access restrictions for this data asset (Public, Internal, Confidential, Restricted)."
              },
              "index": 6,
              "constraints": {
                  "required": True
              },
              "enumValues": [
                  {
                      "name": "Public",
                      "index": 1
                  },
                  {
                      "name": "Internal",
                      "index": 2
                  },
                  {
                      "name": "Confidential",
                      "index": 3
                  },
                  {
                      "name": "Restricted",
                      "index": 4
                  },
              ]
          },
          {
              "name": "data-sensitivity-level",
              "type": "enum",
              "annotations": {
                  "displayName": "Data Sensitivity Level",
                  "description": "The general sensitivity classification of the table. (Low, Medium, High, Critical)"
              },
              "index": 7,
              "constraints": {
                  "required": True
              },
              "enumValues": [
                  {
                      "name": "Low",
                      "index": 1
                  },
                  {
                      "name": "Medium",
                      "index": 2
                  },
                  {
                      "name": "High",
                      "index": 3
                  },
                  {
                      "name": "Critical",
                      "index": 4
                  },
              ]
          },
          {
              "name": "contains-pii",
              "type": "bool",
              "annotations": {
                  "displayName": "Contains PII",
                  "description": "Indicates if this table contains any Personally Identifiable Information (PII)."
              },
              "index": 8,
              "constraints": {
                  "required": True
              }
          }
      ]
  }
  """

  aspect_type_column_defination = {
    "name": "data-sensitivity-aspect-type-metadataTemplate",
    "type": "record",
    "recordFields": [
        {
            "name": "contains-pii",
            "type": "bool",
            "annotations": {
                "displayName": "Contains PII",
                "description": "Indicates if this column contains any Personally Identifiable Information (PII)."
            },
            "index": 1,
            "constraints": {
                "required": True
            }
        },
        {
            "name": "pii-type",
            "type": "string",
            "annotations": {
                "displayName": "PII Type",
                "description": "The type of PII contained within this column (e.g., Name, Email, Phone Number, etc.)."
            },
            "index": 2,
             "constraints": {
                "required": False
            }
        },
        {
           "name": "data-sensitivity-level",
           "type": "enum",
           "annotations": {
                "displayName": "Data Sensitivity Level",
                "description": "The sensitivity level of the data for data masking or other protection needs. (Low, Medium, High, Critical)"
            },
            "index": 3,
            "constraints": {
                "required": True
            },
            "enumValues": [
                {
                  "name": "Low",
                  "index": 1
                },
                {
                  "name": "Medium",
                  "index": 2
                },
               {
                  "name": "High",
                  "index": 3
                },
                {
                  "name": "Critical",
                   "index": 4
                },
            ]
        },
        {
            "name": "compliance-requirements",
            "type": "array",
            "arrayItems":
                {
                "name": "compliance-requirements-metadata-template",
                "type": "string"
                },
            "annotations": {
                "displayName": "Compliance Requirements",
                "description": "List of regulations that are relevant to this column (e.g., GDPR, CCPA, HIPAA)."
            },
            "index": 4,
            "constraints": {
                "required": False
            }
        }
    ]
}

  prompt = f"""You are a data governance expert and need to create data governance aspect tags on a BigQuery table and each column.
  Sample data has been provided about the table.
  Generate the Python code to set the aspects for the table.
  Generate the Python code to set the aspects for each and every column in the table.
  Do not generate additional code comments.
  Only generate the code in the code-to-generate xml.
  The function updateDataplexSystemEntry_BigQueryTable has already been defined.
  Follow the sample code exactly.  Do not create loops or additional variables.

  Use the following for attributes not related to data:
  Check the metadata hint for other details.
  data-steward: Customer tables should use "Customer Service".  Order tables should use "Order Service".  Use "REPLACE-ME" if you cannot determine.
  owner-group: Customer tables should use "Customer Group".  Order tables should use "Order Group".  Use "REPLACE-ME" if you cannot determine.
  business-owner: Customer tables should use "Customer Business Owner".  Order tables should use "Order Business Owner".  Use "REPLACE-ME" if you cannot determine.
  documentation-url: All docuementation is at http://www.mycompany.com/"replace-table-name.pdf"
  data-lifecycle: If the project id is "governed-data-umyps6maku" we are in Production; otherwise, use Dev.
  classification-level: None of this data is Public so use the provided Aspect Choices.
  data-sensitivity-level: Low has no sensitive data. Medium has some information like addresses. High contains email or date of births. Critical is ssns, passports, credit cards.
  contains-pii: True if you detect items like: PII, GDPR or The California Consumer Privacy Act (CCPA).

  Metadata hint (a hint to what the table might be about):
  {metadata_info}

  <table_schema>
  {table_schema}
  </table_schame>

  <sample_table_data>
  {sample_data_results_json}
  </sample_table_data>

  <aspect-type-table-defination>
  {aspect_type_table_defination}
  </aspect-type-table-defination>

  <aspect-type-column-defination>
  {aspect_type_column_defination}
  </aspect-type-column-defination>

  <code-to-generate>
  project_id = "{project_id}"
  entryGroupLocation = "{entryGroupLocation}"
  bigqueryProjectId = "{bigqueryProjectId}"
  bigqueryDataset = "{bigqueryDataset}"
  bigqueryTable = "{bigqueryTable}"
  governedTableEntryTypeId = "{governedTableEntryTypeId}"
  governedTableEntryTypeLocation = "{governedTableEntryTypeLocation}"

  # Sample Code to set Aspect Type on the table.
  {aspect_type_table_sample_code}

  # Sample Code to set Aspect Type for each column in the table.
  {aspect_type_column_sample_code}

  # This will assign the aspect to the table and for all the columns
  updateDataplexSystemEntry_BigQueryTable(project_id,
                                          entryGroupLocation,
                                          bigqueryProjectId, bigqueryDataset, bigqueryTable,
                                          governedTableEntryTypeId, governedTableEntryTypeLocation,
                                          aspects)

  </code-to-generate>

  """

  gemini_json = GeminiLLM(prompt, model=model, response_schema=response_schema)
  #print(gemini_json)
  gemini_dict = json.loads(gemini_json)
  return gemini_dict

##### <font color='#4285f4'>Generate Auto Data Governance</font>

In [None]:
# Customer table (we could loop through all tables, but this will just show one)
model = "gemini-2.5-pro-preview-05-06" # "gemini-2.5-flash-preview-04-17" # You can test other models for speed or accuracy

generated_python_code = CreateGeminiPrompt_TableSample_Python_Results(tables_to_create_aspects[0]["project_id"],
                                                                      tables_to_create_aspects[0]["dataset_name"],
                                                                      tables_to_create_aspects[0]["table_name"],
                                                                      tables_to_create_aspects[0]["sample_data_json"],
                                                                      "This table holds customer data for our order system",
                                                                      model)
print(generated_python_code["explanation"])
print("############# Begin Generated Code #############")
print(generated_python_code["generated_python_code"])

In [None]:
# Product table (we could loop through all tables, but this will just show one)
model = "gemini-2.5-pro-preview-05-06" # "gemini-2.5-flash-preview-04-17" # You can test other models for speed or accuracy

generated_python_code = CreateGeminiPrompt_TableSample_Python_Results(tables_to_create_aspects[1]["project_id"],
                                                                      tables_to_create_aspects[1]["dataset_name"],
                                                                      tables_to_create_aspects[1]["table_name"],
                                                                      tables_to_create_aspects[1]["sample_data_json"],
                                                                      "This table holds product data for our order system",
                                                                      model)

print(generated_python_code["explanation"])
print("############# Begin Generated Code #############")
print(generated_python_code["generated_python_code"])

### <font color='#4285f4'>Automated Data Governance - Aspect Types (Gemini Only) - Json Structured for Code</font>

Full code generation might generate non-perfect code, we know what methods we need to call, so let's just get the values from Gemini.

This is also a smaller prompt with the reponse schema holding a lot of the logic.

We would then loop through the results and upsert our values.  We can read existing aspects and preserve existing values like data steward, data owner, etc.

##### <font color='#4285f4'>Gemini Prompts</font>

In [None]:
def CreateGeminiPrompt_TableSample_Json_Results(project_id, dataset_name, table_name, sample_data_results_json, metadata_info, model):
  table_schema = GetTableSchema(project_id, dataset_name, table_name)
  response_schema = {
  "type": "object",
  "properties": {
    "table": {
      "type": "object",
      "description": "Metadata about the table.",
      "properties": {
        "classification-level": {
          "type": "string",
          "description": "Indicates the sensitivity and access restrictions for this data asset (Public, Internal, Confidential, Restricted).",
          "enum": [
            "Public",
            "Internal",
            "Confidential",
            "Restricted"
          ],
          "example": "Confidential"
        },
        "data-sensitivity-level": {
          "type": "string",
          "description": "The general sensitivity classification of the table. (Low, Medium, High, Critical).",
          "enum": [
            "Low",
            "Medium",
            "High",
            "Critical"
          ],
          "example": "High"
        },
        "contains-pii": {
          "type": "boolean",
          "description": "Indicates if this table contains any Personally Identifiable Information (PII).",
          "example": True
        }
      },
      "required": [
        "classification-level",
        "data-sensitivity-level",
        "contains-pii"
      ]
    },
    "columns": {
      "type": "array",
      "description": "An array of objects, each describing a column in the table and its data sensitivity attributes.",
      "items": {
        "type": "object",
        "properties": {
          "column-name": {
            "type": "string",
            "description": "The name of the column.",
            "example": "email_address"
          },
          "contains-pii": {
            "type": "boolean",
            "description": "Indicates if this column contains any Personally Identifiable Information (PII).",
            "example": True
          },
          "pii-type": {
            "type": "string",
            "description": "The type of PII contained within this column (e.g., Name, Email, Phone Number, etc.). This field is relevant only if 'contains-pii' is true.",
            "nullable": True,
            "example": "Email Address"
          },
          "data-sensitivity-level": {
            "type": "string",
            "description": "The sensitivity level of the data for data masking or other protection needs. (Low, Medium, High, Critical).",
            "enum": [
                "Low",
                "Medium",
                "High",
                "Critical"
            ],
            "example": "High"
          },
          "compliance-requirements": {
            "type": "array",
            "description": "List of regulations that are relevant to this column (e.g., GDPR, CCPA, HIPAA).",
            "items": {
              "type": "string",
              "example": "GDPR"
            },
            "nullable": True,
            "example": ["GDPR", "CCPA"]
          }
        },
        "required": [
          "column-name",
          "contains-pii",
          "data-sensitivity-level"
        ]
      }
    }
  },
  "required": [
    "table",
    "columns"
  ]
}

  prompt = f"""You are a data governance expert and need to create data governance aspect tags on a BigQuery table and each column.
  Sample data has been provided about the table.

  Metadata hint (a hint to what the table might be about):
  {metadata_info}

  <table_schema>
  {table_schema}
  </table_schame>

  <sample_table_data>
  {sample_data_results_json}
  </sample_table_data>
  """

  gemini_json = GeminiLLM(prompt, model=model, response_schema=response_schema)
  print(gemini_json)
  gemini_dict = json.loads(gemini_json)
  return gemini_dict

##### <font color='#4285f4'>Generate Auto Data Governance</font>

In [None]:
# Customer table (we could loop through all tables, but this will just show one)
model = "gemini-2.5-pro-preview-05-06" # "gemini-2.5-flash-preview-04-17" # You can test other models for speed or accuracy

generated_json_results = CreateGeminiPrompt_TableSample_Json_Results(tables_to_create_aspects[0]["project_id"],
                                                                     tables_to_create_aspects[0]["dataset_name"],
                                                                     tables_to_create_aspects[0]["table_name"],
                                                                     tables_to_create_aspects[0]["sample_data_json"],
                                                                     "This table holds customer data for our order system",
                                                                     model)

print(json.dumps(generated_json_results, indent=2))

In [None]:
# Product table (we could loop through all tables, but this will just show one)
model = "gemini-2.5-pro-preview-05-06" # "gemini-2.5-flash-preview-04-17" # You can test other models for speed or accuracy

generated_json_results = CreateGeminiPrompt_TableSample_Json_Results(tables_to_create_aspects[1]["project_id"],
                                                                     tables_to_create_aspects[1]["dataset_name"],
                                                                     tables_to_create_aspects[1]["table_name"],
                                                                     tables_to_create_aspects[1]["sample_data_json"],
                                                                     "This table holds product data for our order system",
                                                                     model)

print(json.dumps(generated_json_results, indent=2))

### <font color='#4285f4'>Automated Data Governance - Aspect Types (Using Sensitive Data Protection Results)</font>

This will query the data from the output of the Sensitive Data Protection results scan along with Gemini to determine the data sensitivity levels.

You can then loop thought the results and update the aspects on your tables.

##### <font color='#4285f4'>Gemini Prompts</font>

In [None]:
def CreateGeminiPrompt_SDP_Scan_Json_Results(project_id, dataset_name, table_name, sample_data_results_json, metadata_info, model):
  table_schema = GetTableSchema(project_id, dataset_name, table_name)
  response_schema = {
  "type": "object",
  "properties": {
    "table": {
      "type": "object",
      "description": "Metadata about the table.",
      "properties": {
        "explanation": {
          "type": "string",
          "description": "Your reasoning for setting specfici values.  Include if you used the sensitive data protection information in your decission.",
        },
        "classification-level": {
          "type": "string",
          "description": "Indicates the sensitivity and access restrictions for this data asset (Public, Internal, Confidential, Restricted).",
          "enum": [
            "Public",
            "Internal",
            "Confidential",
            "Restricted"
          ],
          "example": "Confidential"
        },
        "data-sensitivity-level": {
          "type": "string",
          "description": "The general sensitivity classification of the table. (Low, Medium, High, Critical).",
          "enum": [
            "Low",
            "Medium",
            "High",
            "Critical"
          ],
          "example": "High"
        },
        "contains-pii": {
          "type": "boolean",
          "description": "Indicates if this table contains any Personally Identifiable Information (PII).",
          "example": True
        }
      },
      "required": [
        "explanation",
        "classification-level",
        "data-sensitivity-level",
        "contains-pii"
      ]
    },
    "columns": {
      "type": "array",
      "description": "An array of objects, each describing a column in the table and its data sensitivity attributes.",
      "items": {
        "type": "object",
        "properties": {
          "column-name": {
            "type": "string",
            "description": "The name of the column.",
            "example": "email_address"
          },
          "explanation": {
            "type": "string",
            "description": "Your reasoning for setting specfici values.  Include if you used the sensitive data protection information in your decission.",
          },
          "contains-pii": {
            "type": "boolean",
            "description": "Indicates if this column contains any Personally Identifiable Information (PII).",
            "example": True
          },
          "pii-type": {
            "type": "string",
            "description": "The type of PII contained within this column (e.g., Name, Email, Phone Number, etc.). This field is relevant only if 'contains-pii' is true.",
            "nullable": True,
            "example": "Email Address"
          },
          "data-sensitivity-level": {
            "type": "string",
            "description": "The sensitivity level of the data for data masking or other protection needs. (Low, Medium, High, Critical).",
            "enum": [
                "Low",
                "Medium",
                "High",
                "Critical"
            ],
            "example": "High"
          },
          "compliance-requirements": {
            "type": "array",
            "description": "List of regulations that are relevant to this column (e.g., GDPR, CCPA, HIPAA).",
            "items": {
              "type": "string",
              "example": "GDPR"
            },
            "nullable": True,
            "example": ["GDPR", "CCPA"]
          }
        },
        "required": [
          "column-name",
          "explanation",
          "contains-pii",
          "data-sensitivity-level"
        ]
      }
    }
  },
  "required": [
    "table",
    "columns"
  ]
}

  sdp_dataset_name = "governed_data_sdp_scan"
  sql = f"""SELECT * EXCEPT (ranking)
    FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY info_type.name ORDER BY create_time.timestamp DESC) AS ranking FROM `{sdp_dataset_name}.{table_name}`)
  WHERE ranking = 1;"""
  sdp_data_results = RunQuery(sql)
  sdp_data_results_json = sdp_data_results.to_json(orient='records')
  #print (sdp_data_results_json)

  prompt = f"""You are a data governance expert and need to create data governance aspect tags on a BigQuery table and each column.
  Sample data has been provided about the table.
  A sensitive data protection scan was done on this table.
  The sensitive data protection scan results have been included.
  If you have sensitive data protection for a column, then use sensitive data protection scan results to make your determination for the aspect type attributes.

  Metadata hint (a hint to what the table might be about):
  {metadata_info}

  <table_schema>
  {table_schema}
  </table_schame>

  <sample_table_data>
  {sample_data_results_json}
  </sample_table_data>

  <sensitive_data_protection_scan_results>
  {sdp_data_results_json}
  </sensitive_data_protection_scan_results>
  """

  gemini_json = GeminiLLM(prompt, model=model, response_schema=response_schema)
  print(gemini_json)
  gemini_dict = json.loads(gemini_json)
  return gemini_dict

##### <font color='#4285f4'>Generate Auto Data Governance</font>

In [None]:
# Customer table (we could loop through all tables, but this will just show one)
model = "gemini-2.5-pro-preview-05-06" # "gemini-2.5-flash-preview-04-17" # You can test other models for speed or accuracy

generated_json_results = CreateGeminiPrompt_SDP_Scan_Json_Results(tables_to_create_aspects[0]["project_id"],
                                                                  tables_to_create_aspects[0]["dataset_name"],
                                                                  tables_to_create_aspects[0]["table_name"],
                                                                  tables_to_create_aspects[0]["sample_data_json"],
                                                                  "This table holds customer data for our order system",
                                                                  model)

saved_for_code_example_generated_json_results = generated_json_results # Used below to generate code

print(json.dumps(generated_json_results, indent=2))

In [None]:
# Product table (we could loop through all tables, but this will just show one)
model = "gemini-2.5-pro-preview-05-06" # "gemini-2.5-flash-preview-04-17" # You can test other models for speed or accuracy

generated_json_results = CreateGeminiPrompt_SDP_Scan_Json_Results(tables_to_create_aspects[1]["project_id"],
                                                                  tables_to_create_aspects[1]["dataset_name"],
                                                                  tables_to_create_aspects[1]["table_name"],
                                                                  tables_to_create_aspects[1]["sample_data_json"],
                                                                  "This table holds product data for our order system",
                                                                  model)

print(json.dumps(generated_json_results, indent=2))

### <font color='#4285f4'>Automated Data Governance - Sample Code Generation from Sensitive Data Protection Json Results</font>

In [None]:
entryGroupLocation = "us"
bigqueryProjectId = "governed-data-umyps6maku"
bigqueryDataset = "governed_data_curated"
bigqueryTable = "customer"
governedTableEntryTypeId = "governed-table"
governedTableEntryTypeLocation = "global"
dataGovernanceAspectId = "data-governance-aspect-type-metadataTemplate"
dataSensitivityAspectId = "data-sensitivity-aspect-type-metadataTemplate"
governedColumnEntryTypeLocation = "global"

aspects = {
    f"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}": {
        "data": {
            # "data-steward": "*** You would want to preserve your existing value. ***",
            # "owner-group": "*** You would want to preserve your existing value. ***",
            # "business-owner": "*** You would want to preserve your existing value. ***",
            # "documentation-url": "*** You would want to preserve your existing value. ***",
            # "data-lifecycle": "*** You would want to preserve your existing value. ***",
            "classification-level": saved_for_code_example_generated_json_results["table"]["classification-level"],
            "data-sensitivity-level": saved_for_code_example_generated_json_results["table"]["data-sensitivity-level"],
            "contains-pii": saved_for_code_example_generated_json_results["table"]["contains-pii"]
        }
    }
}

for item in saved_for_code_example_generated_json_results["columns"]:
  aspects[f"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{item['column-name']}"] = {
      "data": {
          "contains-pii": item["contains-pii"],
          "pii-type": item["pii-type"],
          "data-sensitivity-level": item["data-sensitivity-level"],
          "compliance-requirements": item["compliance-requirements"],
      }
  }

print("######################################################################")
print("Generated Aspects")
print("######################################################################")
print(json.dumps(aspects, indent=2))

print()
print("######################################################################")
print("You would then run this code from notebook: 04-Data-Governance")
print("######################################################################")
print("""updateDataplexSystemEntry_BigQueryTable(project_id,
                                        entryGroupLocation,
                                        bigqueryProjectId, bigqueryDataset, bigqueryTable,
                                        governedTableEntryTypeId, governedTableEntryTypeLocation,
                                        aspects)""")

### <font color='#4285f4'>Sample Get Aspects for BigQuery Table</font>

https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryGroups.entries/get

You will need to replace the following
- MY-PROJECT-ID         "your GCP project id"
- MY-BIGQUERY-LOCATION  "us"
- MY-DATASET            "governed_data_raw"
- MY-TABLE              "customer"

```
curl -H 'Content-Type: application/json' -H "Authorization: Bearer $(gcloud -q auth print-access-token)" \
https://dataplex.googleapis.com/v1/projects/MY-PROJECT-ID/locations/MY-BIGQUERY-LOCATION/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/MY-PROJECT-ID/datasets/MY-DATASET/tables/MY-TABLE?view=ALL
```

Sample Return JSON
```
{
  "name": "projects/MY-PROJECT/locations/us/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/MY-PROJECT/datasets/governed_data_raw/tables/customer",
  "entryType": "projects/00000000000/locations/global/entryTypes/bigquery-table",
  "createTime": "2025-02-11T14:14:06.261127Z",
  "updateTime": "2025-03-25T18:36:49.648322Z",
  "aspects": {
    "00000000000.global.data-domain-aspect-type": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-domain-aspect-type",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "zone": "Raw"
      },
      "aspectSource": {}
    },
    "00000000000.global.data-governance-aspect-type": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-governance-aspect-type",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "data-steward": "Jane Doe",
        "owner-group": "data-governance-team",
        "business-owner": "Jane Doe",
        "documentation-url": "http://yourcompany.com/customer-table-documentation",
        "data-lifecycle": "Dev",
        "classification-level": "Restricted",
        "data-sensitivity-level": "High",
        "contains-pii": true
      },
      "aspectSource": {}
    },
    "00000000000.global.data-retention-aspect-type": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-retention-aspect-type",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "retention-days": 365,
        "retention-policy": "http://yourcompany.com/retention-policy"
      },
      "aspectSource": {}
    },
    "00000000000.global.my-aspect-type": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/my-aspect-type",
      "createTime": "2025-02-11T14:35:25.359859Z",
      "updateTime": "2025-02-11T14:35:25.359859Z",
      "data": {
        "type": "VIEW"
      },
      "aspectSource": {}
    },
    "00000000000.global.bigquery-table": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/bigquery-table",
      "createTime": "2025-02-11T14:14:06.261127Z",
      "updateTime": "2025-02-12T22:16:06.170015Z",
      "data": {
        "type": "TABLE",
        "tableType": "TABLE"
      },
      "aspectSource": {
        "createTime": "2025-02-11T14:14:04.298Z",
        "updateTime": "2025-02-12T22:16:05.825Z",
        "dataVersion": "Ingestion/1.0.0"
      }
    },
    "00000000000.global.contacts": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/contacts",
      "createTime": "2025-02-11T14:35:25.963924Z",
      "updateTime": "2025-03-25T18:36:49.613151Z",
      "data": {
        "identities": [
          {
            "role": "Data Steward",
            "name": "Jane Doe"
          },
          {
            "role": "Owner",
            "name": "Data Team"
          }
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.overview": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/overview",
      "createTime": "2025-02-11T14:35:25.963924Z",
      "updateTime": "2025-03-25T18:36:49.613151Z",
      "data": {
        "content": "This is customer master data and contains PII."
      },
      "aspectSource": {}
    },
    "00000000000.global.schema": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/schema",
      "createTime": "2025-02-11T14:14:06.261127Z",
      "updateTime": "2025-02-12T22:16:06.170015Z",
      "data": {
        "fields": [
          {
            "name": "customer_id",
            "description": "Unique identifier for the customer.",
            "dataType": "INTEGER",
            "metadataType": "NUMBER",
            "mode": "NULLABLE"
          },
          {
            "name": "first_name",
            "description": "The first name of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "last_name",
            "description": "The last name of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "email",
            "description": "The email address of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "phone",
            "description": "The phone number of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "gender",
            "description": "The gender of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "ip_address",
            "description": "The IP address of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "ssn",
            "description": "The Social Security Number of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "address",
            "description": "The street address of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "city",
            "description": "The city of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "state",
            "description": "The state of the customer.",
            "dataType": "STRING",
            "metadataType": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "zip",
            "description": "The zip code of the customer.",
            "dataType": "INTEGER",
            "metadataType": "NUMBER",
            "mode": "NULLABLE"
          }
        ]
      },
      "aspectSource": {
        "createTime": "2025-02-11T14:14:04.298Z",
        "updateTime": "2025-02-12T22:16:05.825Z",
        "dataVersion": "Ingestion/1.0.0"
      }
    },
    "00000000000.global.storage": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/storage",
      "createTime": "2025-02-11T14:14:06.261127Z",
      "updateTime": "2025-02-12T22:16:06.170015Z",
      "data": {
        "service": "BIGQUERY",
        "resourceName": "//bigquery.googleapis.com/projects/MY-PROJECT/datasets/governed_data_raw/tables/customer"
      },
      "aspectSource": {
        "createTime": "2025-02-11T14:14:04.298Z",
        "updateTime": "2025-02-12T22:16:05.825Z",
        "dataVersion": "Ingestion/1.0.0"
      }
    },
    "00000000000.global.usage": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/usage",
      "createTime": "2025-02-25T08:53:38.070737Z",
      "updateTime": "2025-03-16T09:15:37.503692Z",
      "data": {},
      "aspectSource": {
        "createTime": "2025-02-23T07:59:59.982Z",
        "updateTime": "2025-03-16T09:13:14.158097Z"
      }
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.city": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.city",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": false,
        "data-sensitivity-level": "Low"
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.zip": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.zip",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": false,
        "data-sensitivity-level": "Low"
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.customer_id": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.customer_id",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": false,
        "data-sensitivity-level": "Low"
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.ssn": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.ssn",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": true,
        "pii-type": "ssn",
        "data-sensitivity-level": "High",
        "compliance-requirements": [
          "GDPR",
          "CCPA"
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.email": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.email",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": true,
        "pii-type": "Email",
        "data-sensitivity-level": "High",
        "compliance-requirements": [
          "GDPR",
          "CCPA"
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.address": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.address",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": true,
        "pii-type": "Street Address",
        "data-sensitivity-level": "High",
        "compliance-requirements": [
          "GDPR",
          "CCPA"
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.ip_address": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.ip_address",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": true,
        "pii-type": "IP Address",
        "data-sensitivity-level": "High",
        "compliance-requirements": [
          "GDPR",
          "CCPA"
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.phone": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.phone",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": true,
        "pii-type": "Phone Number",
        "data-sensitivity-level": "High",
        "compliance-requirements": [
          "GDPR",
          "CCPA"
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.first_name": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.first_name",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": true,
        "pii-type": "Name",
        "data-sensitivity-level": "High",
        "compliance-requirements": [
          "GDPR",
          "CCPA"
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.last_name": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.last_name",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": true,
        "pii-type": "Name",
        "data-sensitivity-level": "High",
        "compliance-requirements": [
          "GDPR",
          "CCPA"
        ]
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.state": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.state",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": false,
        "data-sensitivity-level": "Low"
      },
      "aspectSource": {}
    },
    "00000000000.global.data-sensitivity-aspect-type@Schema.gender": {
      "aspectType": "projects/00000000000/locations/global/aspectTypes/data-sensitivity-aspect-type",
      "path": "Schema.gender",
      "createTime": "2025-02-11T14:35:27.271045Z",
      "updateTime": "2025-02-11T14:35:27.271045Z",
      "data": {
        "contains-pii": false,
        "data-sensitivity-level": "Low"
      },
      "aspectSource": {}
    }
  },
  "parentEntry": "projects/MY-PROJECT/locations/us/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/MY-PROJECT/datasets/governed_data_raw",
  "fullyQualifiedName": "bigquery:MY-PROJECT.governed_data_raw.customer",
  "entrySource": {
    "resource": "projects/MY-PROJECT/datasets/governed_data_raw/tables/customer",
    "system": "BIGQUERY",
    "platform": "GCP",
    "displayName": "customer",
    "description": "Table containing customer raw information.",
    "labels": {
      "dataplex-data-documentation-published-location": "us-central1",
      "dataplex-dp-published-scan": "governed-data-raw-customer-profile-scan",
      "dataplex-dp-published-project": "MY-PROJECT",
      "dataplex-dq-published-scan": "governed-data-raw-customer-quality-scan",
      "dataplex-dp-published-location": "us-central1",
      "dataplex-data-documentation-published-project": "MY-PROJECT",
      "dataplex-dq-published-project": "MY-PROJECT",
      "dataplex-dq-published-location": "us-central1",
      "dataplex-data-documentation-published-scan": "governed-data-raw-customer-insight-scan"
    },
    "ancestors": [
      {
        "name": "projects/MY-PROJECT/datasets/governed_data_raw",
        "type": "dataplex-types.global.bigquery-dataset"
      }
    ],
    "createTime": "2025-02-11T14:14:04.298Z",
    "updateTime": "2025-02-12T22:16:05.825Z",
    "location": "us"
  }
}
```

### <font color='#4285f4'>Clean Up</font>

In [None]:
# Placeholder

### <font color='#4285f4'>Reference Links</font>


- [REPLACE-ME](https://REPLACE-ME)