### <font color='#4285f4'>Overview</font>

Overview: Demostrates Row and Column security along with Data Masking

Process Flow:
1.  Create the Taxonomy
2.  Create the Parent Policy
    a.  Create Child Policy Tags under the Parent Policy Tag (these are hierarchical, you can nest many levels)
        i.  Credit Card
        ii. Predicted Credit Amount
        iii. SSN
        iv. IP Address
        v.  Email
        vi. Address
    b.  Create Data Masking Policies
    c.  Assign Column Level Security and Data Masking Security (you should understand the rule hierarchy - [link](link_placeholder))
3.  Update each BigQuery table setting the policy tag on each column
4.  Run test queries (separate tables are used for demo purposes)
    a.  You can optionally use a separate user to see how security works for a second user.


Notes:
* RLS, CLS and Data Masking work on BigLake tables (CSV, Parquet, etc), Native Tables and BigLake tables on OMNI

Cost:
* Approximate cost: less than $1

Author:
* Adam Paternostro

In [None]:
# Architecture Diagram
from IPython.display import Image
Image(url='https://storage.googleapis.com/data-analytics-golden-demo/colab-diagrams/BigQuery-Data-Governance-CLS-RLS-DM.png', width=1000)

### <font color='#4285f4'>Video Walkthrough</font>

[Video](https://storage.googleapis.com/REPLACE-ME.mp4)


In [None]:
from IPython.display import HTML

HTML("""
<video width="800" height="600" controls>
  <source src="https://storage.googleapis.com/REPLACE-ME.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")

### <font color='#4285f4'>License</font>

```
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
```

### <font color='#4285f4'>Pip installs</font>

In [None]:
# PIP Installs (if necessary)
import sys

# !{sys.executable} -m pip install REPLACE-ME

### <font color='#4285f4'>Initialize</font>

In [None]:
from PIL import Image
from IPython.display import HTML
import IPython.display
import google.auth
import requests
import json
import uuid
import base64
import os
import cv2
import random
import time
import datetime
import base64
import random

import logging
from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception

In [None]:
# Set these (run this cell to verify the output)

bigquery_location = "${bigquery_location}"
dataplex_region = "${dataplex_region}"
location = "${dataplex_region}"

# Get the current date and time
now = datetime.datetime.now()

# Format the date and time as desired
formatted_date = now.strftime("%Y-%m-%d-%H-%M")

# Get some values using gcloud
project_id = os.environ["GOOGLE_CLOUD_PROJECT"]
user = !(gcloud auth list --filter=status:ACTIVE --format="value(account)")

if len(user) != 1:
  raise RuntimeError(f"user is not set: {user}")
user = user[0]

print(f"project_id = {project_id}")
print(f"user = {user}")

### <font color='#4285f4'>Helper Methods</font>

#### restAPIHelper
Calls the Google Cloud REST API using the current users credentials.

In [None]:
def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:
  """Calls the Google Cloud REST API passing in the current users credentials"""

  import requests
  import google.auth
  import json

  # Get an access token based upon the current user
  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request()
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
    "Content-Type" : "application/json",
    "Authorization" : "Bearer " + access_token
  }

  if http_verb == "GET":
    response = requests.get(url, headers=headers)
  elif http_verb == "POST":
    response = requests.post(url, json=request_body, headers=headers)
  elif http_verb == "PUT":
    response = requests.put(url, json=request_body, headers=headers)
  elif http_verb == "PATCH":
    response = requests.patch(url, json=request_body, headers=headers)
  elif http_verb == "DELETE":
    response = requests.delete(url, headers=headers)
  else:
    raise RuntimeError(f"Unknown HTTP verb: {http_verb}")

  if response.status_code == 200:
    return json.loads(response.content)
    #image_data = json.loads(response.content)["predictions"][0]["bytesBase64Encoded"]
  else:
    error = f"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'"
    raise RuntimeError(error)

#### RetryCondition (for retrying LLM calls)

In [None]:
def RetryCondition(error):
  error_string = str(error)
  print(error_string)

  retry_errors = [
      "RESOURCE_EXHAUSTED",
      "No content in candidate",
      # Add more error messages here as needed
  ]

  for retry_error in retry_errors:
    if retry_error in error_string:
      print("Retrying...")
      return True

  return False

#### Gemini LLM

In [None]:
@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))
def GeminiLLM(prompt, model = "gemini-2.0-flash", response_schema = None,
                 temperature = 1, topP = 1, topK = 32):

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models

  llm_response = None
  if temperature < 0:
    temperature = 0

  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request() # required to acess access token
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token
  }

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
  url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent"

  generation_config = {
    "temperature": temperature,
    "topP": topP,
    "maxOutputTokens": 8192,
    "candidateCount": 1,
    "responseMimeType": "application/json",
  }

  # Add inthe response schema for when it is provided
  if response_schema is not None:
    generation_config["responseSchema"] = response_schema

  if model == "gemini-2.0-flash":
    generation_config["topK"] = topK

  payload = {
    "contents": {
      "role": "user",
      "parts": {
          "text": prompt
      },
    },
    "generation_config": {
      **generation_config
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }

  response = requests.post(url, json=payload, headers=headers)

  if response.status_code == 200:
    try:
      json_response = json.loads(response.content)
    except Exception as error:
      raise RuntimeError(f"An error occurred parsing the JSON: {error}")

    if "candidates" in json_response:
      candidates = json_response["candidates"]
      if len(candidates) > 0:
        candidate = candidates[0]
        if "content" in candidate:
          content = candidate["content"]
          if "parts" in content:
            parts = content["parts"]
            if len(parts):
              part = parts[0]
              if "text" in part:
                text = part["text"]
                llm_response = text
              else:
                raise RuntimeError("No text in part: {response.content}")
            else:
              raise RuntimeError("No parts in content: {response.content}")
          else:
            raise RuntimeError("No parts in content: {response.content}")
        else:
          raise RuntimeError("No content in candidate: {response.content}")
      else:
        raise RuntimeError("No candidates: {response.content}")
    else:
      raise RuntimeError("No candidates: {response.content}")

    # Remove some typically response characters (if asking for a JSON reply)
    llm_response = llm_response.replace("```json","")
    llm_response = llm_response.replace("```","")
    llm_response = llm_response.replace("\n","")

    return llm_response

  else:
    raise RuntimeError(f"Error with prompt:'{prompt}'  Status:'{response.status_code}' Text:'{response.text}'")

In [None]:
@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))
def GeminiLLM_VerifyImage(prompt, imageBase64, model = "gemini-2.0-flash", response_schema = None,
                 temperature = 1, topP = 1, topK = 32):

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models

  llm_response = None
  if temperature < 0:
    temperature = 0

  creds, project = google.auth.default()
  auth_req = google.auth.transport.requests.Request() # required to acess access token
  creds.refresh(auth_req)
  access_token=creds.token

  headers = {
      "Content-Type" : "application/json",
      "Authorization" : "Bearer " + access_token
  }

  # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference
  url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent"

  generation_config = {
    "temperature": temperature,
    "topP": topP,
    "maxOutputTokens": 8192,
    "candidateCount": 1,
    "responseMimeType": "application/json",
  }

  # Add inthe response schema for when it is provided
  if response_schema is not None:
    generation_config["responseSchema"] = response_schema

  if model == "gemini-2.0-flash":
    generation_config["topK"] = topK

  payload = {
    "contents": {
      "role": "user",
      "parts": [
          { "text": prompt },
          { "inlineData": {  "mimeType": "image/png", "data": f"{imageBase64}" } }
        ]
    },
    "generation_config": {
      **generation_config
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }

  response = requests.post(url, json=payload, headers=headers)

  if response.status_code == 200:
    try:
      json_response = json.loads(response.content)
    except Exception as error:
      raise RuntimeError(f"An error occurred parsing the JSON: {error}")

    if "candidates" in json_response:
      candidates = json_response["candidates"]
      if len(candidates) > 0:
        candidate = candidates[0]
        if "content" in candidate:
          content = candidate["content"]
          if "parts" in content:
            parts = content["parts"]
            if len(parts):
              part = parts[0]
              if "text" in part:
                text = part["text"]
                llm_response = text
              else:
                raise RuntimeError("No text in part: {response.content}")
            else:
              raise RuntimeError("No parts in content: {response.content}")
          else:
            raise RuntimeError("No parts in content: {response.content}")
        else:
          raise RuntimeError("No content in candidate: {response.content}")
      else:
        raise RuntimeError("No candidates: {response.content}")
    else:
      raise RuntimeError("No candidates: {response.content}")

    # Remove some typically response characters (if asking for a JSON reply)
    llm_response = llm_response.replace("```json","")
    llm_response = llm_response.replace("```","")
    llm_response = llm_response.replace("\n","")

    return llm_response

  else:
    raise RuntimeError(f"Error with prompt:'{prompt}'  Status:'{response.status_code}' Text:'{response.text}'")

#### Helper Functions

In [None]:
def RunQuery(sql):
  import time
  from google.cloud import bigquery
  client = bigquery.Client()

  if (sql.startswith("SELECT") or sql.startswith("WITH")):
      df_result = client.query(sql).to_dataframe()
      return df_result
  else:
    job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)
    query_job = client.query(sql, job_config=job_config)

    # Check on the progress by getting the job's updated state.
    query_job = client.get_job(
        query_job.job_id, location=query_job.location
    )
    print("Job {} is currently in state {} with error result of {}".format(query_job.job_id, query_job.state, query_job.error_result))

    while query_job.state != "DONE":
      time.sleep(2)
      query_job = client.get_job(
          query_job.job_id, location=query_job.location
          )
      print("Job {} is currently in state {} with error result of {}".format(query_job.job_id, query_job.state, query_job.error_result))

    if query_job.error_result == None:
      return True
    else:
      raise Exception(query_job.error_result)

In [None]:
def PrettyPrintJson(json_string):
  json_object = json.loads(json_string)
  json_formatted_str = json.dumps(json_object, indent=2)
  return json_formatted_str

In [None]:
def GetNextPrimaryKey(fully_qualified_table_name, field_name):
  from google.cloud import bigquery
  client = bigquery.Client()
  sql = f"""
  SELECT IFNULL(MAX({field_name}),0) AS result
    FROM `{fully_qualified_table_name}`
  """
  # print(sql)
  df_result = client.query(sql).to_dataframe()
  # display(df_result)
  return df_result['result'].iloc[0] + 1

### <font color='#4285f4'>Taxonomy / Policy Helper Functions</font>

#### createTaxonomy
Creates the top level Taxonomy

In [None]:
def createTaxonomy(project_id, bigquery_location, taxonomy_name, description):
  """Creates a Taxonomy."""

  # First find the connection
  # https://cloud.google.com/data-catalog/docs/reference/rest/v1/projects.locations.taxonomies/list
  url = f"https://datacatalog.googleapis.com/v1/projects/{project_id}/locations/{bigquery_location}/taxonomies"


  # Gather existing connections
  json_result = restAPIHelper(url, "GET", None)
  print(f"createTaxonomy (GET) json_result: {json_result}")

  # Test to see if connection exists, if so return
  if "taxonomies" in json_result:
    for item in json_result["taxonomies"]:
      print(f"displayName: {item['displayName']}")
      # "projects/test/locations/us/taxonomies/2620666826070342226"
      # NOTE: We cannot test the complete name since it contains the an unknown number
      if item["displayName"] == taxonomy_name:
        print("Taxonomy already exists")
        name = item["name"]
        return name

  # Create the taxonomy
  # https://cloud.google.com/data-catalog/docs/reference/rest/v1/projects.locations.taxonomies/create
  print("Creating Taxonomy")

  url = f"https://datacatalog.googleapis.com/v1/projects/{project_id}/locations/{bigquery_location}/taxonomies"

  request_body = {
      "displayName": taxonomy_name,
      "description": description,
  }

  json_result = restAPIHelper(url, "POST", request_body)

  name = json_result["name"]
  print("Taxonomy created: ", name)
  return name

#### createPolicyTag
Creates a Taxonomy Policy Tag or Child Policy Tag

In [None]:
def createPolicyTag(project_id, taxonomy_name, policy_parent, policy_name):
  """Creates Taxonomy Policy Tag or Sub-Policy Tag"""

  # First find the connection
  # https://cloud.google.com/data-catalog/docs/reference/rest/v1/projects.locations.taxonomies/list


  url = f"https://datacatalog.googleapis.com/v1/{taxonomy_name}/policyTags"

  # Gather existing connections
  json_result = restAPIHelper(url, "GET", None)
  print(f"createTaxonomyPolicyTags (GET) json_result: {json_result}")

  # Test to see if connection exists, if so returns
  if "policyTags" in json_result:
    for item in json_result["policyTags"]:
      # print(f"displayName: {item['displayName']}")
      # "projects/test/locations/us/taxonomies/2620666826070342226"
      # NOTE: We cannot test the complete name since it contains the an unknown number
      if item["displayName"] == policy_name:
        print(f"{policy_name} already exists")
        return item["name"]


  # Create the taxonomy (High)
  # https://cloud.google.com/data-catalog/docs/reference/rest/v1/projects.locations.taxonomies.policyTags/create
  print(f"Creating Policy {policy_name}")

  url = f"https://datacatalog.googleapis.com/v1/{taxonomy_name}/policyTags"

  if policy_parent is  None:
    request_body = {
        "displayName": policy_name,
        "description": "BigQuery Data Governance Demo - " + policy_name,
    }
  else:
    request_body = {
        "parentPolicyTag" : policy_parent,
        "displayName": policy_name,
        "description": "BigQuery Data Governance Demo - " + policy_name,
    }

  json_result = restAPIHelper(url, "POST", request_body)

  policy_full_name = json_result["name"]
  print("Policy created: ", policy_full_name)

  return policy_full_name

#### securePolicyTag
Secures a policy (column level security)

In [None]:
def secureDataPolicy(project_id, bigquery_location, user, data_policy_name):
  """Secure a Data Policy."""

  # First find the IAM Permission
  # https://cloud.google.com/bigquery/docs/reference/bigquerydatapolicy/rest/v1/projects.locations.dataPolicies/getIamPolicy

  url = f"https://bigquerydatapolicy.googleapis.com/v1/{data_policy_name}:getIamPolicy"

  # Gather existing data policies
  request_body = { }
  json_result = restAPIHelper(url, "POST", request_body)
  print(f"getIamPolicy (POST) json_result: {json_result}")

  # Test for existance
  members = []
  if "bindings" in json_result:
    for item in json_result["bindings"]:
      print(f"role: {item['role']}") # I should check the role here too "roles/bigquerydatapolicy.maskedReader"
      for member in item["members"]:
        print(f"member: {member}")
        if member == "user:" + user:
          print(f"secureDataPolicy: Permissions exist member: {member}")
          return
        else:
          members.append(member)

  members.append("user:" + user)
  
  print(f"members: {members}")

  # Set IAM
  # https://cloud.google.com/bigquery/docs/reference/bigquerydatapolicy/rest/v1/projects.locations.dataPolicies/setIamPolicy
  url = f"https://bigquerydatapolicy.googleapis.com/v1/{data_policy_name}:setIamPolicy"

  request_body = {
        "policy": {
            "bindings":[
                {
                    "members": members,
                    "role": "roles/bigquerydatapolicy.maskedReader"
                    }
                ]
            },
        "updateMask" : "bindings"
        }

  json_result = restAPIHelper(url, "POST", request_body)
  print("IAM Security Set: ", data_policy_name)

#### createDataPolicy
Create a data masking policy

In [None]:
def createDataPolicy(project_id, bigquery_location, policyTag, policy_name, dataPolicyType, predefinedExpression):
  """Creates a Data Policy."""

  # First find the connection
  # https://cloud.google.com/bigquery/docs/reference/bigquerydatapolicy/rest/v1/projects.locations.dataPolicies/list?

  url = f"https://bigquerydatapolicy.googleapis.com/v1/projects/{project_id}/locations/{bigquery_location}/dataPolicies"

  # Gather existing data policies
  json_result = restAPIHelper(url, "GET", None)
  print(f"createDataPolicies (GET) json_result: {json_result}")

  # Test for policy_name
  if "dataPolicies" in json_result:
    for item in json_result["dataPolicies"]:
      # print(f"name: {item['name']}")
      if item["name"] == f"projects/{project_id}/locations/{bigquery_location}/dataPolicies/{policy_name}":
        print(f"createDataPolicy policy exists: {policy_name}")
        return item["name"]

  # Create Data Policy
  # https://cloud.google.com/bigquery/docs/reference/bigquerydatapolicy/rest/v1/projects.locations.dataPolicies/create
  url = f"https://bigquerydatapolicy.googleapis.com/v1/projects/{project_id}/locations/{bigquery_location}/dataPolicies"

  # Create
  print(f"Creating Data Policy {policy_name}")

  request_body = {
      "dataPolicyId": policy_name,
      "dataPolicyType": dataPolicyType,
      "policyTag" : policyTag,
      "dataMaskingPolicy": {
          "predefinedExpression": predefinedExpression
          }
  }

  json_result = restAPIHelper(url, "POST", request_body)

  policy_name = json_result["name"]
  print("Data Policy created: ", policy_name)

  return policy_name

#### secureDataPolicy
Secures a data policy (masking) tag

In [None]:
def secureDataPolicy(project_id, bigquery_location, user, data_policy_name):
  """Secure a Data Policy."""

  # First find the IAM Permission
  # https://cloud.google.com/bigquery/docs/reference/bigquerydatapolicy/rest/v1/projects.locations.dataPolicies/getIamPolicy

  url = f"https://bigquerydatapolicy.googleapis.com/v1/{data_policy_name}:getIamPolicy"

  # Gather existing data policies
  request_body = { }
  json_result = restAPIHelper(url, "POST", request_body)
  print(f"getIamPolicy (POST) json_result: {json_result}")

  # Test for existance
  if "bindings" in json_result:
    for item in json_result["bindings"]:
      print(f"role: {item['role']}") # I should check the role here too "roles/bigquerydatapolicy.maskedReader"
      for member in item["members"]:
        print(f"member: {member}")
        if member == "user:" + user:
          print(f"secureDataPolicy: Permissions exist member: {member}")
          return

  # Set IAM
  # https://cloud.google.com/bigquery/docs/reference/bigquerydatapolicy/rest/v1/projects.locations.dataPolicies/setIamPolicy
  url = f"https://bigquerydatapolicy.googleapis.com/v1/{data_policy_name}:setIamPolicy"

  request_body = {
        "policy": {
            "bindings":[
                {
                    "members": [ "user:" + user ],
                    "role": "roles/bigquerydatapolicy.maskedReader"
                    }
                ]
            },
        "updateMask" : "bindings"
        }

  json_result = restAPIHelper(url, "POST", request_body)
  print("IAM Security Set: ", data_policy_name)

#### getTableSchema
Retrieve a BigQuery table schema as JSON

In [None]:
def getTableSchema(project_id, dataset_name, table_name):
  import io
  import google.cloud.bigquery as bigquery

  client = bigquery.Client()

  dataset_ref = client.dataset(dataset_name, project=project_id)
  table_ref = dataset_ref.table(table_name)
  table = client.get_table(table_ref)

  f = io.StringIO("")
  client.schema_to_json(table.schema, f)
  return f.getvalue()

#### updateTableSchema
Sets the schema for a BigQuery table (CLS , Data Masking)

In [None]:
def updateTableSchema(project_id, dataset_name, table_name, new_schema):
  import io
  import google.cloud.bigquery as bigquery

  client = bigquery.Client()

  dataset_ref = client.dataset(dataset_name, project=project_id)
  table_ref = dataset_ref.table(table_name)
  table = client.get_table(table_ref)

  table.schema = new_schema
  table = client.update_table(table, ["schema"])

  print(f"Table {table_name} schema updated!")

### <font color='#4285f4'>Configure Policies</font>

| Column                  | Manager               | Employee             |
|-------------------------|-----------------------|----------------------|
| credit_card_number      | Last 4                | No Access            |
| predicted_credit_amount | Access                | No Access            |
| ssn                     | Last 4                | Nullify              |
| ip_address              | Last 4                | Hash (SHA-256)       |
| email                   | Email mask            | Email mask           |
| address                 | First four characters | Last four characters |

In [None]:
################################################################################
# Create Hierarchical Data Policies
################################################################################
taxonomy_name = project_id.lower()

################################################################################
# To see the Taxonomy open this link in a new tab: https://console.cloud.google.com/bigquery/policy-tags
################################################################################
taxonomy_name = createTaxonomy(project_id, bigquery_location, taxonomy_name, "BigQuery Data Governance Demo")
print(f"taxonomy_name: {taxonomy_name}")

In [None]:
################################################################################
# Policies [create policies on columns we want to security and/or perform data masking]
################################################################################
policy = createPolicyTag(project_id, taxonomy_name, None, "data-security")
print(f"policy: {policy}")

# Credit Card
policy_credit_card_number = createPolicyTag(project_id, taxonomy_name, policy, "policy_credit_card_number")
print(f"policy_credit_card_number: {policy_credit_card_number}")

# Predicted Credit Amount
policy_predicted_credit_amount = createPolicyTag(project_id, taxonomy_name, policy, "policy_predicted_credit_amount")
print(f"policy_predicted_credit_amount: {policy_predicted_credit_amount}")

# SSN
policy_ssn = createPolicyTag(project_id, taxonomy_name, policy, "policy_ssn")
print(f"policy_ssn: {policy_ssn}")

# IP Address
policy_ip_address = createPolicyTag(project_id, taxonomy_name, policy, "policy_ip_address")
print(f"policy_ip_address: {policy_ip_address}")
#

# Email
policy_email = createPolicyTag(project_id, taxonomy_name, policy, "policy_email")
print(f"policy_email: {policy_email}")

# Address
policy_address = createPolicyTag(project_id, taxonomy_name, policy, "policy_address")
print(f"policy_address: {policy_address}")

In [None]:
################################################################################
# Data Masking Policies
################################################################################

# Credit Card: LAST_FOUR_CHARACTERS
datamask_policy_credit_card_number_last_four = createDataPolicy(project_id, bigquery_location, policy_credit_card_number, "datamask_policy_credit_card_number_last_four", "DATA_MASKING_POLICY", "LAST_FOUR_CHARACTERS")

# Predicted Credit Amount
# No data masking needed

# SSN: LAST_FOUR_CHARACTERS
datamask_policy_ssn_last_four = createDataPolicy(project_id, bigquery_location, policy_ssn, "datamask_policy_ssn_last_four", "DATA_MASKING_POLICY", "LAST_FOUR_CHARACTERS")

# IP Address: LAST_FOUR_CHARACTERS
datamask_policy_ip_address_last_four = createDataPolicy(project_id, bigquery_location, policy_ip_address, "datamask_policy_ip_address_last_four", "DATA_MASKING_POLICY", "LAST_FOUR_CHARACTERS")

# Email
datamask_policy_email_email_mask = createDataPolicy(project_id, bigquery_location, policy_email, "datamask_policy_email_email_mask", "DATA_MASKING_POLICY", "EMAIL_MASK")

# Address
datamask_policy_address_first_four = createDataPolicy(project_id, bigquery_location, policy_address, "datamask_policy_address_first_four", "DATA_MASKING_POLICY", "FIRST_FOUR_CHARACTERS")
datamask_policy_address_last_four = createDataPolicy(project_id, bigquery_location, policy_address, "datamask_policy_address_last_four", "DATA_MASKING_POLICY", "LAST_FOUR_CHARACTERS")


In [None]:
print("################################################################################")
print(f"To view policies: https://console.cloud.google.com/bigquery/policy-tags?inv=1&project={project_id}")
print("################################################################################")

### <font color='#4285f4'>Set Permissions</font>


In [None]:
################################################################################
# Manager Permissions
################################################################################

# Credit Card Data Masking Last 4
secureDataPolicy(project_id, bigquery_location, user, datamask_policy_credit_card_number_last_four)

# Predicted Credit Amount (we set this at the policy level, not the data mask)
securePolicyTag(project_id, bigquery_location, user, policy_predicted_credit_amount)

# SSN
secureDataPolicy(project_id, bigquery_location, user, datamask_policy_ssn_last_four)

# IP Address
secureDataPolicy(project_id, bigquery_location, user, datamask_policy_ip_address_last_four)

# Email
secureDataPolicy(project_id, bigquery_location, user, datamask_policy_email_email_mask)

# Address
secureDataPolicy(project_id, bigquery_location, user, datamask_policy_address_first_four)

In [None]:
employee_user = "gemini@paternostro.altostrat.com"

In [None]:
################################################################################
# Employee Permissions
################################################################################

# Credit Card
# No access allowed

# Predicted Credit Amount
# No access allowed

# SSN
secureDataPolicy(project_id, bigquery_location, employee_user, datamask_policy_ssn_last_four)

# IP Address
secureDataPolicy(project_id, bigquery_location, employee_user, datamask_policy_ip_address_last_four)

# Email
secureDataPolicy(project_id, bigquery_location, employee_user, datamask_policy_email_email_mask)

# Address
secureDataPolicy(project_id, bigquery_location, employee_user, datamask_policy_address_last_four)

In [None]:
print("################################################################################")
print(f"To view policies: https://console.cloud.google.com/bigquery/policy-tags?inv=1&project={project_id}")
print("################################################################################")

### <font color='#4285f4'>Update the BigQuery table with the Policy Tags</font>

In [None]:
%%bigquery

CREATE OR REPLACE TABLE governed_data_curated.customer_policy_demo AS
SELECT * FROM governed_data_curated.customer;

In [None]:
# To see the existing schema
# print(getTableSchema(project_id, "governed_data_curated", "customer_policy_demo"))

In [None]:
customer_policy_demo_schema = [
  {
    "mode": "NULLABLE",
    "name": "customer_id",
    "type": "INTEGER"
  },
  {
    "mode": "NULLABLE",
    "name": "first_name",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "last_name",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "email",
    "type": "STRING",
    "policyTags": {
      "names": [
          policy_email
      ]
    }
  },
  {
    "mode": "NULLABLE",
    "name": "phone",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "gender",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "ip_address",
    "type": "STRING",
    "policyTags": {
      "names": [
          policy_ip_address
      ]
    }
  },
  {
    "mode": "NULLABLE",
    "name": "ssn",
    "type": "STRING",
    "policyTags": {
      "names": [
          policy_ssn
      ]
    }
  },
  {
    "mode": "NULLABLE",
    "name": "address",
    "type": "STRING",
    "policyTags": {
      "names": [
          policy_address
      ]
    }
  },
  {
    "mode": "NULLABLE",
    "name": "city",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "state",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "zip",
    "type": "INTEGER"
  },
  {
    "mode": "NULLABLE",
    "name": "credit_card_number",
    "type": "STRING",
    "policyTags": {
      "names": [
          policy_credit_card_number
      ]
    },
  },
  {
    "mode": "NULLABLE",
    "name": "predicted_credit_amount",
    "type": "FLOAT",
    "policyTags": {
      "names": [
          policy_predicted_credit_amount
      ]
    }
  }
]

In [None]:
updateTableSchema(project_id, "governed_data_curated", "customer_policy_demo",customer_policy_demo_schema)

### <font color='#4285f4'>Manager Access</font>

| Column                  | Manager               | Employee             |
|-------------------------|-----------------------|----------------------|
| credit_card_number      | Last 4                | No Access            |
| predicted_credit_amount | Access                | No Access            |
| ssn                     | Last 4                | Nullify              |
| ip_address              | Last 4                | Hash (SHA-256)       |
| email                   | Email mask            | Email mask           |
| address                 | First four characters | Last four characters |

In [None]:
%%bigquery

-- We query the entire table since
SELECT * FROM governed_data_curated.customer_policy_demo;

### <font color='#4285f4'>Employee Access</font>

Grant access to this project to another user and then login as that user and run the below query.

| Column                  | Manager               | Employee             |
|-------------------------|-----------------------|----------------------|
| credit_card_number      | Last 4                | No Access            |
| predicted_credit_amount | Access                | No Access            |
| ssn                     | Last 4                | Nullify              |
| ip_address              | Last 4                | Hash (SHA-256)       |
| email                   | Email mask            | Email mask           |
| address                 | First four characters | Last four characters |

```
-- You will get an error since you do not have access to the credit card or predicted credit amount
SELECT *
  FROM `${project_id}.governed_data_curated.customer_policy_demo`;
```

```
-- We can exclude the CLS fields
-- We will get data masking.
-- NOTE: The data masking for Address (last 4) compared to a manager that gets the first 4.
SELECT * EXCEPT (credit_card_number,predicted_credit_amount)
  FROM `${project_id}.governed_data_curated.customer_policy_demo`;
```

### <font color='#4285f4'>Row Level Security</font>

In [None]:
# Only allow states FL/NY for the current user
sql = f"""
CREATE OR REPLACE ROW ACCESS POLICY customer_employee_example_rls
    ON `governed_data_curated.customer_policy_demo`
    GRANT TO ("user:{user}")
FILTER USING (state = 'FL' OR state = 'NY');"""

RunQuery(sql)

In [None]:
%%bigquery

-- We query the fields we hace access to with Data Masking and Row Level Security
SELECT * EXCEPT (credit_card_number,predicted_credit_amount) FROM governed_data_curated.customer_policy_demo;

### <font color='#4285f4'>Clean Up</font>

In [None]:
%%bigquery

# Remove the row level policy
DROP ALL ROW ACCESS POLICIES ON `governed_data_curated.customer_policy_demo`;

### <font color='#4285f4'>Reference Links</font>


- [REPLACE-ME](https://REPLACE-ME)