GCP-3078: Bind Permissions To Lowest Level Resource#73
Conversation
95daae1 to
da227cd
Compare
da227cd to
82692c1
Compare
| .param("--project", project_id) | ||
| .param("--filter", f"name={FULL_BUCKET_NAME}") | ||
| ) | ||
| if len(bucket_search) == 1: |
There was a problem hiding this comment.
nit: maybe >= 1 for extra safety?
There was a problem hiding this comment.
Bucket names must be globally unique, so it’s not possible to have a search result in more than 1 results; If I were filtering with the ~ command, it would match using a regular expression, which can return multiple results. However, in this case, we’re being intentional and using name= rather than name~ so you can only have one bucket that matches that condition
https://cloud.google.com/sdk/gcloud/reference/beta/topic/filters
There was a problem hiding this comment.
yeah makes sense/imagine its super unlikely but was mainly for the "just-in-case" pov
| .param("--project", project_id) | ||
| .param("--location", region) | ||
| .flag("--uniform-bucket-level-access") | ||
| .param("--soft-delete-duration", "0s") |
There was a problem hiding this comment.
OOC why disabling this?
| step_reporter.report(message="Enabling Dataflow Prime for the job...") | ||
| create_dataflow_job_cmd.param("--additional-experiments", "enable_prime") | ||
| elif dataflow_configuration.machine_type: | ||
| create_dataflow_job_cmd.param( |
There was a problem hiding this comment.
nit: there's no logging here for like "Setting worker machine type to {config.machine_type}" not sure if that's intentional or not
There was a problem hiding this comment.
good point, going to add it for consistency
katherinekim-51
left a comment
There was a problem hiding this comment.
would have someone else also review this but lgtm!
thekevinhuang
left a comment
There was a problem hiding this comment.
I think the change looks good. One nit, I'd probably link documentation stating min required roles in the PR description in case it is needed in the future. Maybe even screenshots in case google pulls the rug and changes the docs without telling us ;)

Summary
This change updates the Dataflow script to reduce the scope of permissions granted to the Dataflow service account. Instead of assigning multiple roles at the project level, the script now grants only the permissions required for the specific resources used by the Dataflow job.
Changes
The service account will retain only the
roles/dataflow.workerrole at the project level.It will now be granted:
A dedicated temporary storage bucket is now created for each Dataflow job in its respective region.
roles/storage.adminat the project level—an unnecessarily broad permission scope.The script now accepts additional configuration parameters for the Dataflow job, including:
isStreamingEngineEnabledparallelismbatchCountmaxNumWorkersnumWorkersUI Preview:
Additional Notes
The new bucket’s name does not follow the same naming convention as other resources. This is intentional, as GCP bucket names must be globally unique. To ensure uniqueness, the project ID is appended to the bucket name.