Skip to content

Error while using customGcsTempLocation() with Dataflow #19676

@damccorm

Description

@damccorm

I have the following code snippet which writes content to BigQuery via File Loads.

Currently the files are being written to a GCS Bucket, but I want to write them to the local file storage of Dataflow instead and want BigQuery to load data from there.

 

 

 


BigQueryIO
 .writeTableRows()
 .withNumFileShards(100)
 .withTriggeringFrequency(Duration.standardSeconds(90))

.withMethod(BigQueryIO.Write.Method.FILE_LOADS)
 .withSchema(getSchema())
 .withoutValidation()

.withCustomGcsTempLocation(new ValueProvider<String>() {
    @Override
    public String get(){

        return "/home/harshit/testFiles";     
    }
    @Override
    public boolean isAccessible(){

        return true;     
    }})
 .withTimePartitioning(new TimePartitioning().setType("DAY"))

.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
 .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)

.to(tableName));

 

 

On running this, I don't see any files being written to the provided path and the BQ load jobs fail with an IOException.

 

I looked at the docs, but I was unable to find any working example for this.

Imported from Jira BEAM-8089. Original Jira may contain additional context.
Reported by: the-dagger.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions