Destination Azure Blob Storage: Support writing timestamps #6063

ghost · 2021-09-14T16:57:44Z

Tell us about the problem you're trying to solve

I'd like to be able to use Azure Blob Storage (or S3 / GCS) as a durable data lake while also facilitating quick loads into a DW, like Snowflake and BigQuery.

Describe the solution you’d like

The option to add append the current timestamp (_airbyte_emitted_at) to the resulting filename in Cloud Storage. This would allow incremental reads to create individual files that can be loaded, queried, managed efficiently.

Describe the alternative you’ve considered or used

An alternative would be to manage a larger workflow outside of Airbyte that loads the file, copies to a durable location, and then removes the original.

Another possible alternative may be to enhance DW Destinations that leverage Cloud Storage by allowing the user to retain the staged data, as opposed to removing it automatically. I could see value in both enhancements.

Additional context

Similar to #4610

Are you willing to submit a PR?

Perhaps. :)

andriikorotkov · 2022-01-19T17:05:44Z

Hello, @dsdorazio and @sherifnada. I would like to share with you my vision of this task. Now, I have opened a pull request, in which S3 and GCS can be used as staging. The data of each synchronization is saved in a separate file on the staging, and in Azure Blob Storage there is always one blob with the airbyte stream name with actual data. Is this solution right for you?

If the solution that I described is not suitable, please describe in more detail the solution that you propose.

tuliren · 2022-01-20T01:38:32Z

Let's move our discussion from #9336 to here.

I have opened a pull request, in which S3 and GCS can be used as staging.

I don't think this is the right solution. The purpose of the Azure blob storage destination is to store objects directly on Azure. Adding the S3 or GCS as staging area unnecessarily copy the data first to S3 or GCS, and then to Azure.

What's the current filename outputted by Azure destination? It seems to me that if the Azure output filename follows a similar pattern as the S3 or GCS, a timestamp will be included in the filename.

Here is what the S3 destination filename looks like:

https://docs.airbyte.com/integrations/destinations/s3#configuration

testing_bucket/data_output_path/public/users/2021_01_01_1609541171643_0.csv
↑              ↑                ↑      ↑     ↑          ↑             ↑ ↑
|              |                |      |     |          |             | format extension
|              |                |      |     |          |             partition id
|              |                |      |     |          upload time in millis
|              |                |      |     upload date in YYYY-MM-DD
|              |                |      stream name
|              |                source namespace (if it exists)
|              bucket path
bucket name

Same for GCS:
https://docs.airbyte.com/integrations/destinations/gcs#configuration

andriikorotkov · 2022-01-20T08:18:13Z

@tuliren, The current filename outputted by Azure destination - it is stream name. Also, all blobs are stored in one container, which is specified by the user when creating the destination.

testing_container/blob_name
↑                          ↑ 
|                          |
|                         stream name
|
user container name

Are you suggesting changing this to -

container_with_stream_name/2021_01_01_1609541171643_0

Or is the next option better?

testing_container/stream_name__2021_01_01_1609541171643_0

tuliren · 2022-01-20T08:27:15Z

Got it. I think keeping the same pattern as S3 should be good:

<bucket>/<output_path>/<namespace-if-there-is-one>/<stream-name>/2021_01_01_1609541171643_0.csv

tuliren · 2022-01-24T04:14:55Z

@andriikorotkov, was a previous comment deleted? Although folders are not supported in Azure, the object path can have / in it so that it looks like a traditional path.

ghost added the type/enhancement New feature or request label Sep 14, 2021

sherifnada added the lang/java label Sep 15, 2021

andriikorotkov mentioned this issue Nov 1, 2021

[EPIC] Support all possible data-types for database destinations #6996

Open

9 tasks

sherifnada added the area/connectors Connector related issues label Nov 15, 2021

andriikorotkov self-assigned this Dec 20, 2021

karinakuz added connectors/destination/azure-blob-storage connectors/destinations-api labels Jan 6, 2022

alexandr-shegeda linked a pull request Jan 11, 2022 that will close this issue

🎉 Add S3 and GCS stagings for Azure Blob Storage destination #9336

Closed

40 tasks

karinakuz added connectors/destinations-files and removed connectors/destinations-api labels Jan 13, 2022

DoNotPanicUA mentioned this issue Jan 18, 2022

S3, GCS Destinations : Fix interface duplication #9577

Merged

tuliren mentioned this issue Jan 19, 2022

🎉 Add S3 and GCS stagings for Azure Blob Storage destination #9336

Closed

40 tasks

andriikorotkov linked a pull request Jan 24, 2022 that will close this issue

🎉 Updated azure blob storage destination #9682

Merged

40 tasks

andriikorotkov removed a link to a pull request Jan 24, 2022

🎉 Add S3 and GCS stagings for Azure Blob Storage destination #9336

Closed

40 tasks

andriikorotkov closed this as completed Apr 11, 2022

bleonard added the team/connectors-java label Apr 19, 2022

rx007 mentioned this issue Dec 27, 2023

[Snyk] Security upgrade axios from 0.21.1 to 1.6.3 rx007/airbyte#2209

Open

mresposito mentioned this issue Dec 27, 2023

[Snyk] Security upgrade axios from 0.21.1 to 1.6.3 candulabs/airbyte#181

Open

mresposito mentioned this issue Jan 5, 2024

[Snyk] Security upgrade axios from 0.21.1 to 1.6.4 candulabs/airbyte#182

Open

rx007 mentioned this issue Jan 6, 2024

[Snyk] Security upgrade axios from 0.21.1 to 1.6.4 rx007/airbyte#2210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Destination Azure Blob Storage: Support writing timestamps #6063

Destination Azure Blob Storage: Support writing timestamps #6063

ghost commented Sep 14, 2021

andriikorotkov commented Jan 19, 2022 •

edited

Loading

tuliren commented Jan 20, 2022

andriikorotkov commented Jan 20, 2022 •

edited

Loading

tuliren commented Jan 20, 2022

tuliren commented Jan 24, 2022

Destination Azure Blob Storage: Support writing timestamps #6063

Destination Azure Blob Storage: Support writing timestamps #6063

Comments

ghost commented Sep 14, 2021

Tell us about the problem you're trying to solve

Describe the solution you’d like

Describe the alternative you’ve considered or used

Additional context

Are you willing to submit a PR?

andriikorotkov commented Jan 19, 2022 • edited Loading

tuliren commented Jan 20, 2022

andriikorotkov commented Jan 20, 2022 • edited Loading

tuliren commented Jan 20, 2022

tuliren commented Jan 24, 2022

andriikorotkov commented Jan 19, 2022 •

edited

Loading

andriikorotkov commented Jan 20, 2022 •

edited

Loading