-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 New Source: GCS #23186
🎉 New Source: GCS #23186
Conversation
@sh4sh @natalyjazzviolin can you please take a look? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @tuanchris 👋 thanks for the contribution. I tried to test the connector. It is working reading one file but not a folder.
In my case I created the folder test_folder
and added two csv files.
The config below doesn't work:
{
"gcs_path": "test_folder/",
"gcs_bucket": "airbyte-integration-test-source-gcs",
"service_account": "..."
}
Also it is missing the documentation and instructions to setup the connector.
Today the connector it isn't different from the File using GCS option. Probably if the connector work reading a folder it can be accepted, probably would be better a strategy to read incremental.
airbyte-integrations/connectors/source-gcs/source_gcs/source.py
Outdated
Show resolved
Hide resolved
Thanks @marcosmarxm for the review. I have:
And here's the discovery results: {
"type": "CATALOG",
"catalog": {
"streams": [
{
"name": "film",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"film_id": { "type": "string" },
"title": { "type": "string" },
"release_year": { "type": "string" },
"language_id": { "type": "string" },
"rental_duration": { "type": "string" },
"rental_rate": { "type": "string" },
"replacement_cost": { "type": "string" },
"rating": { "type": "string" }
}
},
"supported_sync_modes": ["full_refresh"]
},
{
"name": "actor",
"json_schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"actor_id": { "type": "string" },
"first_name": { "type": "string" },
"last_name": { "type": "string" }
}
},
"supported_sync_modes": ["full_refresh"]
}
]
}
} |
airbyte-integrations/connectors/source-gcs/source_gcs/source.py
Outdated
Show resolved
Hide resolved
Co-authored-by: sh4sh <6833405+sh4sh@users.noreply.github.com>
airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-gcs/source_gcs/source.py
Outdated
Show resolved
Hide resolved
….yaml Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left one more comment, otherwise looks good to me
@marcosmarxm please proceed with this PR when you have a chance |
cc @sh4sh can you pick up this one please? I guess @marcosmarxm is unavailable |
/test connector=connectors/source-gcs
Build PassedTest summary info:
|
/publish connector=connectors/source-gcs
if you have connectors that successfully published but failed definition generation, follow step 4 here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tuanchris
* initial commit * fix test error * Update get_gcs_blobs logic * add docs * Update source_definitions.yaml * Update airbyte-integrations/connectors/source-gcs/source_gcs/source.py Co-authored-by: sh4sh <6833405+sh4sh@users.noreply.github.com> * Update airbyte-config/init/src/main/resources/seed/source_definitions.yaml Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * Update airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * Update airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * update docker file for pandas package * reimplement read_csv file * add logic to filter selected streams * close file_obj after reading * fix format and tests * add another stream * auto-bump connector version --------- Co-authored-by: Sunny <6833405+sh4sh@users.noreply.github.com> Co-authored-by: Denys Davydov <davydov.den18@gmail.com> Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* initial commit * fix test error * Update get_gcs_blobs logic * add docs * Update source_definitions.yaml * Update airbyte-integrations/connectors/source-gcs/source_gcs/source.py Co-authored-by: sh4sh <6833405+sh4sh@users.noreply.github.com> * Update airbyte-config/init/src/main/resources/seed/source_definitions.yaml Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * Update airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * Update airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * update docker file for pandas package * reimplement read_csv file * add logic to filter selected streams * close file_obj after reading * fix format and tests * add another stream * auto-bump connector version --------- Co-authored-by: Sunny <6833405+sh4sh@users.noreply.github.com> Co-authored-by: Denys Davydov <davydov.den18@gmail.com> Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* initial commit * fix test error * Update get_gcs_blobs logic * add docs * Update source_definitions.yaml * Update airbyte-integrations/connectors/source-gcs/source_gcs/source.py Co-authored-by: sh4sh <6833405+sh4sh@users.noreply.github.com> * Update airbyte-config/init/src/main/resources/seed/source_definitions.yaml Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * Update airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * Update airbyte-integrations/connectors/source-gcs/source_gcs/helpers.py Co-authored-by: Denys Davydov <davydov.den18@gmail.com> * update docker file for pandas package * reimplement read_csv file * add logic to filter selected streams * close file_obj after reading * fix format and tests * add another stream * auto-bump connector version --------- Co-authored-by: Sunny <6833405+sh4sh@users.noreply.github.com> Co-authored-by: Denys Davydov <davydov.den18@gmail.com> Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Hi! Any chance that we can have this connector working with parquet also? |
Can you open a feature request? |
What
Add new source GCS. This connecter will:
#11135 @YowanR
How
google-cloud-storage
database and a service accountget_blobs
method to list all blobs.csv
filesFuture improvements to be made:
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes