-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
π Source Shopify: Add a graphql products stream - Invalid PR #19492
Conversation
Hey @PhilipCorr This PR is part of Airbyte's Community Maintainer program and will be reviewed by a member of our community. Please remain patient as a maintainer is assigned to review the PR. Thank you! |
Thanks @PhilipCorr , here Andres, part of the Community Maintainer program. I'll be assisting with the review of this PR. |
Great thanks @sajarin and @andresbravog! Looking forward to hearing what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π Gret work @PhilipCorr , Just a few style comments and minor type check issues
@@ -26,6 +26,9 @@ | |||
"products": { | |||
"updated_at": "2025-07-08T05:40:38-07:00" | |||
}, | |||
"products_graph_ql": { | |||
"updated_at": "2022-11-16T16:00:00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't be this date in the same format as the others including the timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a number of tests to make sure that the Graphql API returned the same data as the REST API.
This is me thinking out loud as much as anything but posting here for context too. Depending on the state timestamp format, the number of rows returned by the connector sometimes differs. Here are the results i'm getting:
Date Used | REST API | GraphQL API |
---|---|---|
2022-11-18T14:10:00 | 99 | 99 |
2022-11-18T14:30:00 | 89 | 89 |
2022-11-18T14:50:00 | 78 | 78 |
2022-11-18T14:50:00-07:00 | 0 | 79 |
2022-11-18T14:50:00:07:00 | 79 | 79 |
2022-11-18T14:50:00+07:00 | 238 | 80 |
2022-11-18T07:50:00 | 238 | 238 |
2022-11-18T14:50:00Z | 80 | 80 |
The rest api accounts for the 7 hour offset for the timezone whereas the GraphQl API doesn't.
The state being returned by the connector looks like this:
{"type": "STATE", "state": {"type": "STREAM", "stream": {"stream_descriptor": {"name": "products"}, "stream_state": {"updated_at": "2022-11-18T16:51:10+00:00"}}, "data": {"products_graph_ql": {"updated_at": "2022-11-18T14:50:00Z"}, "products": {"updated_at": "2022-11-18T16:51:10+00:00"}}}}
I.e. in my tests, the updated_at
timestamp (which is what updates the state) is such that it always includes +00:00
. This might just be because of the timezone of the shop that I'm using. Because of this particular timezone (+00:00), both apis return the same results for both apis. I'll update the hard-coded json to include the timezone. I'm curious what your thoughts are here though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that makes sense, could you review the other style issues and commit updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, yes, will do that this morning!
@@ -11,6 +11,9 @@ | |||
"products": { | |||
"updated_at": "2022-10-10T06:21:56-07:00" | |||
}, | |||
"products_graph_ql": { | |||
"updated_at": "2022-11-16T16:00:00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't be this date equal in format to the others including the timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be yes. See my comment above for some tests I ran. A date formatted with the suffix -07:00
gives different results between the REST API and the GraphQL API.
_schema_root = _schema.shopify_schema | ||
|
||
|
||
def get_query_products(first: int, filter_field: str, filter_value: str, next_page_token: Optional[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe break this on a second line to avoid pflake8 offenses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't changed when I ran black -l 120 . --extend-exclude virtualenv
so also considering it formatted correctly if you agree?
docs/integrations/sources/shopify.md
Outdated
@@ -145,7 +147,8 @@ This is expected when the connector hits the 429 - Rate Limit Exceeded HTTP Erro | |||
|
|||
| Version | Date | Pull Request | Subject | | |||
|:--------|:-----------|:----------------------------------------------------------|:----------------------------------------------------------------------------------------------------------| | |||
| 0.2.0 | 2022-10-21 | [18298](https://github.com/airbytehq/airbyte/pull/18298) | Updated API version to the `2022-10`, make stream schemas backward cpmpatible | | |||
| 0.2.0 | 2022-11-16 | [19492](https://github.com/airbytehq/airbyte/pull/19492) | Add support for graphql and add a graphql products stream | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be 0.3.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π
docs/integrations/sources/shopify.md
Outdated
|
||
This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python). | ||
|
||
## Troubleshooting | ||
|
||
Check out common troubleshooting issues for the BigQuery destination connector on our Discourse [here](https://discuss.airbyte.io/tags/c/connector/11/source-shopify). | ||
Check out common troubleshooting issues for the Shopify destination connector on our Discourse [here](https://discuss.airbyte.io/tags/c/connector/11/source-shopify). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/destination/source/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the word destination
to source
here.
return wait_time | ||
|
||
@staticmethod | ||
def get_rest_api_wait_time(*args, threshold: float = 0.9, rate_limit_header: str = "X-Shopify-Shop-Api-Call-Limit"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also was unaffected by black -l 120 . --extend-exclude virtualenv
.
wait_time = ShopifyRateLimiter.on_mid_load | ||
elif load < mid_load: | ||
wait_time = ShopifyRateLimiter.on_low_load | ||
wait_time = ShopifyRateLimiter._convert_load_to_time(load, threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like load here is type float
but an Optional[int]
is expected in the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot! Updated the expected type to be Optional[float]
else: | ||
load = None | ||
|
||
wait_time = ShopifyRateLimiter._convert_load_to_time(load, threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, load here is either None
or float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I updated the expected type for this function to be Optional[float]
) | ||
if api_type == ApiTypeEnum.rest.value: | ||
ShopifyRateLimiter.wait_time( | ||
ShopifyRateLimiter.get_rest_api_wait_time(*args, threshold=threshold, rate_limit_header=rate_limit_header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated via black -l 120 . --extend-exclude virtualenv
) | ||
elif api_type == ApiTypeEnum.graphql.value: | ||
ShopifyRateLimiter.wait_time( | ||
ShopifyRateLimiter.get_graphql_api_wait_time(*args, threshold=threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not impacted by black -l 120 . --extend-exclude virtualenv
so considering it short enough if you agree?
Run
|
Running pflake8 now gives no output as the issues have been cleared up:
|
Thanks for reviewing @andresbravog, I've made the requested changes now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π Well done @PhilipCorr
@sajarin This PR is ready to run tests and merge. |
Hi @andresbravog and @sajarin, I have unfortunately become aware that the access token I was using to test my new stream was accidentally committed to this PR. I removed the file and the access token has since been rotated, but the access token is still in the commit history here. Would you mind removing it from the commit history? I deleted my forked repo too in an effort to remove the token. I can open a new PR with the same changes as soon as this issue is resolved. Thanks and sorry for the inconvenience. |
@PhilipCorr I'd recommend modifying your repo history and reopening this PR. That should modify the history here. |
Hi @andresbravog and @sajarin, The above reopening of the PR is based on a new fork that I created. This doesn't seem to have been enough to clear the commit history though. I opened a support ticket with github to get their opinion on what should be done here. I'm in favour of just deleting the internal references and preserving the comment history for context if you think this is appropriate.
|
@sajarin @marcosmarxm we have a weird situation here, can we get help from a repository admin to delete the current PR so the credentials are not exposed any more? |
Hello π, first thank you for this amazing contribution. We really appreciate the effort you've made to improve the project. If you have any questions feel free to send me a message in Slack! |
I can't delete a PR |
@PhilipCorr I can delete the PR, only you can do it. Did you try to remove using cherrypicking the commit? |
Hi @marcosmarxm, I did cherry pick the commit via the new fork I created. However, I couldn't find a way to get the history of this PR to reference the new PR. It seems like the history of this PR is tied to the old fork. Either way, as per github support message above, are you happy to authorize deleting the internal references and preserving the comment history for this PR? I'll open a new PR which points to this for context and we can merge via that new PR. |
Hey @PhilipCorr, Deleting the internal references should be fine, we want to preserve the comment history here. Let us know if you need anything else! |
Sounds great, thanks @sajarin. I'll let github support know that was are happy to:
|
The internal references have been deleted. Closing this and will reference in a new PR. |
Thanks @PhilipCorr let me know when you create the new one so I can accelerate the Review process |
Thanks @andresbravog, I have the follow-up PR here and it's ready for review: #19789 |
This PR has been closed as invalid and is for context only.
The current valid PR is here: #19789
What
Shopify have a graphQL admin API in addition to the rest admin API. Some data is only available via the graphQL api. This PR adds support for this graphQL API. It also adds the first graphQL stream for fetching products.
There is a products stream that uses the rest API already. However, some product data can only be fetched via the use of the graphQL API. In this PR I've added the
ProductsGraphQl
stream. I've named it like this to separate it from the pre-existing rest APIProducts
stream.How
I followed the github source as an example. This already has streams that use the github graphql api.
The main thing I copied from the github source is the use of the python
sgqlc
package.To make full use of this package I ran the following command to generate the json representation of shopify's graphql api:
I then ran the folloiwng command to convert that json representation to python:
The resulting
shopify_schema.py
file is being added in this PR.This follows the examples in the sgqlc documentation closely.
Recommended reading order
shopify_schema.py
file was auto-generated. Hence why it is so long.setup.py
. This was updated to add thesgqlc
depencency.source.py
. This shows the added graphql stream.graphql.py
. This builds up the graphql queries that get sent to the graphql endpoint.utils.py
. Here,ShopifyRateLimiter
was updated as the graphql api uses slightly different rate limiting compared to the rest api.test_control_rate_limit.py
. This adds tests for the new graphql rate limiting.π¨ User Impact π¨
There should not be any breaking changes. The changes here are intended to be backwards compatible.
In order to simplify the changes here and to get quicker feedback, the fields requested via this stream are hard-coded as seen in
graphql.py
. Hard coding them like this follows how the GitHub source is currently implemented.I'm open to suggestions as to how this should be handled.
Maybe we should use the catalog here to specify which fields are included in the graphql query.
Pre-merge Checklist
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereTests
Unit
I had to add pytest-mock here. I'm not sure how these tests were working in the past without that dependency.Output:
Integration
There doesn't seem to be any integration tests here
Acceptance
Currently running into an error when I run the command as per the readme: