Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Linnworks: improve streams ProcessedOrders and ProcessedOrderDetails #8226

Merged
merged 23 commits into from
Dec 16, 2021

Conversation

monai
Copy link
Contributor

@monai monai commented Nov 24, 2021

What

This PR makes two changes to the following streams:

  1. It changes cursor field from dReceivedDate to dProcessedOn in the stream ProcessedOrders. The latter field more accurately reflects the stream's nature and allows the implementation of the following change.
  2. It converts streams ProcessedOrderDetails sync mode to incremental. Both streams contain a property with a
    processed timestamp. Therefore, it can be transparently passed to the parent stream ProcessedOrder and make the child stream incremental.

Other changes:

  1. Add missing catalog and configured catalog to the integration tests.

Recommended reading order

  1. streams.py

🚨 User Impact 🚨

🚨🚨 The cursor field of stream ProcessedOrders changed from dReceivedDate to dProcessedOn.

Pre-merge Checklist

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@github-actions github-actions bot added the area/connectors Connector related issues label Nov 24, 2021
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Nov 24, 2021
@monai monai marked this pull request as ready for review November 24, 2021 12:33
@alafanechere alafanechere self-assigned this Dec 1, 2021
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @monai, thanks for this contrib and improvements. I'm facing some error running the acceptance test that are related to schema validation during test_read. Looks like we have empty field that are not accepted by your schema.

{"type": "LOG", "log": {"level": "ERROR", "message": "\nThe stock_locations stream has the following schema errors:\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BinRack']:\n    {'description': 'Bin rack', 'type': 'string'}\n\nOn instance['BinRack']:\n    None"}}
{"type": "LOG", "log": {"level": "ERROR", "message": "\nThe stock_items stream has the following schema errors:\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['StockLevels']['items']['properties']['SKU']:\n    {'description': 'Product SKU', 'type': 'string'}\n\nOn instance['StockLevels'][1]['SKU']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['Images']['items']['properties']['ChecksumValue']:\n    {'description': 'Internal checksum value', 'type': 'string'}\n\nOn instance['Images'][0]['ChecksumValue']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['Images']['items']['properties']['RawChecksum']:\n    {'description': 'Raw file checksum (Used for UI to determine if the '\n                    'image file is the same before submitting for upload)',\n     'type': 'string'}\n\nOn instance['Images'][0]['RawChecksum']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['PackageGroupName']:\n    {'description': 'Default package group name', 'type': 'string'}\n\nOn instance['PackageGroupName']:\n    None"}}
{"type": "LOG", "log": {"level": "ERROR", "message": "\nThe processed_orders stream has the following schema errors:\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['cShippingAddress']:\n    {'description': \"Customer's shipping address\", 'type': 'string'}\n\nOn instance['cShippingAddress']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['Vendor']:\n    {'description': 'Courier name (e.g. DPD)', 'type': 'string'}\n\nOn instance['Vendor']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingEmailAddress']:\n    {'type': 'string'}\n\nOn instance['BillingEmailAddress']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['PackageCategory']:\n    {'description': 'Package category', 'type': 'string'}\n\nOn instance['PackageCategory']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['PackageTitle']:\n    {'description': 'Package name', 'type': 'string'}\n\nOn instance['PackageTitle']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['FolderCollection']:\n    {'description': 'Folder name of an order', 'type': 'string'}\n\nOn instance['FolderCollection']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['cBillingAddress']:\n    {'description': 'Customer billing address', 'type': 'string'}\n\nOn instance['cBillingAddress']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingName']:\n    {'description': 'Customer billing name', 'type': 'string'}\n\nOn instance['BillingName']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingCompany']:\n    {'description': 'Customer billing company', 'type': 'string'}\n\nOn instance['BillingCompany']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingAddress1']:\n    {'description': 'Billing address line one', 'type': 'string'}\n\nOn instance['BillingAddress1']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingAddress2']:\n    {'description': 'Billing address line two', 'type': 'string'}\n\nOn instance['BillingAddress2']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingAddress3']:\n    {'description': 'Billing address line three', 'type': 'string'}\n\nOn instance['BillingAddress3']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingTown']:\n    {'description': 'Billing town', 'type': 'string'}\n\nOn instance['BillingTown']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingRegion']:\n    {'description': 'Billing region, area, county', 'type': 'string'}\n\nOn instance['BillingRegion']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingPostCode']:\n    {'description': 'Billing postcode', 'type': 'string'}\n\nOn instance['BillingPostCode']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingCountryName']:\n    {'description': 'Billing country', 'type': 'string'}\n\nOn instance['BillingCountryName']:\n    None\n--------------------------------------------------------------------------------\nNone is not of type 'string'\n\nFailed validating 'type' in schema['properties']['BillingPhoneNumber']:\n    {'description': 'Billing phone number', 'type': 'string'}\n\nOn instance['BillingPhoneNumber']:\n    None"}}

EDIT: I made the required changes to the catalogs / schema to make the acceptance test pass with our sandbox Linnwork account. But I'm afraid that the main build of this connector is failing for a reason I can't figure out:

atched_error = ContainerError('Command \'discover --config tap_config.json\' in image \'sha256:a83076bf3e0e5ccee0a9cdbd2d7571e126a450... validation_error.message) from None\nException: Config validation error: \'application_id\' is a required property\n')

Our CI credentials are exactly the same that I use locally...

@@ -207,12 +207,12 @@ def request_body_data(
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
) -> MutableMapping[str, Any]:
request = {
"DateField": "received",
"DateField": "processed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this field depends of the cursor storing a mapping between cursor field and date field could be interesting in case of future updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objects that this stream pulls have several date fields indicating the order's lifecycle, i.e., received, processed, canceled, etc. See the documentation.

The received date is the first date that is set upon order creation. The processed date will always be later. Therefore in the case of existing data, after connector upgrade, some orders might be refetched. But that's not a problem at all. In full_refresh sync, they get duplicated anyway, and in incremental sync, they get deduped according to primary_key.

Have I answered your concern? If not, would you please elaborate more on what concerns you with this change?

@monai
Copy link
Contributor Author

monai commented Dec 8, 2021

Hi @monai, thanks for this contrib and improvements. I'm facing some error running the acceptance test that are related to schema validation during test_read. Looks like we have empty field that are not accepted by your schema.

It might be the change in their API. However, documentation hasn't changed yet. I had the same issue with the 3PL Central connector, where documentation wasn't updated together with the code.

EDIT: I made the required changes to the catalogs / schema to make the acceptance test pass with our sandbox Linnwork account. But I'm afraid that the main build of this connector is failing for a reason I can't figure out:

atched_error = ContainerError('Command \'discover --config tap_config.json\' in image \'sha256:a83076bf3e0e5ccee0a9cdbd2d7571e126a450... validation_error.message) from None\nException: Config validation error: \'application_id\' is a required property\n')

Our CI credentials are exactly the same that I use locally...

It looks like something's wrong with your CI setup. Generated file tap_config.json in a container doesn't have the property application_id.

The code works for me with two different accounts as expected:

~/projects/lt/airbyte/airbyte-integrations/connectors/source-linnworks (linnworks-pod) [um] git rev-parse HEAD
9fa54aec5a6bc4efedfd06921094d710f851c100
~/projects/lt/airbyte/airbyte-integrations/connectors/source-linnworks (linnworks-pod) [um] python main.py discover --config secrets/config.json | jq
{
  "type": "CATALOG",
  "catalog": {
    "streams": [
      {
        "name": "stock_locations",
        "json_schema": {
          "$schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "additionalProperties": false,
...

@alafanechere
Copy link
Contributor

It looks like something's wrong with your CI setup. Generated file tap_config.json in a container doesn't have the property application_id.

Yes we're trying to figure this out, will keep you updated.

@alafanechere alafanechere temporarily deployed to more-secrets December 16, 2021 19:38 Inactive
@alafanechere alafanechere merged commit a0ec0de into airbytehq:master Dec 16, 2021
@monai monai deleted the linnworks-pod branch December 22, 2021 16:30
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants