Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source hubspot: add incremental streams #24711

Merged
merged 34 commits into from Apr 6, 2023

Conversation

artem1205
Copy link
Collaborator

What

Resolving #19265

How

add cursors with filtering by date

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

no breaking changes

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)

  • Secrets in the connector's spec are annotated with airbyte_secret

  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.

  • Code reviews completed

  • Connector version has been incremented

  • Documentation updated

    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Mar 30, 2023
@artem1205
Copy link
Collaborator Author

artem1205 commented Mar 30, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4567442638
❌ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4567442638
🐛 https://gradle.com/s/vwbdmtbt6x3sg

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_incremental.py::TestIncremental::test_two_sequential_reads[inputs0]
FAILED test_incremental.py::TestIncremental::test_read_sequential_slices[inputs0]
FAILED test_incremental.py::TestIncremental::test_state_with_abnormally_large_values[inputs0]
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
============= 3 failed, 38 passed, 1 skipped in 215.25s (0:03:35) ==============

@artem1205
Copy link
Collaborator Author

artem1205 commented Mar 30, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4568020523
❌ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4568020523
🐛 https://gradle.com/s/3x72qlcfvyh76

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_incremental.py::TestIncremental::test_two_sequential_reads[inputs0]
FAILED test_incremental.py::TestIncremental::test_read_sequential_slices[inputs0]
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
============= 2 failed, 39 passed, 1 skipped in 220.61s (0:03:40) ==============

record_value = (
pendulum.parse(record.get(self.cursor_field)).int_timestamp
if isinstance(record.get(self.cursor_field), str)
else record.get(self.cursor_field) // 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored. date-fields is stored in milliseconds from now.

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 3, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4598592727
❌ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4598592727
🐛 https://gradle.com/s/ogfkpzus73hry

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestDiscovery::test_backward_compatibility[inputs0] - co...
FAILED test_incremental.py::TestIncremental::test_two_sequential_reads[inputs0]
FAILED test_incremental.py::TestIncremental::test_read_sequential_slices[inputs0]
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
============= 3 failed, 38 passed, 1 skipped in 215.16s (0:03:35) ==============

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 4, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4605352973
❌ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4605352973
🐛 https://gradle.com/s/wsbz3uqkrydus

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestDiscovery::test_backward_compatibility[inputs0] - co...
FAILED test_core.py::TestBasicRead::test_read[inputs0] - Failed: Stream conta...
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
============= 2 failed, 39 passed, 1 skipped in 337.25s (0:05:37) ==============

…9265

# Conflicts:
#	airbyte-integrations/connectors/source-hubspot/Dockerfile
#	airbyte-integrations/connectors/source-hubspot/unit_tests/test_streams.py
#	docs/integrations/sources/hubspot.md
@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 4, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4605920544
❌ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4605920544
🐛 https://gradle.com/s/tn22tny3lqnxa

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestBasicRead::test_read[inputs0] - Failed: Stream conta...
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:587: Backward compatibility tests are disabled for version 0.4.0.
============= 1 failed, 39 passed, 2 skipped in 342.96s (0:05:42) ==============

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 4, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4606734075
✅ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4606734075
Python tests coverage:

Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       2      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70      3    96%
source_hubspot/streams.py       868     77    91%
source_hubspot/source.py         60     19    68%
-------------------------------------------------
TOTAL                          1008     99    90%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       2      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70     10    86%
source_hubspot/source.py         60     14    77%
source_hubspot/streams.py       868    231    73%
-------------------------------------------------
TOTAL                          1008    255    75%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:587: Backward compatibility tests are disabled for version 0.4.0.
================== 40 passed, 2 skipped in 356.54s (0:05:56) ===================

@artem1205 artem1205 requested a review from lazebnyi April 4, 2023 10:47
@artem1205 artem1205 requested a review from a team April 4, 2023 16:42
@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 4, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4610872693
❌ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4610872693
🐛 https://gradle.com/s/q32pyj3sk7yke

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestBasicRead::test_read[inputs0] - Failed: Stream engag...
FAILED test_incremental.py::TestIncremental::test_state_with_abnormally_large_values[inputs0]
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
============= 2 failed, 39 passed, 1 skipped in 393.62s (0:06:33) ==============

],
)
def test_streams_read(stream, endpoint, requests_mock, common_params, fake_properties_list):
def test_streams_read(stream, endpoint, cursor_value, requests_mock, common_params, fake_properties_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this test that the filtering works properly? at the SemiIncrementalStream level? I think no, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the testfix for full refresh, as cursor value is not the same for all streams

@@ -664,6 +666,57 @@ def _flat_associations(self, records: Iterable[MutableMapping]) -> Iterable[Muta
yield record


class SemiIncrementalStream(Stream, IncrementalMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a better name for this pattern is ClientSideIncrementalStream

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a note to this doc that there are two kinds of incremental sync: server-side incremental sync (i.e: where the hubspot API returns only the data updated or generated since the last sync) and client-side incremental sync (where the Hubspot API returns all the data / doesn't allow filtering and the connector filters only the records which were updated) and update the Supported Streams list to incidcate which sync types they support

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs updated

record_value, datetime_format = (
(pendulum.parse(record.get(self.cursor_field)), "YYYY-MM-DDTHH:mm:ss.SSSSSSZ")
if isinstance(record.get(self.cursor_field), str)
else (pendulum.from_format(str(record.get(self.cursor_field)), "x"), "x")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a smaller comment but fwiw this should be done using inversion of control i.e: the super class should not know what the subclasses do. Instead the subclasses should have a way to provide the datetime object given their knowledge of their own date format.

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 5, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4618436154
✅ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4618436154
Python tests coverage:

Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       2      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70      3    96%
source_hubspot/streams.py       881     77    91%
source_hubspot/source.py         60     19    68%
-------------------------------------------------
TOTAL                          1021     99    90%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       2      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70     10    86%
source_hubspot/source.py         60     14    77%
source_hubspot/streams.py       881    231    74%
-------------------------------------------------
TOTAL                          1021    255    75%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
================== 41 passed, 1 skipped in 355.74s (0:05:55) ===================

@artem1205
Copy link
Collaborator Author

@sherifnada , agreed with inversion of control principle. I refactored and added @abstractmethod for cursor_field_datetime_format.
Unittest for ClientSideIncrementalStream also added

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 6, 2023

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4630526828
✅ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/4630526828
Python tests coverage:

Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       2      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70      3    96%
source_hubspot/streams.py       881     77    91%
source_hubspot/source.py         60     19    68%
-------------------------------------------------
TOTAL                          1021     99    90%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       2      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70     10    86%
source_hubspot/source.py         60     14    77%
source_hubspot/streams.py       881    229    74%
-------------------------------------------------
TOTAL                          1021    253    75%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
================== 41 passed, 1 skipped in 384.31s (0:06:24) ===================

Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing from the connector ops side:

  • Changes to acceptance-test-config.yml and sample files used in it
  • Version bumping, semver, changelog

Comment on lines -163 to -166
"stream": {
"name": "form_submissions",
"json_schema": {},
"supported_sync_modes": ["full_refresh"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not include this stream in the incremental catalog example/acceptance test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stream is empty, while for incremental tests, all streams should produce at least 1 record

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we don't have empty_streams for incremental tests, got it. Thanks for the explanation!

@@ -34,5 +34,5 @@ COPY source_hubspot ./source_hubspot
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.4.0
LABEL io.airbyte.version=0.5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compatible new features: minor release 👌🏻

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 6, 2023

/publish connector=connectors/source-hubspot

🕑 Publishing the following connectors:
connectors/source-hubspot
https://github.com/airbytehq/airbyte/actions/runs/4631300905


Connector Did it publish? Were definitions generated?
connectors/source-hubspot

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

…remental-19265' into artem1205/source-hubspot-add-incremental-19265
…9265

# Conflicts:
#	airbyte-config/init/src/main/resources/seed/source_definitions.yaml
#	airbyte-config/init/src/main/resources/seed/source_specs.yaml
@artem1205 artem1205 merged commit 120cb14 into master Apr 6, 2023
24 of 25 checks passed
@artem1205 artem1205 deleted the artem1205/source-hubspot-add-incremental-19265 branch April 6, 2023 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/hubspot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants