- 
                Notifications
    
You must be signed in to change notification settings  - Fork 31
 
fix: hubspot property chunking #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          
📝 WalkthroughWalkthroughAlways iterate request-property chunks when  Changes
 Sequence Diagram(s)sequenceDiagram
  autonumber
  participant C as Caller
  participant SR as SimpleRetriever
  participant AQP as AdditionalQueryProperties
  participant PG as Paginator
  participant PT as PaginationTracker
  C->>SR: read()
  SR->>SR: init stream_slice
  alt additional_query_properties present
    SR->>AQP: get_request_property_chunks(stream_slice)
    AQP-->>SR: chunks
    loop for each chunk
      SR->>SR: extend stream_slice with chunk
      SR->>PG: next_page(stream_slice)
      PG-->>SR: page (records)
      loop for each record
        alt property_chunking truthy
          SR->>SR: compute merge_key
          alt merge_key present
            SR->>PT: observe merged record
            SR-->>C: yield merged record
          else merge_key missing
            SR->>PT: observe record
            SR-->>C: yield raw record
          end
        else property_chunking falsy
          SR->>PT: observe record
          SR-->>C: yield raw record
        end
      end
    end
  else
    SR->>PG: next_page(stream_slice)
    PG-->>SR: page (records)
    SR->>PT: observe records
    SR-->>C: yield records
  end
    Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
 Suggested reviewers
 Would you like a short unit-test checklist covering records-without-merge-key and chunked vs non-chunked paths, wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Comment   | 
    
          👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix_hubspot_property_chunking#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix_hubspot_property_chunkingHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR: 
  | 
    
| 
           /autofix 
 
 
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
409-419: Consider extracting the duplicate record emission logic, wdyt?Lines 409-414 and 416-419 perform identical operations (observe, increment, set last_record, yield). Could these be unified to reduce duplication?
For example, you could extract this into a helper:
def _emit_record(self, record: Record, pagination_tracker: PaginationTracker) -> Tuple[int, Record]: """Emit a record and update tracking state.""" pagination_tracker.observe(record) return 1, record # page_size_increment, last_recordThen use it in both places:
if merge_key: _deep_merge(merged_records[merge_key], current_record) else: - # We should still emit records even if the record did not have a merge key - pagination_tracker.observe(current_record) - last_page_size += 1 - last_record = current_record - yield current_record + page_size_increment, last_record = self._emit_record(current_record, pagination_tracker) + last_page_size += page_size_increment + yield current_record else: - pagination_tracker.observe(current_record) - last_page_size += 1 - last_record = current_record - yield current_record + page_size_increment, last_record = self._emit_record(current_record, pagination_tracker) + last_page_size += page_size_increment + yield current_recordThis is just a suggestion to improve maintainability - the current implementation is functionally correct!
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
airbyte_cdk/sources/declarative/requesters/query_properties/property_chunking.py (1)
get_merge_key(70-71)airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)
observe(510-543)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
[error] 399-406: ruff format --diff detected formatting changes needed in this file. Exit code 1 after showing the diff; please run the formatter to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: destination-motherduck
 - GitHub Check: Check: source-intercom
 - GitHub Check: Check: source-pokeapi
 - GitHub Check: Check: source-shopify
 - GitHub Check: Check: source-hardcoded-records
 - GitHub Check: SDM Docker Image Build
 - GitHub Check: Manifest Server Docker Image Build
 - GitHub Check: Pytest (Fast)
 - GitHub Check: Pytest (All, Python 3.11, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.12, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.13, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.10, Ubuntu)
 
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
387-387: LGTM! The condition simplification makes sense.Checking only for
additional_query_propertiesallows property chunking to occur even whenproperty_chunkingis not defined, which aligns with the PR's intent to fix the chunking behavior.
401-414: No additional tests needed – existing testtest_simple_retriever_still_emit_records_if_no_merge_keycovers this edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (3)
1121-1121: Add a docstring to explain the test's specific purpose?This test covers an important edge case (additional_query_properties without property_chunking + pagination). A docstring would help future maintainers quickly understand what scenario this test validates, wdyt?
1174-1176: Consider settingget_initial_tokenreturn value for test robustness?Other pagination tests (e.g., lines 662, 1548) explicitly set
paginator.get_initial_token.return_value. While the test might work without it due to Mock's behavior, explicitly setting it toNonewould make the test more explicit and prevent potential issues if the retriever calls this method, wdyt?Apply this diff:
paginator = _mock_paginator() +paginator.get_initial_token.return_value = None paginator.next_page_token.side_effect = [{"next_page_token": 1}, None]
1194-1194: Strengthen theextra_fieldsassertion to verify content?The current assertion only checks that
extra_fieldsis truthy. Consider verifying:
- The actual content of
 extra_fields(should contain{"query_properties": ...})- Both
 send_requestcalls haveextra_fieldspopulated (both pages in the same property chunk)This would make the test more thorough and catch potential regressions, wdyt?
Apply this diff:
-assert requester.send_request.call_args_list[0].kwargs["stream_slice"].extra_fields +# Verify both pages have extra_fields populated (same property chunk) +first_call_extra_fields = requester.send_request.call_args_list[0].kwargs["stream_slice"].extra_fields +second_call_extra_fields = requester.send_request.call_args_list[1].kwargs["stream_slice"].extra_fields +assert first_call_extra_fields.get("query_properties") == ["first_name", "last_name", "nonary", "bracelet"] +assert second_call_extra_fields.get("query_properties") == ["first_name", "last_name", "nonary", "bracelet"]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (4)
airbyte_cdk/sources/types.py (7)
Record(21-72)data(35-36)associated_slice(39-40)StreamSlice(75-169)cursor_slice(107-112)partition(99-104)extra_fields(115-117)airbyte_cdk/sources/declarative/requesters/requester.py (1)
send_request(138-156)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
QueryProperties(14-48)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
name(118-126)name(129-131)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-intercom
 - GitHub Check: Check: destination-motherduck
 - GitHub Check: Check: source-shopify
 - GitHub Check: Check: source-pokeapi
 - GitHub Check: Check: source-hardcoded-records
 - GitHub Check: Pytest (All, Python 3.13, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.12, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.11, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.10, Ubuntu)
 - GitHub Check: Pytest (Fast)
 - GitHub Check: Manifest Server Docker Image Build
 - GitHub Check: SDM Docker Image Build
 - GitHub Check: Analyze (python)
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📦 lgtm
… for missing custom properties during incremental syncs (#68159) ## What We had a bug where during incremental syncs Hubspot CRM Search streams were not including the custom properties in the json body of the POST request so they were not getting received and emitted with records. ## How The bug was in the CDK and it was fixed in version `7.3.7` in this airbytehq/airbyte-python-cdk#797 We need to bump the version of SDM to get the fix, but in addition, we need to upgrade the unit_test `pyproject.toml` which is still on v6. I've also added a new test that validates that properties are indeed populated in the outbound request. And with the bump from v6 to v7 I fixed the tests which have now changed. **Note**: It does feel like we have something of a gap where our unit tests don't properly test CDK changes since the two are independently versioned... This is something we may want to investigate and solve so these types of things don't happen again ## Can this PR be safely reverted and rolled back? - [ ] YES 💚 - [ ] NO ❌ Kind of... If we do this wrong then we have to reset customers back to their previous state, but this is no different than the state we were previously in
What
Follow up to https://airbytehq-team.slack.com/archives/C02U9R3AF37/p1760561662456189
Summary by CodeRabbit