Skip to content

Conversation

@pnilan
Copy link
Contributor

@pnilan pnilan commented Oct 17, 2025

What

Updates/removes various dependencies for compatibility with connector-builder-mcp project.

  • Unpins jsonref on v1
  • Removes numpy dependency
  • Reverts ddtrace to latest stable due to yanked version

Summary by CodeRabbit

  • Chores
    • Removed numpy as a direct dependency; added jsonref and updated ddtrace for improved compatibility.
  • Bug Fixes
    • Improved handling of missing numeric values.
    • Enhanced Excel date/time detection to better map spreadsheet date columns to expected formats.

@github-actions github-actions bot added the chore label Oct 17, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@pnilan/cdk/dep-version-bumps#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch pnilan/cdk/dep-version-bumps

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

PyTest Results (Fast)

3 802 tests  ±0   3 790 ✅ ±0   6m 39s ⏱️ +7s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit da9f206. ± Comparison against base commit 55ea028.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

PyTest Results (Full)

3 805 tests  ±0   3 793 ✅ ±0   11m 18s ⏱️ +3s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit da9f206. ± Comparison against base commit 55ea028.

♻️ This comment has been updated with latest results.

@pnilan pnilan marked this pull request as ready for review October 17, 2025 17:49
Copilot AI review requested due to automatic review settings October 17, 2025 17:49
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Removed the numpy dependency and replaced numpy-specific imports: math.nan now used instead of numpy.nan, and pandas' is_datetime64_any_dtype used instead of numpy.issubdtype. Updated public dependencies: added jsonref and relaxed/changed ddtrace version in pyproject.toml.

Changes

Cohort / File(s) Summary
Dependency changes
pyproject.toml
Removed numpy from [tool.poetry.dependencies]; added/changed jsonref entry (now ^1); updated ddtrace version constraint to ^3.
Response extractor update
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
Replaced from numpy import nan with from math import nan (usage unchanged).
Excel parser dtype handling
airbyte_cdk/sources/file_based/file_types/excel_parser.py
Removed explicit NumPy dtype imports; changed ExcelParser.dtype_to_json_type parameter type to Any; replaced issubdtype(dtype, datetime64) with pd.api.types.is_datetime64_any_dtype(dtype) for datetime detection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

dependencies

Suggested reviewers

  • aaronsteers
  • brianjlai
  • pedroslopez

Would you like me to also run a quick checklist of places to verify compatibility (e.g., CI matrix, any transitive deps that required numpy) or is this sufficient?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title "chore: update jsonref, remove numpy dep, update ddtrace" accurately reflects the main changes in this pull request. All three items mentioned in the title are present in the changeset: jsonref is updated in pyproject.toml, numpy is removed as a dependency, and ddtrace is updated from ^3.12.3 to ^3. The title is concise, specific, and avoids vague terminology. While the PR also includes code refactoring to replace numpy imports with alternatives (math.nan and pandas APIs), the primary objective centers on dependency management, which the title appropriately captures.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pnilan/cdk/dep-version-bumps

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the explicit numpy dependency from the project by replacing direct numpy imports with pandas and standard library alternatives, updates jsonref to v1, and reverts ddtrace to an earlier version.

  • Removed explicit numpy dependency constraint from pyproject.toml
  • Updated jsonref from ~0.2 to ^1
  • Reverted ddtrace from ^3.12.3 to ^3.12.0
  • Replaced numpy imports with pandas API calls and Python standard library

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
pyproject.toml Updated jsonref to v1, removed numpy constraint, and reverted ddtrace to v3.12.0
airbyte_cdk/sources/file_based/file_types/excel_parser.py Replaced numpy dtype functions with pandas API equivalents
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py Replaced numpy.nan with math.nan from standard library

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking comment. The other are nits

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
airbyte_cdk/sources/file_based/file_types/excel_parser.py (1)

154-166: Use pandas dtype predicates to correctly classify object/number/bool

The current dtype classification logic is fragile and incomplete:

  • dtype is object relies on identity checks that vary across numpy/pandas versions
  • dtype in number_types only detects int64/float64, missing int32, uint64, float32, etc.
  • dtype == "bool" uses string comparison instead of dtype introspection
  • Yet the code already uses pd.api.types.is_datetime64_any_dtype() elsewhere, creating inconsistency

Switching to pandas predicates throughout (is_object_dtype, is_bool_dtype, is_numeric_dtype) is more robust and handles edge cases better. The test suite confirms the proposed behavior remains equivalent.

-        number_types = ("int64", "float64")
         if current_type == "string":
             # Previous column values were of the string type, no need to look further.
             return current_type
-        if dtype is object:
-            return "string"
-        if dtype in number_types and (not current_type or current_type == "number"):
-            return "number"
-        if dtype == "bool" and (not current_type or current_type == "boolean"):
-            return "boolean"
+        # Prefer pandas predicates for robust dtype detection
+        if pd.api.types.is_object_dtype(dtype):
+            return "string"
+        if pd.api.types.is_bool_dtype(dtype) and (not current_type or current_type == "boolean"):
+            return "boolean"
+        if pd.api.types.is_numeric_dtype(dtype) and (not current_type or current_type == "number"):
+            return "number"
         if pd.api.types.is_datetime64_any_dtype(dtype):
             return "date-time"
         return "string"

Would you adopt this approach?

🧹 Nitpick comments (2)
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (2)

3-3: Remove unused import to keep tests tidy?

math isn’t used in this test file. Dropping it avoids linter noise, wdyt?

-import math

41-49: DRY up repetitive None assertions?

Would you prefer a compact loop to assert the same invariant across fields, keeping the test concise, wdyt?

-        assert extracted_records[0]["CITY"] is None
-        assert extracted_records[0]["STATE_PROVINCE_REGION"] is None
-        assert extracted_records[0]["POSTAL_CODE"] is None
-        assert extracted_records[0]["COUNTRY"] is None
-        assert extracted_records[0]["ALTERNATE_EMAILS"] is None
-        assert extracted_records[0]["PHONE_NUMBER"] is None
-        assert extracted_records[0]["WHATSAPP"] is None
-        assert extracted_records[0]["FACEBOOK"] is None
-        assert extracted_records[0]["UNIQUE_NAME"] is None
+        for field in [
+            "CITY",
+            "STATE_PROVINCE_REGION",
+            "POSTAL_CODE",
+            "COUNTRY",
+            "ALTERNATE_EMAILS",
+            "PHONE_NUMBER",
+            "WHATSAPP",
+            "FACEBOOK",
+            "UNIQUE_NAME",
+        ]:
+            assert extracted_records[0][field] is None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abc0184 and b592c5d.

⛔ Files ignored due to path filters (2)
  • unit_tests/sources/declarative/extractors/decompressed_response.csv is excluded by !**/*.csv
  • unit_tests/sources/declarative/extractors/nan_response.csv is excluded by !**/*.csv
📒 Files selected for processing (2)
  • airbyte_cdk/sources/file_based/file_types/excel_parser.py (2 hunks)
  • unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (1)
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py (1)
  • extract_records (155-176)
🪛 GitHub Actions: Generate Docs
airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype' while defining ExcelParser.dtype. This occurred during docs generation step (poetry run poe docs-generate). Possible pandas version incompatibility.

🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: mypy: Name "pd.ExtensionDtype" is not defined [name-defined]. Ensure pandas is imported as 'pd' and that 'ExtensionDtype' is accessible, or adjust the type usage.

🪛 GitHub Actions: Pytest (Fast)
airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype'. This may be due to an incompatibility with the installed pandas version.


[error] Excel parser initialization failed due to the AttributeError in pandas usage. See test failure in unit_tests/sources/file_based/test_file_based_scenarios.py.

🪛 GitHub Actions: PyTest Matrix
airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-google-drive
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-shopify
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b592c5d and 1ce8768.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/file_based/file_types/excel_parser.py (2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: mypy failed: Function 'pandas.core.frame.DataFrame.dtypes' is not valid as a type. Perhaps you need 'Callable[...]' or a callback protocol? (during step 'poetry run mypy --config-file mypy.ini airbyte_cdk')

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-google-drive
  • GitHub Check: Check: source-intercom
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte_cdk/sources/file_based/file_types/excel_parser.py (1)

164-164: Nice migration to pandas API!

The switch from numpy's issubdtype to pd.api.types.is_datetime64_any_dtype() is the correct pandas-native approach and aligns perfectly with removing the numpy dependency.

@pnilan pnilan force-pushed the pnilan/cdk/dep-version-bumps branch from 1ce8768 to c11d7cd Compare October 17, 2025 21:00
@pnilan pnilan changed the title chore: bump jsonref, remove numpy dep, revert ddtrace to unyanked ver chore: update jsonref, remove numpy dep, update ddtrace Oct 17, 2025
Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@pnilan pnilan merged commit 041c201 into main Oct 17, 2025
29 of 31 checks passed
@pnilan pnilan deleted the pnilan/cdk/dep-version-bumps branch October 17, 2025 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants