chore: update jsonref, remove numpy dep, update ddtrace #800

pnilan · 2025-10-17T17:27:35Z

What

Updates/removes various dependencies for compatibility with connector-builder-mcp project.

Unpins jsonref on v1
Removes numpy dependency
Reverts ddtrace to latest stable due to yanked version

Summary by CodeRabbit

Chores
- Removed numpy as a direct dependency; added jsonref and updated ddtrace for improved compatibility.
Bug Fixes
- Improved handling of missing numeric values.
- Enhanced Excel date/time detection to better map spreadsheet date columns to expected formats.

github-actions · 2025-10-17T17:27:52Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@pnilan/cdk/dep-version-bumps#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch pnilan/cdk/dep-version-bumps

Helpful Resources

CDK API Reference

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

github-actions · 2025-10-17T17:35:36Z

PyTest Results (Fast)

3 802 tests ±0 3 790 ✅ ±0 6m 39s ⏱️ +7s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit da9f206. ± Comparison against base commit 55ea028.

♻️ This comment has been updated with latest results.

github-actions · 2025-10-17T17:40:27Z

PyTest Results (Full)

3 805 tests ±0 3 793 ✅ ±0 11m 18s ⏱️ +3s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit da9f206. ± Comparison against base commit 55ea028.

♻️ This comment has been updated with latest results.

coderabbitai · 2025-10-17T17:49:38Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Removed the numpy dependency and replaced numpy-specific imports: math.nan now used instead of numpy.nan, and pandas' is_datetime64_any_dtype used instead of numpy.issubdtype. Updated public dependencies: added jsonref and relaxed/changed ddtrace version in pyproject.toml.

Changes

Cohort / File(s)	Summary
Dependency changes `pyproject.toml`	Removed `numpy` from `[tool.poetry.dependencies]`; added/changed `jsonref` entry (now ^1); updated `ddtrace` version constraint to `^3`.
Response extractor update `airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py`	Replaced `from numpy import nan` with `from math import nan` (usage unchanged).
Excel parser dtype handling `airbyte_cdk/sources/file_based/file_types/excel_parser.py`	Removed explicit NumPy dtype imports; changed `ExcelParser.dtype_to_json_type` parameter type to `Any`; replaced `issubdtype(dtype, datetime64)` with `pd.api.types.is_datetime64_any_dtype(dtype)` for datetime detection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

dependencies

Suggested reviewers

aaronsteers
brianjlai
pedroslopez

Would you like me to also run a quick checklist of places to verify compatibility (e.g., CI matrix, any transitive deps that required numpy) or is this sufficient?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Title Check	✅ Passed	The title "chore: update jsonref, remove numpy dep, update ddtrace" accurately reflects the main changes in this pull request. All three items mentioned in the title are present in the changeset: jsonref is updated in pyproject.toml, numpy is removed as a dependency, and ddtrace is updated from ^3.12.3 to ^3. The title is concise, specific, and avoids vague terminology. While the PR also includes code refactoring to replace numpy imports with alternatives (math.nan and pandas APIs), the primary objective centers on dependency management, which the title appropriately captures.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pnilan/cdk/dep-version-bumps

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull Request Overview

This PR removes the explicit numpy dependency from the project by replacing direct numpy imports with pandas and standard library alternatives, updates jsonref to v1, and reverts ddtrace to an earlier version.

Removed explicit numpy dependency constraint from pyproject.toml
Updated jsonref from ~0.2 to ^1
Reverted ddtrace from ^3.12.3 to ^3.12.0
Replaced numpy imports with pandas API calls and Python standard library

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File	Description
pyproject.toml	Updated jsonref to v1, removed numpy constraint, and reverted ddtrace to v3.12.0
airbyte_cdk/sources/file_based/file_types/excel_parser.py	Replaced numpy dtype functions with pandas API equivalents
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py	Replaced numpy.nan with math.nan from standard library

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

airbyte_cdk/sources/file_based/file_types/excel_parser.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

maxi297

One blocking comment. The other are nits

pyproject.toml

airbyte_cdk/sources/file_based/file_types/excel_parser.py

airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

airbyte_cdk/sources/file_based/file_types/excel_parser.py (1)
154-166: Use pandas dtype predicates to correctly classify object/number/bool

The current dtype classification logic is fragile and incomplete:

dtype is object relies on identity checks that vary across numpy/pandas versions

dtype in number_types only detects int64/float64, missing int32, uint64, float32, etc.

dtype == "bool" uses string comparison instead of dtype introspection

Yet the code already uses pd.api.types.is_datetime64_any_dtype() elsewhere, creating inconsistency

Switching to pandas predicates throughout (is_object_dtype, is_bool_dtype, is_numeric_dtype) is more robust and handles edge cases better. The test suite confirms the proposed behavior remains equivalent.
-        number_types = ("int64", "float64")
         if current_type == "string":
             # Previous column values were of the string type, no need to look further.
             return current_type
-        if dtype is object:
-            return "string"
-        if dtype in number_types and (not current_type or current_type == "number"):
-            return "number"
-        if dtype == "bool" and (not current_type or current_type == "boolean"):
-            return "boolean"
+        # Prefer pandas predicates for robust dtype detection
+        if pd.api.types.is_object_dtype(dtype):
+            return "string"
+        if pd.api.types.is_bool_dtype(dtype) and (not current_type or current_type == "boolean"):
+            return "boolean"
+        if pd.api.types.is_numeric_dtype(dtype) and (not current_type or current_type == "number"):
+            return "number"
         if pd.api.types.is_datetime64_any_dtype(dtype):
             return "date-time"
         return "string"
Would you adopt this approach?

🧹 Nitpick comments (2)

unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (2)

3-3: Remove unused import to keep tests tidy?

math isn’t used in this test file. Dropping it avoids linter noise, wdyt?

-import math

41-49: DRY up repetitive None assertions?

Would you prefer a compact loop to assert the same invariant across fields, keeping the test concise, wdyt?

-        assert extracted_records[0]["CITY"] is None
-        assert extracted_records[0]["STATE_PROVINCE_REGION"] is None
-        assert extracted_records[0]["POSTAL_CODE"] is None
-        assert extracted_records[0]["COUNTRY"] is None
-        assert extracted_records[0]["ALTERNATE_EMAILS"] is None
-        assert extracted_records[0]["PHONE_NUMBER"] is None
-        assert extracted_records[0]["WHATSAPP"] is None
-        assert extracted_records[0]["FACEBOOK"] is None
-        assert extracted_records[0]["UNIQUE_NAME"] is None
+        for field in [
+            "CITY",
+            "STATE_PROVINCE_REGION",
+            "POSTAL_CODE",
+            "COUNTRY",
+            "ALTERNATE_EMAILS",
+            "PHONE_NUMBER",
+            "WHATSAPP",
+            "FACEBOOK",
+            "UNIQUE_NAME",
+        ]:
+            assert extracted_records[0][field] is None

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abc0184 and b592c5d.

⛔ Files ignored due to path filters (2)

unit_tests/sources/declarative/extractors/decompressed_response.csv is excluded by !**/*.csv
unit_tests/sources/declarative/extractors/nan_response.csv is excluded by !**/*.csv

📒 Files selected for processing (2)

airbyte_cdk/sources/file_based/file_types/excel_parser.py (2 hunks)
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (1)

airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py (1)

extract_records (155-176)

🪛 GitHub Actions: Generate Docs

airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype' while defining ExcelParser.dtype. This occurred during docs generation step (poetry run poe docs-generate). Possible pandas version incompatibility.

🪛 GitHub Actions: Linters

airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: mypy: Name "pd.ExtensionDtype" is not defined [name-defined]. Ensure pandas is imported as 'pd' and that 'ExtensionDtype' is accessible, or adjust the type usage.

🪛 GitHub Actions: Pytest (Fast)

airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype'. This may be due to an incompatibility with the installed pandas version.

[error] Excel parser initialization failed due to the AttributeError in pandas usage. See test failure in unit_tests/sources/file_based/test_file_based_scenarios.py.

🪛 GitHub Actions: PyTest Matrix

airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-google-drive
GitHub Check: Check: source-intercom
GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-hardcoded-records
GitHub Check: Check: source-shopify
GitHub Check: Manifest Server Docker Image Build
GitHub Check: SDM Docker Image Build
GitHub Check: Analyze (python)

airbyte_cdk/sources/file_based/file_types/excel_parser.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b592c5d and 1ce8768.

📒 Files selected for processing (1)

airbyte_cdk/sources/file_based/file_types/excel_parser.py (2 hunks)

🧰 Additional context used

🪛 GitHub Actions: Linters

airbyte_cdk/sources/file_based/file_types/excel_parser.py

[error] 142-142: mypy failed: Function 'pandas.core.frame.DataFrame.dtypes' is not valid as a type. Perhaps you need 'Callable[...]' or a callback protocol? (during step 'poetry run mypy --config-file mypy.ini airbyte_cdk')

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Check: source-shopify
GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-google-drive
GitHub Check: Check: source-intercom
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.13, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.12, Ubuntu)
GitHub Check: Manifest Server Docker Image Build
GitHub Check: Pytest (Fast)

🔇 Additional comments (1)

airbyte_cdk/sources/file_based/file_types/excel_parser.py (1)

164-164: Nice migration to pandas API!

The switch from numpy's issubdtype to pd.api.types.is_datetime64_any_dtype() is the correct pandas-native approach and aligns perfectly with removing the numpy dependency.

airbyte_cdk/sources/file_based/file_types/excel_parser.py

aaronsteers

Looks good!

bump jsonref, remove numpy dep, revert ddtrace to unyanked ver

6f092c5

github-actions bot added the chore label Oct 17, 2025

pnilan temporarily deployed to PyPi October 17, 2025 17:28 — with GitHub Actions Inactive

pnilan marked this pull request as ready for review October 17, 2025 17:49

Copilot AI review requested due to automatic review settings October 17, 2025 17:49

Copilot AI reviewed Oct 17, 2025

View reviewed changes

airbyte_cdk/sources/file_based/file_types/excel_parser.py Outdated Show resolved Hide resolved

coderabbitai bot approved these changes Oct 17, 2025

View reviewed changes

Update airbyte_cdk/sources/file_based/file_types/excel_parser.py

abc0184

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pnilan requested review from aaronsteers, brianjlai and maxi297 October 17, 2025 18:02

maxi297 reviewed Oct 17, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

airbyte_cdk/sources/file_based/file_types/excel_parser.py Outdated Show resolved Hide resolved

airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py Show resolved Hide resolved

coderabbitai bot requested changes Oct 17, 2025

View reviewed changes

airbyte_cdk/sources/file_based/file_types/excel_parser.py Show resolved Hide resolved

coderabbitai bot requested changes Oct 17, 2025

View reviewed changes

airbyte_cdk/sources/file_based/file_types/excel_parser.py Outdated Show resolved Hide resolved

revert type

c11d7cd

pnilan force-pushed the pnilan/cdk/dep-version-bumps branch from 1ce8768 to c11d7cd Compare October 17, 2025 21:00

coderabbitai bot approved these changes Oct 17, 2025

View reviewed changes

updates ddtrace to latest

da9f206

pnilan changed the title ~~chore: bump jsonref, remove numpy dep, revert ddtrace to unyanked ver~~ chore: update jsonref, remove numpy dep, update ddtrace Oct 17, 2025

aaronsteers approved these changes Oct 17, 2025

View reviewed changes

pnilan merged commit 041c201 into main Oct 17, 2025
29 of 31 checks passed

pnilan deleted the pnilan/cdk/dep-version-bumps branch October 17, 2025 21:54

chore: update jsonref, remove numpy dep, update ddtrace #800

chore: update jsonref, remove numpy dep, update ddtrace #800

Uh oh!

Conversation

pnilan commented Oct 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Summary by CodeRabbit

Uh oh!

github-actions bot commented Oct 17, 2025

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

Helpful Resources

PR Slash Commands

Uh oh!

github-actions bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

github-actions bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

coderabbitai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

maxi297 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aaronsteers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pnilan commented Oct 17, 2025 •

edited by coderabbitai bot

Loading

github-actions bot commented Oct 17, 2025 •

edited

Loading

github-actions bot commented Oct 17, 2025 •

edited

Loading

coderabbitai bot commented Oct 17, 2025 •

edited

Loading