- 
                Notifications
    
You must be signed in to change notification settings  - Fork 31
 
chore: update jsonref, remove numpy dep, update ddtrace #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@pnilan/cdk/dep-version-bumps#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch pnilan/cdk/dep-version-bumpsHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR: 
  | 
    
| 
          
 Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughRemoved the numpy dependency and replaced numpy-specific imports:  Changes
 Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
 Suggested reviewers
 Would you like me to also run a quick checklist of places to verify compatibility (e.g., CI matrix, any transitive deps that required numpy) or is this sufficient? Pre-merge checks and finishing touches✅ Passed checks (3 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Comment   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes the explicit numpy dependency from the project by replacing direct numpy imports with pandas and standard library alternatives, updates jsonref to v1, and reverts ddtrace to an earlier version.
- Removed explicit numpy dependency constraint from pyproject.toml
 - Updated jsonref from ~0.2 to ^1
 - Reverted ddtrace from ^3.12.3 to ^3.12.0
 - Replaced numpy imports with pandas API calls and Python standard library
 
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description | 
|---|---|
| pyproject.toml | Updated jsonref to v1, removed numpy constraint, and reverted ddtrace to v3.12.0 | 
| airbyte_cdk/sources/file_based/file_types/excel_parser.py | Replaced numpy dtype functions with pandas API equivalents | 
| airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py | Replaced numpy.nan with math.nan from standard library | 
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blocking comment. The other are nits
        
          
                airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (1)
airbyte_cdk/sources/file_based/file_types/excel_parser.py (1)
154-166: Use pandas dtype predicates to correctly classify object/number/boolThe current dtype classification logic is fragile and incomplete:
dtype is objectrelies on identity checks that vary across numpy/pandas versionsdtype in number_typesonly detects int64/float64, missing int32, uint64, float32, etc.dtype == "bool"uses string comparison instead of dtype introspection- Yet the code already uses
 pd.api.types.is_datetime64_any_dtype()elsewhere, creating inconsistencySwitching to pandas predicates throughout (
is_object_dtype,is_bool_dtype,is_numeric_dtype) is more robust and handles edge cases better. The test suite confirms the proposed behavior remains equivalent.- number_types = ("int64", "float64") if current_type == "string": # Previous column values were of the string type, no need to look further. return current_type - if dtype is object: - return "string" - if dtype in number_types and (not current_type or current_type == "number"): - return "number" - if dtype == "bool" and (not current_type or current_type == "boolean"): - return "boolean" + # Prefer pandas predicates for robust dtype detection + if pd.api.types.is_object_dtype(dtype): + return "string" + if pd.api.types.is_bool_dtype(dtype) and (not current_type or current_type == "boolean"): + return "boolean" + if pd.api.types.is_numeric_dtype(dtype) and (not current_type or current_type == "number"): + return "number" if pd.api.types.is_datetime64_any_dtype(dtype): return "date-time" return "string"Would you adopt this approach?
🧹 Nitpick comments (2)
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (2)
3-3: Remove unused import to keep tests tidy?
mathisn’t used in this test file. Dropping it avoids linter noise, wdyt?-import math
41-49: DRY up repetitive None assertions?Would you prefer a compact loop to assert the same invariant across fields, keeping the test concise, wdyt?
- assert extracted_records[0]["CITY"] is None - assert extracted_records[0]["STATE_PROVINCE_REGION"] is None - assert extracted_records[0]["POSTAL_CODE"] is None - assert extracted_records[0]["COUNTRY"] is None - assert extracted_records[0]["ALTERNATE_EMAILS"] is None - assert extracted_records[0]["PHONE_NUMBER"] is None - assert extracted_records[0]["WHATSAPP"] is None - assert extracted_records[0]["FACEBOOK"] is None - assert extracted_records[0]["UNIQUE_NAME"] is None + for field in [ + "CITY", + "STATE_PROVINCE_REGION", + "POSTAL_CODE", + "COUNTRY", + "ALTERNATE_EMAILS", + "PHONE_NUMBER", + "WHATSAPP", + "FACEBOOK", + "UNIQUE_NAME", + ]: + assert extracted_records[0][field] is None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
unit_tests/sources/declarative/extractors/decompressed_response.csvis excluded by!**/*.csvunit_tests/sources/declarative/extractors/nan_response.csvis excluded by!**/*.csv
📒 Files selected for processing (2)
airbyte_cdk/sources/file_based/file_types/excel_parser.py(2 hunks)unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (1)
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py (1)
extract_records(155-176)
🪛 GitHub Actions: Generate Docs
airbyte_cdk/sources/file_based/file_types/excel_parser.py
[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype' while defining ExcelParser.dtype. This occurred during docs generation step (poetry run poe docs-generate). Possible pandas version incompatibility.
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_types/excel_parser.py
[error] 142-142: mypy: Name "pd.ExtensionDtype" is not defined [name-defined]. Ensure pandas is imported as 'pd' and that 'ExtensionDtype' is accessible, or adjust the type usage.
🪛 GitHub Actions: Pytest (Fast)
airbyte_cdk/sources/file_based/file_types/excel_parser.py
[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype'. This may be due to an incompatibility with the installed pandas version.
[error] Excel parser initialization failed due to the AttributeError in pandas usage. See test failure in unit_tests/sources/file_based/test_file_based_scenarios.py.
🪛 GitHub Actions: PyTest Matrix
airbyte_cdk/sources/file_based/file_types/excel_parser.py
[error] 142-142: AttributeError: module 'pandas' has no attribute 'ExtensionDtype'.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Check: source-pokeapi
 - GitHub Check: Check: source-google-drive
 - GitHub Check: Check: source-intercom
 - GitHub Check: Check: destination-motherduck
 - GitHub Check: Check: source-hardcoded-records
 - GitHub Check: Check: source-shopify
 - GitHub Check: Manifest Server Docker Image Build
 - GitHub Check: SDM Docker Image Build
 - GitHub Check: Analyze (python)
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/file_based/file_types/excel_parser.py(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_types/excel_parser.py
[error] 142-142: mypy failed: Function 'pandas.core.frame.DataFrame.dtypes' is not valid as a type. Perhaps you need 'Callable[...]' or a callback protocol? (during step 'poetry run mypy --config-file mypy.ini airbyte_cdk')
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-shopify
 - GitHub Check: Check: destination-motherduck
 - GitHub Check: Check: source-google-drive
 - GitHub Check: Check: source-intercom
 - GitHub Check: Pytest (All, Python 3.10, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.13, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.11, Ubuntu)
 - GitHub Check: Pytest (All, Python 3.12, Ubuntu)
 - GitHub Check: Manifest Server Docker Image Build
 - GitHub Check: Pytest (Fast)
 
🔇 Additional comments (1)
airbyte_cdk/sources/file_based/file_types/excel_parser.py (1)
164-164: Nice migration to pandas API!The switch from numpy's
issubdtypetopd.api.types.is_datetime64_any_dtype()is the correct pandas-native approach and aligns perfectly with removing the numpy dependency.
1ce8768    to
    c11d7cd      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
What
Updates/removes various dependencies for compatibility with connector-builder-mcp project.
jsonrefon v1numpydependencyddtraceto latest stable due to yanked versionSummary by CodeRabbit