Skip to content

fix(sql_execution): Fix is large number check to use 2**53 as cutoff#73

Merged
tkislan merged 6 commits intomainfrom
tk/fix-is-large-number-serialization
Mar 11, 2026
Merged

fix(sql_execution): Fix is large number check to use 2**53 as cutoff#73
tkislan merged 6 commits intomainfrom
tk/fix-is-large-number-serialization

Conversation

@tkislan
Copy link
Contributor

@tkislan tkislan commented Mar 10, 2026

Before
Screenshot 2026-03-10 at 18 03 42

After
Screenshot 2026-03-10 at 20 34 21

Summary by CodeRabbit

  • Bug Fixes

    • Integers above the float64 exact range (2**53) and other unsafe numeric values are now represented as strings in query results, JSON, and Parquet exports.
  • New Features

    • Added numeric-safety utilities that detect large/unsafe numeric values and convert only affected columns to strings during serialization.
  • Tests

    • Expanded coverage for boundary values, mixed-type columns, Decimals, infinities/NaN, and large-integer behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fa1482f6-e439-461d-bbd6-c5841dc8a4cc

📥 Commits

Reviewing files that changed from the base of the PR and between ed33db8 and ed6dea6.

📒 Files selected for processing (1)
  • deepnote_toolkit/ocelots/pandas/utils.py

📝 Walkthrough

Walkthrough

Replaces a local large-number helper in deepnote_toolkit/sql/sql_execution.py with is_large_number from deepnote_toolkit.ocelots.pandas.utils and removes the local _is_large_number and Decimal import. Adds MAX_SAFE_FLOAT64_INTEGER, is_large_number, and cast_large_numbers_to_string in deepnote_toolkit/ocelots/pandas/utils.py. Calls cast_large_numbers_to_string(df_copy) from deepnote_toolkit/ocelots/pandas/implementation.py inside to_records when mode="json". Expands tests and fixtures to cover 2**53 boundary behavior, large integers, Decimals, mixed-type columns, and precision-preservation cases.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Ocelots as Ocelots.to_records
  participant Utils as utils.cast_large_numbers_to_string
  participant DF as DataFrame
  Caller->>Ocelots: to_records(df_copy, mode="json")
  Ocelots->>Utils: cast_large_numbers_to_string(df_copy)
  Utils->>DF: inspect columns using is_large_number / MAX_SAFE_FLOAT64_INTEGER
  DF-->>Utils: convert large-number columns to strings
  Utils-->>Ocelots: return sanitized df_copy
  Ocelots->>Caller: return records (from sanitized df_copy)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning No documentation updates found in deepnote-toolkit repo for new public API functions. Primary repos cannot be verified. Update docs in deepnote-toolkit repo and verify updates in deepnote/deepnote OSS and deepnote/deepnote-internal repos.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: refactoring large-number detection to use 2**53 as the float64 precision cutoff across multiple modules.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

📦 Python package built successfully!

  • Version: 2.1.2.dev15+ee8ddc8
  • Wheel: deepnote_toolkit-2.1.2.dev15+ee8ddc8-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.1.2.dev15%2Bee8ddc8/deepnote_toolkit-2.1.2.dev15%2Bee8ddc8-py3-none-any.whl"

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 10, 2026
@deepnote-bot
Copy link

deepnote-bot commented Mar 10, 2026

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-73
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-03-10 20:43:05 (UTC)
📜 Deployed commit 6128b89bd47afba10f13ecb12e7d5895c6444e95
🛠️ Toolkit version ee8ddc8

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.75%. Comparing base (2df9d55) to head (ed6dea6).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
+ Coverage   73.71%   73.75%   +0.03%     
==========================================
  Files          93       93              
  Lines        5261     5269       +8     
  Branches      764      766       +2     
==========================================
+ Hits         3878     3886       +8     
  Misses       1138     1138              
  Partials      245      245              
Flag Coverage Δ
combined 73.75% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…JSON compatibility

- Introduced `cast_large_numbers_to_string` function to convert numeric values exceeding the float64 safe integer range (2**53) to strings, preserving precision for JSON serialization.
- Updated `PandasImplementation.to_json` method to utilize the new function.
- Added unit tests to ensure correct behavior for large numbers in dataframes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/ocelots/pandas/utils.py`:
- Around line 119-120: The return statement exceeds the 88-character limit;
break it into multiple shorter lines by splitting the isinstance check and the
comparison across lines or by assigning parts to temporaries: e.g., compute a
boolean like is_num = isinstance(x, (int, float, Decimal, np.integer,
np.floating)) on one line and then return is_num and abs(x) >
MAX_SAFE_FLOAT64_INTEGER on the next, updating the return in the function
containing that line to use these shorter lines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fa49302c-c577-464b-8417-e4db1b3b012a

📥 Commits

Reviewing files that changed from the base of the PR and between b12a99b and 731e71b.

📒 Files selected for processing (5)
  • deepnote_toolkit/ocelots/pandas/implementation.py
  • deepnote_toolkit/ocelots/pandas/utils.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/helpers/testing_dataframes.py
  • tests/unit/test_dataframe_utils.py

tkislan added 2 commits March 10, 2026 19:11
- Reformatted the `is_large_number` and `cast_large_numbers_to_string` functions for better readability by using multi-line expressions.
- Updated unit tests to call the `is_large_number` function directly instead of through a different module, ensuring consistency and clarity in test cases.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/ocelots/pandas/utils.py`:
- Around line 128-142: The function cast_large_numbers_to_string is missing
explicit type hints; update its signature to include pandas DataFrame input and
return types (e.g., df: pd.DataFrame -> pd.DataFrame) and ensure pandas is
imported as pd in the module if not already; keep the implementation and
references to helpers (is_pure_numeric, is_large_number, safe_convert_to_string)
unchanged so callers and static type checkers recognize the typed signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ff36bff8-c27f-468a-83ad-596bc6377706

📥 Commits

Reviewing files that changed from the base of the PR and between 731e71b and 7fbfcf1.

📒 Files selected for processing (3)
  • deepnote_toolkit/ocelots/pandas/utils.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_sql_execution_internal.py

Add explicit pd.DataFrame input and return type annotations to
cast_large_numbers_to_string so callers and static type checkers
(mypy) recognise the typed signature.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/ocelots/pandas/utils.py`:
- Around line 112-125: Add an explicit type hint for the parameter in
is_large_number by changing its signature to is_large_number(x: Any) -> bool and
ensure typing.Any is imported at the top of the module; keep the existing
isinstance checks (Decimal, np.integer, np.floating, int, float) and return
logic using MAX_SAFE_FLOAT64_INTEGER unchanged so type checking tools accept the
function signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 845868d1-da3a-41ce-a2a6-36905b079c94

📥 Commits

Reviewing files that changed from the base of the PR and between 7fbfcf1 and ed33db8.

📒 Files selected for processing (1)
  • deepnote_toolkit/ocelots/pandas/utils.py

Import typing.Any and annotate the x parameter of is_large_number
so static type checkers (mypy) accept the function signature.
@tkislan tkislan marked this pull request as ready for review March 10, 2026 22:38
@tkislan tkislan requested a review from a team as a code owner March 10, 2026 22:38
@tkislan tkislan requested a review from mfranczel March 10, 2026 22:38
@tkislan tkislan merged commit 595046a into main Mar 11, 2026
32 checks passed
@tkislan tkislan deleted the tk/fix-is-large-number-serialization branch March 11, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants