Enforce DataFrame display memory limits with max_rows + min_rows constraint (deprecate repr_rows)
#1367
+280
−59
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Large DataFrames could ignore the configured
max_memory_byteslimit during display.Previously the defaults (
repr_rows=10,min_rows_display=20) meant the collection loop conditionrows_so_far < min_rowsstayed true even after exceeding the memory budget, causing significantly more data to be streamed/collected than intended.This PR resolves that by:
max_rowssetting (replacingrepr_rows).min_rows_display) cannot exceed the maximum rows cap.repr_rowsso existing users aren’t broken immediately.What changes are included in this PR?
Docs: Update user guide examples to use
max_rowsinstead ofrepr_rows.Python formatter API:
Add
max_rowsas the primary configuration for limiting displayed rows.Keep
repr_rowsas a deprecated alias (constructor arg + property), emittingDeprecationWarning.Add centralized validation via
_validate_formatter_parameters():min_rows_display <= max_rows.repr_rowsandmax_rowsare provided with different values.Store resolved value internally as
_max_rowsand exposemax_rows/ deprecatedrepr_rowsproperties.Add
max_rowstoconfigure_formatter()allowed keys.Rust display/streaming logic:
repr_rows->max_rows.min_rows20 → 10) to avoid violating the min/max relationship.min_rows <= max_rows.(memory && max_rows)or until the guaranteedmin_rowsis reached, with clearer comments.Are these changes tested?
Yes.
Updated existing formatter tests to use
max_rows.Added new tests for:
Memory-limit boundary conditions (tiny budget, default budget, large budget, and min-rows override).
repr_rowsbackward compatibility:DeprecationWarningwhen used.max_rows.max_rows.Validation failures for invalid
max_rowsand formin_rows_display > max_rows.Are there any user-facing changes?
Yes.
New option:
max_rowsis now the preferred way to cap rows displayed in repr/HTML output.Deprecation:
repr_rowsis deprecated and will emit aDeprecationWarning.repr_rowscontinues to work.repr_rowsandmax_rowswith different values raises aValueError.Behavioral change: Default minimum rows displayed changes from 20 to 10.
Docs: Updated examples and clarified that
min_rows_displaymust be<= max_rows.If the deprecation/rename is considered a public API change, please add the
api changelabel.LLM-generated code disclosure
This PR includes code and comments generated with assistance from an LLM. All LLM-generated content has been manually reviewed and tested.