Skip to content

[Doc] Enhancement the document for dataset and s3 #37241

@mapleFU

Description

@mapleFU

Describe the enhancement requested

Currently, we has the issue below:

  1. [Python][Dataset][Parquet] Enable Pre-Buffering by default for Parquet s3 datasets #36765
  2. [Python] Read table stuck and hangs forever #37139

When reading parquet from s3, it's important to prefetch some file or row-groups. However, larger prefetch depth might cause more memory usage. And what make it even worse is that, interfaces like to_table and other reading might has different behaviors for prefetch. So we'd better and more document for it.

Component(s)

Documentation, Parquet, Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions