docs: Add some recommendations on table partitioning

It's not entirely obvious to new users what value should be specified for `chunksize`, yet its choice is vital to getting good performance. It would be great if we could add to the docs some recommendations (and possibly examples) of how to choose a good `chunksize` value. Specifically, we could list a few basic recommendations:
- Specify it as a ratio of the total number of rows (e.g. `nrows / 10`)
- When the data is very big, limit it to a certain maximum size (e.g. whether data is 20GB or 300GB, pick `10_000_000` rows)
- If the same analysis will be rerun many times, benchmark different `chunksize` choices and pick the one that is fastest but also doesn't cause OOM errors
- etc.

Showing an example of how to do that benchmarking would be really useful to users who are lost on what `chunksize` to pick.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add some recommendations on table partitioning #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs: Add some recommendations on table partitioning #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions