Skip to content

Commit

Permalink
Soften some wording
Browse files Browse the repository at this point in the history
  • Loading branch information
wjones127 committed Dec 16, 2021
1 parent b75f6ed commit ca8f8a8
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 11 deletions.
8 changes: 4 additions & 4 deletions docs/source/cpp/dataset.rst
Expand Up @@ -395,11 +395,11 @@ cardinality 1,000 will make that 365,365 calls.

The most optimal partitioning layout will depend on your data, access patterns, and which
systems will be reading the data. Most systems, including Arrow, should work across a
range of file sizes and partitioning layouts, but there are extremes you should avoid. To
avoid worst case behavior, keep to these guidelines:
range of file sizes and partitioning layouts, but there are extremes you should avoid. These
guidelines can help avoid some known worst cases:

* Avoid files smaller than 20MB and larger than 2GB
* Avoid partitioning layouts with more than 10,000 distinct partitions.
* Avoid files smaller than 20MB and larger than 2GB.
* Avoid partitioning layouts with more than 10,000 distinct partitions.

For file formats that have a notion of groups within a file, such as Parquet, similar
guidelines apply. Row groups can provide parallelism when reading and allow data skipping
Expand Down
8 changes: 4 additions & 4 deletions docs/source/python/dataset.rst
Expand Up @@ -602,11 +602,11 @@ cardinality 1,000 will make that 365,365 calls.

The most optimal partitioning layout will depend on your data, access patterns, and which
systems will be reading the data. Most systems, including Arrow, should work across a
range of file sizes and partitioning layouts, but there are extremes you should avoid. To
avoid worst case behavior, keep to these guidelines:
range of file sizes and partitioning layouts, but there are extremes you should avoid. These
guidelines can help avoid some known worst cases:

* Avoid files smaller than 20MB and larger than 2GB
* Avoid partitioning layouts with more than 10,000 distinct partitions.
* Avoid files smaller than 20MB and larger than 2GB.
* Avoid partitioning layouts with more than 10,000 distinct partitions.

For file formats that have a notion of groups within a file, such as Parquet, similar
guidelines apply. Row groups can provide parallelism when reading and allow data skipping
Expand Down
6 changes: 3 additions & 3 deletions r/vignettes/dataset.Rmd
Expand Up @@ -448,10 +448,10 @@ cardinality 1,000 will make that 365,365 calls.

The most optimal partitioning layout will depend on your data, access patterns, and which
systems will be reading the data. Most systems, including Arrow, should work across a
range of file sizes and partitioning layouts, but there are extremes you should avoid. To
avoid worst case behavior, keep to these guidelines:
range of file sizes and partitioning layouts, but there are extremes you should avoid. These
guidelines can help avoid some known worst cases:

* Avoid files smaller than 20MB and larger than 2GB
* Avoid files smaller than 20MB and larger than 2GB.
* Avoid partitioning layouts with more than 10,000 distinct partitions.

For file formats that have a notion of groups within a file, such as Parquet, similar
Expand Down

0 comments on commit ca8f8a8

Please sign in to comment.