From ca8f8a8e2bd96c01ea941607fbd8b32020495ce8 Mon Sep 17 00:00:00 2001 From: Will Jones Date: Thu, 16 Dec 2021 13:56:59 -0800 Subject: [PATCH] Soften some wording --- docs/source/cpp/dataset.rst | 8 ++++---- docs/source/python/dataset.rst | 8 ++++---- r/vignettes/dataset.Rmd | 6 +++--- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/source/cpp/dataset.rst b/docs/source/cpp/dataset.rst index a1f1c23545bab..7827c954cfc92 100644 --- a/docs/source/cpp/dataset.rst +++ b/docs/source/cpp/dataset.rst @@ -395,11 +395,11 @@ cardinality 1,000 will make that 365,365 calls. The most optimal partitioning layout will depend on your data, access patterns, and which systems will be reading the data. Most systems, including Arrow, should work across a -range of file sizes and partitioning layouts, but there are extremes you should avoid. To -avoid worst case behavior, keep to these guidelines: +range of file sizes and partitioning layouts, but there are extremes you should avoid. These +guidelines can help avoid some known worst cases: - * Avoid files smaller than 20MB and larger than 2GB - * Avoid partitioning layouts with more than 10,000 distinct partitions. +* Avoid files smaller than 20MB and larger than 2GB. +* Avoid partitioning layouts with more than 10,000 distinct partitions. For file formats that have a notion of groups within a file, such as Parquet, similar guidelines apply. Row groups can provide parallelism when reading and allow data skipping diff --git a/docs/source/python/dataset.rst b/docs/source/python/dataset.rst index f7d3bb27c358a..5af24a0b08753 100644 --- a/docs/source/python/dataset.rst +++ b/docs/source/python/dataset.rst @@ -602,11 +602,11 @@ cardinality 1,000 will make that 365,365 calls. The most optimal partitioning layout will depend on your data, access patterns, and which systems will be reading the data. Most systems, including Arrow, should work across a -range of file sizes and partitioning layouts, but there are extremes you should avoid. To -avoid worst case behavior, keep to these guidelines: +range of file sizes and partitioning layouts, but there are extremes you should avoid. These +guidelines can help avoid some known worst cases: - * Avoid files smaller than 20MB and larger than 2GB - * Avoid partitioning layouts with more than 10,000 distinct partitions. +* Avoid files smaller than 20MB and larger than 2GB. +* Avoid partitioning layouts with more than 10,000 distinct partitions. For file formats that have a notion of groups within a file, such as Parquet, similar guidelines apply. Row groups can provide parallelism when reading and allow data skipping diff --git a/r/vignettes/dataset.Rmd b/r/vignettes/dataset.Rmd index 919cd0a72d7ce..a7e8b8050b490 100644 --- a/r/vignettes/dataset.Rmd +++ b/r/vignettes/dataset.Rmd @@ -448,10 +448,10 @@ cardinality 1,000 will make that 365,365 calls. The most optimal partitioning layout will depend on your data, access patterns, and which systems will be reading the data. Most systems, including Arrow, should work across a -range of file sizes and partitioning layouts, but there are extremes you should avoid. To -avoid worst case behavior, keep to these guidelines: +range of file sizes and partitioning layouts, but there are extremes you should avoid. These +guidelines can help avoid some known worst cases: - * Avoid files smaller than 20MB and larger than 2GB + * Avoid files smaller than 20MB and larger than 2GB. * Avoid partitioning layouts with more than 10,000 distinct partitions. For file formats that have a notion of groups within a file, such as Parquet, similar