From 4eef9dc10fd5ec6586298cf5597e8704c8d11912 Mon Sep 17 00:00:00 2001 From: Alenka Frim Date: Mon, 10 Jun 2024 14:15:33 +0200 Subject: [PATCH] Apply suggestions from code review Co-authored-by: Joris Van den Bossche --- docs/source/format/Intro.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/format/Intro.rst b/docs/source/format/Intro.rst index 2093b1a128847..c87157754a900 100644 --- a/docs/source/format/Intro.rst +++ b/docs/source/format/Intro.rst @@ -106,8 +106,8 @@ Fixed Size Primitive Layout --------------------------- A primitive column represents an array of values where each value -has the same physical size measured in bytes. Data types that share the -same fixed size primitive layout are, for example, signed and unsigned +has the same physical size measured in bytes. Data types that use the +fixed size primitive layout are, for example, signed and unsigned integer data types, floating point numbers, boolean, decimal and temporal data types. @@ -138,7 +138,7 @@ data types. Variable length binary and string --------------------------------- -The bytes of a binary or string column are stored together consecutively +The bytes of all elements in a binary or string column are stored together consecutively in a single buffer or region of memory. To know where each element of the column starts and ends the physical layout also includes integer offsets. The number of elements of the offset buffer is one more than the length of the @@ -172,10 +172,10 @@ layout used in `DuckDB`_ and `Velox`_ (and sometimes also called "German style s .. _UmbraDB: https://umbra-db.com/ .. _DuckDB: https://duckdb.com .. _Velox: https://velox-lib.io/ -The main differences to classical binary and string layout is the views buffer. +The main differences to the classical binary and string layout is the views buffer. It includes the length of the string, and then either contains the characters inline (for small strings) or only the first 4 bytes of the string and an offset into one of -potentially several data buffers. It also supports binary and strings to be written +potentially several data buffers. Because it uses an offset and length to refer to the data buffer, the bytes of all elements do not need to be stored together consecutively in one buffer, and thus it supports the bytes to be written out of order. These properties are important for efficient string processing. The prefix