From 814cd8f511ec0b3e3961800ce71cd46a3d4eb8a7 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Fri, 19 Sep 2025 20:21:42 +0300 Subject: [PATCH 01/23] Document vectorized STL algorithms --- docs/standard-library/toc.yml | 3 + .../vectorized-stl-algoritms.md | 86 +++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100644 docs/standard-library/vectorized-stl-algoritms.md diff --git a/docs/standard-library/toc.yml b/docs/standard-library/toc.yml index ce3cb76ff56..b05a170940f 100644 --- a/docs/standard-library/toc.yml +++ b/docs/standard-library/toc.yml @@ -1588,6 +1588,9 @@ items: href: iterators.md - name: Algorithms href: algorithms.md + expanded: false + - name: Vectorized STL Algorithms + href: vectorized-stl-algorithms.md - name: Allocators href: allocators.md - name: Function objects in the C++ Standard Library diff --git a/docs/standard-library/vectorized-stl-algoritms.md b/docs/standard-library/vectorized-stl-algoritms.md new file mode 100644 index 00000000000..5bb6cd68f3d --- /dev/null +++ b/docs/standard-library/vectorized-stl-algoritms.md @@ -0,0 +1,86 @@ +--- +title: "Vectorized STL Algorithms" +ms.date: "09/19/2025" +helpviewer_keywords: ["Vector Algorithms", "Vectorization", "SIMD"] +--- +# Vectorized STL Algorithms + +Under certain conditions, STL algorithms execute not element-wise, but multiple element at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such approach instead of +element-wise approach is called vectorization. The implementation that is not vectorized is called scalar. + +The conditions for vectorization are: + - The container or range is contigous. `array`, `vector`, and `basic_string` are contigous containers, `span` and `basic_string_view` provide conditions ranges. + - There are such SIMD insstructions available for the target platform that implement the particular algorithm on particular element types efficiently. Usually this is true for plain types (like built-in integers) and simple operations. + - Either of the following: + - The compiler is capable emiting vectorized machine code for an implementation written as scalar code (auto-vectorization) + - The implementation itself is written as vectorized code (manual vectorization) + +## Auto-vectorization in STL + +See [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer). It applies to the STL implementation code the same way as to user code. + +Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization. + +## Manual vectorization in STL + +For x64 and x86 targets, certain algorithms have manual vectorization implemented. This implementation is pre-compiled, and uses runtime CPU dispatch, so it is engaged on suitable CPUs only. + +The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they only vectorized for simple types, like standard integer types. + +Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. + +The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro: + - `contains` + - `contains_subrange` + - `find` + - `find_last` + - `find_end` + - `find_first_of` + - `adjacent_find` + - `count` + - `mismatch` + - `search` + - `search_n` + - `swap_ranges` + - `replace` + - `remove` + - `remove_copy` + - `unique` + - `unique_copy` + - `reverse` + - `rotate` + - `is_sorted` + - `is_sorted_until` + - `minmax_element` + - `minmax` + - `lexicographical_compare` + - `lexicographical_compare_three_way` + +In addition to algorithms, the macro controls the manual vectorization of: + - `basic_string` and `basic_string_view` members: + - `find` + - `rfind` + - `find_first_of` + - `find_first_not_of` + - `find_last_of` + - `find_last_not_of` + - `bitset` constructors from string and `bitset::to_string` + +## Manually vectorized algorithms for floating point types + +Vectorization of floating point types is connected with extra difficulties: + - For floating point results, the order of operations may matter. Some reordering may yield a different result, whether more precise, or less precise. Vecotization may need operations reordering, so it may affect that. + - Floating point types may contain NaN values, which don't behave transitively while comparing. + - Floating point operations may raise exceptions. + +The STL deals with the first two difficulties safely. Only `minmax_element`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: + - Do not compute new floating point values, only compare the existing values, so different order does not affect precision. + - As sorting algorithms, require elements transitivity, so NaNs are not allowed as elements. + +There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable the vectorization. The macro has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. + +`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when `/fp:except` option is set. This is to avoid problems with exceptions. + +## See also + +[Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) From 4272d7e48c4e459057854ea43a6790ac528fea09 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Fri, 19 Sep 2025 20:31:06 +0300 Subject: [PATCH 02/23] validation errors fix --- docs/standard-library/toc.yml | 1 + docs/standard-library/vectorized-stl-algoritms.md | 1 + 2 files changed, 2 insertions(+) diff --git a/docs/standard-library/toc.yml b/docs/standard-library/toc.yml index b05a170940f..e85567f4724 100644 --- a/docs/standard-library/toc.yml +++ b/docs/standard-library/toc.yml @@ -1589,6 +1589,7 @@ items: - name: Algorithms href: algorithms.md expanded: false + items: - name: Vectorized STL Algorithms href: vectorized-stl-algorithms.md - name: Allocators diff --git a/docs/standard-library/vectorized-stl-algoritms.md b/docs/standard-library/vectorized-stl-algoritms.md index 5bb6cd68f3d..e8e0b6a136e 100644 --- a/docs/standard-library/vectorized-stl-algoritms.md +++ b/docs/standard-library/vectorized-stl-algoritms.md @@ -1,4 +1,5 @@ --- +description: "Vectorized STL Algorithms" title: "Vectorized STL Algorithms" ms.date: "09/19/2025" helpviewer_keywords: ["Vector Algorithms", "Vectorization", "SIMD"] From cc385c5f0dcf78ffeafd16af30cc2cb75efbb8ca Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Fri, 19 Sep 2025 20:36:47 +0300 Subject: [PATCH 03/23] Un-nest to make that work --- docs/standard-library/toc.yml | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/standard-library/toc.yml b/docs/standard-library/toc.yml index e85567f4724..a9a033ccd84 100644 --- a/docs/standard-library/toc.yml +++ b/docs/standard-library/toc.yml @@ -1588,10 +1588,8 @@ items: href: iterators.md - name: Algorithms href: algorithms.md - expanded: false - items: - - name: Vectorized STL Algorithms - href: vectorized-stl-algorithms.md +- name: Vectorized STL Algorithms + href: vectorized-stl-algorithms.md - name: Allocators href: allocators.md - name: Function objects in the C++ Standard Library From 86a3e29eff5f0cff3c8c2fa91c02e931a3bb3069 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Fri, 19 Sep 2025 20:42:50 +0300 Subject: [PATCH 04/23] Typo in file name --- ...vectorized-stl-algoritms.md => vectorized-stl-algorithms.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename docs/standard-library/{vectorized-stl-algoritms.md => vectorized-stl-algorithms.md} (96%) diff --git a/docs/standard-library/vectorized-stl-algoritms.md b/docs/standard-library/vectorized-stl-algorithms.md similarity index 96% rename from docs/standard-library/vectorized-stl-algoritms.md rename to docs/standard-library/vectorized-stl-algorithms.md index e8e0b6a136e..dbe8f30073d 100644 --- a/docs/standard-library/vectorized-stl-algoritms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -28,7 +28,7 @@ For x64 and x86 targets, certain algorithms have manual vectorization implemente The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they only vectorized for simple types, like standard integer types. -Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. +Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro: - `contains` From 53ae06f3ccc8576df13079077758633c2543e4d5 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Fri, 19 Sep 2025 20:52:14 +0300 Subject: [PATCH 05/23] Typoes --- docs/standard-library/vectorized-stl-algorithms.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index dbe8f30073d..f42e7065a94 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -7,13 +7,13 @@ helpviewer_keywords: ["Vector Algorithms", "Vectorization", "SIMD"] # Vectorized STL Algorithms Under certain conditions, STL algorithms execute not element-wise, but multiple element at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such approach instead of -element-wise approach is called vectorization. The implementation that is not vectorized is called scalar. +element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. The conditions for vectorization are: - - The container or range is contigous. `array`, `vector`, and `basic_string` are contigous containers, `span` and `basic_string_view` provide conditions ranges. - - There are such SIMD insstructions available for the target platform that implement the particular algorithm on particular element types efficiently. Usually this is true for plain types (like built-in integers) and simple operations. + - The container or range is contigous. `array`, `vector`, and `basic_string` are contigous containers, `span` and `basic_string_view` provide contiguous ranges. + - There are such SIMD instructions available for the target platform that implement the particular algorithm on particular element types efficiently. Often this is true for plain types (like built-in integers) and simple operations. - Either of the following: - - The compiler is capable emiting vectorized machine code for an implementation written as scalar code (auto-vectorization) + - The compiler is capable of emitting vectorized machine code for an implementation written as scalar code (auto-vectorization) - The implementation itself is written as vectorized code (manual vectorization) ## Auto-vectorization in STL @@ -76,7 +76,7 @@ Vectorization of floating point types is connected with extra difficulties: The STL deals with the first two difficulties safely. Only `minmax_element`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: - Do not compute new floating point values, only compare the existing values, so different order does not affect precision. - - As sorting algorithms, require elements transitivity, so NaNs are not allowed as elements. + - As sorting algorithms, require transitivity of comparisons, so NaNs are not allowed as elements. There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable the vectorization. The macro has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. From a36c5a8ef181ee51c34b7124930b329d5df075f9 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Sat, 20 Sep 2025 16:14:12 +0300 Subject: [PATCH 06/23] Spelling --- docs/standard-library/vectorized-stl-algorithms.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index f42e7065a94..01cf8e48e2c 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -10,7 +10,7 @@ Under certain conditions, STL algorithms execute not element-wise, but multiple element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. The conditions for vectorization are: - - The container or range is contigous. `array`, `vector`, and `basic_string` are contigous containers, `span` and `basic_string_view` provide contiguous ranges. + - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. - There are such SIMD instructions available for the target platform that implement the particular algorithm on particular element types efficiently. Often this is true for plain types (like built-in integers) and simple operations. - Either of the following: - The compiler is capable of emitting vectorized machine code for an implementation written as scalar code (auto-vectorization) @@ -70,7 +70,7 @@ In addition to algorithms, the macro controls the manual vectorization of: ## Manually vectorized algorithms for floating point types Vectorization of floating point types is connected with extra difficulties: - - For floating point results, the order of operations may matter. Some reordering may yield a different result, whether more precise, or less precise. Vecotization may need operations reordering, so it may affect that. + - For floating point results, the order of operations may matter. Some reordering may yield a different result, whether more precise, or less precise. Vectotization may need operations reordering, so it may affect that. - Floating point types may contain NaN values, which don't behave transitively while comparing. - Floating point operations may raise exceptions. From 6c10dc3d959d1c4fba7a4334cc3be332d0fe2001 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Sat, 20 Sep 2025 16:20:38 +0300 Subject: [PATCH 07/23] Complete the lists --- docs/standard-library/vectorized-stl-algorithms.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 01cf8e48e2c..b2266592808 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -49,10 +49,15 @@ The following algorithms have manual vectorization controlled via `_USE_STD_VECT - `unique` - `unique_copy` - `reverse` + - `reverse_copy` - `rotate` - `is_sorted` - `is_sorted_until` + - `max_element` + - `min_element` - `minmax_element` + - `max` + - `min` - `minmax` - `lexicographical_compare` - `lexicographical_compare_three_way` @@ -74,7 +79,7 @@ Vectorization of floating point types is connected with extra difficulties: - Floating point types may contain NaN values, which don't behave transitively while comparing. - Floating point operations may raise exceptions. -The STL deals with the first two difficulties safely. Only `minmax_element`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: +The STL deals with the first two difficulties safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: - Do not compute new floating point values, only compare the existing values, so different order does not affect precision. - As sorting algorithms, require transitivity of comparisons, so NaNs are not allowed as elements. From b1600d8e3cf80cfa66563faa548ba8b14bf8fbcf Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Sat, 20 Sep 2025 17:27:07 +0300 Subject: [PATCH 08/23] Update docs/standard-library/vectorized-stl-algorithms.md Co-authored-by: Rageking8 --- docs/standard-library/vectorized-stl-algorithms.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index b2266592808..eb94e128209 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -1,8 +1,9 @@ --- -description: "Vectorized STL Algorithms" title: "Vectorized STL Algorithms" -ms.date: "09/19/2025" -helpviewer_keywords: ["Vector Algorithms", "Vectorization", "SIMD"] +description: "Learn more about: Vectorized STL Algorithms" +ms.date: 09/19/2025 +f1_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS"] +helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS", "Vector Algorithms", "Vectorization", "SIMD"] --- # Vectorized STL Algorithms From becc300baab1cdf3dae36f50ba33c1380ac7f4f4 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Sat, 20 Sep 2025 17:31:10 +0300 Subject: [PATCH 09/23] Review comments Co-authored-by: Rageking8 --- docs/standard-library/vectorized-stl-algorithms.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index eb94e128209..05ecd40526a 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -7,8 +7,7 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL --- # Vectorized STL Algorithms -Under certain conditions, STL algorithms execute not element-wise, but multiple element at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such approach instead of -element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. +Under certain conditions, STL algorithms execute not element-wise, but multiple element at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such approach instead of element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. The conditions for vectorization are: - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. @@ -52,7 +51,7 @@ The following algorithms have manual vectorization controlled via `_USE_STD_VECT - `reverse` - `reverse_copy` - `rotate` - - `is_sorted` + - `is_sorted` - `is_sorted_until` - `max_element` - `min_element` @@ -76,7 +75,7 @@ In addition to algorithms, the macro controls the manual vectorization of: ## Manually vectorized algorithms for floating point types Vectorization of floating point types is connected with extra difficulties: - - For floating point results, the order of operations may matter. Some reordering may yield a different result, whether more precise, or less precise. Vectotization may need operations reordering, so it may affect that. + - For floating point results, the order of operations may matter. Some reordering may yield a different result, whether more precise, or less precise. Vectorization may need operations reordering, so it may affect that. - Floating point types may contain NaN values, which don't behave transitively while comparing. - Floating point operations may raise exceptions. @@ -86,7 +85,7 @@ The STL deals with the first two difficulties safely. Only `max_element`, `min_e There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable the vectorization. The macro has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. -`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when `/fp:except` option is set. This is to avoid problems with exceptions. +`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) option is set. This is to avoid problems with exceptions. ## See also From 053dac29f43d62d3a44725d708a738bbb0f6acf6 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Mon, 22 Sep 2025 20:20:02 +0300 Subject: [PATCH 10/23] Update docs/standard-library/vectorized-stl-algorithms.md Co-authored-by: David Justo --- docs/standard-library/vectorized-stl-algorithms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 05ecd40526a..549e8119296 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -75,7 +75,7 @@ In addition to algorithms, the macro controls the manual vectorization of: ## Manually vectorized algorithms for floating point types Vectorization of floating point types is connected with extra difficulties: - - For floating point results, the order of operations may matter. Some reordering may yield a different result, whether more precise, or less precise. Vectorization may need operations reordering, so it may affect that. + - Vectorization may reorder operations, which can affect the precision of floating point results. - Floating point types may contain NaN values, which don't behave transitively while comparing. - Floating point operations may raise exceptions. From 8f4b39fd0f00554174452af43f67e469d76127c1 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Mon, 22 Sep 2025 20:23:29 +0300 Subject: [PATCH 11/23] Review feedback Co-authored-by: David Justo --- docs/standard-library/vectorized-stl-algorithms.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 549e8119296..3425add3bcc 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -74,14 +74,14 @@ In addition to algorithms, the macro controls the manual vectorization of: ## Manually vectorized algorithms for floating point types -Vectorization of floating point types is connected with extra difficulties: +Vectorization of floating point types comes with extra difficulties: - Vectorization may reorder operations, which can affect the precision of floating point results. - - Floating point types may contain NaN values, which don't behave transitively while comparing. + - Floating point types may contain `NaN` values, which don't behave transitively on comparisons. - Floating point operations may raise exceptions. The STL deals with the first two difficulties safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: - Do not compute new floating point values, only compare the existing values, so different order does not affect precision. - - As sorting algorithms, require transitivity of comparisons, so NaNs are not allowed as elements. + - Because they are sorting algorithms, `NaNs` are not allowed amongst the operands. There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable the vectorization. The macro has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. From 6d98a0e94135c1d34a7f6b5ba5eb1a209035a229 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Wed, 24 Sep 2025 20:36:51 +0300 Subject: [PATCH 12/23] STL review feedback Co-authored-by: StephanTLavavej --- docs/standard-library/vectorized-stl-algorithms.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 3425add3bcc..9a02386a275 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -10,7 +10,7 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL Under certain conditions, STL algorithms execute not element-wise, but multiple element at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such approach instead of element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. The conditions for vectorization are: - - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. + - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. Built-in array elements also form contiguous range. In contrast, `list` and `map` are not contiguous containers. - There are such SIMD instructions available for the target platform that implement the particular algorithm on particular element types efficiently. Often this is true for plain types (like built-in integers) and simple operations. - Either of the following: - The compiler is capable of emitting vectorized machine code for an implementation written as scalar code (auto-vectorization) @@ -18,15 +18,15 @@ The conditions for vectorization are: ## Auto-vectorization in STL -See [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer). It applies to the STL implementation code the same way as to user code. +See [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion of [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch there. It applies to the STL implementation code the same way as to user code. Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization. ## Manual vectorization in STL -For x64 and x86 targets, certain algorithms have manual vectorization implemented. This implementation is pre-compiled, and uses runtime CPU dispatch, so it is engaged on suitable CPUs only. +For x64 and x86 targets, certain algorithms have manual vectorization implemented. This implementation is separately compiled, and uses runtime CPU dispatch, so it is engaged on suitable CPUs only. -The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they only vectorized for simple types, like standard integer types. +The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they are only vectorized for simple types, like standard integer types. Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. From 59f997725c8db06fee2bfeaf7d0b0c5efd31afb4 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Wed, 24 Sep 2025 21:01:49 +0300 Subject: [PATCH 13/23] Spelling --- docs/standard-library/vectorized-stl-algorithms.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 9a02386a275..ad709792fb1 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -7,10 +7,10 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL --- # Vectorized STL Algorithms -Under certain conditions, STL algorithms execute not element-wise, but multiple element at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such approach instead of element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. +Under certain conditions, STL algorithms execute not element-wise, but multiple elements at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such an approach instead of element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. The conditions for vectorization are: - - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. Built-in array elements also form contiguous range. In contrast, `list` and `map` are not contiguous containers. + - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. Built-in array elements also form contiguous ranges. In contrast, `list` and `map` are not contiguous containers. - There are such SIMD instructions available for the target platform that implement the particular algorithm on particular element types efficiently. Often this is true for plain types (like built-in integers) and simple operations. - Either of the following: - The compiler is capable of emitting vectorized machine code for an implementation written as scalar code (auto-vectorization) @@ -28,7 +28,7 @@ For x64 and x86 targets, certain algorithms have manual vectorization implemente The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they are only vectorized for simple types, like standard integer types. -Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. +Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining the `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro: - `contains` From ce5dca49a827caa3abbf4d7a85f096f9fa1e6f71 Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Thu, 25 Sep 2025 14:58:08 +0300 Subject: [PATCH 14/23] Global macro --- docs/standard-library/vectorized-stl-algorithms.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index ad709792fb1..65e7d3b6659 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -30,6 +30,8 @@ The manually vectorized algorithms use template meta-programming to detect the s Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining the `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. +When overriding `_USE_STD_VECTOR_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. + The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro: - `contains` - `contains_subrange` @@ -87,6 +89,8 @@ There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vector `_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) option is set. This is to avoid problems with exceptions. +When overriding `_USE_STD_VECTOR_FLOATING_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. + ## See also [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) From eff4d936720032e6e8cbb20a7b0b5b24a85e16da Mon Sep 17 00:00:00 2001 From: Alex Guteniev Date: Wed, 1 Oct 2025 16:59:02 +0300 Subject: [PATCH 15/23] Link to documentation on how to set macro globally --- docs/standard-library/vectorized-stl-algorithms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 65e7d3b6659..df6b030e196 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -30,7 +30,7 @@ The manually vectorized algorithms use template meta-programming to detect the s Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining the `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. -When overriding `_USE_STD_VECTOR_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. +When overriding `_USE_STD_VECTOR_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. See [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md) compiler option. The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro: - `contains` From d6f6bd92d63a4abc3aa48972ea2f7dd5cb860070 Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Wed, 1 Oct 2025 16:13:57 -0700 Subject: [PATCH 16/23] touch --- docs/standard-library/vectorized-stl-algorithms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index df6b030e196..a71b3813405 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -1,7 +1,7 @@ --- title: "Vectorized STL Algorithms" description: "Learn more about: Vectorized STL Algorithms" -ms.date: 09/19/2025 +ms.date: 10/1/2025 f1_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS"] helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS", "Vector Algorithms", "Vectorization", "SIMD"] --- From f8904208ab97b59070fe9c2aeb64adc5bf1ab9bd Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Wed, 1 Oct 2025 17:11:57 -0700 Subject: [PATCH 17/23] edit pass --- .../vectorized-stl-algorithms.md | 90 ++++++++----------- 1 file changed, 37 insertions(+), 53 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index a71b3813405..6f5110236f7 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -7,89 +7,73 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL --- # Vectorized STL Algorithms -Under certain conditions, STL algorithms execute not element-wise, but multiple elements at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such an approach instead of element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. +Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization leverages single instruction, multiple data (SIMD) instructions provided by the CPU, a technique known as vectorization. When this optimization isn't applied, the implementation is referred to as scalar. -The conditions for vectorization are: - - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. Built-in array elements also form contiguous ranges. In contrast, `list` and `map` are not contiguous containers. - - There are such SIMD instructions available for the target platform that implement the particular algorithm on particular element types efficiently. Often this is true for plain types (like built-in integers) and simple operations. - - Either of the following: - - The compiler is capable of emitting vectorized machine code for an implementation written as scalar code (auto-vectorization) - - The implementation itself is written as vectorized code (manual vectorization) +The conditions required for vectorization are: + - The container or range must be contiguous. Examples of contiguous containers include `array`, `vector`, and `basic_string`. Contiguous ranges are provided by types like `span` and `basic_string_view`. Built-in arrays also form contiguous ranges. In contrast, containers like `list` and `map` aren't contiguous. + - The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for intrinsic types (like built-in integers) and simple operations. + - One of the following conditions must be met: + - The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization). + - The algorithm's implementation is explicitly written to use vectorized code (manual vectorization). -## Auto-vectorization in STL +## Auto-vectorization in the STL -See [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion of [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch there. It applies to the STL implementation code the same way as to user code. +For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion about in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way as to user code. Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization. -## Manual vectorization in STL +## Manual vectorization in the STL -For x64 and x86 targets, certain algorithms have manual vectorization implemented. This implementation is separately compiled, and uses runtime CPU dispatch, so it is engaged on suitable CPUs only. +For x64 and x86, certain algorithms include manual vectorization. This implementation is separately compiled, and uses runtime CPU dispatch, so it's only used on suitable CPUs. -The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they are only vectorized for simple types, like standard integer types. +Manually vectorized algorithms use template meta-programming to detect whether the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types. -Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining the `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. +Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. You can disable manual vectorization with `#define _USE_STD_VECTOR_ALGORITHMS=0'. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms. -When overriding `_USE_STD_VECTOR_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. See [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md) compiler option. +To set `_USE_STD_VECTOR_ALGORITHMS` ensure that it's set to the same value for all linked translation units that use algorithms. A reliable way to achieve this to set it using in the project properties instead of in source. For more information about how to do that, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). -The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro: - - `contains` - - `contains_subrange` - - `find` - - `find_last` - - `find_end` - - `find_first_of` - - `adjacent_find` +The following algorithms are manually vectorized and their behavior is controlled by the `_USE_STD_VECTOR_ALGORITHMS` macro: + - `contains`, `contains_subrange` + - `find`, `find_last`, `find_end`, `find_first_of`, `adjacent_find` - `count` - `mismatch` - - `search` - - `search_n` + - `search`, `search_n` - `swap_ranges` - `replace` - - `remove` - - `remove_copy` - - `unique` - - `unique_copy` - - `reverse` - - `reverse_copy` + - `remove`, `remove_copy` + - `unique`, `unique_copy` + - `reverse`, `reverse_copy` - `rotate` - - `is_sorted` - - `is_sorted_until` - - `max_element` - - `min_element` - - `minmax_element` - - `max` - - `min` - - `minmax` - - `lexicographical_compare` - - `lexicographical_compare_three_way` - -In addition to algorithms, the macro controls the manual vectorization of: + - `is_sorted`, `is_sorted_until` + - `lexicographical_compare`, `lexicographical_compare_three_way` + - `max`, `min`, `minmax` + - `max_element`, `min_element`, `minmax_element` + +In addition to algorithms, the `_USE_STD_VECTOR_ALGORITHMS` macro controls the manual vectorization of: + - `basic_string` and `basic_string_view` members: - `find` - `rfind` - - `find_first_of` - - `find_first_not_of` - - `find_last_of` - - `find_last_not_of` + - `find_first_of`, `find_first_not_of` + - `find_last_of`, `find_last_not_of` - `bitset` constructors from string and `bitset::to_string` ## Manually vectorized algorithms for floating point types -Vectorization of floating point types comes with extra difficulties: +Vectorization of floating point types requires additional considerations: - Vectorization may reorder operations, which can affect the precision of floating point results. - Floating point types may contain `NaN` values, which don't behave transitively on comparisons. - Floating point operations may raise exceptions. -The STL deals with the first two difficulties safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: - - Do not compute new floating point values, only compare the existing values, so different order does not affect precision. - - Because they are sorting algorithms, `NaNs` are not allowed amongst the operands. +The STL deals with the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: + - Don't compute new floating point values, only compare the existing values, so different order does not affect precision. + - Because they're sorting algorithms, `NaNs` isn't an allowed operand. -There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable the vectorization. The macro has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. +Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. -`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) option is set. This is to avoid problems with exceptions. +`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set. -When overriding `_USE_STD_VECTOR_FLOATING_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. +To set `_USE_STD_VECTOR_FLOATING_ALGORITHMS` ensure that it's set to the same value for all linked translation units that use algorithms. A reliable way to achieve this to set it using in the project properties instead of in source. For more information about how to do that, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). ## See also From bc3d9cf59086304f7b68802b3d66537b55139726 Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Wed, 1 Oct 2025 17:39:24 -0700 Subject: [PATCH 18/23] edits --- docs/standard-library/vectorized-stl-algorithms.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 6f5110236f7..cbfe7078965 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -7,7 +7,7 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL --- # Vectorized STL Algorithms -Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization leverages single instruction, multiple data (SIMD) instructions provided by the CPU, a technique known as vectorization. When this optimization isn't applied, the implementation is referred to as scalar. +Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique known as vectorization. When this optimization isn't applied, the implementation is referred to as scalar. The conditions required for vectorization are: - The container or range must be contiguous. Examples of contiguous containers include `array`, `vector`, and `basic_string`. Contiguous ranges are provided by types like `span` and `basic_string_view`. Built-in arrays also form contiguous ranges. In contrast, containers like `list` and `map` aren't contiguous. @@ -24,15 +24,16 @@ Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-ve ## Manual vectorization in the STL -For x64 and x86, certain algorithms include manual vectorization. This implementation is separately compiled, and uses runtime CPU dispatch, so it's only used on suitable CPUs. +For x64 and x86, certain algorithms include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs. Manually vectorized algorithms use template meta-programming to detect whether the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types. Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. You can disable manual vectorization with `#define _USE_STD_VECTOR_ALGORITHMS=0'. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms. -To set `_USE_STD_VECTOR_ALGORITHMS` ensure that it's set to the same value for all linked translation units that use algorithms. A reliable way to achieve this to set it using in the project properties instead of in source. For more information about how to do that, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). +Ensure that you assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. A reliable way to do this is by configuring it in the project properties instead of in the source code. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). -The following algorithms are manually vectorized and their behavior is controlled by the `_USE_STD_VECTOR_ALGORITHMS` macro: + +The `_USE_STD_VECTOR_ALGORITHMS` macro determines the behavior of the following manually vectorized algorithms: - `contains`, `contains_subrange` - `find`, `find_last`, `find_end`, `find_first_of`, `adjacent_find` - `count` @@ -60,7 +61,7 @@ In addition to algorithms, the `_USE_STD_VECTOR_ALGORITHMS` macro controls the m ## Manually vectorized algorithms for floating point types -Vectorization of floating point types requires additional considerations: +Vectorization of floating-point types involves specific considerations: - Vectorization may reorder operations, which can affect the precision of floating point results. - Floating point types may contain `NaN` values, which don't behave transitively on comparisons. - Floating point operations may raise exceptions. @@ -73,7 +74,7 @@ Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized `_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set. -To set `_USE_STD_VECTOR_FLOATING_ALGORITHMS` ensure that it's set to the same value for all linked translation units that use algorithms. A reliable way to achieve this to set it using in the project properties instead of in source. For more information about how to do that, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). +Ensure that you assign the same value to `_USE_STD_VECTOR_FLOATING_ALGORITHMS` for all linked translation units that use algorithms. A reliable way to do this is by configuring it in the project properties instead of in the source code. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). ## See also From 279a1dc0830a2454e2b86c17d8ef0a8da345815b Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Fri, 3 Oct 2025 10:30:33 -0700 Subject: [PATCH 19/23] tech review --- docs/standard-library/vectorized-stl-algorithms.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index cbfe7078965..9e2ce9826a4 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -1,7 +1,7 @@ --- title: "Vectorized STL Algorithms" description: "Learn more about: Vectorized STL Algorithms" -ms.date: 10/1/2025 +ms.date: 10/03/2025 f1_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS"] helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS", "Vector Algorithms", "Vectorization", "SIMD"] --- @@ -11,14 +11,14 @@ Under specific conditions, algorithms in the C++ Standard Template Library (STL) The conditions required for vectorization are: - The container or range must be contiguous. Examples of contiguous containers include `array`, `vector`, and `basic_string`. Contiguous ranges are provided by types like `span` and `basic_string_view`. Built-in arrays also form contiguous ranges. In contrast, containers like `list` and `map` aren't contiguous. - - The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for intrinsic types (like built-in integers) and simple operations. + - The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for arithmetic types and simple operations. - One of the following conditions must be met: - The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization). - The algorithm's implementation is explicitly written to use vectorized code (manual vectorization). ## Auto-vectorization in the STL -For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion about in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way as to user code. +For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it does to user code. Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization. @@ -28,7 +28,7 @@ For x64 and x86, certain algorithms include manual vectorization. This implement Manually vectorized algorithms use template meta-programming to detect whether the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types. -Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. You can disable manual vectorization with `#define _USE_STD_VECTOR_ALGORITHMS=0'. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms. +Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. You can disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms. Ensure that you assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. A reliable way to do this is by configuring it in the project properties instead of in the source code. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). @@ -67,8 +67,8 @@ Vectorization of floating-point types involves specific considerations: - Floating point operations may raise exceptions. The STL deals with the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: - - Don't compute new floating point values, only compare the existing values, so different order does not affect precision. - - Because they're sorting algorithms, `NaNs` isn't an allowed operand. +- Avoid computing new floating-point values; instead, they compare existing values to ensure that differences in operation order don't impact precision. +- Since these are sorting algorithms, `NaN` values aren't allowed inputs. Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. From 27b7e12a1fb239c83d845c68979b3e86f65f7c00 Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Fri, 3 Oct 2025 10:33:06 -0700 Subject: [PATCH 20/23] edit --- docs/standard-library/vectorized-stl-algorithms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 9e2ce9826a4..47356b389ff 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -67,7 +67,7 @@ Vectorization of floating-point types involves specific considerations: - Floating point operations may raise exceptions. The STL deals with the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: -- Avoid computing new floating-point values; instead, they compare existing values to ensure that differences in operation order don't impact precision. +- Don’t compute new floating-point values. Instead, they only compare existing values to ensure that differences in operation order don't impact precision. - Since these are sorting algorithms, `NaN` values aren't allowed inputs. Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. From 70e0fe5f1be98ba9d1fdf412cd383dbb785e01a3 Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Fri, 3 Oct 2025 10:56:48 -0700 Subject: [PATCH 21/23] edit pass --- .../vectorized-stl-algorithms.md | 40 +++++++++---------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 47356b389ff..ff4bceb6546 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -7,33 +7,33 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL --- # Vectorized STL Algorithms -Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique known as vectorization. When this optimization isn't applied, the implementation is referred to as scalar. +Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar. The conditions required for vectorization are: - - The container or range must be contiguous. Examples of contiguous containers include `array`, `vector`, and `basic_string`. Contiguous ranges are provided by types like `span` and `basic_string_view`. Built-in arrays also form contiguous ranges. In contrast, containers like `list` and `map` aren't contiguous. + - The container or range must be contiguous. Examples include `array`, `vector`, and `basic_string`. Types like `span` and `basic_string_view` provide contiguous ranges. Built-in arrays also form contiguous ranges. Containers like `list` and `map` aren't contiguous. - The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for arithmetic types and simple operations. - - One of the following conditions must be met: + - One of these conditions must be met: - The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization). - - The algorithm's implementation is explicitly written to use vectorized code (manual vectorization). + - The algorithm's implementation explicitly uses vectorized code (manual vectorization). ## Auto-vectorization in the STL -For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it does to user code. +For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it applies to user code. -Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization. +Algorithms like `transform`, `reduce`, and `accumulate` heavily benefit from auto-vectorization. ## Manual vectorization in the STL -For x64 and x86, certain algorithms include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs. +Certain algorithms for x64 and x86 include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs. -Manually vectorized algorithms use template meta-programming to detect whether the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types. +Manually vectorized algorithms use template metaprogramming to detect if the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types. -Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. You can disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms. +Programs generally either benefit in performance from manual vectorization or remain unaffected by it. Disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms. -Ensure that you assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. A reliable way to do this is by configuring it in the project properties instead of in the source code. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). +Assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). -The `_USE_STD_VECTOR_ALGORITHMS` macro determines the behavior of the following manually vectorized algorithms: +The `_USE_STD_VECTOR_ALGORITHMS` macro controls the behavior of these manually vectorized algorithms: - `contains`, `contains_subrange` - `find`, `find_last`, `find_end`, `find_first_of`, `adjacent_find` - `count` @@ -50,7 +50,7 @@ The `_USE_STD_VECTOR_ALGORITHMS` macro determines the behavior of the following - `max`, `min`, `minmax` - `max_element`, `min_element`, `minmax_element` -In addition to algorithms, the `_USE_STD_VECTOR_ALGORITHMS` macro controls the manual vectorization of: +The `_USE_STD_VECTOR_ALGORITHMS` macro also controls the manual vectorization of: - `basic_string` and `basic_string_view` members: - `find` @@ -62,19 +62,19 @@ In addition to algorithms, the `_USE_STD_VECTOR_ALGORITHMS` macro controls the m ## Manually vectorized algorithms for floating point types Vectorization of floating-point types involves specific considerations: - - Vectorization may reorder operations, which can affect the precision of floating point results. - - Floating point types may contain `NaN` values, which don't behave transitively on comparisons. - - Floating point operations may raise exceptions. + - Vectorization might reorder operations, which can affect the precision of floating-point results. + - Floating-point types might contain `NaN` values, which don't behave transitively in comparisons. + - Floating-point operations might raise exceptions. -The STL deals with the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: -- Don’t compute new floating-point values. Instead, they only compare existing values to ensure that differences in operation order don't impact precision. -- Since these are sorting algorithms, `NaN` values aren't allowed inputs. +The STL addresses the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: +- Don’t compute new floating-point values. Instead, they compare existing values to ensure that differences in operation order don't impact precision. +- Because these are sorting algorithms, `NaN` values aren't allowed as inputs. -Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. +Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating-point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` doesn't affect anything if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set. -Ensure that you assign the same value to `_USE_STD_VECTOR_FLOATING_ALGORITHMS` for all linked translation units that use algorithms. A reliable way to do this is by configuring it in the project properties instead of in the source code. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). +Assign the same value to `_USE_STD_VECTOR_FLOATING_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). ## See also From 3b81dc015a69fb2279169bc0b48ef1c3411685b1 Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Fri, 3 Oct 2025 11:11:56 -0700 Subject: [PATCH 22/23] last small edits --- docs/standard-library/vectorized-stl-algorithms.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index ff4bceb6546..958b852db64 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -7,20 +7,20 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL --- # Vectorized STL Algorithms -Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar. +Under specific conditions, algorithms in the MSVC Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar. The conditions required for vectorization are: - The container or range must be contiguous. Examples include `array`, `vector`, and `basic_string`. Types like `span` and `basic_string_view` provide contiguous ranges. Built-in arrays also form contiguous ranges. Containers like `list` and `map` aren't contiguous. - The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for arithmetic types and simple operations. - One of these conditions must be met: - - The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization). - - The algorithm's implementation explicitly uses vectorized code (manual vectorization). + - The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization). + - The algorithm's implementation explicitly uses vectorized code (manual vectorization). ## Auto-vectorization in the STL For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it applies to user code. -Algorithms like `transform`, `reduce`, and `accumulate` heavily benefit from auto-vectorization. +Algorithms like `transform`, `reduce`, and `accumulate` benefit heavily from auto-vectorization. ## Manual vectorization in the STL @@ -28,7 +28,7 @@ Certain algorithms for x64 and x86 include manual vectorization. This implementa Manually vectorized algorithms use template metaprogramming to detect if the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types. -Programs generally either benefit in performance from manual vectorization or remain unaffected by it. Disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms. +Programs either benefit in performance from manual vectorization or remain unaffected by it. Disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because `_USE_STD_VECTOR_ALGORITHMS` defaults to 1 on those platforms. Assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). @@ -72,7 +72,7 @@ The STL addresses the first two considerations safely. Only `max_element`, `min_ Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating-point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` doesn't affect anything if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. -`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set. +The `_USE_STD_VECTOR_FLOATING_ALGORITHMS` macro defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set. Assign the same value to `_USE_STD_VECTOR_FLOATING_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md). From fd1aa69a9ad8d391590ce855a105a9cec829a63b Mon Sep 17 00:00:00 2001 From: TylerMSFT Date: Fri, 3 Oct 2025 11:33:58 -0700 Subject: [PATCH 23/23] add branding --- docs/standard-library/vectorized-stl-algorithms.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/standard-library/vectorized-stl-algorithms.md b/docs/standard-library/vectorized-stl-algorithms.md index 958b852db64..13cad087a01 100644 --- a/docs/standard-library/vectorized-stl-algorithms.md +++ b/docs/standard-library/vectorized-stl-algorithms.md @@ -1,11 +1,11 @@ --- -title: "Vectorized STL Algorithms" +title: "Vectorized MSVC STL Algorithms" description: "Learn more about: Vectorized STL Algorithms" ms.date: 10/03/2025 f1_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS"] helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS", "Vector Algorithms", "Vectorization", "SIMD"] --- -# Vectorized STL Algorithms +# Vectorized MSVC STL Algorithms Under specific conditions, algorithms in the MSVC Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar. @@ -16,13 +16,13 @@ The conditions required for vectorization are: - The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization). - The algorithm's implementation explicitly uses vectorized code (manual vectorization). -## Auto-vectorization in the STL +## Auto-vectorization in the MSVC STL For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it applies to user code. Algorithms like `transform`, `reduce`, and `accumulate` benefit heavily from auto-vectorization. -## Manual vectorization in the STL +## Manual vectorization in the MSVC STL Certain algorithms for x64 and x86 include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs.