BLI: refactor IndexMask for better performance and memory usage

Goals of this refactor: * Reduce memory consumption of `IndexMask`. The old `IndexMask` uses an `int64_t` for each index which is more than necessary in pretty much all practical cases currently. Using `int32_t` might still become limiting in the future in case we use this to index e.g. byte buffers larger than a few gigabytes. We also don't want to template `IndexMask`, because that would cause a split in the "ecosystem", or everything would have to be implemented twice or templated. * Allow for more multi-threading. The old `IndexMask` contains a single array. This is generally good but has the problem that it is hard to fill from multiple-threads when the final size is not known from the beginning. This is commonly the case when e.g. converting an array of bool to an index mask. Currently, this kind of code only runs on a single thread. * Allow for efficient set operations like join, intersect and difference. It should be possible to multi-thread those operations. * It should be possible to iterate over an `IndexMask` very efficiently. The most important part of that is to avoid all memory access when iterating over continuous ranges. For some core nodes (e.g. math nodes), we generate optimized code for the cases of irregular index masks and simple index ranges. To achieve these goals, a few compromises had to made: * Slicing of the mask (at specific indices) and random element access is `O(log #indices)` now, but with a low constant factor. It should be possible to split a mask into n approximately equally sized parts in `O(n)` though, making the time per split `O(1)`. * Using range-based for loops does not work well when iterating over a nested data structure like the new `IndexMask`. Therefor, `foreach_*` functions with callbacks have to be used. To avoid extra code complexity at the call site, the `foreach_*` methods support multi-threading out of the box. The new data structure splits an `IndexMask` into an arbitrary number of ordered `IndexMaskSegment`. Each segment can contain at most `2^14 = 16384` indices. The indices within a segment are stored as `int16_t`. Each segment has an additional `int64_t` offset which allows storing arbitrary `int64_t` indices. This approach has the main benefits that segments can be processed/constructed individually on multiple threads without a serial bottleneck. Also it reduces the memory requirements significantly. For more details see comments in `BLI_index_mask.hh`. I did a few tests to verify that the data structure generally improves performance and does not cause regressions: * Our field evaluation benchmarks take about as much as before. This is to be expected because we already made sure that e.g. add node evaluation is vectorized. The important thing here is to check that changes to the way we iterate over the indices still allows for auto-vectorization. * Memory usage by a mask is about 1/4 of what it was before in the average case. That's mainly caused by the switch from `int64_t` to `int16_t` for indices. In the worst case, the memory requirements can be larger when there are many indices that are very far away. However, when they are far away from each other, that indicates that there aren't many indices in total. In common cases, memory usage can be way lower than 1/4 of before, because sub-ranges use static memory. * For some more specific numbers I benchmarked `IndexMask::from_bools` in `index_mask_from_selection` on 10.000.000 elements at various probabilities for `true` at every index: ``` Probability Old New 0 4.6 ms 0.8 ms 0.001 5.1 ms 1.3 ms 0.2 8.4 ms 1.8 ms 0.5 15.3 ms 3.0 ms 0.8 20.1 ms 3.0 ms 0.999 25.1 ms 1.7 ms 1 13.5 ms 1.1 ms ``` Pull Request: https://projects.blender.org/blender/blender/pulls/104629
UPBGE · May 24, 2023 · 2cfcb8b · 2cfcb8b
1 parent f3f2f7f
commit 2cfcb8b
Show file tree

Hide file tree

Showing 182 changed files with 4,061 additions and 2,954 deletions.
diff --git a/source/blender/blenkernel/BKE_attribute_math.hh b/source/blender/blenkernel/BKE_attribute_math.hh
@@ -294,7 +294,7 @@ template<typename T> class SimpleMixer {
   /**
    * \param mask: Only initialize these indices. Other indices in the buffer will be invalid.
    */
-  SimpleMixer(MutableSpan<T> buffer, const IndexMask mask, T default_value = {})
+  SimpleMixer(MutableSpan<T> buffer, const IndexMask &mask, T default_value = {})
       : buffer_(buffer), default_value_(default_value), total_weights_(buffer.size(), 0.0f)
   {
     BLI_STATIC_ASSERT(std::is_trivial_v<T>, "");
@@ -327,7 +327,7 @@ template<typename T> class SimpleMixer {
     this->finalize(IndexMask(buffer_.size()));
   }
 
-  void finalize(const IndexMask mask)
+  void finalize(const IndexMask &mask)
   {
     mask.foreach_index([&](const int64_t i) {
       const float weight = total_weights_[i];
@@ -365,7 +365,7 @@ class BooleanPropagationMixer {
   /**
    * \param mask: Only initialize these indices. Other indices in the buffer will be invalid.
    */
-  BooleanPropagationMixer(MutableSpan<bool> buffer, const IndexMask mask) : buffer_(buffer)
+  BooleanPropagationMixer(MutableSpan<bool> buffer, const IndexMask &mask) : buffer_(buffer)
   {
     mask.foreach_index([&](const int64_t i) { buffer_[i] = false; });
   }
@@ -391,7 +391,7 @@ class BooleanPropagationMixer {
    */
   void finalize() {}
 
-  void finalize(const IndexMask /*mask*/) {}
+  void finalize(const IndexMask & /*mask*/) {}
 };
 
 /**
@@ -421,7 +421,7 @@ class SimpleMixerWithAccumulationType {
    * \param mask: Only initialize these indices. Other indices in the buffer will be invalid.
    */
   SimpleMixerWithAccumulationType(MutableSpan<T> buffer,
-                                  const IndexMask mask,
+                                  const IndexMask &mask,
                                   T default_value = {})
       : buffer_(buffer), default_value_(default_value), accumulation_buffer_(buffer.size())
   {
@@ -449,7 +449,7 @@ class SimpleMixerWithAccumulationType {
     this->finalize(buffer_.index_range());
   }
 
-  void finalize(const IndexMask mask)
+  void finalize(const IndexMask &mask)
   {
     mask.foreach_index([&](const int64_t i) {
       const Item &item = accumulation_buffer_[i];
@@ -478,12 +478,12 @@ class ColorGeometry4fMixer {
    * \param mask: Only initialize these indices. Other indices in the buffer will be invalid.
    */
   ColorGeometry4fMixer(MutableSpan<ColorGeometry4f> buffer,
-                       IndexMask mask,
+                       const IndexMask &mask,
                        ColorGeometry4f default_color = ColorGeometry4f(0.0f, 0.0f, 0.0f, 1.0f));
   void set(int64_t index, const ColorGeometry4f &color, float weight = 1.0f);
   void mix_in(int64_t index, const ColorGeometry4f &color, float weight = 1.0f);
   void finalize();
-  void finalize(IndexMask mask);
+  void finalize(const IndexMask &mask);
 };
 
 class ColorGeometry4bMixer {
@@ -500,12 +500,12 @@ class ColorGeometry4bMixer {
    * \param mask: Only initialize these indices. Other indices in the buffer will be invalid.
    */
   ColorGeometry4bMixer(MutableSpan<ColorGeometry4b> buffer,
-                       IndexMask mask,
+                       const IndexMask &mask,
                        ColorGeometry4b default_color = ColorGeometry4b(0, 0, 0, 255));
   void set(int64_t index, const ColorGeometry4b &color, float weight = 1.0f);
   void mix_in(int64_t index, const ColorGeometry4b &color, float weight = 1.0f);
   void finalize();
-  void finalize(IndexMask mask);
+  void finalize(const IndexMask &mask);
 };
 
 template<typename T> struct DefaultMixerStruct {

diff --git a/source/blender/blenkernel/BKE_curves.hh b/source/blender/blenkernel/BKE_curves.hh
@@ -160,7 +160,7 @@ class CurvesGeometry : public ::CurvesGeometry {
   /** Set all curve types to the value and call #update_curve_types. */
   void fill_curve_types(CurveType type);
   /** Set the types for the curves in the selection and call #update_curve_types. */
-  void fill_curve_types(IndexMask selection, CurveType type);
+  void fill_curve_types(const IndexMask &selection, CurveType type);
   /** Update the cached count of curves of each type, necessary after #curve_types_for_write. */
   void update_curve_types();
 
@@ -173,10 +173,10 @@ class CurvesGeometry : public ::CurvesGeometry {
   /**
    * All of the curve indices for curves with a specific type.
    */
-  IndexMask indices_for_curve_type(CurveType type, Vector<int64_t> &r_indices) const;
+  IndexMask indices_for_curve_type(CurveType type, IndexMaskMemory &memory) const;
   IndexMask indices_for_curve_type(CurveType type,
-                                   IndexMask selection,
-                                   Vector<int64_t> &r_indices) const;
+                                   const IndexMask &selection,
+                                   IndexMaskMemory &memory) const;
 
   Array<int> point_to_curve_map() const;
 
@@ -361,16 +361,16 @@ class CurvesGeometry : public ::CurvesGeometry {
 
   void calculate_bezier_auto_handles();
 
-  void remove_points(IndexMask points_to_delete,
+  void remove_points(const IndexMask &points_to_delete,
                      const AnonymousAttributePropagationInfo &propagation_info = {});
-  void remove_curves(IndexMask curves_to_delete,
+  void remove_curves(const IndexMask &curves_to_delete,
                      const AnonymousAttributePropagationInfo &propagation_info = {});
 
   /**
    * Change the direction of selected curves (switch the start and end) without changing their
    * shape.
    */
-  void reverse_curves(IndexMask curves_to_reverse);
+  void reverse_curves(const IndexMask &curves_to_reverse);
 
   /**
    * Remove any attributes that are unused based on the types in the curves.

diff --git a/source/blender/blenkernel/BKE_curves_utils.hh b/source/blender/blenkernel/BKE_curves_utils.hh
@@ -481,14 +481,14 @@ void copy_point_data(OffsetIndices<int> src_points_by_curve,
 
 void copy_point_data(OffsetIndices<int> src_points_by_curve,
                      OffsetIndices<int> dst_points_by_curve,
-                     IndexMask src_curve_selection,
+                     const IndexMask &src_curve_selection,
                      GSpan src,
                      GMutableSpan dst);
 
 template<typename T>
 void copy_point_data(OffsetIndices<int> src_points_by_curve,
                      OffsetIndices<int> dst_points_by_curve,
-                     IndexMask src_curve_selection,
+                     const IndexMask &src_curve_selection,
                      Span<T> src,
                      MutableSpan<T> dst)
 {
@@ -500,13 +500,13 @@ void copy_point_data(OffsetIndices<int> src_points_by_curve,
 }
 
 void fill_points(OffsetIndices<int> points_by_curve,
-                 IndexMask curve_selection,
+                 const IndexMask &curve_selection,
                  GPointer value,
                  GMutableSpan dst);
 
 template<typename T>
 void fill_points(const OffsetIndices<int> points_by_curve,
-                 IndexMask curve_selection,
+                 const IndexMask &curve_selection,
                  const T &value,
                  MutableSpan<T> dst)
 {
@@ -541,7 +541,9 @@ bke::CurvesGeometry copy_only_curve_domain(const bke::CurvesGeometry &src_curves
 /**
  * Copy the number of points in every curve in the mask to the corresponding index in #sizes.
  */
-void copy_curve_sizes(OffsetIndices<int> points_by_curve, IndexMask mask, MutableSpan<int> sizes);
+void copy_curve_sizes(OffsetIndices<int> points_by_curve,
+                      const IndexMask &mask,
+                      MutableSpan<int> sizes);
 
 /**
  * Copy the number of points in every curve in #curve_ranges to the corresponding index in
@@ -554,12 +556,12 @@ void copy_curve_sizes(OffsetIndices<int> points_by_curve,
 IndexMask indices_for_type(const VArray<int8_t> &types,
                            const std::array<int, CURVE_TYPES_NUM> &type_counts,
                            const CurveType type,
-                           const IndexMask selection,
-                           Vector<int64_t> &r_indices);
+                           const IndexMask &selection,
+                           IndexMaskMemory &memory);
 
 void foreach_curve_by_type(const VArray<int8_t> &types,
                            const std::array<int, CURVE_TYPES_NUM> &type_counts,
-                           IndexMask selection,
+                           const IndexMask &selection,
                            FunctionRef<void(IndexMask)> catmull_rom_fn,
                            FunctionRef<void(IndexMask)> poly_fn,
                            FunctionRef<void(IndexMask)> bezier_fn,

diff --git a/source/blender/blenkernel/BKE_geometry_fields.hh b/source/blender/blenkernel/BKE_geometry_fields.hh
@@ -136,53 +136,55 @@ class GeometryFieldInput : public fn::FieldInput {
  public:
   using fn::FieldInput::FieldInput;
   GVArray get_varray_for_context(const fn::FieldContext &context,
-                                 IndexMask mask,
+                                 const IndexMask &mask,
                                  ResourceScope &scope) const override;
   virtual GVArray get_varray_for_context(const GeometryFieldContext &context,
-                                         IndexMask mask) const = 0;
+                                         const IndexMask &mask) const = 0;
   virtual std::optional<eAttrDomain> preferred_domain(const GeometryComponent &component) const;
 };
 
 class MeshFieldInput : public fn::FieldInput {
  public:
   using fn::FieldInput::FieldInput;
   GVArray get_varray_for_context(const fn::FieldContext &context,
-                                 IndexMask mask,
+                                 const IndexMask &mask,
                                  ResourceScope &scope) const override;
   virtual GVArray get_varray_for_context(const Mesh &mesh,
                                          eAttrDomain domain,
-                                         IndexMask mask) const = 0;
+                                         const IndexMask &mask) const = 0;
   virtual std::optional<eAttrDomain> preferred_domain(const Mesh &mesh) const;
 };
 
 class CurvesFieldInput : public fn::FieldInput {
  public:
   using fn::FieldInput::FieldInput;
   GVArray get_varray_for_context(const fn::FieldContext &context,
-                                 IndexMask mask,
+                                 const IndexMask &mask,
                                  ResourceScope &scope) const override;
   virtual GVArray get_varray_for_context(const CurvesGeometry &curves,
                                          eAttrDomain domain,
-                                         IndexMask mask) const = 0;
+                                         const IndexMask &mask) const = 0;
   virtual std::optional<eAttrDomain> preferred_domain(const CurvesGeometry &curves) const;
 };
 
 class PointCloudFieldInput : public fn::FieldInput {
  public:
   using fn::FieldInput::FieldInput;
   GVArray get_varray_for_context(const fn::FieldContext &context,
-                                 IndexMask mask,
+                                 const IndexMask &mask,
                                  ResourceScope &scope) const override;
-  virtual GVArray get_varray_for_context(const PointCloud &pointcloud, IndexMask mask) const = 0;
+  virtual GVArray get_varray_for_context(const PointCloud &pointcloud,
+                                         const IndexMask &mask) const = 0;
 };
 
 class InstancesFieldInput : public fn::FieldInput {
  public:
   using fn::FieldInput::FieldInput;
   GVArray get_varray_for_context(const fn::FieldContext &context,
-                                 IndexMask mask,
+                                 const IndexMask &mask,
                                  ResourceScope &scope) const override;
-  virtual GVArray get_varray_for_context(const Instances &instances, IndexMask mask) const = 0;
+  virtual GVArray get_varray_for_context(const Instances &instances,
+                                         const IndexMask &mask) const = 0;
 };
 
 class AttributeFieldInput : public GeometryFieldInput {
@@ -212,7 +214,7 @@ class AttributeFieldInput : public GeometryFieldInput {
   }
 
   GVArray get_varray_for_context(const GeometryFieldContext &context,
-                                 IndexMask mask) const override;
+                                 const IndexMask &mask) const override;
 
   std::string socket_inspection_name() const override;
 
@@ -229,7 +231,7 @@ class IDAttributeFieldInput : public GeometryFieldInput {
   }
 
   GVArray get_varray_for_context(const GeometryFieldContext &context,
-                                 IndexMask mask) const override;
+                                 const IndexMask &mask) const override;
 
   std::string socket_inspection_name() const override;
 
@@ -239,7 +241,7 @@ class IDAttributeFieldInput : public GeometryFieldInput {
 
 VArray<float3> curve_normals_varray(const CurvesGeometry &curves, const eAttrDomain domain);
 
-VArray<float3> mesh_normals_varray(const Mesh &mesh, const IndexMask mask, eAttrDomain domain);
+VArray<float3> mesh_normals_varray(const Mesh &mesh, const IndexMask &mask, eAttrDomain domain);
 
 class NormalFieldInput : public GeometryFieldInput {
  public:
@@ -249,7 +251,7 @@ class NormalFieldInput : public GeometryFieldInput {
   }
 
   GVArray get_varray_for_context(const GeometryFieldContext &context,
-                                 IndexMask mask) const override;
+                                 const IndexMask &mask) const override;
 
   std::string socket_inspection_name() const override;
 
@@ -288,7 +290,7 @@ class AnonymousAttributeFieldInput : public GeometryFieldInput {
   }
 
   GVArray get_varray_for_context(const GeometryFieldContext &context,
-                                 IndexMask mask) const override;
+                                 const IndexMask &mask) const override;
 
   std::string socket_inspection_name() const override;
 
@@ -302,7 +304,7 @@ class CurveLengthFieldInput final : public CurvesFieldInput {
   CurveLengthFieldInput();
   GVArray get_varray_for_context(const CurvesGeometry &curves,
                                  eAttrDomain domain,
-                                 IndexMask mask) const final;
+                                 const IndexMask &mask) const final;
   uint64_t hash() const override;
   bool is_equal_to(const fn::FieldNode &other) const override;
   std::optional<eAttrDomain> preferred_domain(const bke::CurvesGeometry &curves) const final;

diff --git a/source/blender/blenkernel/BKE_instances.hh b/source/blender/blenkernel/BKE_instances.hh
@@ -155,7 +155,7 @@ class Instances {
    * Remove the indices that are not contained in the mask input, and remove unused instance
    * references afterwards.
    */
-  void remove(const blender::IndexMask mask,
+  void remove(const blender::IndexMask &mask,
               const blender::bke::AnonymousAttributePropagationInfo &propagation_info);
   /**
    * Get an id for every instance. These can be used for e.g. motion blur.

diff --git a/source/blender/blenkernel/BKE_mesh_sample.hh b/source/blender/blenkernel/BKE_mesh_sample.hh
@@ -32,7 +32,7 @@ void sample_point_attribute(Span<int> corner_verts,
                             Span<int> looptri_indices,
                             Span<float3> bary_coords,
                             const GVArray &src,
-                            IndexMask mask,
+                            const IndexMask &mask,
                             GMutableSpan dst);
 
 void sample_point_normals(Span<int> corner_verts,
@@ -47,20 +47,20 @@ void sample_corner_attribute(Span<MLoopTri> looptris,
                              Span<int> looptri_indices,
                              Span<float3> bary_coords,
                              const GVArray &src,
-                             IndexMask mask,
+                             const IndexMask &mask,
                              GMutableSpan dst);
 
 void sample_corner_normals(Span<MLoopTri> looptris,
                            Span<int> looptri_indices,
                            Span<float3> bary_coords,
                            Span<float3> src,
-                           IndexMask mask,
+                           const IndexMask &mask,
                            MutableSpan<float3> dst);
 
 void sample_face_attribute(Span<int> looptri_polys,
                            Span<int> looptri_indices,
                            const GVArray &src,
-                           IndexMask mask,
+                           const IndexMask &mask,
                            GMutableSpan dst);
 
 /**
@@ -148,7 +148,7 @@ class BaryWeightFromPositionFn : public mf::MultiFunction {
 
  public:
   BaryWeightFromPositionFn(GeometrySet geometry);
-  void call(IndexMask mask, mf::Params params, mf::Context context) const;
+  void call(const IndexMask &mask, mf::Params params, mf::Context context) const;
 };
 
 /**
@@ -163,7 +163,7 @@ class CornerBaryWeightFromPositionFn : public mf::MultiFunction {
 
  public:
   CornerBaryWeightFromPositionFn(GeometrySet geometry);
-  void call(IndexMask mask, mf::Params params, mf::Context context) const;
+  void call(const IndexMask &mask, mf::Params params, mf::Context context) const;
 };
 
 /**
@@ -183,7 +183,7 @@ class BaryWeightSampleFn : public mf::MultiFunction {
  public:
   BaryWeightSampleFn(GeometrySet geometry, fn::GField src_field);
 
-  void call(IndexMask mask, mf::Params params, mf::Context context) const;
+  void call(const IndexMask &mask, mf::Params params, mf::Context context) const;
 
  private:
   void evaluate_source(fn::GField src_field);