Geometry_Engine: GeometryHash as a single value #3251

pawelbaran · 2024-01-17T13:49:02Z

Important: highly recommended to read the text with this script open to visualise the thought process 😉

During and following the recent chats with @al-fisher and @alelom, we were investigating the possibility of using GeometryHash to efficiently hash geometry. The method itself works great, but it does return an array of doubles, which is not very practical if meant to be used as a db or dictionary key. Therefore, we started thinking of aggregating the output array into a single value.

First guess of @alelom was to simply sum the values in the array, but that yields wrong results for cases where the sum of coordinates of points defining the hashed geometry is the same. That is caused by each geometry being hashed by degrading it to its defining points, which then get converted into a double array:

BHoM_Engine/Geometry_Engine/Query/GeometryHash.cs

Lines 405 to 413 in a5674ef

    
           private static double[] ToDoubleArray(this Point p, double typeTranslationFactor) 
        
           { 
        
               return new double[] 
        
               { 
        
                   p.X + typeTranslationFactor, 
        
                   p.Y + typeTranslationFactor, 
        
                   p.Z + typeTranslationFactor 
        
               }; 
        
           }

..where translationFactor is (usually) a constant depending on geometry type. So Point (1,2,3) will yield the same aggregate hash as (3,2,1), and Line { (1,2,3), (3,2,1) } will yield the same aggregate has as { (0,0,6), (6,0,0) } etc.

First solution for the above that I tried was to generate the aggregate by multiplying each subsequent 3 values from the output array by a constant set of 3 different prime numbers. This, however, works only against points with swapped/different coordinates (e.g. (1,2,3) is considered different than (3,2,1)), but fails against higher order geometries with shifted defining points (e.g. Polyline (A,B,C) has same geometry hash aggregate as (B,A,C)).

Second solution that I tried and that seems to be reliable enough is to generate the aggregate by multiplying each subsequent 3 values from the output array by a changing set of 3 different prime numbers, i.e. we have a pool of n prime numbers, from which per each subsequent trio of doubles we pick another set of 3, e.g. for 1st trio we apply prime numbers at indexes 0,1,2 for the next trio 1,2,3 etc. Please see the attached script to visualise.

Here come the questions 😃

do you see any immediate gap in my thought process that would prove the second solution incorrect?
if not, shall we add a new method in Geometry_Engine, e.g. AggregateGeometryHash or shall I keep it in my test bed for now?

I am looking forward to actioning this!

The text was updated successfully, but these errors were encountered:

alelom · 2024-01-17T15:29:26Z

Thanks @pawelbaran !

First off, I definitely agree, we need an aggregate geometry hash. I think the current methods are useful but feel more like the "back-end" of what the GeometryHash should be. Further, I think that the general definition of hash is of a single string or number representing the signature of an object, not of an array of elements, because that's of difficult use.
For this reason, I raised #3253 , suggesting we rename the current methods to ToHashArray(), or ToHashDoubleArray(), or similar. I think we should implement an "aggregate" function as you suggest and have that as our GeometryHash() method(s).

Secondly, I also agree with the implementation you've made, it makes sense and it's robust enough.
I think we can simplify it a bit by multiplying for 1,2,3, because I think that is the simplest implementation that keeps the same robustness, while minimizing the rate of increase of the operands. I would like to avoid working with extremely large numbers if that can be avoided, because it's wasteful (leads to overflows or wraps from zero), and I don't think that prime numbers add robustness. I tried your script replacing the list of prime numbers with 1,2,3 and the result is the same.
Additionally, we also need to add an unchecked statement around these methods, as it is good practice when writing hash methods.

pawelbaran · 2024-01-17T15:43:51Z

Thanks @alelom, 100% agreed, all makes sense 👍

One word of caution though: using only 3 multipliers in alternating order leads to duplicate hashes for geometries that are represented by 3 points (e.g. arcs). I ran into exactly such edge case while building the script (where I started from 3 prime numbers instead of 10). So I would consider starting from at least 5 different multipliers as circles are represented by 4 points - to avoid duplicates in case of the most primitive geometries. Easiest way of testing against this edge case is creating a polyline with just a few control points in various order.

Finally, one more thought that I had in mind: I would be a bit afraid of using double as return type as it by nature bears flaws related to the way in which it is stored in memory. Integers or strings seem to have a more 'stable' nature - maybe I am overreacting, but rather safe than sorry 😉

…://github.com/BHoM/BHoM_Engine into Geometry_Engine-#3251-GeometryHashSingleValue

pawelbaran added the type:question Ask for further details or start conversation label Jan 17, 2024

pawelbaran assigned alelom, al-fisher, IsakNaslundBh and pawelbaran Jan 17, 2024

alelom mentioned this issue Jan 17, 2024

Geometry_Engine: consider renaming current GeometryHash() methods to ToHashArray(); expose a new GeometryHash() method that uses ToHashArray() to return a single number #3253

Closed

alelom added the type:feature New capability or enhancement label Jan 17, 2024

alelom added a commit that referenced this issue Jan 23, 2024

Merge branch 'Geometry_Engine-#3251-GeometryHashSingleValue' of https…

803c4dd

…://github.com/BHoM/BHoM_Engine into Geometry_Engine-#3251-GeometryHashSingleValue

FraserGreenroyd closed this as completed in #3257 Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geometry_Engine: GeometryHash as a single value #3251

Geometry_Engine: GeometryHash as a single value #3251

pawelbaran commented Jan 17, 2024 •

edited

Loading

alelom commented Jan 17, 2024

pawelbaran commented Jan 17, 2024

Geometry_Engine: GeometryHash as a single value #3251

Geometry_Engine: GeometryHash as a single value #3251

Comments

pawelbaran commented Jan 17, 2024 • edited Loading

alelom commented Jan 17, 2024

pawelbaran commented Jan 17, 2024

pawelbaran commented Jan 17, 2024 •

edited

Loading