Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features in geometry tree tailored for clash detection. See #4278. #4282

Closed
wants to merge 42 commits into from

Conversation

Moult
Copy link
Contributor

@Moult Moult commented Feb 1, 2024

Definitely do not merge :)

First attempt at implementation of intersection clash checks. See #4278.

This considers only protruding clashes. It does not consider touching nor encroaching clashes and does not consider protrusion direction in the protrusion distance.

Still a lot to move around and tweak so it's also not yet ready for a detailed review and you can see lots of random prints and probably horrifically incorrect usage of pointers and structs and what not but at least something is committed to a branch for now ...

This considers only protruding clashes. It does not consider touching nor encroaching clashes and does not consider protrusion direction in the protrusion distance.
@Moult Moult requested a review from aothms February 1, 2024 07:33
@Moult Moult marked this pull request as draft February 1, 2024 07:34
…ults.

Add cache for points that have already been checked.

Separate out new add_triangulated function with new kwarg add_element(should_triangulate=True) so that you can still have the old geom tree using aabb only if you want, also because add() is used in boolean_utils. Tolerance is now a kwarg.

I misunderstood the OBB dimension and should *2 not /2 which now makes it much slower since there's more triangles to check.
… optimal OBBs, check OBB first for points_in_b prior to doing raycast (much faster now), and return the surface point of the protrusion too for convenience.
…rotrusion)

This should be equivalent to select(element) but faster and with an allow_touching toggle.
@Moult
Copy link
Contributor Author

Moult commented Feb 4, 2024

I think this is now ready for a discussion (but not for merging) :)

You'll notice this PR is mostly lines added, not removed because I didn't want to touch existing functionality.

Things not yet implemented:

  • - Many-many filtered clash sets
  • - Self clashing
  • - Considering piercing clashes in "intersection" clash mode
  • - Nicer returns rather than independent protrusion_distances etc
  • - Clash grouping
  • - Duplication clashing
  • - Box check for clearance distance
  • - No-return-early for clearance distance option
  • - Contained within for collision checks
  • - Multithreading!
  • - UB tree to BVH tree upgrade
  • - Early return for protrusion threshold
  • - Tree loading / saving
  • - Piping clash tests
  • - Ray intersection

@aothms
Copy link
Member

aothms commented Feb 7, 2024

Super awesome!

I see a couple of follow up tasks:

  • - move physx code to separate file (.cpp) maybe just use the whole lib as a proper dependency
  • - separate product geometries into solids/shells so that we can rely on bvh-based point containment test instead of non-deterministic'ish ray hit counting
  • - most unordered maps should probably be vectors because the key is simply a contiguous range of ints
  • - factor out the common bits of the clash functions
  • - I think the distrinction between add/add_triangulated shouldn't be made at insertion time but rather at tree initialization.
  • - filter out invalid tris before tree insertion instead of maintaining a map of bools
  • - as you said replace triangleintersects.hpp with physx equivalent
  • - tag open shells so that we know we can't rely on/do containment tests
  • - I would probably store the points in a vector and index into them for every triangle, as opposed to storing all three points du

plicated, to save some mem

I don't think I'm going to touch any of the geometric predicates. It sounds rather thought through. In some cases I'd have different preferences (like I conceptually don't really like how the distinction between piercing and protruding happens based on loose edges/verts as opposed to topology, but don't see a quick way out of resolving that without a more proper triangle mesh datastructure)

I can work on these follow up tasks if you want, but then we need to agree on some "handover time" so that we're loosing time on conflicts.

@Moult
Copy link
Contributor Author

Moult commented Feb 8, 2024

filter out invalid tris before tree insertion instead of maintaining a map of bools

I'm not sure how to do this. The BVH tree is created from a triangle_set, and a triangle_set is created from a shape list, and a shape list is a list of TopoDS_Face, not triangles. I'm guessing somehow under the hood it reuses the triangulation data from incremental mesh, but I don't know how to get at the triangles until after the BVH is actually created.

@Moult
Copy link
Contributor Author

Moult commented Feb 8, 2024

A small benchmark regarding the vertex indices indirection on a 40MB data set clashing all 6400 elements against each other using a intersection check (without early returns). I'm measuring this portion of the code:

    for element in sorted(all_els, key=lambda e: e.Name):
        clashes = tree.clash_intersection(element, tolerance=0.002, check_all=True)
Test Mem (excl 682MB for file.open()) Time
Before indirection 1,473MB 10.2s
After indirection 1,112MB 12.7s

Note: negligible time difference in the tree adding portion of the code.

@Moult
Copy link
Contributor Author

Moult commented Feb 8, 2024

OK I think I've done what I can. The remaining unchecked boxes I either don't know how to do properly or probably need a bigger architectural decision (also probably worth thinking what signatures to use for many-many clashes). I'll be pens down for the rest of today if you want to write code :)

@Moult
Copy link
Contributor Author

Moult commented Feb 8, 2024

Current benchmarks :) Notice the growing majority of time in opening, building the tree, and the crazy high memory use.

All times in seconds. Memory in MB. All elements are clashed against all other elements.

  • Open = time to ifcopenshell.open(...)
  • Tree = time to create tree, either (B) baseline or (T) with triangulation / elem BVH
  • Col = collision, allowing touching
  • Int = intersection, 2mm tolerance with (R) return early or (A) check all
  • Clr = clearance check of 100mm
  • Sel = tree.select(e), which is functionally equivalent to Col or Int (if you need protrusion distances - except that Sel distances aren't correct atm)
  • Mem = memory, where (B) represents a "baseline" of only ifcopenshell.open() and creating the current box-based UBTree, and (A) represents the peak memory usage after everything including triangulated tree building and clashing. Measured using print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)

Note that tree.select(e, extend=x) is not measured (equivalent to Clr) because it's simply too slow (gave up measuring after 5 minutes).

Dataset # Objs Open Tree(B) Tree(T) Col Int(R) Int(A) Clr(R) Sel Mem(B) Mem(A)
40MB of Elec / Fire 6,412 1.65 1.85 3.3 0.12 0.35 0.76 0.65 9.5 995 1,760
200MB or Elec / Fire / Hyd / Mech 18,513 8.8 34.21 39.6 0.46 1.8 3.9 20.5 130 8,410 10,820

@Moult
Copy link
Contributor Author

Moult commented Feb 11, 2024

The latest commit introduces many-many variants of the clash functions: clash_intersection_many, clash_collision_many, and clash_clearance_many. I don't think there's any reason to keep the old 1:N clash functions (e.g. clash_intersection) because the N:N versions are always faster... like significantly so. See the edit I've made to the benchmarks post to see the huge improvement in numbers :) (especially when the same objects are in both set A and B).

Do you see any reason to keep the 1:N functions or can / should I delete them?

@Moult
Copy link
Contributor Author

Moult commented Feb 12, 2024

I had a shot at implementing multithreading. I attempted to try to do something similar to

for (auto& rep : tasks_) {
MAKE_TYPE_NAME(Kernel)* K = nullptr;
if (threadpool.size() < kernel_pool.size()) {
K = kernel_pool[threadpool.size()];
}
while (threadpool.size() == conc_threads) {
for (int i = 0; i < (int)threadpool.size(); i++) {
auto& fu = threadpool[i];
std::future_status status;
status = fu.wait_for(std::chrono::seconds(0));
if (status == std::future_status::ready) {
process_finished_rep(fu.get());
std::swap(threadpool[i], threadpool.back());
threadpool.pop_back();
std::swap(kernel_pool[i], kernel_pool.back());
K = kernel_pool.back();
break;
} // if
} // for
} // while
std::future<geometry_conversion_task*> fu = std::async(
std::launch::async, [this](
IfcGeom::MAKE_TYPE_NAME(Kernel)* kernel,
const IfcGeom::IteratorSettings& settings,
geometry_conversion_task* rep) {
this->create_element_(kernel, settings, rep);
return rep;
},
K,
std::ref(settings),
&rep);
threadpool.emplace_back(std::move(fu));
}
but failed horribly. I ended up using a mutex with a very naive approach:

  1. After the box BVH clash, create a task queue of clashes
  2. Divide that queue by num_threads
  3. Use threads to do the work, use a mutex to lock, and merge the results into a final results vector to return

I've got no idea if there is a better way to do it, but I've just updated the results table again and I'm very impressed with the results. I think I've run out of tricks I can think of to further optimise the clash portion of the code. I reckon it's now time to move on to optimising opening / tree creation.

@aothms
Copy link
Member

aothms commented Feb 20, 2024

Experiment with multithreading

Multi-threading on IO bound tasks like this likely has very little effect. It may even make things worse because of all the locking that has to be put in place.

@Moult
Copy link
Contributor Author

Moult commented Feb 20, 2024

Hmm, I'm looking to crush the 140 seconds of save/load time mentioned here. I did a couple of measurements:

108 seconds to convert to H5 via Python (89 seconds for geom iteration + ~19 seconds for H5 processing).

29 seconds to load a chunked H5 into Blender. (26 seconds of H5 loading / processing, and ~3 seconds of creating Blender objects)

I was hoping that merely using C++ / multithreading would be enough to crush either the 19 or 26 seconds.

@Moult
Copy link
Contributor Author

Moult commented Feb 21, 2024

The good news is that with my attempt at porting the loading code to C++, the 29 seconds it took to load a H5 into Blender has now dropped to 6.6 seconds. Woohoo! (half the time loading the H5, and the other half creating Blender objects). No multithreading was used.

I found a crazy behaviour in H5 where getting a subgroup name was very, very slow. Maybe that explains why the Python code for shape_id, shape in model["shapes"].items() is so slow.

I also found that the casting from SWIG wrapped vector to Python list was very slow too. I got around this by implementing numpy.i.

The bad news is that I've gone past the point of knowing what I'm doing and I have absolutely no idea how this now compiles (I manually copied over numpy.i and the numpy include .h directory, and manually included something in CMakeLists that obviously only works on my machine. So there's a huge amount of cleanup to do ... but hey I'm still really excited about the numbers! :)

@Moult
Copy link
Contributor Author

Moult commented Feb 21, 2024

Given there's a week left before the release, here's the coordination usecase wishlist I'd ideally like covered by then.

  1. I want to just open a model to inspect it visually as fast as possible
    • chunked loading (direct from geom iterator)
    • save sqlite
    • query sqlite
    • save to blend for future loading (no h5 necessary)
  2. I want to open many large models to inspect it visually with basic properties as fast as possible
    • Blender ui / operators to trigger preprocessing steps
    • save h5
    • save sqlite
    • load h5
    • query sqlite
  3. I want to run clash detection on a model I'm authoring
    • updated ifcclash
    • updated blender clash ui
    • build tree
    • run n:n clash functions
  4. I want to run clash detection on many large models
    • updated ifcclash
    • updated blender clash ui
    • save h5
    • load h5
    • (ideally not build) load tree
    • run n:n clash functions

@Moult
Copy link
Contributor Author

Moult commented Feb 22, 2024

I tested with saving to H5 via C++ and it seems as though the 108 seconds have also dropped down to 89 seconds for saving a H5. I guess all the overhead in the past was in passing big lists to Python and handling those lists there.

(Note I originally measured 108 seconds but when remeasuring with my own compiled version of IfcOpenShell it went down to 101 seconds, I wonder is there is some -March native optimisation compared to IfcOpenBot, or if the latest OCCT 7.7 is faster somehow)

I did a few measurements:

  • Given %template(FloatVector) vector<float>;, a h5_shape.verts is returned as a <ifcopenshell.ifcopenshell_wrapper.FloatVector; proxy of <Swig Object of type 'std::vector< float > *' at 0x7f0f871a8ff0> >. If I use this directly in Blender code (e.g. to create meshes with) for a random file it takes 14.7 seconds. Using the SWIG objects directly is slow it seems.
  • If I cast to list list(e.verts) and then create meshes using the Python list for all subsequent code it drops to 8.9 seconds.
  • If I use numpy.i and provide a numpy.ndarray directly from C++ then it drops to 6.5 seconds.

I wonder if this means that there could be a benefit in serving TriangulationElement geometry verts/edges/faces as numpy arrays (such as for general geometry iteration that everybody uses). I didn't look in detail as to how that's managed in SWIG but type(shape.geometry.verts) says tuple so maybe it's a different story.

@Moult
Copy link
Contributor Author

Moult commented Feb 22, 2024

It's now possible to chunk directly from an IFC (instead of first having to save out a H5). This means that users can press a button in Blender and immediately load and see an IFC with a fast FPS. Something that would previously take almost 5 minutes and have them browse around with 3 FPS would now take 83 seconds and at 30 FPS.

And this should work for multiple models too! (Once I build the operator for it) Users would also be able to headlessly run it in the background and save out a Blend file and auto-link that blend file to the scene (and then memory used for ifcopenshell.open() should be freed I think). So in a single session they can load in many models and federate them conveniently.

(BTW you've probably noticed the code getting worse and worse, I'd love to clean it up but I need your guidance, and there's a lot of magic around SWIG which escapes me. I've definitely gone overscope in this and starting working on #4279 which is related but perhaps should be in a separate PR)

@Moult
Copy link
Contributor Author

Moult commented Feb 23, 2024

OK, now when loading in a chunked model, you can activate a tool where you can "click" on objects. It'll "highlight" the object you clicked on (it's not a true editable object after all) and it'll query the SQLite db for object properties (see the right hand side of the screenshot).

2024-02-23-230139_1920x1080_scrot

… balance RAM (instancing) vs speed (chunking)
 - Only affects non-chunked instances (where verts are higher so the impact will be greater)
 - Things far away or not in view of the camera will be rendered as bounds
@Moult
Copy link
Contributor Author

Moult commented Feb 27, 2024

Next steps:

  1. Clash code to be reviewed by @aothms to be ready to merge. @Moult to rewrite ifcclash python frontend to work with it, and unbundle hppfcl deps. (excluding any H5 functionality related to IfcClash)
  2. @aothms to investigate alternative to numpy.i so iterator can return efficiently to Python. @Moult to write rewrite chunking in Python reading from this "buffer" and using numpy as efficiently as possible to do chunking and measure how this compares to the C++ process_chunk() / get_chunk(). Goal is to 1) reduce complexity of numpy returns 2) prevent extra code branching in iterator to maintain.
  3. @aothms to start separating H5 into its own distinct serialiser with resolving hacks like the 2 chunk definitions, the vector vs double
  4. Adding support for AABB/OBB in H5
  5. Read/write BVH tree tri swap indices
  6. Future: memory research on pure viewer

@aothms aothms mentioned this pull request Feb 27, 2024
@Moult
Copy link
Contributor Author

Moult commented Feb 28, 2024

Thanks to your awesome work @aothms I think we can close #4369, and tomorrow I'll do a quick commit to delete all the chunking C++ code, and do a bit of clean up to finish integrating it into Blender, and then I'd say action 2 is done!

@Moult Moult closed this Feb 29, 2024
@Moult Moult mentioned this pull request Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants