Feature request: asynchronous loading of mesh objects #533

aksyom · 2017-07-12T21:21:59Z

In an Open World type of a game one would want a mechanism to load big chunks of geometry data asynchronously on background thread. This would enable one to remove unused terrain LOD layers from RAM, as there is a guarantee one can load them from file once again when they are needed. What I mean by LOD layers here are weight-decimated low-poly versions of a full-detail geometry, each version decimated from a specific focal point on the XY-axis. This is a very basic technique, but it is actually VERY efficient because it doesn't waste CPU or GPU time on dynamic LOD generation. Extremely good for machines with slow CPU and GPU.

Currently both BGE and UPBGE only support asynchronous loading (in a background thread) for Scenes. If a scenes are async loaded from a blend, all their objects are merged into the current scene, but they do not work as they should; they only use the lighting from the original scenes they were loaded from. This is a weird quirk that needs work-arounds that probably won't work very well.

The only way to instantiate objects is scene.addObject() method, put it can only instantiate objects from an inactive layer within the current scene. If you put a very large terrain geometry (in chunks or LOD layers) in an inactive scene, the engine has to load ALL of that geometry into RAM so that it can be later instantiated. And in my case, having around 64 LOD layers of terrain with 1M faces each in an inactive layer makes UPBGE load the scene very slow and uses up so much RAM it is ridiculous.

Yes, of course you can do a huge amount of work to split a LOD layer into itty bitty pieces on the XY-axis and then bge.logic.LibLoad() them over multiple update() cycles, but this technique is not only complex, but also probably has higher overhead than a decent background loader for geometry, even on a single core system. And on a multi-core system the backgroad loading of meshes is going to be faster anyway (and more concise, correct and simpler).

And the most important thing is that a large geometry split into 10K or 100K parts will eventually incur a massive viewport culling overhead, which makes the game grind down to a halt. I have not tested how many objects UPBGE can handle within one scene, but common sense would say not going over a few thousand!

I made a prototype of the layered terrain LOD technique with Panda3D. It was very simple, and I could render HUGE sized terrains with absolutely low specced hardware (one laptop with i3 CPU and Intel HD3000) so that the terrain still looked damn good. And the whole thing used at maximum the RAM that was required to hold in memory 9 layers of LOD at once, which was tolerable (around 130 Megabytes with a each LOD layer having 1M faces or so).

Thus I am only requesting a few things:

Make bge.logic.LibLoad() able to asynchronously load meshes from blends, just like it can do with entire object hierarchies with Scenes. This one improvement would open up great new possibilities for different kinds of games, not only Open World.
Make the objects in async loaded scene use the lighting in the current active scene (instead of using the lighting in the scene they were loaded from)

If either of my suggestions are implemented, UPBGE would be a lot better for large scale projects.

Thanks ...

EDIT: fixed some misunderstandings, LibLoad()ing a scene does load objects in inactive layers, their references are not in scene.objects but instead inactive objects are in scene.objectsInactive

lostscience · 2017-07-13T12:46:13Z

I would like this feature to.This would be great in digging similar to minecraft.It would be great for it work over many scenes.I am have low end hewlet packard.

aksyom · 2017-07-13T13:09:37Z

Actually, all scenes from a blend can be loaded asynchronously and merged into the current scene but there are some weird quirks:

the objects that share active layers with the current scene appear on the object list of the current scene and can be manipulated but
the loaded objects do not react at all to the lighting of the current scene; they only use the lighting from the respective scenes they were loaded from and
the objects from the other blend that are in inactive layers do not get loaded at all

The issue of the async loaded objects not using the lights from the current scene is a bit of a deal breaker for me, even if there is a work around. I don't think this is how this should work.

I thought I solved the issue of loading a scene asynchronously and then using the objects within it in the current scene, and closed the ticket for a short time. But then I realized this doesn't quite work as it should.

So yes, we just need a proper mechanism to merge stuff from other blend files into the active scene on the background.

lostscience · 2017-07-13T17:57:01Z

the sooner the better we get this into the upbge.

BluePrintRandom · 2017-07-13T19:51:06Z

If the bpplayer bidirectional streaming patch is comleted, this should be faster. (load preconverted game data)

…

On Jul 13, 2017 10:57 AM, "lostscience" ***@***.***> wrote: the sooner the better we get this into the upbge. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#533 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AG25WVMyVsks7DQK9rcNwlgrYtDG6kVbks5sNlptgaJpZM4OWOND> .

aksyom · 2017-07-13T20:22:31Z

Okay, but how does that solve the issue of loading a HUGE blob of data at once while the logic cycle waits for the loading to complete (freezing the frame updates)?

At the minimum the data blobs I will be loading are between 10~16 MB in runtime format (assuming certain things, that is). Even if this data is pre-converted on the disk, it still does not remove from the fact that a synchronous loading code needs to load at minimum 10 MB (and with BGE probably even more) data from disk within 1 frame. If you have a very fast SSD you might just do this under 16 ms (which is frame interval on 60 FPS), but on most standard magnetic hard disks (esp. on laptop hard disks) that's just not gonna happen with the seek delay and all that. Let's assume a transfer speed of 100 MB/s, a loading of 10 MB would require 100ms, which is 6.25 frames; that is, if I would load 10MB of data, I would experience a 6.25 dropped frames.

On the other hand, if you have a working asynchronous loading for objects, the logic code does NOT need to wait for a loading operation to complete. The loading will be handled on the background, and there will no impact on frame rate whatsoever.

Honestly, I don't mind even if the data is stored in a format which is slow to convert into runtime. That does not even matter, because if you have a working asynchronous backround loading and conversion for objects, you can split the scene into reasonably sized pieces and then background load them in a calculated priority order. Because each blob is loaded and converted in it's own thread on a separate core, there won't be any lag. I have proven this to work previously on other engines.

I am going to try streaming data in from a blend on a Python thread (or by iteratively loading data in on the main logic thread, say, max 100K chunks per logic cycle), then use LibLoad() on that data when it has been fully loaded. But the problems with this method are:

it still wastes valuable logic time because it runs with it on a single core even on a multicore system
the conversion of that data blob on LibLoad might still take more than 16ms and cause frame lag

My only logical conclusion is that only a truly asynchronous background loading of data actually supports really massive Open World data sets in a way that scales with the hardware. I can't really stress this enough.

BluePrintRandom · 2017-07-14T06:08:17Z

https://www.youtube.com/watch?v=tZOW8msMWk0 :D

…

On Thu, Jul 13, 2017 at 1:22 PM, Arto Pekkanen ***@***.***> wrote: Okay, but how does that solve the issue of loading a HUGE blob of data at once while the logic cycle waits for the loading to complete (freezing the frame updates)? At the minimum the data blobs I will be loading are between 10~16 MB. Even if this data is pre-converted on the disk, it still does not remove from the fact that a synchronous loading code needs to load at minimum 10 MB (and with BGE probably even more) data from disk within 1 frame. If you have a very fast SSD you might just do this under 16 ms (which is frame interval on 60 FPS), but on most standard magnetic hard disks (esp. on laptop hard disks) that's just not gonna happen with the seek delay and all that. On the other hand, if you have a working asynchronous loading for objects, the logic code does NOT need to wait for a loading operation to complete. The loading will be handled on the background, and there will no impact on frame rate whatsoever. Honestly, I don't care even if the data is stored in a format which is slow to convert into runtime. That does not even matter, because if you have a working asynchronous backround loading for objects, you can split the scene into reasonably sized pieces and then background load them in a calculated priority order. I have proven this to work! My only logical conclusion is that only a truly asynchronous background loading of data actually supports really massive Open World data sets. I can't really stress this enough. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#533 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AG25WcEP09bsPJKwbf9r0UeGk-zYIkMKks5sNnyIgaJpZM4OWOND> .

aksyom · 2017-07-14T09:51:22Z

Unless this loading of geometry is done in a separate thread on a separate core then it has nothing to do with what I am asking for.

What I am trying to explain here is the importance of a threaded, asynchronous background loading for serialized assets, be these assets big or small. It is the only solution that provides best performance results on multicore hardware, which is available everywhere these days. Without such a solution you will always get needless performance overhead from loading assets.

And even if you don't care about performance in this regard, there are just certain things you absolutely cannot do if you are dealing with massive terrain meshes with in-game view distance set up to 8 km! Let me explain ...

When your view distance is up to 20km, you also make the terrain mesh span 8km x 8km area. Further, the terrain mesh has been weight-decimated onto multiple versions for each focal point, and each focal point is in the center of a 1024x1024 region. This yields up to 64 terrain meshes, one for each 1024x1024 region Each of these weight-decimated meshes has 1M faces and takes 64-80 MB of disk space on compressed blend file. When the camera moves from region to another, a terrain mesh for that specific region needs to be loaded. In order to load a mesh like this into the game, one cannot simply LibLoad() it as a mesh from blend, because that would freeze the game for a dozen seconds or so! The only working approach for this is to asynchronously load each mesh on the background so that it doesn't freeze the game logic cycles.

But the async loading does not work for mesh objects at all. And async loading a scene does not work, because the loaded objects and meshes do not work with the lighting in the active scene (which is highly illogical in my opinion).

Thus I've tried using Python threading to load the blend data into bytes while yielding between reads (to give CPU time to logic cycles), and then LibLoad() when all is read. But synchronous LibLoad() conversion for a 1M mesh freezes the game! That is because the conversion of Blender data into BGE runtime data is done on the same thread as all the other game engine logic, and the conversion does not yield between discrete operations for some reason.

The only quick and dirty workaround would be to split each terrain mesh small pieces, and then for each mesh read in and LibLoad() each separately just to make sure that each LibLoad() does not incur a frame lag. But to split each mesh into small enough pieces, I would probably have to split them into around 64 or even 256 sub-meshes. And each submesh would need to stored in it's own blend file which is inconvenient.

But having to split large terrain geometry into too many small meshes and loading them separately is a bad idea because:

It takes a lot of extra time to splice the geometry in Blender when the original geometry is even 4M faces. I normally splice a terrain geometry into 8x8 chunks just so that viewport culling works for the mesh. Of course I do it with my own tool script, but even then it takes a long time on my Core-i5 processor. But splicing it further into, say, 64x64 would take exponentially more time. I don't want to waste time unless absolutely necessary.
The more meshes you have in a scene, the more overhead there will be in culling. That is, if I would have an even larger terrain mesh split into 1024 meshes, the 1024 meshes would certainly cause frame lag unless you've a powerful processor. And even if you would not care about this, I don't want to make a game where too much CPU time is spent on things that could be more wisely spent elsewere. This is why my original strategy of splitting the terrain to 8x8 pieces is way better.

Thus, again, we would not need to care about this, if we just had a background loading for meshes. But we don't. And this is why I am asking for this feature.

sdfgeoff · 2017-07-14T13:54:04Z

The only thing that libload async does synchronously is merging the scenes. This is something that cannot be avoided or multithreaded. So far as I know, everything else (loading, decompressing) is done in another thread. But I'm no expert in the behind the scenes for LibLoad.

Request 2

they only use the lighting from the original scenes they were loaded from

If you libload some lights and then load some models, the models are lit by the lights that came before. However, they are not (except for the scenes original objects) lit by lights that came after. Yes, this is likely a bug. One workaround: load a bunch of lights and then position them around. If you've got a loading screen and don't mind adding extra time at startup (doesn't apply here), you can recompile the shaders by toggling one of the GLSL options to ensure all the objects show all the lights. Eg:

        bge.render.setGLSLMaterialSetting("nodes", 0)
        bge.render.setGLSLMaterialSetting("nodes", 1)

Also, since the objects in inactive layers are not loaded at all, one cannot use scene.addObject() to dynamically add/remove objects.

Uhm, I do this all the time. Objects on inactive layers are definitely loaded in my blend files!

For things like terrain, investigate other options than loading in world chunks. Consider separating your level format from the .blend format, and instead representing it as a bunch of textures (eg heighmaps/vectormaps) and transformation matrices for entities. Then you can load all the entities at game start, generate terrain geometry on the fly (eg inside vertex shader for graphical. Probably one big physics mesh re-instanced at game start) and so on - no need to stream things in and out. This is how they did it in Halo Wars
Yes, it doesn't work if things are massive enough, or if you want utter and complete control over every vertex of every object in every part of the world.

aksyom · 2017-07-14T15:24:09Z

If you libload some lights and then load some models, the models are lit by the lights that came before. However, they are not (except for the scenes original objects) lit by lights that came after. Hence: load a bunch of lights and then position them around. Yes, this is likely a bug.

I think this bug should be fixed, because in my experience LibLoad() on a Scene does not work properly. I will put more details on a comment below ...

Uhm, I do this all the time. Objects on inactive layers are definitely loaded in my blend files!

Yeah I realized that the inactive objects also get loaded with async scene loading. I fixed my original writings hopefully to signify this understanding.

For things like terrain, investigate other options than loading in world chunks. Consider separating your level format from the .blend format, and instead representing it as a bunch of textures (eg heighmaps/vectormaps) and transformation matrices for entities. Then you can load all the entities at game start, generate terrain geometry on the fly (eg inside vertex shader for graphical. Probably one big physics mesh re-instanced at game start) and so on - no need to stream things in and out. This is how they did it in Halo Wars
Yes, it doesn't work if things are massive enough, or if you want utter and complete control over every vertex of every object in every part of the world.

There are a few things that make me rather averse to making custom shaders as of now. The GLSL facing interface in UPBGE lacks reference documentation upon which to build. In order for me to generate a shader I would need a shader API which provides engine specific helper functions for adding/mixing lighting, shadow mapping, environmental cube mapping and whatnot. This is because if I would make a terrain shader now, lacking API specs, I could probably only implement a Phong-shaded surface with no shadows or anything else. Later I could implement shading and shadow stuff myself if I figure out how to use the custom shader binding routines, but I have no guarantees that the terrain would shade the same as other objects using non-custom shaders because I've no idea on how these different shading models in Blender are implemented.

Even if I'd go with the custom shader route, and I somehow figured out all the maths behind this problem domain, there is a rather hard limit on how massive geometries would work. In order for me to deform the vertices on the 1024x1024 vertices grid, I must keep the grid mesh in one piece. Having a single, 1M faces grid mesh on a scene is bad, because it cannot be culled and all the 1M faces need to be rendered over and over again by the GPU. On a decent GPU, not a problem, but on a crap-tier Intel/Radeon HD, no chance. If the terrain mesh was split into, say, 8x8 submeshes, the viewport culling would make sure that only part of these meshes need be rendered at once.

AFAIK there are only two correct ways of doing terrain shading on the GPU:

Pre-generates grids with resolutions from lowest to highest, sends these grids to geometry shader and let's the GS merge and deform them into a single output mesh. That's how one guy did it with Panda3D.
Create a single, low resolution grid, send it to a tessellation shader with a heightmap, and tessellate & deform the mesh according to the heightmap.

I don't know if these optimal solutions could be done on UPBGE. But even if they could I have no experience of how to do either. I just repeat on rote what others have said, but I've no idea about the mathematics behind these optimal solutions.

And the funniest thing is that pre-generated, focal weight-decimated terrain meshes are the most optimal solution GPU wise, because they do not require any extra FLOPS on the GPU side. Just split the mesh into 8x8 or 16x16 and viewport culling do it's job. Switch mesh to another (with asynchronous background loading) depending on camera position. Modern hard disks are able to output approximately 100 MB/s of streaming data, not a bottleneck, since the mesh will be switched to another only when the other mesh is loaded (ie. lazi loading). If the HD is really slow, you will only notice a small drop in terrain quality, and that happens only for a moment until the mesh for a region has loaded. The only thing to keep in mind is that you need to have 2x mesh size worth of VRAM to do the switch, and 9x worth the mesh size of RAM to hold all the meshes for surrounding regions cached. The only downside is that it requires more VRAM and RAM, but this is not really an issue if you use a minimal vertex array format (16 bytes per vertex, XYX + normal, no color or any other useless stuff).

As for storing transformation matrices in separate file, I think you can just fill your world with Empty -objects and parent spawned objects on them. And if too many empties cause overhead, I can just partition the world into regions and merge each region's empties from a different blend file when needed. The advantage with this is that I can temporarily link the empties from an external blend if I need to see them in viewport in order to get a bigger picture. You can't do that with matrices stored in a text file, well, unless you write a plugin to blender which reads these text files and places empties on the scene based on whatever ... but that's outside the scope of my project for now. I am not going to create a level editor for now, Blender shall do and thus I'll use whatever is already available.

aksyom · 2017-07-14T16:59:55Z

Okay, I have now tested LibLoad() on Scenes, and in my experience it does not work in any way that I could work around. I will provide you with a set of 3 blend files, where the libload_test.blend is the main file:
https://drive.google.com/open?id=0B3u1MJ_t35aQelJJWUFYUmROb0U

When I run the libload_test.blend, it has an empty scene and a component script on an Empty which is supposed to do 2 things:

load scenes from a set of blends synchronously, and after that
load scenes from another set of blends asynchronously

In this case the libload_test.blend has been set (via component arguments) to first load libload_test_lights.blend sync, and after that load libload_test_sphere.blend async.

After libload_test_sphere.blend has been loaded, I will also do:
bge.render.setGLSLMaterialSetting("nodes", 0)
bge.render.setGLSLMaterialSetting("nodes", 1)

... to recompile shaders hoping that this will make light affect objects.

If @sdfgeoff explained correctly, then all the objects from libload_test_sphere.blend should be lighted by the lights loaded previously from libload_test_lights.blend. But this is not the case.

When libload_test.blend runs, it loads the lights scene and then the sphere scene. However, the light only affects the plane -object that is part of the lights scene, and all the objects in the main scene. The loaded light does not affect the sphere from the sphere scene, which is loaded after the lights scene.

Curiously, if I do not force recompile of shaders via the trick mentioned above, the loaded lights affect only the plane object that is in the lights scene.

And an absolute deal breaker with this method, even if it would work, is that none of the lights loaded from a library cast shadows, even if I have explicitly enabled shadows on the lights. I had noticed this discrepancy earlier, but did not report it because I can work around that.

Thus, the problem is this:

lights LibLoaded() from external scene do not cast shadows, even though they can be forced to light objects in the current scene; if I am to use lights LibLoaded() separately, I absolutely need them to cast shadows
objects LibLoaded() from an external scene do not react to any lights other than those that are part of their original scene

Just try out the blend files I posted here you'll see what I mean.

So, I would humbly plead somebody to fix this bug, and make lights affect the objects LibLoaded() from an external file's scenes. It would greatly improve the overall applicability of UPBGE. Until these things are fixed, and/or until I get a good documentation on the UPBGE shader API, I will continue by implementing a split/chunked terrain LOD merger algorithm as I can't figure out any other way on how to continue from here.

EDIT: the UPBGE version pulled at 2017-07-16 crashes when I do bge.render.setGLSLMaterialSetting("nodes", 0), this is definitely a bug because it worked on previous version I built from older sources.

Previously only scene supported asynchronous libloading. But the loading of meshes and scenes are similar in the point that they both use a scene converter but with a different procedure on the data to convert and register to this scene converter. This commit introduce a more flexible KX_LibLoadStatus with the usage of a lambda function which receive as argument one of the scene converted listed in KX_LibLoadStatus and process the conversion. This lambda is created in BL_Converter::LinkBlendFile in the same time than a list of scenes used to create the scene converters is built. KX_LibLoadStatus is constructed passing the function and the scene list. BL_Converter also replaced the usage of blender task scheduler by TBB. A tbb::task_group and a std::mutex is now hold. The function BL_Converter::ConvertLibraryTask is in charge to call the conversion function and this function can be called by a task group or manually which help to reuse code. As before BL_Converter::LinkBlendFile, is reponsible to do a direct conversion or launch a conversion asynchronous task. Fix issue: #533.

lordloki · 2020-05-03T13:03:05Z

Closing as new tracker rules but open and feature for later tags added.

aksyom closed this as completed Jul 13, 2017

aksyom reopened this Jul 13, 2017

panzergame added the feature request label May 26, 2018

panzergame mentioned this issue Oct 20, 2018

UPBGE: Allow asynchronous mesh libload. #897

Open

lordloki added feature request for later open and removed feature request labels May 3, 2020

lordloki closed this as completed May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: asynchronous loading of mesh objects #533

Feature request: asynchronous loading of mesh objects #533

aksyom commented Jul 12, 2017 •

edited

lostscience commented Jul 13, 2017

aksyom commented Jul 13, 2017 •

edited

lostscience commented Jul 13, 2017

BluePrintRandom commented Jul 13, 2017 via email

aksyom commented Jul 13, 2017 •

edited

BluePrintRandom commented Jul 14, 2017 via email

aksyom commented Jul 14, 2017 •

edited

sdfgeoff commented Jul 14, 2017 •

edited

aksyom commented Jul 14, 2017 •

edited

aksyom commented Jul 14, 2017 •

edited

lordloki commented May 3, 2020

Feature request: asynchronous loading of mesh objects #533

Feature request: asynchronous loading of mesh objects #533

Comments

aksyom commented Jul 12, 2017 • edited

lostscience commented Jul 13, 2017

aksyom commented Jul 13, 2017 • edited

lostscience commented Jul 13, 2017

BluePrintRandom commented Jul 13, 2017 via email

aksyom commented Jul 13, 2017 • edited

BluePrintRandom commented Jul 14, 2017 via email

aksyom commented Jul 14, 2017 • edited

sdfgeoff commented Jul 14, 2017 • edited

aksyom commented Jul 14, 2017 • edited

aksyom commented Jul 14, 2017 • edited

lordloki commented May 3, 2020

aksyom commented Jul 12, 2017 •

edited

aksyom commented Jul 13, 2017 •

edited

aksyom commented Jul 13, 2017 •

edited

aksyom commented Jul 14, 2017 •

edited

sdfgeoff commented Jul 14, 2017 •

edited

aksyom commented Jul 14, 2017 •

edited

aksyom commented Jul 14, 2017 •

edited