Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java heap space OOM error and multifile processing #14

Closed
mwussow opened this issue Jun 5, 2020 · 3 comments
Closed

Java heap space OOM error and multifile processing #14

mwussow opened this issue Jun 5, 2020 · 3 comments

Comments

@mwussow
Copy link

mwussow commented Jun 5, 2020

Dear Claus,

thanks for this great tool!
There are two errors that I frequently encounter when using citygml-tools:

1) Running out of java heap space memory:
I have 8GB RAM on my machine and yet I often run OOM when using citygml-tools. I already increased the default heap space with export _JAVA_OPTIONS="-Xmx6g" and this helped to make this error less common, but I still encounter it when processing large files (i.e., >3GB) or multiple files (i.e., 100+ files).

2) File sizes blow up when processing multiple files at once
I am trying to convert ~3k GML files (each ~100MB) to CityJSON. While it seems to be possible to convert several files at once by providing the path to the folder where they are saved, tghe file sizes of subsequent fiels increase linearly, which leads to dramatically oversized json files when processing 100+ files at once.

I would highly appreciate any advice on how to process multiple files efficiently and any hints on how to fix the above issues.

Thanks,
Moritz

@clausnagel
Copy link
Member

Thanks for your feedback.

  1. CityGML files currently must be loaded into main memory to be able to convert them to CityJSON. A chunk-wise processing would help to keep the memory footprint low. For example, reading only one cityObjectMember at a time and directly writing it to the CityJSON target file. However, CityJSON currently does not well support chunk-wise processing (see Chunk-wise parsing/writing of CityObjects cityjson/specs#6). But the editors are aware of this issue, and there are first proposals for solving it.

    In the meantime, to avoid OOM errors, you should only run citygml-tools on small enough CityGML files that can be loaded into main memory.

  2. This sounds like a bug. It should, of course, be possible to convert a folder of CityGML files in one run if each file can be loaded into main memory. Seems like there is a memory leak in the code...

I will look into 2 and report back soon. If you are able to share your datasets, I'm happy to use them in my tests.

@mwussow
Copy link
Author

mwussow commented Jun 5, 2020

Thanks for your prompt reply! A workaround that seems to work for me is to run a python script that executes citygml-tools for each file individually:

import os
from tqdm import tqdm
files = os.listdir()
path = [path_to_folder]

t = tqdm(files)
for f in t:
    if f[-4:] == '.gml':
        command = '[path_to_citygml-tools]/citygml-tools-1.3.2/bin/citygml-tools to-cityjson ' + path + f
        os.system(command)
print('done')

@clausnagel
Copy link
Member

Ok, I fixed the following issues:

  • Fixed memory leak of the to-cityjson command when converting multiple files in one run
  • Fixed increasing CityJSON file sizes when running the to-cityjson command on multiple files

Both fixes are available in the master branch (aae72a2). Could you please build a new version of citygml-tools from master and test whether the fixes solve your issues? Let me know if you need help with building citygml-tools from source.

@mwussow mwussow closed this as completed Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants