Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: Use more performant JSON libraries #155

Open
cheahjs opened this issue Feb 21, 2024 · 3 comments
Open

Investigation: Use more performant JSON libraries #155

cheahjs opened this issue Feb 21, 2024 · 3 comments
Labels
performance Issue related to improving performance of the library

Comments

@cheahjs
Copy link
Owner

cheahjs commented Feb 21, 2024

The python json stdlib is not known for it's speed, and JSON dumping accounts for a significant fraction of the time taken to convert a save file to JSON.

Investigate if it is feasible to use some of the higher performance libraries such as orjson or ujson, and how much of an improvement it is.

It is not a general solution - most alternative JSON libraries require strict UTF-8 compliance, which Unreal's treatment of UTF-16 as arbitrary 16-bit chars is incompatible with - currently surrogatepass is used to encode non-valid characters into surrogate pairs, but this is not possible in a UTF-8 only environment.

@cheahjs cheahjs added the performance Issue related to improving performance of the library label Feb 21, 2024
@AntiMoron
Copy link

There are still some things that we can leverage to optimize the JSON output part.
According to my observation, the following things can be done to compress it further:

  1. even with '--minify-json' on, there are still spaces. Remove them.
    image

  2. the json follows a pattern of the following:

  • value wrapped by: 'values', 'value', 'RawData','object'. remove those meaningless wraps.
  • many uuid like '00000000-0000-0000-0000-000000000000', change those to null, even further, just not output this field.
  • dont output any fields whose value is null.
    image
  • field name compression : just provide us a map, and replace origin keys, e.g.: type -> t, value -> v, values -> vs
  • make it stream, it's not rendered as a stream, which surely takes a lot of RAM usage. Since we are generating a JSON, we can do that by concating strings, or use some library(I'm not good at python, I don't know any).

After these, it should be very fast, and the output file should be smaller, around hundreds of MB.

@AntiMoron
Copy link

Also, really thanks the good work of analysing and parsing, currently resource usage and parsing speed matters, on dedicated servers, I have to set the --cpus=0.5 in docker to make sure the server won't be down while proceeding the save file.
So please consider to do it in some faster language like C++ or Rust.

@cheahjs
Copy link
Owner Author

cheahjs commented Feb 26, 2024

I will point you to a previous comment I made about building this in a different language: #83 (comment)

I will not make any changes to the JSON output at this time, as a long enough time has passed that there will be significant downstream impact on existing users.

If you need faster performance:

  1. Don't output to JSON and operate on the Python dictionary directly
  2. Use https://github.com/magicbear/palworld-server-toolkit which has implemented various optimisations on top of this library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issue related to improving performance of the library
Projects
None yet
Development

No branches or pull requests

2 participants