Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asynchronous processing option via coroutines #6

Closed
wants to merge 6 commits into from

Conversation

oratory
Copy link

@oratory oratory commented Apr 12, 2021

This adds an async=true config option to coroutine.yield() in the compress and decompress functions. This removes or reduces framerate loss while processing large strings.

@SafeteeWoW
Copy link
Owner

coroutine.yield() is not preferred solution.

I would like a DecompressBegin/DecompressContinue/DecompressEnd style API.
So the decompress could be done by chunk.

@SafeteeWoW
Copy link
Owner

I agree coroutine is an easy way.
If you can provide me an example that a WoW addon really need this feature,
and add some tests in tests/Test.lua, which covers a new code in LibDeflate.lua,
and pass the CI, without affecting backward compatibility,
I may merge this.

@@ -92,6 +92,39 @@ local decompress_deflate_with_level = LibDeflate:DecompressDeflate(
compress_deflate_with_level)
assert(decompress_deflate_with_level == example_input)

-- Compress with async coroutine, within World of Warcraft
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes to examples/example.lua does not work outside World of Warcraft.
You need to ensure WoW specific code does not run out of WoW.
This is not a WoW specific library

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to use WoW API here.
Just use coroutine as an example

@@ -2732,13 +2740,14 @@ end
-- If the decompression fails (The first return value of this function is nil),
-- this return value is undefined.
-- @see LibDeflate:CompressDeflate
function LibDeflate:DecompressDeflate(str)
local arg_valid, arg_err = IsValidArguments(str)
function LibDeflate:DecompressDeflate(str, configs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some tests in tests/Test.lua, which should covers all new code in LibDeflate.lua

LibDeflate.lua Outdated
local arg_valid, arg_err = IsValidArguments(str)
function LibDeflate:DecompressDeflate(str, configs)
if not configs then configs = {} end
local arg_valid, arg_err = IsValidArguments(str, false, nil, true, configs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not reuse "IsValidArguments", which is for Compress

@SafeteeWoW
Copy link
Owner

For completeness, add coroutine support for all Compress/Decompress APIs

@SafeteeWoW
Copy link
Owner

SafeteeWoW commented Apr 12, 2021

I guess this is for WeakAuras during Import?
How much framerate improvement by adding this feature?

@oratory
Copy link
Author

oratory commented Apr 12, 2021

I guess this is for WeakAuras during Import?

WeakAuras might use it but I primarily made these changes for a WagoLib I'm making to easily let any addon add Wago.io support.

How much framerate improvement by adding this feature?

This isn't exactly comprehensive but here are in-game results with a fairly large 3.5MB table test:

Async

Serialize Time: 0.931 seconds
Compress Time: 14.345 seconds
Average 7.7 FPS over 15.276 seconds

Sync

All one frame so GetTime() does not update, but with a stopwatch I timed the combined serialize+compress process to 15.17 seconds.
Game is locked for 15.17 seconds

@rossnichols
Copy link

@oratory if you want to measure performance of the synchronous version, take a look at debugprofilestart() and debugprofilestop() APIs.

@SafeteeWoW
Copy link
Owner

@oratory I plan to make a new version this week to merge your change. Need to add some tests for code coverage

@oratory
Copy link
Author

oratory commented Apr 13, 2021

Great I'll make some updates and the tests today or tomorrow.

@oratory
Copy link
Author

oratory commented Apr 18, 2021

I've cleaned up the lib and removed the wow-specific stuff. Also added in a handful async tests and all are working.

@SafeteeWoW
Copy link
Owner

  1. No time to make a release this week. Hopefully next week.
  2. I think the name "async" is not an appropriate name, especially after your most recent change.
    Maybe "partitioned"? You decide it, but "async" is misleading

LibDeflate.lua Outdated

local isWoW = function()
return CreateFrame and true or false
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove added WoW specific codes

@SafeteeWoW
Copy link
Owner

Please help to test if "async" option significantly decrease compression/decompression speed, in WoW.
If that's the case, you may need to add "chunk_size" in config.
I am unable to test in game at the moment

@SafeteeWoW
Copy link
Owner

SafeteeWoW commented Apr 18, 2021

Will change tab indentation to 4space indentation later.
But for this PR, please remove all format only line of changes

P.S. This lib uses tab indent because WoW official Lua code uses tab when this lib first releases

LibDeflate.lua Outdated
end
return CompressZlibInternal(str, dictionary, configs, callback or true)
end

Copy link
Owner

@SafeteeWoW SafeteeWoW Apr 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this function used to?

@SafeteeWoW
Copy link
Owner

If you add new public API, modify TestExported in Test.lua
This is why CI fails

LibDeflate.lua Outdated
@@ -105,7 +105,7 @@ do
-- 1 : v1.0.0
-- 2 : v1.0.1
-- 3 : v1.0.2
local _MINOR = 3
local _MINOR = 4

Copy link
Owner

@SafeteeWoW SafeteeWoW Apr 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not modify version in PR. I will handle this

LibDeflate.lua Outdated
local total_bitlen, result = FlushWriter(_FLUSH_MODE_OUTPUT)
local padding_bitlen = (8-total_bitlen%8)%8
return result, padding_bitlen
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to remove duplicate code

LibDeflate.lua Outdated
local bitlen_left = state.ReaderBitlenLeft()
local bytelen_left = (bitlen_left - bitlen_left % 8) / 8
return result, bytelen_left
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to remove duplicate code

@oratory
Copy link
Author

oratory commented Apr 18, 2021

Please help to test if "async" option significantly decrease compression/decompression speed, in WoW.
If that's the case, you may need to add "chunk_size" in config.

Will do. I think a "max_time_spent_per_chunk" might be better in this case but I suppose I could add both options.

@SafeteeWoW
Copy link
Owner

I don't like max_time_spent_per_chunk, which involves time measurement, which is not stable.

local decompress_resume, decompress_complete = LibDeflate:DecompressDeflate(compress_deflate, {async=true})
repeat until not decompress_resume()
local decompress_async = decompress_complete()
assert(decompress_async == example_input)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think two functions need to be returned. One function should be enough.
Note: Do not start (de)compression on direct calls to (De)CompressDeflate in async mode

@SafeteeWoW
Copy link
Owner

Squash your commits into single commit, and use git to remove all format-only lines of changes.

@oratory
Copy link
Author

oratory commented Apr 26, 2021

I changed things around a bit. Instead of being a typical async function, I'm calling it Chunks Mode which returns a handler which gets called on repeat until processing is finished.

Some rough benchmarks

Seralize & Compress

Small table: 115 Bytes

Standard Chunks Mode
Serialize: 0.21 ms Serialize: 1.19 ms
Compress: 1.30 ms Compress: 10.59 ms
Total: 1.51 ms Total: 11.78 ms

Medium Table: 163584 Bytes

Standard Chunks Mode
Serialize: 10 ms Serialize: 12 ms
Compress: 579 ms Compress: 674 ms
Total: 589 ms Total: 686 ms

Large Table: 1750073 Bytes

Standard Chunks Mode
Serialize: 560 ms Serialize: 609 ms
Compress: 7578 ms Compress: 8323 ms
Total: 8128 ms Total: 8932 ms

X-Large Table: 3511073 Bytes

Standard Chunks Mode
Serialize: 1130 ms Serialize: 1223 ms
Compress: 16489 ms Compress: 16700 ms
Total: 17589 ms Total: 17923 ms

Decompress and Deserialize

Small table: 115 Bytes

Standard Chunks Mode
Decompress: 0.24 ms Decompress: 9 ms
Deserialize: 0.10 ms Deserialize: 11 ms
Total: 0.34 ms Total: 20 ms

Medium Table: 163584 Bytes

Standard Chunks Mode
Decompress: 192 ms Decompress: 264 ms
Deserialize: 16 ms Deserialize: 27 ms
Total: 208 ms Total: 291 ms

Large Table: 1750073 Bytes

Standard Chunks Mode
Decompress: 3094 ms Decompress: 3668 ms
Deserialize: 574 ms Deserialize: 772 ms
Total: 3668 ms Total: 4440 ms

X-Large Table: 3511073 Bytes

Standard Chunks Mode
Decompress: 6308 ms Decompress: 7465 ms
Deserialize: 1074 ms Deserialize: 1536 ms
Total: 7382 ms Total: 9001 ms

Chunks Mode does have a small increase in processing time but has the benefit (in WoW and likely other environments) of not locking up the game and making the OS think it's non-responsive when processing large data.

Similar changes made on my LibSerialize PR. rossnichols/LibSerialize#4

@SafeteeWoW
Copy link
Owner

Thanks. I get vocation on May 1-5. Will merge your PR during that period

@SafeteeWoW
Copy link
Owner

  1. Reworked CI using Github workflows. Much faster than Travis and appveyor
  2. Read the comments in Add async serialize option via coroutines rossnichols/LibSerialize#4.
    Should add separate API for chunk mode because it is fundamentally different vs default mode.
    Plan to work on this during this weekend.

@SafeteeWoW
Copy link
Owner

I would keep the chunk_size option, but remove chunk_time option.
It's better for the user to measure time outside of this library

@rossnichols
Copy link

I would keep the chunk_size option, but remove chunk_time option.
It's better for the user to measure time outside of this library

I had asked him to add it for LibSerialize - conceptually, yielding by size/count doesn't achieve the goal of the feature, which is to minimize impact on frame rate. With size/count, a given operation will always be divided into the same number of chunks, whereas a time based approach allows for fluctuation based on CPU performance.

Maybe they can be combined into a single option though? The user supplies a function "shouldYield(count)" and can choose to implement it as "count > threshold" or "elapsed time > threshold". The library resets the count and yields when the function returns true.

@SafeteeWoW
Copy link
Owner

  1. I do not want to measure time in this library, which is not stable.

  2. I don't think the library needs to measure the time. User can do this

The example in README of this PR

-- In WoW, coroutine processing can be handled on frame updates to reduce loss of framerate.
local processing = CreateFrame('Frame')
local WoW_compress_co = LibDeflate:CompressDeflate(example_input, {level=9, chunksMode=true})
processing:SetScript('OnUpdate', function()
  local ongoing, WoW_compressed = WoW_compress_co()
  if not ongoing then
    -- do something with `WoW_compressed`
    processing:SetScript('OnUpdate', nil)
  end
end)

If time is a concern, change the code to

local processing = CreateFrame('Frame')
local WoW_compress_co = LibDeflate:CompressDeflate(example_input, {level=9, chunksMode=true})
time_threshold=1.0/120
processing:SetScript('OnUpdate', function()
  local ongoing, WoW_compressed
  start = os.clock()
  while (not ongoing) and (os.clock() - start < time_threshold) do
    ongoing, WoW_compressed = WoW_compress_co()
  end
  if not ongoing then
    -- do something with `WoW_compressed`
    processing:SetScript('OnUpdate', nil)
  end
end)

@SafeteeWoW
Copy link
Owner

time-based chunk will only be provided if it is much better than the alternative of size-based chunk in a while loop,
as I show above

@SafeteeWoW
Copy link
Owner

Will decide whether to add time option after I have completed code and done a benchmark

@SafeteeWoW
Copy link
Owner

SafeteeWoW commented May 6, 2021

A handler function would be better than a time option

configs = {
chunk_size -- integer
chunk_yield_handler --function to control whether to yield. nil to use the default.
chunk_yield_handler_userdata --user provided data to handler
}

The prototype of chunk_yield_handler is:

function handler(userdata, libdata)
-- userdata is user provided data in configs.chunk_yield_handler_userdata
-- libdata is a table. Currently only contains one field: num_bytes_processed
-- return true to yield
-- return false to not yield
end

The default handler return true for every chunk_size

@SafeteeWoW
Copy link
Owner

I can see that coroutine yield may be costly, so I will consider the handler approach.

@rossnichols
Copy link

I wasn't suggesting that the library itself measure time - that's why I said that the user-supplied handler is the entity that measures elapsed time. Your example code above doesn't actually help time measurement, because it's WoW_compress_co() that needs its execution to be time-gated, not the code that calls it (unless you just specify a super small chunk size and yield much more often, returning control to the outer handler, but that'll increase the coroutine overhead).

I agree that a handler-based approach seems the most flexible and allows the library code to be generic. I don't necessarily see a value to returning nil to use the "default" - rather, just have the library provide a default handler that yields e.g. based on the number of bytes. Then you can specify config to either:

  1. Provide a handler that yields based on completely different criteria (e.g. time), or
  2. Provide an updated default byte count that the default handler uses when deciding to yield.

@SafeteeWoW
Copy link
Owner

Recently very busy and no time to work on this. Hopefully I can work on this on this weekend.

@SafeteeWoW
Copy link
Owner

Currently not maintaining. Will pick up if I would like to

@SafeteeWoW SafeteeWoW closed this Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants