Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considering smaller format for control_data #599

Closed
kolibril13 opened this issue Oct 22, 2020 · 7 comments · Fixed by #623
Closed

Considering smaller format for control_data #599

kolibril13 opened this issue Oct 22, 2020 · 7 comments · Fixed by #623
Labels
testing Anything related to testing the library
Projects

Comments

@kolibril13
Copy link
Member

I just saw that your control_data folder already has a size of > 130 MB, with 1.6 MB per file in the .npy format, which is huge!
space
In comparison, a png file rendered in low quality has only around 0.01 MB per file.
So what I want to say: On the long run, probably we should change to another format for the test of comparison data, otherwise the repo might grow quite large.
An svg file would have around 0.004 MB, so in case of manim should support SVG output one day, I'd say we can then switch to that format for the test control_data.
One more thing to note: there is software to clean up the git history and delete certain big files from it while not breaking other things, so we don't have to worry too much right now.

@kolibril13 kolibril13 added the testing Anything related to testing the library label Oct 22, 2020
@kolibril13 kolibril13 added this to New tests request : in Tests Oct 22, 2020
@leotrs
Copy link
Contributor

leotrs commented Oct 23, 2020

Good catch. cc @ManimCommunity/tests

We can't output SVG now, and something tells me that this will become an actual problem way before we output to SVG.

The problem with PNG is that we can't compare that to anything that we have in-memory.

All I can think of now is make the test animations smaller than usual, and maybe even remove the alpha channel. This way we could immediately reduce the size by half.

@huguesdevimeux
Copy link
Member

Thanks for pointing out this.
As @leotrs said, I don't see any better option. We could use hash instead, but it would make the tests very very hard to debug, which is bad.
About using svg, I'm not certain if it would be possible, as the essence of the tests is to compare the raw frame (before any post processing, compression, etc..).
If there is an optimisation to be done, maybe it's the alpha channel, as proposed.
it's kind of important to find a solution if we want to implement more reobust graphical unit tests that would test several frame of a movie instead of the last one (as it is now).

If you come up with new ideas, feel free!

@kolibril13
Copy link
Member Author

I have an idea, we can use compressed NumPy arrays.
As we have lots of monochrome spaces and only a few colours that we use in our scenes, instead of np.save, we can use np.savez_compressed!

import numpy as np

ar=np.load("GrowFromCenterTest.npy")
np.savez_compressed('test', my_ar = ar)
loaded_ar = np.load('test.npz')["my_ar"]
print(np.array_equal(ar, loaded_ar))

The output of this script is "True".
And the size of the array went down from 1.6MB to 2,5 kB!
This could be also of use when testing the videos. np.savez_compressed works a little bit like a dict, were more then one array can be compressed simultaneously:
np.savez_compressed('/tmp/123', my_ar1=ar1, my_ar2=ar2, my_ar3 = ar3 )

@leotrs
Copy link
Contributor

leotrs commented Oct 23, 2020

Compressed arrays is a good idea!

Another idea is to store everything in grayscale. You can go from four channels (RGBA) to just one (grayscale). If you reduce the size of each test to half the number of pixels, that's a total of 8x reduction.

@huguesdevimeux
Copy link
Member

This seems good to me.
I think we'll use this feature of compressing sceveral arrays when out graphival tests will use several frames.

@kolibril13
Copy link
Member Author

See #618 for further discussions.

@behackl
Copy link
Member

behackl commented Oct 26, 2020

I would really like to see us using compressed control data as soon as possible. Could we talk about implementing this as soon as possible? If it helps (and noone else wants to do it), I can PR this later today.

Rewriting git history is an extremely delicate process that requires a huge amount of coordination (basically, the fewer open PRs we have at that point the better; and everyone will have to adapt their local clones as well). Changing to compressed data sooner rather than later is a good idea to make sure we keep the repo size manageable until we actually rewrite the history.

@behackl behackl linked a pull request Oct 26, 2020 that will close this issue
1 task
Tests automation moved this from New tests request : to Done Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Anything related to testing the library
Projects
Tests
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants