-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit of sprite atlasmaker. #137
Conversation
Thanks Junjay! I'm taking a look. |
…rate list of images from wikipedia for testing, Misc other minor bug fixes as well.
Following up from last week, here are the updates in the latest commit:
Additionally, I added a utility in the 'utils' subdir patterned on your shell script that retrieves a list of featured images from wikipedia that can be used for testing. Running 1000 images through my mac pro (dual core) finished in under 8 minutes, with 12 images failed due to their not being PIL recognized image formats (webm, svg, etc). Please let me how it's working for you. |
Update: the error when running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work here Junjay! Couple of comments. I'm excited to start using this tool. I think it'll help a lot.
|
||
```sh | ||
# Create temp dir to hold outputs if it doesn't exist (optional) | ||
mkdir $PWD/outputs/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you skip this step, the montage process works, but then fails at the end with a Python stacktrace ending with:
IOError: [Errno 2] No such file or directory: '</path/to/pwd>/outputs/spriteatlas.png'
Would be nice to fail fast and discover this missing dir before making the sprite atlas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to create dir if not exists and also verify we can write to the output dir. Fails fast otherwise.
Example usage for getting a list of 1000 images with debug messages printed to stdout: | ||
|
||
```sh | ||
bazel run :wikipedia_sourcelist_generator -- --num_images=1000 --outputdir=$PWD --verbosity=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sourcelist generator worked fine for creating a long list, thanks! Was able to make a list of 10k images.
However, when I tried to generate an atlas with these 10k images, it seemed to be doing OK for 15 minutes and then blew up. I'm not sure why, but here's the top and bottom of the very long stack:
[Parallel(n_jobs=-1)]: Done 2576 tasks | elapsed: 14.6min
E0629 14:57:10.262512 140735762211712 parallelize.py:33] Retrieval of file https://upload.wikimedia.org/wikipedia/commons/2/2a/Coat_of_arms_of_Mexico.svg failed with error: cannot identify image file <_io.BytesIO object at 0x109333d10>
E0629 14:57:11.501877 140735762211712 parallelize.py:33] Retrieval of file https://upload.wikimedia.org/wikipedia/commons/d/d4/Coat_of_Arms_of_Siam_%281873-1910%29.svg failed with error: cannot identify image file <_io.BytesIO object at 0x1093286b0>
E0629 14:57:12.611440 140735762211712 parallelize.py:33] Retrieval of file https://upload.wikimedia.org/wikipedia/commons/a/a1/Coat_of_arms_of_Trinidad_and_Tobago.svg failed with error: cannot identify image file <_io.BytesIO object at 0x109333e30>
E0629 14:57:15.713234 140735762211712 parallelize.py:33] Retrieval of file https://upload.wikimedia.org/wikipedia/commons/d/d6/College_of_Arms-Lant%27s_Roll.svg failed with error: cannot identify image file <_io.BytesIO object at 0x116457710>
E0629 14:57:39.827016 140735762211712 parallelize.py:33] Retrieval of file https://upload.wikimedia.org/wikipedia/commons/6/62/Cheetahs_on_the_Edge_%28Director%27s_Cut%29.ogv failed with error: cannot identify image file <_io.BytesIO object at 0x1166dec50>
E0629 14:59:09.965570 140735762211712 parallelize.py:33] Retrieval of file https://upload.wikimedia.org/wikipedia/commons/c/c0/Big_Buck_Bunny_4K.webm failed with error: cannot identify image file <_io.BytesIO object at 0x109335b30>
Joblib's exception reporting continues...
Traceback (most recent call last):
File "/private/var/tmp/_bazel_jimbo/bdc924cd5f9f9e5884088ec4eae7b2c1/execroot/ai_google_pair_facets/bazel-out/darwin-fastbuild/bin/facets_atlasmaker/atlasmaker.runfiles/ai_google_pair_facets/facets_atlasmaker/atlasmaker.py", line 182, in <module>
app.run(main)
File "/Users/jimbo/facets-env/lib/python2.7/site-packages/absl/app.py", line 274, in run
_run_main(main, args)
File "/Users/jimbo/facets-env/lib/python2.7/site-packages/absl/app.py", line 238, in _run_main
sys.exit(main(argv))
File "/private/var/tmp/_bazel_jimbo/bdc924cd5f9f9e5884088ec4eae7b2c1/execroot/ai_google_pair_facets/bazel-out/darwin-fastbuild/bin/facets_atlasmaker/atlasmaker.runfiles/ai_google_pair_facets/facets_atlasmaker/atlasmaker.py", line 164, in main
n_jobs=FLAGS.num_parallel_jobs, verbose=FLAGS.parallelization_verbosity)
File "/Users/jimbo/checkouts/facets/facets_atlasmaker/parallelize.py", line 70, in get_and_convert_images_parallel
for location in image_src_locations)
File "/Users/jimbo/facets-env/lib/python2.7/site-packages/joblib/parallel.py", line 789, in __call__
self.retrieve()
File "/Users/jimbo/facets-env/lib/python2.7/site-packages/joblib/parallel.py", line 740, in retrieve
raise exception
joblib.my_exceptions.JoblibIOError: JoblibIOError
___________________________________________________________________________
Multiprocessing exception:
...
...........................................................................
/Users/jimbo/facets-env/lib/python2.7/site-packages/PIL/Image.py in resize(self=<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=648x432>, size=(80, 53), resample=1, box=(0, 0, 648, 432))
1742 return self.convert('La').resize(size, resample, box).convert('LA')
1743
1744 if self.mode == 'RGBA':
1745 return self.convert('RGBa').resize(size, resample, box).convert('RGBA')
1746
-> 1747 self.load()
self.load = <bound method JpegImageFile.load of <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=648x432>>
1748
1749 return self._new(self.im.resize(size, resample, box))
1750
1751 def rotate(self, angle, resample=NEAREST, expand=0, center=None,
...........................................................................
/Users/jimbo/facets-env/lib/python2.7/site-packages/PIL/ImageFile.py in load(self=<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=648x432>)
223 if LOAD_TRUNCATED_IMAGES:
224 break
225 else:
226 self.tile = []
227 raise IOError("image file is truncated "
--> 228 "(%d bytes not processed)" % len(b))
b = '\x95^\xb0V\xb6\x15\xb9EC\xd6'
229
230 b = b + s
231 n, err_code = decoder.decode(b)
232 if n < 0:
IOError: image file is truncated (10 bytes not processed)
___________________________________________________________________________
You don't have to fix this, just pasting in case it's something obvious to you. I'm running atlasmaker again with 1,000 images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1,000 image atlasmaker worked great! Here's the end of the log:
[Parallel(n_jobs=-1)]: Done 952 tasks | elapsed: 5.6min
[Parallel(n_jobs=-1)]: Done 1000 out of 1000 | elapsed: 5.8min finished
I0629 15:12:59.930185 140735762211712 montage.py:175] Montaged 1000 images onto sprite atlas of size (2560, 2560) pixels.
W0629 15:12:59.930639 140735762211712 montage.py:178] 12 images had failures and were replaced by the default image
I liked that it told me how many images failed. Here's the top of the manifest.json
file:
{
"source image": "https://upload.wikimedia.org/wikipedia/commons/e/e4/1_taipei_sunrise_panorama_dxr_edit_pangen_141215_1.jpg",
"offset_x": 0,
"image_name": "1_taipei_sunrise_panorama_dxr_edit_pangen_141215_1.jpg",
"success": true,
"offset_y": 0
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message is currently the entire message, which is much more descriptive. I can truncate this to a more concise message if needed, but for now it may be more helpful to have the full error message.
@@ -0,0 +1,126 @@ | |||
# Facets Atlasmaker | |||
|
|||
Atlasmaker is a command line utility and library for creating sprite atlases. These atlases can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing all this documentation! Much appreciated.
facets_atlasmaker/montage.py
Outdated
|
||
# Manifest Key names | ||
MANIFEST_IMAGE_NAME_KEY = 'image_name' | ||
MANIFEST_SOURCE_IMAGE_KEY = 'source image' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 'source image'
should be 'source_image'
(with an underscore) to match the other key names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
facets_atlasmaker/montage.py
Outdated
MANIFEST_SOURCE_IMAGE_KEY = 'source image' | ||
MANIFEST_OFFSET_X_KEY = 'offset_x' | ||
MANIFEST_OFFSET_Y_KEY = 'offset_y' | ||
MANIFEST_IMAGE_SUCCESS_KEY = 'success' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that there's a flag to indicate whether download was successful. This allows for easy querying with jq
to find out what failed:
$ cat outputs/manifest.json | jq 'select(.success == false)'
Since success is by far the majority case, it would lead to a slightly smaller manifest to omit the success
key and instead have an error
key which only appears for the failures.
Besides switching from success
to error
, how hard would it be for the value of the error to be a string describing the kind of error? For example, the two kinds of errors I'm seeing are image is too big (DecompressionBombWarning
) and 'cannot identify image file' (e.g. SVG and webm files).
Here's an example of the DecompressionBombWarning
:
/Users/jimbo/facets-env/lib/python2.7/site-packages/PIL/Image.py:2514: DecompressionBombWarning: Image size (124090014 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
DecompressionBombWarning)
You don't have to implement this request, I'm just curious to get your assessment of how hard it might be to add a very brief reason for the error
, e.g. TOO_BIG
, NON_IMAGE
or OTHER
.
Hi Jimbo-- made a couple of fixes:
I ran successfully it on the 10k wikipedia test image set on my workstation, which finished in ~21 mins. |
Also renamed source image key to have underscore.
I just realized another bug: if one tries to generate an atlas as jpeg, it fails because RGBA can't be output as jpeg. Going to try to fix this, and also add a fail fast check if someone specifies an output format that PIL can't perform or is not possible with the current settings. |
In the RGBA/jpeg case, dropping the alpha channel is a reasonable approach. Partially or fully transparent pixels may appear strange, but this is OK. |
facets_atlasmaker/parallelize.py
Outdated
location, image_convert_settings, | ||
allow_truncated_images=allow_truncated_images, | ||
disk_cache=disk_cache) | ||
for location in image_src_locations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the output of Parallel looks something like this:
[Parallel(n_jobs=-1)]: Done 562 tasks | elapsed: 3.4min
It does not include the total number of tasks, just how many have been completed.
I tried casting image_src_locations
to a list, hoping that would trigger Parallel to know how many jobs there were, but it didn't seem to make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a "--parallelize_verbosity" flag that defines the joblib verbosity (defaulted to a pretty high level of 10, which is why the number of finished tasks is output). I increased this higher but it doesn't give the count of total tasks, like it shows in the joblib docs. I'm not sure what determines when it shows the "Done X tasks out of Y" message. I also tried superficially adding tqdm to get a progress bar but it's crashing when using it with joblib parallel. Might be tricky to add this in, so for now I've just logged at the start of the process how many images are listed in the source file, and one can extrapolate from the existing "jobs done" count. Can look into a better implementation if this is a big issue down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about this, not worth tearing your hear out about. A log line right before the Parallel call to state something like "Scheduling 10000 tasks..." might be nice, but also optional.
… RGB if image format doesn't support RGBA (like jpeg)
A few fixes:
Please let me know if you're able to run it. You seem to always discover new edge cases ;) |
…set timeout and max retries.
Added log message of number of scheduled tasks before parallel runs. |
Thanks for all your hard work on this! I'm running the 10k example again from my laptop. If it works I'd say we're good to go. 🎉 |
It worked! @jameswex please take a look when you get a chance. |
didn't do a full code review but saw that you did a review and tested it out @jimbojw . this looks great, glad to have this addition. one note is that it would be nice to add a section to the existing Facets Dive readme about sprite making and have it link to the readme for this atlas maker. |
Hi Junjay, can you add a line or two to the Facets Dive README about this? It doesn't need to be a lot, just a pointer to your documentation. |
Hi Jimbo and James, I added a sentence to the "Providing Sprites For Dive to Render" section linking to atlasmaker. Let me know if you think it should be its own section to be more obvious, or if it looks good to you. Thanks! |
Command line utility for creating sprite atlases from a list of source image locations. Apologies in advance for the large PR.