Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't dl images in the reblog of an answer post #533

Open
rduwjjnh opened this issue May 21, 2024 · 3 comments
Open

Doesn't dl images in the reblog of an answer post #533

rduwjjnh opened this issue May 21, 2024 · 3 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@rduwjjnh
Copy link

Here's an example post: https://www.tumblr.com/fruitegg/659014555856437248/
My settings:
image
I've found the 659014555856437248 post only in the answers.txt, but the content is the same as of the original answer post (614541567356698624). And it only downloads the image in the 614541567356698624 post.
image

It seems to work correctly with the reblogs of other types of posts.

Expected behavior
I expect it to parse the content of the reblog of an answer post and download all images.

Desktop (please complete the following information):

  • TumblThree version: v2.13.0.442
  • OS: Windows 10
@thomas694
Copy link
Contributor

That's by design. Files that have already been downloaded are not downloaded again in the same blog (or globally with setting).
You recognize the skipped image download because you build unique filenames (e.g. %i).
If you enable "dump crawler data", you'll see that both posts are actually processed/downloaded, but duplicate media is skipped.

@rduwjjnh
Copy link
Author

rduwjjnh commented May 26, 2024

Yes, both posts are processed, but it completely skips the content that's added in the reblog, this is what I showed on the second screenshot. These are the jsons it gives me if I enable dumping crawler data. The content of both posts is exactly the same, that of the original post, whereas there should be the new text and images in the reblog:
Original post 614541567356698624.json
Reblog 659014555856437248.json

In comparison, If I take a reblog of another type of post, for example - https://www.tumblr.com/fruitegg/685938465659060224/, there's new text and images in the json of the reblog, and all images are downloaded, as I expect.
Original post 685828894320984064.json
Reblog 685938465659060224.json

I understand that it skips duplicates, but those images weren't downloaded even once. If I search 659014555856437248, I only find that one image that's in the original answer post, and if I search 614541567356698624, there are no images with that post id. There are also no occurrences of the original links (64.media.tumblr.com/*) of the images from the reblog in any text files.

@thomas694
Copy link
Contributor

Ok, now I've seen it. When I looked in the JSONs and in the browser I saw the problem.
For the reblogged answer, they render on their own HTML page more than they give us back in their data structure.

JSON

<p>working full time doing backgrounds for an animation studio. mildly amusing given that i can count on maybe one hand the number of backgrounds ive completed in personal drawings</p><p>beyond that ive been doing embroidery and listening to a lot of united states chemical safety board videos. recently ive also gotten into eurobeat.</p><figure class=\"tmblr-full tmblr-embed\" data-provider=\"youtube\" data-url=\"https://www.youtube.com/embed/3d37Ca3E4fA?feature=oembed&amp;enablejsapi=1&amp;origin=https://safe.txmblr.com&amp;wmode=opaque\" data-orig-width=\"267\" data-orig-height=\"200\"><iframe width=\"540\" height=\"404\" id=\"youtube_iframe\" src=\"https://www.youtube.com/embed/3d37Ca3E4fA?feature=oembed&amp;enablejsapi=1&amp;origin=https://safe.txmblr.com&amp;wmode=opaque\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen title=\"Combustible Dust: An Insidious Hazard\"></iframe></figure><div class=\"npf_row\"><figure class=\"tmblr-full\" data-orig-height=\"377\" data-orig-width=\"540\"><img src=\"https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s640x960/ce3c383c324201dd063bced071e1811b040d565e.png\" data-orig-height=\"377\" data-orig-width=\"540\" srcset=\"https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s75x75_c1/24a02aa8a639466b0e822562e2d497da80973bf6.png 75w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s100x200/25a48dc37daaebbc08992b82a498d7c46fab5efc.png 100w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s250x400/74e1c24eee855b5c646d5fae08e5b43e7e85f8c9.png 250w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s400x600/6d0195abeacabe35a94f7651a67c59ff73424954.png 400w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s500x750/05369ee9febdf60136f616cd89d24ecfdfb77ce2.png 500w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s540x810/572edf4b62a53d3617567cb140c87b6b4800e80e.png 540w\" sizes=\"(max-width: 540px) 100vw, 540px\"/></figure></div><p>psd, progress gif on <a href=\"http://www.patreon.com/fruitegg\">patreon</a></p>

HTML

<div class="copy"><p><a class="tumblr_blog" href="https://fruitegg.tumblr.com/post/614541567356698624">fruitegg</a>:</p><blockquote><p>working full time doing backgrounds for an animation studio. mildly amusing given that i can count on maybe one hand the number of backgrounds ive completed in personal drawings</p><p>beyond that ive been doing embroidery and listening to a lot of united states chemical safety board videos. recently ive also gotten into eurobeat.</p><figure class="tmblr-embed tmblr-full"><iframe id="embed-665324efa263d479899774" class="embed_iframe" src="https://safe.txmblr.com/svc/embed/inline/https%3A%2F%2Fwww.youtube.com%2Fembed%2F3d37Ca3E4fA%3Ffeature%3Doembed%26enablejsapi%3D1%26origin%3Dhttps%3A%2F%2Fsafe.txmblr.com%26wmode%3Dopaque#embed-665324efa263d479899774-partied" width="500" height="374" scrolling="no" frameborder="0" allowfullscreen="allowfullscreen" mozallowfullscreen="" webkitallowfullscreen=""></iframe></figure><div class="npf_row"><div class="npf_col"><figure class="tmblr-full"><a class="post_media_photo_anchor" data-big-photo="https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s1280x1920/80705096951d53b1ca75c45a40e61c04fe3de52d.png" data-big-photo-height="377" data-big-photo-width="540"><img class="post_media_photo image" src="https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s500x750/05369ee9febdf60136f616cd89d24ecfdfb77ce2.png" srcset="https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s75x75_c1/24a02aa8a639466b0e822562e2d497da80973bf6.png 75w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s100x200/25a48dc37daaebbc08992b82a498d7c46fab5efc.png 100w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s250x400/74e1c24eee855b5c646d5fae08e5b43e7e85f8c9.png 250w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s400x600/6d0195abeacabe35a94f7651a67c59ff73424954.png 400w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s500x750/05369ee9febdf60136f616cd89d24ecfdfb77ce2.png 500w, https://64.media.tumblr.com/0f70d31ad418c92960a20539af9c4b39/1187fa04d278c1ac-61/s540x810/572edf4b62a53d3617567cb140c87b6b4800e80e.png 540w" sizes="(max-width: 540px) 100vw, 540px" alt="image"></a></figure></div></div><p>psd, progress gif on <a href="http://www.patreon.com/fruitegg">patreon</a></p></blockquote><p></p><p>prequel to dis pic</p><div class="npf_row"><div class="npf_col"><figure class="tmblr-full"><a class="post_media_photo_anchor" data-big-photo="https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s1280x1920/5b70fb594c58ce0d5ee378d2294d8f7aa8868e93.png" data-big-photo-height="810" data-big-photo-width="540"><img class="post_media_photo image" src="https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s500x750/c7f5d8e1346172738fbcfc2a50a9fa37c3c3f5da.png" srcset="https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s75x75_c1/aa6d9e24c37a17b637de4905a0eb0b6ff3a551c2.png 75w, https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s100x200/bb556e2696db63fd151c2e381a54ad8ae8c45c8a.png 100w, https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s250x400/8795a9458ba40f2a247427270df94edcde5e4f41.png 250w, https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s400x600/b36ecc391fc4eac130b1aa007f031c1d03e8b99d.png 400w, https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s500x750/c7f5d8e1346172738fbcfc2a50a9fa37c3c3f5da.png 500w, https://64.media.tumblr.com/64113a30f9c16b995e5d68cbbd76d246/b8ff597f5db87163-ac/s540x810/3264b985c2bf10cd987c9df90d644ffb5de70b39.png 540w" sizes="(max-width: 540px) 100vw, 540px" alt="image"></a></figure></div></div><div class="npf_row"><div class="npf_col"><figure class="tmblr-full"><a class="post_media_photo_anchor" data-big-photo="https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s1280x1920/ae5d4eba03b3d50536ba9121363c9aa43de2c80a.png" data-big-photo-height="770" data-big-photo-width="540"><img class="post_media_photo image" src="https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s500x750/723d1e823eba0e2295262c7525aa3a95019f98cb.png" srcset="https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s75x75_c1/0f6932bf539faf314dc9e1889d0d3d31b60262fb.png 75w, https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s100x200/b3861554b7e8b57dee2e98e2a2e94ac7d6d14f8c.png 100w, https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s250x400/63b07817ca4b9d4913d3615e575a591954fdf87c.png 250w, https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s400x600/5fb96052dbef78d47c43ab166c5770fc98e95f50.png 400w, https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s500x750/723d1e823eba0e2295262c7525aa3a95019f98cb.png 500w, https://64.media.tumblr.com/7a1a48cbc7e1efe4da4010f4a4825b54/b8ff597f5db87163-d7/s540x810/2b470fc4676935169840a8a7757020b8c0a038ee.png 540w" sizes="(max-width: 540px) 100vw, 540px" alt="image"></a></figure></div></div></div>

In this particular case, the images cannot be downloaded because we parse the data structure and not the HTML page. Maybe they'll fix this error one day.

@thomas694 thomas694 added help wanted Extra attention is needed bug Something isn't working labels May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants