reduce "redundancy", so the check only happens once and that value is stored, allowing that value to be compared. #1201

NewUserHa · 2022-11-27T09:26:05Z

NewUserHa · 2022-12-01T18:46:22Z

It seems there's some memory leak, it currently committed 1G+ memory in a long-run test. But maybe the reason is not this change

NewUserHa · 2022-12-02T16:28:06Z

Even if all images are skipped downloading (i.e. without writing any file), the committed memory still grows.
So memory leak probably is not related to this PR.

Nandaka · 2022-12-03T00:52:10Z

memory leak

can you compare before this PR and after? how do you test it? Last time when this happen, usually due to the BeautifulSoup page is not decompose() and deleted.

also maybe because the downloaded page is being cached for 1hour (useful if it need to get the same artist info multiple time and reduce access to pixiv server)

NewUserHa · 2022-12-03T07:30:07Z

Without this change, it seems 23M workset memory for a middle-run (maybe fewer hours and cool delay), don't remember what memory is in a long run.

BeautifulSoup's objects should be garbage collected automatically by python if no reference to that object for a while.

It also costs ~300MB committed memory size if 10000 skipped via "Already downloaded in DB". Therefore may have nothing to do with file writing or beautifulsoup.

biggestsonicfan · 2022-12-08T01:21:11Z

In the future, these checks are not "useless", each check itself is important to the code. What you've done here, however, is reduce "redundancy", so the check only happens once and that value is stored, allowing that value to be compared. "Remove redundant file checks" would have been a more apt title for this pr.

Nandaka · 2022-12-08T01:48:24Z

well, grammar is not my strong poing, but at least I can understand it 😄

biggestsonicfan · 2022-12-27T02:42:02Z

Just firing up PixivUtil2 again and the "Skipped getting remote file size because local file not exists" is really getting to me...

It should be "Remote file size check skipped because local file does not exist" or something like that. Ironically printing this message creates greater latency than the "redundant" checks it intended to remove, so they have actually managed to slow down the program where they intended to speed it up... If no message is printed, it should be implied that the file size check is skipped, else it prints the check when it the conditions to check the file size is met. This notification for every new image is superfluous...

NewUserHa · 2022-12-27T10:31:01Z

That msg is printed just like how other similar lines printing in that same code block.
Printing msg will NOT slow down anything in any way, because it is just a 'print()'. If it makes you unhappy, you can comment that 'print' line out.
But this PR does save your time and your disks. And you can verify that by Task Manager or etc to monitor your disk.

biggestsonicfan · 2022-12-28T05:40:45Z

I had to see the numbers for myself, and you are right. The line printing adds only an additional 0:00:00.000051 time per image, while six checks to see if a file exists takes a whopping 0:00:00.00015

If you are able to perceive a difference of even 0:00:00.001 per image downloaded, not taking into account the random delay used in-between downloads, I don't know what to tell you.

NewUserHa · 2022-12-29T02:39:36Z

that's how programs work, and the 'print' you can consider it costs no time.
the time the disk costs actually depends on your cases, e.g. your disks are SSD or not, the file is in your devices buffer or not, etc. If you have a large folder, that should cost a lot of time.
why do you want your disk to check the same thing( if the file exists here) again and again for the same result? No program should do that. and this PR fixed it.

NewUserHa added 3 commits November 27, 2022 17:24

Update PixivDownloadHandler.py

482918e

Update PixivDownloadHandler.py

f7e8248

Update PixivConstant.py

78cf022

Nandaka merged commit 8318f28 into Nandaka:master Dec 8, 2022

Nandaka changed the title ~~get rid of useless checks to speed up the download, if image not exists at all.~~ reduce "redundancy", so the check only happens once and that value is stored, allowing that value to be compared. Dec 8, 2022

Nandaka mentioned this pull request Dec 15, 2022

No retry on incompleted downloads? #1206

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce "redundancy", so the check only happens once and that value is stored, allowing that value to be compared. #1201

reduce "redundancy", so the check only happens once and that value is stored, allowing that value to be compared. #1201

NewUserHa commented Nov 27, 2022

NewUserHa commented Dec 1, 2022

NewUserHa commented Dec 2, 2022

Nandaka commented Dec 3, 2022 •

edited

Loading

NewUserHa commented Dec 3, 2022

biggestsonicfan commented Dec 8, 2022

Nandaka commented Dec 8, 2022

biggestsonicfan commented Dec 27, 2022 •

edited

Loading

NewUserHa commented Dec 27, 2022 •

edited

Loading

biggestsonicfan commented Dec 28, 2022

NewUserHa commented Dec 29, 2022

reduce "redundancy", so the check only happens once and that value is stored, allowing that value to be compared. #1201

reduce "redundancy", so the check only happens once and that value is stored, allowing that value to be compared. #1201

Conversation

NewUserHa commented Nov 27, 2022

NewUserHa commented Dec 1, 2022

NewUserHa commented Dec 2, 2022

Nandaka commented Dec 3, 2022 • edited Loading

NewUserHa commented Dec 3, 2022

biggestsonicfan commented Dec 8, 2022

Nandaka commented Dec 8, 2022

biggestsonicfan commented Dec 27, 2022 • edited Loading

NewUserHa commented Dec 27, 2022 • edited Loading

biggestsonicfan commented Dec 28, 2022

NewUserHa commented Dec 29, 2022

Nandaka commented Dec 3, 2022 •

edited

Loading

biggestsonicfan commented Dec 27, 2022 •

edited

Loading

NewUserHa commented Dec 27, 2022 •

edited

Loading