Enhance document #1197

NewUserHa · 2022-11-19T15:11:40Z

This PR tries to update the document and add some details to it.

Found some issues that need to discuss:

customCleanUpRe no document.
customBadChars no code no use.
writeRawJSON no code no use.
writeSeriesJSON no code no use. the includeSeriesJSON do the job.
useLocalTimezone can be removed since all file time is always use local time.

hope this can make the document better for coming new users

biggestsonicfan · 2022-11-19T15:50:11Z

What on earth?

NewUserHa · 2022-11-19T15:57:56Z

the document, the readme.md.

those settings are not used at all or having issues.

Nandaka · 2022-11-19T16:34:46Z

Is the document you are referring to is the Readme.md?

writeRawJSON it is used in the code and in the config file.

includeSeriesJSON is the correct one, not writeSeriesJSON ==> only need to update the readme

useLocalTimezone => not in my case see #420 as the API return in utc

customCleanUpRe is used as part of filename sanitizer if user want to use it.

customBadChars ==> customBadChars was added on #920, it is still used in the code (refer to the #920)

NewUserHa · 2022-11-19T16:56:49Z

right.
I'm trying set config based on the document, so I found the issues.

writeRawJSON, writeImageJSON

turned both on, but no '..-raw.text' file written, only a json.text from 'writeImageJSON'

writeSeriesJSON

it's created by .py when no config found

useLocalTimezone

system use time in seconds since the epoch (i.e. UTC) always. tzinfo is usually useful only in string format and time calculation.
the reason that '70850475.txt' from that issue with wrong time may be no 'os.utime()' followed after '.close()' at:

PixivUtil2/PixivImage.py

Line 434 in f2e9ab1

info.close()

so useLocalTimezone may be no need.

customCleanUpRe, customBadChars

I think maybe adding a few examples will help.

I want details of the 'checkfilesize' 'overwrite' and 'etc.' too since they are important to control downloading

NewUserHa · 2022-11-20T12:29:13Z

some more useful info that I think can help users and enhance document:

createWebp, webm, apng, gif

PixivUtil2/PixivHelper.py

Line 1045 in f2e9ab1

with zipfile.ZipFile(ugoira_file) as f:

this line extracted all files from the ugoira. it can directly convert from the source file .zip to all other formats actually, I guess.

webpparam: Parameter to be used to encode webm. default is lossless 0 -q:v 90 -loop 0 -vsync 2 -r 999

"to encode webm" ==> "to encode webp".
while actual argument miss lossless 1(true) at

PixivUtil2/PixivHelper.py

Line 1014 in f2e9ab1

_config.ffmpegParam = "-vsync 2 -r 999 -pix_fmt yuv420p"

webpparam

webp can use lossless for the best quality by default.

checkLastModified: when found db record and both overwrite and checksize is false, compare the local file's last-modified time with the uploaded time on pixiv.

PixivUtil2/PixivDownloadHandler.py

Line 75 in 2699125

    
           if config.checkLastModified and os.path.isfile(filename_save) and image is not None:

it should do the time check after finding the file size is the same.

alwaysCheckFileSize: when found db record, if both this and overwrite are false, it will only check if the file exists at the recorded path. otherwise, it'll always do a file size check.

overwrite: need alwaysCheckFileSize to work. when overwrite is true, if the file size is different, it will just delete it and then redownload it.

will https://github.com/Nandaka/PixivUtil2/blob/f2e9ab19fe471c93c0a1697d89f1f9f93f3c39cd/PixivBrowserFactory.py#L60 be affected by the socks5 `getaddrd failed`?

biggestsonicfan · 2022-11-21T01:12:30Z

readme.md

@@ -528,7 +528,7 @@ Please refer run with `--help` for latest information.
  Set to `True` to export the image information to a .XMP sidecar file, one per image in the album. The data contained within the file is the same but some software requires matching file names to detect the metadata. If set to `True`, then `writeImageXMP` is ignored.
 - verifyimage

-  Do image and zip checking after download. Set the value to `True` to enable.
+  Check if downloaded files is valid image or zip. Set the value to `True` to enable.


If "files" is plural, the proper term to use would be "are", so Check if downloaded files are a valid image or zip, however this is not what verifyimage actually does. verifyimage simply checks if the file ends in ".jpg", ".png", ".gif", ".ugoira", or ".zip". It does not verify if it is a valid image or zip file.

mistake.

that adding details is trying to make it easier to tweak PixivUtil2 for experienced users.

actually, verifyimage will validate files:

PixivUtil2/PixivDownloadHandler.py

Lines 203 to 248 in f2e9ab1

elif config.verifyImage and filename_save.endswith((".jpg", ".png", ".gif")):

fp = None

try:

from PIL import Image, ImageFile

fp = open(filename_save, "rb")

# Fix Issue #269, refer to https://stackoverflow.com/a/42682508

ImageFile.LOAD_TRUNCATED_IMAGES = True

img = Image.open(fp)

img.load()

fp.close()

PixivHelper.print_and_log('info', ' Image verified.')

except BaseException:

if fp is not None:

fp.close()

PixivHelper.print_and_log('info', ' Image invalid, deleting...')

os.remove(filename_save)

raise

elif config.verifyImage and filename_save.endswith((".ugoira", ".zip")):

fp = None

try:

import zipfile

fp = open(filename_save, "rb")

zf = zipfile.ZipFile(fp)

check_result = None

try:

check_result = zf.testzip()

# Issue #1165

except NotImplementedError as ne:

PixivHelper.print_and_log('warn', f' {ne}')

except RuntimeError as e:

if 'encrypted' in str(e):

PixivHelper.print_and_log('info', ' archive is encrypted, cannot verify.')

else:

raise

fp.close()

if check_result is None:

PixivHelper.print_and_log('info', ' Image verified.')

else:

PixivHelper.print_and_log('info', f' Corrupted file in archive: {check_result}.')

raise PixivException(f"Incomplete Downloaded for {url}", PixivException.DOWNLOAD_FAILED_OTHER)

except BaseException:

if fp is not None:

fp.close()

PixivHelper.print_and_log('info', ' Image invalid, deleting...')

os.remove(filename_save)

raise

Oh so it does, my mistake

it seems that the overwrite and alwaysCheckFileSize have too much tech debt.
and not sure how to detail those

haha, it is from 2015 and no major rewrite at all. As long it is working as my expectation, I don't plan to do anything as long it doesn't break.

PixivConfig.py

Nandaka · 2022-11-22T11:52:40Z

also send a lot of changes in one big pull request, i might not be able to understand it 😄

NewUserHa · 2022-11-22T14:56:49Z

This PR will try to improve the document only, to make both new and experienced users clear with some details.

NewUserHa · 2022-11-23T11:29:00Z

The rest of this PR is done. Except the customCleanUpRe, but I don't use those settings and didn't read those code.

@Nandaka Can you add its document?

biggestsonicfan · 2022-11-23T15:09:46Z

Did you notice r18Type isn't documented? The information on it can be found in #439.

NewUserHa · 2022-11-23T15:23:02Z

@cglmrfreeman it's updated now.
There're many options I don't use. You can also suggest any option I didn't see to the PR directly.

Nandaka · 2022-11-24T10:42:18Z

PixivConfig.py

@@ -223,7 +223,7 @@ def proxy(self):
        value = getattr(self, "proxyAddress", None)
        if not value:
            return None
-        match = re.match(r"^(?:(https?|socks[45])://)?([\w.-]+)(:\d+)?$", value)
+        match = re.match(r"^(?:(https?|socks[45]h?)://)?([\w.-]+)(:\d+)?$", value)


should we add check for socks4a too?

You can. But I don't have any socks4a server for test.

Update readme.md

cc113fb

NewUserHa added 3 commits November 20, 2022 20:37

Update readme.md

b0d7bf2

Update readme.md

cc08b6a

Update PixivConfig.py

e5d503e

will https://github.com/Nandaka/PixivUtil2/blob/f2e9ab19fe471c93c0a1697d89f1f9f93f3c39cd/PixivBrowserFactory.py#L60 be affected by the socks5 `getaddrd failed`?

biggestsonicfan reviewed Nov 21, 2022

View reviewed changes

PixivConfig.py Show resolved Hide resolved

NewUserHa added 3 commits November 23, 2022 08:43

Update readme.md

d696c90

Update readme.md

2f12865

Update PixivConfig.py

05720c5

NewUserHa requested review from Nandaka and removed request for Nandaka November 23, 2022 11:26

Update readme.md

8f14cad

Nandaka reviewed Nov 24, 2022

View reviewed changes

Merge branch 'master' into patch-2

b8fc3f7

Nandaka merged commit 3771035 into Nandaka:master Nov 25, 2022

NewUserHa mentioned this pull request Nov 27, 2022

Enhance document 2 #1202

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance document #1197

Enhance document #1197

NewUserHa commented Nov 19, 2022 •

edited

Loading

biggestsonicfan commented Nov 19, 2022

NewUserHa commented Nov 19, 2022

Nandaka commented Nov 19, 2022 •

edited

Loading

NewUserHa commented Nov 19, 2022 •

edited

Loading

NewUserHa commented Nov 20, 2022 •

edited

Loading

biggestsonicfan Nov 21, 2022 •

edited

Loading

NewUserHa Nov 21, 2022 •

edited

Loading

biggestsonicfan Nov 21, 2022

NewUserHa Nov 21, 2022

Nandaka Nov 22, 2022

Nandaka commented Nov 22, 2022

NewUserHa commented Nov 22, 2022

NewUserHa commented Nov 23, 2022

biggestsonicfan commented Nov 23, 2022

NewUserHa commented Nov 23, 2022

Nandaka Nov 24, 2022

NewUserHa Nov 24, 2022

	elif config.verifyImage and filename_save.endswith((".jpg", ".png", ".gif")):
	fp = None
	try:
	from PIL import Image, ImageFile
	fp = open(filename_save, "rb")
	# Fix Issue #269, refer to https://stackoverflow.com/a/42682508
	ImageFile.LOAD_TRUNCATED_IMAGES = True
	img = Image.open(fp)
	img.load()
	fp.close()
	PixivHelper.print_and_log('info', ' Image verified.')
	except BaseException:
	if fp is not None:
	fp.close()
	PixivHelper.print_and_log('info', ' Image invalid, deleting...')
	os.remove(filename_save)
	raise
	elif config.verifyImage and filename_save.endswith((".ugoira", ".zip")):
	fp = None
	try:
	import zipfile
	fp = open(filename_save, "rb")
	zf = zipfile.ZipFile(fp)
	check_result = None
	try:
	check_result = zf.testzip()
	# Issue #1165
	except NotImplementedError as ne:
	PixivHelper.print_and_log('warn', f' {ne}')
	except RuntimeError as e:
	if 'encrypted' in str(e):
	PixivHelper.print_and_log('info', ' archive is encrypted, cannot verify.')
	else:
	raise
	fp.close()
	if check_result is None:
	PixivHelper.print_and_log('info', ' Image verified.')
	else:
	PixivHelper.print_and_log('info', f' Corrupted file in archive: {check_result}.')
	raise PixivException(f"Incomplete Downloaded for {url}", PixivException.DOWNLOAD_FAILED_OTHER)
	except BaseException:
	if fp is not None:
	fp.close()
	PixivHelper.print_and_log('info', ' Image invalid, deleting...')
	os.remove(filename_save)
	raise

Enhance document #1197

Enhance document #1197

Conversation

NewUserHa commented Nov 19, 2022 • edited Loading

biggestsonicfan commented Nov 19, 2022

NewUserHa commented Nov 19, 2022

Nandaka commented Nov 19, 2022 • edited Loading

NewUserHa commented Nov 19, 2022 • edited Loading

NewUserHa commented Nov 20, 2022 • edited Loading

biggestsonicfan Nov 21, 2022 • edited Loading

Choose a reason for hiding this comment

NewUserHa Nov 21, 2022 • edited Loading

Choose a reason for hiding this comment

biggestsonicfan Nov 21, 2022

Choose a reason for hiding this comment

NewUserHa Nov 21, 2022

Choose a reason for hiding this comment

Nandaka Nov 22, 2022

Choose a reason for hiding this comment

Nandaka commented Nov 22, 2022

NewUserHa commented Nov 22, 2022

NewUserHa commented Nov 23, 2022

biggestsonicfan commented Nov 23, 2022

NewUserHa commented Nov 23, 2022

Nandaka Nov 24, 2022

Choose a reason for hiding this comment

NewUserHa Nov 24, 2022

Choose a reason for hiding this comment

NewUserHa commented Nov 19, 2022 •

edited

Loading

Nandaka commented Nov 19, 2022 •

edited

Loading

NewUserHa commented Nov 19, 2022 •

edited

Loading

NewUserHa commented Nov 20, 2022 •

edited

Loading

biggestsonicfan Nov 21, 2022 •

edited

Loading

NewUserHa Nov 21, 2022 •

edited

Loading