Suggestion: Dump links from post descriptions #95

reyaz006 · 2015-10-23T17:36:24Z

Some artist like to share extended information like links to high-quality images or different variations of their creations that for various reasons they were unable to upload to Pixiv directly.

I think it would be useful to dump those descriptions and specifically those that contain links to other sites. Maybe also allow users to specify which sites they would like to monitor. E.g. imgur, mediafire, nicovideo etc.

I'm not sure if artists are allowed to edit post descriptions. If yes, then the described feature will also introduce an issue where user is left with outdated copy of some description, in case it was updated by the author at some point. Can be solved by checking the size of description data.

I'm not sure if this will be a useful feature for enough people.

NHOrus · 2015-10-24T08:22:41Z

It's very interesting feature. But I am not sure what's the best way to dump them so it would be comfortable for Nandaka and end user.

reyaz006 · 2015-10-24T09:01:23Z

You mean actual implementation or how user will see it? Here is some example:

The description itself is on the html page, so there should be no need to download it separately. They can be saved as .txt files with names same as image-file names, much like writeimageinfo does. Maybe even make the description part of this option, but then I'd like an option for those who does not want to save any .txt file if description does not contain any links.

First, user should decide if he wants to dump all descriptions or only those that contain links. The first one is easy and means no filtering. (Now that I think about it, perhaps it will be better for most interested users to use blacklist instead of whitelist.) For the second one, after user enables this option for the first time, he does not set any blacklist. Then, after he sees what was actually saved he might decide which links he does not want to see - then set up a blacklist. If some description contain links, all of those links must fit blacklist rules for that description to be skipped by PixivUtil2.

There are some ways for the user to quickly see which descriptions were downloaded - additional notes about this can be added in log files "Downloaded_on_20xx_xx_xx.txt", users may just search for new .txt files in download folder etc.

For additional convenience, every URL can be saved in some kind of global url-log file, something like:

pixiv_id_1\image_id_1 http://twitter.com/link
pixiv_id_1\image_id_2 http://imgur.com/link
pixiv_id_1\image_id_2 http://imgur.com/link2
pixiv_id_3\image_id_10 http://mediafire.com/link
pixiv_id_8\image_id_55 http://nicovideo.jp/link

Nandaka · 2015-10-26T05:24:10Z

most likely, I will use the Downloaded_on_20xx_xx_xx.txt approach, and only for those description with link in it (e.g. //p[@class='caption']/a)

the content might look like this:

#image_id
url1
url2
url3
#next image id
url1
...

add sample page: http://www.pixiv.net/member_illust.php?mode=medium&illust_id=53117853

Implement Feature #95: dump url list to text file. Set writeUrlInDescription = True to enable.

Nandaka · 2015-10-26T06:25:30Z

Try https://github.com/Nandaka/PixivUtil2/releases/tag/20151026-beta

reyaz006 · 2015-10-26T10:14:07Z

Thank you very much, @Nandaka
Seems to work fine. I can see the links are getting dumped into url_list_20151026.txt. Maybe better to use the %Y-%m-%d though, like in Downloaded_on__2015-10-26_.txt.

It's useful enough for me, as is. The other options that I think might be useful for someone else:

Domain blacklist or whitelist for urls (as described in my previous comment). I would prefer blacklist if this gets implemented.
An opion to change the name of the url dump, so it might be possible to force the application to use a single text file with custom name for all links, instead of 1 file per day.
Dump the whole description. Unlikely useful for many, as it may just fill the url dump with more useless data. Still, might be a viable addition to writeimageinfo.

Nandaka · 2015-11-12T07:30:18Z

Try: https://github.com/Nandaka/PixivUtil2/releases/tag/v20151112

add 2 new option to blacklist the url using regex and setting the filename.

reyaz006 · 2015-11-13T13:12:45Z

Both settings work fine for me. Thanks for this improvement!

NHOrus · 2015-11-19T13:24:54Z

Question: How exactly it works? Will it dump links on every go-through? Or only on new images? Or only when you check old images with "alwayscheckimagesize" setting?

Nandaka · 2015-11-20T08:47:49Z

it will go through all image if you set alwayscheckimagesize=True

https://github.com/Nandaka/PixivUtil2/blob/master/PixivUtil2.py#L481 and https://github.com/Nandaka/PixivUtil2/blob/master/PixivUtil2.py#L649

Nandaka added a commit that referenced this issue Oct 26, 2015

v20151026-beta

e9eb814

Implement Feature #95: dump url list to text file. Set writeUrlInDescription = True to enable.

Nandaka added the Enhancement label Nov 12, 2015

Nandaka closed this as completed Feb 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Dump links from post descriptions #95

Suggestion: Dump links from post descriptions #95

reyaz006 commented Oct 23, 2015

NHOrus commented Oct 24, 2015

reyaz006 commented Oct 24, 2015

Nandaka commented Oct 26, 2015

Nandaka commented Oct 26, 2015

reyaz006 commented Oct 26, 2015

Nandaka commented Nov 12, 2015

reyaz006 commented Nov 13, 2015

NHOrus commented Nov 19, 2015

Nandaka commented Nov 20, 2015

Suggestion: Dump links from post descriptions #95

Suggestion: Dump links from post descriptions #95

Comments

reyaz006 commented Oct 23, 2015

NHOrus commented Oct 24, 2015

reyaz006 commented Oct 24, 2015

Nandaka commented Oct 26, 2015

Nandaka commented Oct 26, 2015

reyaz006 commented Oct 26, 2015

Nandaka commented Nov 12, 2015

reyaz006 commented Nov 13, 2015

NHOrus commented Nov 19, 2015

Nandaka commented Nov 20, 2015