Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Dump links from post descriptions #95

Closed
reyaz006 opened this issue Oct 23, 2015 · 9 comments
Closed

Suggestion: Dump links from post descriptions #95

reyaz006 opened this issue Oct 23, 2015 · 9 comments

Comments

@reyaz006
Copy link

Some artist like to share extended information like links to high-quality images or different variations of their creations that for various reasons they were unable to upload to Pixiv directly.

I think it would be useful to dump those descriptions and specifically those that contain links to other sites. Maybe also allow users to specify which sites they would like to monitor. E.g. imgur, mediafire, nicovideo etc.

I'm not sure if artists are allowed to edit post descriptions. If yes, then the described feature will also introduce an issue where user is left with outdated copy of some description, in case it was updated by the author at some point. Can be solved by checking the size of description data.

I'm not sure if this will be a useful feature for enough people.

@NHOrus
Copy link
Contributor

NHOrus commented Oct 24, 2015

It's very interesting feature. But I am not sure what's the best way to dump them so it would be comfortable for Nandaka and end user.

@reyaz006
Copy link
Author

You mean actual implementation or how user will see it? Here is some example:

The description itself is on the html page, so there should be no need to download it separately. They can be saved as .txt files with names same as image-file names, much like writeimageinfo does. Maybe even make the description part of this option, but then I'd like an option for those who does not want to save any .txt file if description does not contain any links.

First, user should decide if he wants to dump all descriptions or only those that contain links. The first one is easy and means no filtering. (Now that I think about it, perhaps it will be better for most interested users to use blacklist instead of whitelist.) For the second one, after user enables this option for the first time, he does not set any blacklist. Then, after he sees what was actually saved he might decide which links he does not want to see - then set up a blacklist. If some description contain links, all of those links must fit blacklist rules for that description to be skipped by PixivUtil2.

There are some ways for the user to quickly see which descriptions were downloaded - additional notes about this can be added in log files "Downloaded_on_20xx_xx_xx.txt", users may just search for new .txt files in download folder etc.

For additional convenience, every URL can be saved in some kind of global url-log file, something like:

pixiv_id_1\image_id_1 http://twitter.com/link
pixiv_id_1\image_id_2 http://imgur.com/link
pixiv_id_1\image_id_2 http://imgur.com/link2
pixiv_id_3\image_id_10 http://mediafire.com/link
pixiv_id_8\image_id_55 http://nicovideo.jp/link

@Nandaka
Copy link
Owner

Nandaka commented Oct 26, 2015

most likely, I will use the Downloaded_on_20xx_xx_xx.txt approach, and only for those description with link in it (e.g. //p[@class='caption']/a)

the content might look like this:

#image_id
url1
url2
url3
#next image id
url1
...

add sample page: http://www.pixiv.net/member_illust.php?mode=medium&illust_id=53117853

Nandaka added a commit that referenced this issue Oct 26, 2015
Implement Feature #95: dump url list to text file. Set
writeUrlInDescription = True to enable.
@Nandaka
Copy link
Owner

Nandaka commented Oct 26, 2015

@reyaz006
Copy link
Author

Thank you very much, @Nandaka
Seems to work fine. I can see the links are getting dumped into url_list_20151026.txt. Maybe better to use the %Y-%m-%d though, like in Downloaded_on__2015-10-26_.txt.

It's useful enough for me, as is. The other options that I think might be useful for someone else:

  • Domain blacklist or whitelist for urls (as described in my previous comment). I would prefer blacklist if this gets implemented.
  • An opion to change the name of the url dump, so it might be possible to force the application to use a single text file with custom name for all links, instead of 1 file per day.
  • Dump the whole description. Unlikely useful for many, as it may just fill the url dump with more useless data. Still, might be a viable addition to writeimageinfo.

@Nandaka
Copy link
Owner

Nandaka commented Nov 12, 2015

Try: https://github.com/Nandaka/PixivUtil2/releases/tag/v20151112

add 2 new option to blacklist the url using regex and setting the filename.

@reyaz006
Copy link
Author

Both settings work fine for me. Thanks for this improvement!

@NHOrus
Copy link
Contributor

NHOrus commented Nov 19, 2015

Question: How exactly it works? Will it dump links on every go-through? Or only on new images? Or only when you check old images with "alwayscheckimagesize" setting?

@Nandaka
Copy link
Owner

Nandaka commented Nov 20, 2015

@Nandaka Nandaka closed this as completed Feb 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants