-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate comic.pixiv.net #2607
Comments
I get exactly the same file URLs, so they seem to be static and not dependent on the user or session. I've tested this a bit (with curl) and my results are as follows: To access the images only the referrer has to be set correctly like e.g. The json file is quite a bit trickier, the request must appear like a "XMLHttpRequest" and the correct session cookies must be set: where: Also note: The json URL apparently depends on the (comic?) session. |
Copying these values from one's browser works, but isn't an option for Danbooru. Trying to get the server to correctly initialize the (comic) session computationally seems to be really difficult on the other hand (or I'm missing something obvious). |
Danbooru already uses pixiv session tokens that it gets by manually logging into the site. I assume the comic session token would be the same. If the comic session is returned by logging into the comic site, then it's just a matter of storing the token locally and reusing it. I've noticed Pixiv has been switching up their login options recently though and the newer JS one was tricky for me to decipher. It seems to rely on a CAPTCHA or some sort of key verification and is therefore difficult to automate. |
Looks like it. If I delete the I guess, that's pretty much it. So the steps are as follows: When
When image from
|
The cookie itself is set via a If I copy both
But if I only copy
So there seems to be still one step missing to make this work completely. |
Investigating this some more (this time via the browser's developer tools) it seems that the This is exactly the comic-session-specific value appearing in the So (at least on the first request of the comic session) |
This was actually quite difficult to figure out. The steps below seem to work. Prerequisite: The
The server responds with
The server responds with
A json file that looks like
... and the json file (file structure already explained by Type-kun above) with the image urls is returned, now the images can be downloaded:
|
All subsequent requests |
commit to store the comic session id in aa77ba3 |
I'm not sure how much demand for this there is but it would be a fair amount of work I think. I assume it would work like the batch bookmarklet where all the posts in a comic can be uploaded. Automatically creating and tagging a pool could be handled, too. |
I expected it to work directly through batch upload bookmarklet. There's no need for new scheme, it fits into current perfectly. I'll try to describe the steps again. When bookmarklet is used on https://comic.pixiv.net/viewer/stories/9869:
It takes up to 3 requests, but still fits into the current model. Direct image URLs require passing through image proxy, like regular pixiv images. The rule is: when I'm also not sure how much demand there is, but there's almost no other way to upload from comic.pixiv.net, because image urls are hidden and require digging through page source code or even inspecting network traffic. |
@r888888888 this should probably be reopened, as there's been several people who have requested for this recently (most recently for https://comic.pixiv.net/works/5083). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This seems to be a relatively new website by pixiv which allows artists to use a better reader for their short series than regular pixiv provides. However, I don't know if there's an easy way to extract the images. Users can't really do that manually, for example one can't right-click the image and "open in a new tab", since the reader intercepts right clicks. That said, you can see the links when you monitor the downloaded resources, but that's too hardcore for most uploaders, and also, when opened on their own, those links don't work (probably referer-based protection), which is inconvenient.
Additionally, new images seem to be loaded with AJAX on demand, not when the page is loaded, which makes automated uploads difficult. The reader script probably can be reverse-engineered, at least it's not in flash or something, just javascript. Then, if that's possible, batch upload strategy for comic.pixiv.net would be great.
The site seems to automatically authenticate users which are logged in on the main pixiv.net website.
Example: https://comic.pixiv.net/viewer/stories/9869
Page 2:
https://img-comic.pximg.net/images/page/9869/V52hshKjl05juBvdbHJ5/2.jpg?20151030104009
Page 3:
https://img-comic.pximg.net/images/page/9869/4Dx6Cl2FiZtOkRRyUCJv/3.jpg?20151030104009
General pattern:
https://img-comic.pximg.net/images/page/<story_id>/<random_key>/<page_number>.jpg?<timestamp>
Keys seem to be consistent after reloading, so maybe they are permanently bound to images. Perhaps they are function of page and user session or profile, this will need some further checks checks. Either way, they seem to be completely random at a glance, so we can't grab all the pages just by knowing story id and page count.
Not sure if timestamps are mandatory or not, maybe they serve the same purpose as in pixiv, to indicate revisions.
Also, when loading the story page,
meta name="viewer-api-url"
tag with json info is included into the page html code. In this case, I saw/api/v1/viewer/stories/rXqsSATnBA/9869.json
linked, and saw it loaded in the page resources inspector: the json file itself has links to all pages underdata.pages
. However, attempt to directly access to https://comic.pixiv.net/api/v1/viewer/stories/rXqsSATnBA/9869.json returns an error. Probably some sort of protection is in use, or pixiv api uses different authentication methods.The text was updated successfully, but these errors were encountered: