Investigate comic.pixiv.net #2607

Type-kun · 2016-06-07T19:01:28Z

This seems to be a relatively new website by pixiv which allows artists to use a better reader for their short series than regular pixiv provides. However, I don't know if there's an easy way to extract the images. Users can't really do that manually, for example one can't right-click the image and "open in a new tab", since the reader intercepts right clicks. That said, you can see the links when you monitor the downloaded resources, but that's too hardcore for most uploaders, and also, when opened on their own, those links don't work (probably referer-based protection), which is inconvenient.

Additionally, new images seem to be loaded with AJAX on demand, not when the page is loaded, which makes automated uploads difficult. The reader script probably can be reverse-engineered, at least it's not in flash or something, just javascript. Then, if that's possible, batch upload strategy for comic.pixiv.net would be great.

The site seems to automatically authenticate users which are logged in on the main pixiv.net website.

Example: https://comic.pixiv.net/viewer/stories/9869

Page 2:
https://img-comic.pximg.net/images/page/9869/V52hshKjl05juBvdbHJ5/2.jpg?20151030104009
Page 3:
https://img-comic.pximg.net/images/page/9869/4Dx6Cl2FiZtOkRRyUCJv/3.jpg?20151030104009

General pattern:
https://img-comic.pximg.net/images/page/<story_id>/<random_key>/<page_number>.jpg?<timestamp>

Keys seem to be consistent after reloading, so maybe they are permanently bound to images. Perhaps they are function of page and user session or profile, this will need some further checks checks. Either way, they seem to be completely random at a glance, so we can't grab all the pages just by knowing story id and page count.

Not sure if timestamps are mandatory or not, maybe they serve the same purpose as in pixiv, to indicate revisions.

Also, when loading the story page, meta name="viewer-api-url" tag with json info is included into the page html code. In this case, I saw /api/v1/viewer/stories/rXqsSATnBA/9869.json linked, and saw it loaded in the page resources inspector: the json file itself has links to all pages under data.pages. However, attempt to directly access to https://comic.pixiv.net/api/v1/viewer/stories/rXqsSATnBA/9869.json returns an error. Probably some sort of protection is in use, or pixiv api uses different authentication methods.

The text was updated successfully, but these errors were encountered:

SD-DAken · 2016-06-08T21:43:41Z

Page 2:
https://img-comic.pximg.net/images/page/9869/V52hshKjl05juBvdbHJ5/2.jpg?20151030104009
Page 3:
https://img-comic.pximg.net/images/page/9869/4Dx6Cl2FiZtOkRRyUCJv/3.jpg?20151030104009

I get exactly the same file URLs, so they seem to be static and not dependent on the user or session.

I've tested this a bit (with curl) and my results are as follows:

To access the images only the referrer has to be set correctly like e.g.
curl "https://img-comic.pximg.net/images/page/9869/V52hshKjl05juBvdbHJ5/2.jpg?20151030104009" -H "Referer: https://comic.pixiv.net/viewer/stories/9869" > test.jpg

The json file is quite a bit trickier, the request must appear like a "XMLHttpRequest" and the correct session cookies must be set:
curl "https://comic.pixiv.net/api/v1/viewer/stories/93WrK4ZKrs/9869.json" -H "Host: comic.pixiv.net" -H "X-Requested-With: XMLHttpRequest" -H "Cookie: PHPSESSID=<pixiv_session>; _pixiv-comic_session=<comic_session>; "

where:
<pixiv_session>: Your PHP session key created on login to pixiv
<comic_session>: The session key for the comics page, automatically created on first visit

Also note: The json URL apparently depends on the (comic?) session.

SD-DAken · 2016-06-08T22:56:49Z

<pixiv_session>: Your PHP session key created on login to pixiv
<comic_session>: The session key for the comics page, automatically created on first visit

Copying these values from one's browser works, but isn't an option for Danbooru.

Trying to get the server to correctly initialize the (comic) session computationally seems to be really difficult on the other hand (or I'm missing something obvious).

r888888888 · 2016-06-08T23:36:04Z

Danbooru already uses pixiv session tokens that it gets by manually logging into the site. I assume the comic session token would be the same. If the comic session is returned by logging into the comic site, then it's just a matter of storing the token locally and reusing it.

I've noticed Pixiv has been switching up their login options recently though and the newer JS one was tricky for me to decipher. It seems to rely on a CAPTCHA or some sort of key verification and is therefore difficult to automate.

Type-kun · 2016-06-09T06:18:27Z

Danbooru already uses pixiv session tokens that it gets by manually logging into the site. I assume the comic session token would be the same. If the comic session is returned by logging into the comic site, then it's just a matter of storing the token locally and reusing it.

Looks like it. If I delete the _pixiv-comic_session cookie, disable javascript and reload the page, cookie is still there. It comes in Set-cookie header, so it should be reusable just like regular sessionID. Pixiv.net cookies should most likely be passed with the request.

I guess, that's pretty much it. So the steps are as follows:

When https://comic.pixiv.net/viewer/stories/<story_id> is opened with batch upload bookmarklet:

get the page HTML, passing the pixiv.net cookies in the request
store the comic.pixiv.net cookies we get back from the request
Search the html for <meta name="viewer-api-url" content="...", and get the json url from content attribute
load the json using both pixiv.net and comic.pixiv.net cookies, along with "X-Requested-With: XMLHttpRequest" header.
Search json for data.contents.pages array. In there, each element can have right and left object, both contain image url in data.url. This gives us all the image urls for batch upload page.

When image from img-comic.pximg.net/images/page/<story_id> is opened in upload page:

Pass it through image proxy, like with pixiv images
Supply https://comic.pixiv.net/viewer/stories/<story_id> as referer

SD-DAken · 2016-06-11T15:37:19Z

Looks like it. If I delete the _pixiv-comic_session cookie, disable javascript and reload the page, cookie is still there. It comes in Set-cookie header, so it should be reusable just like regular sessionID. Pixiv.net cookies should most likely be passed with the request.

The cookie itself is set via a set-cookie header, that's right. But in my tests this isn't enough to completely initialize a working session with the server.

If I copy both PHPSESSIDand _pixiv-comic_session from my browser and request https://comic.pixiv.net/viewer/stories/9869 via curl I get an html document back that contains:

<meta name="app-token" content="" />
<meta name="token-api-url" content="" />
<meta name="viewer-api-url" content="/api/v1/viewer/stories/8buf0QJJPk/9869.json" />
<meta name="works-info-api-url" content="" />

But if I only copy PHPSESSIDfrom my browser and use the _pixiv-comic_session returned in the set-cookie header I get an html document containing:

<meta name="app-token" content="88c4e9bf9041d3d4ddc9791346598891" />
<meta name="token-api-url" content="/api/v1/viewer/token/88c4e9bf9041d3d4ddc9791346598891.json" />
<meta name="viewer-api-url" content="" />
<meta name="works-info-api-url" content="" />

So there seems to be still one step missing to make this work completely.

SD-DAken · 2016-06-11T16:02:51Z

Investigating this some more (this time via the browser's developer tools) it seems that the /api/v1/viewer/token/<random_value>.json file contains something like:
{"error":null,"data":{"token":"IS15ZcVeGv"}}

This is exactly the comic-session-specific value appearing in the /api/v1/viewer/stories/<token>/<story-id>.json URL.

So (at least on the first request of the comic session) viewer-api-url is not set and token-api-url has to be requested instead to get the required token.

SD-DAken · 2016-06-11T20:36:44Z

This was actually quite difficult to figure out. The steps below seem to work.

Prerequisite: The PHPSESSID value is known. (Since Danbooru already logs in to pixiv this should be the case).

curl "https://comic.pixiv.net/viewer/stories/9869" -H "Host: comic.pixiv.net" -H "Cookie: PHPSESSID=<pixiv_session>;" -D -

The server responds with Set-Cookie: _pixiv-comic_session=<comic_session>, send the request again with that cookie added:

curl "https://comic.pixiv.net/viewer/stories/9869" -H "Host: comic.pixiv.net" -H "Cookie: PHPSESSID=<pixiv_session>; _pixiv-comic_session=<comic_session>;" -D -

The server responds with Set-Cookie: is_browser=yes and an html document, from which two values are needed:
<meta content="<csrf_token>" name="csrf-token" />
<meta name="token-api-url" content="/api/v1/viewer/token/<random>.json" />
Request the token json file with these parameters filled in (This must be a POST request, hence the Content-Length: 0 and --data "" below). This request also apparently must be made only a short moment after the previous, else it fails (Tricky when doing it by hand, shouldn't be a problem when done by a script).

curl "https://comic.pixiv.net/api/v1/viewer/token/<random>.json" -H "Host: comic.pixiv.net" -H "X-CSRF-Token: <csrf_token>" -H "X-Requested-With: XMLHttpRequest" -H "Cookie: PHPSESSID=<pixiv_session>; _pixiv-comic_session=<comic_session>; is_browser=yes;" -H "Content-Length: 0" --data "" -D -

A json file that looks like {"error":null,"data":{"token":"<comic_session_token>"}} is returned.
Fill this in at the right position...

curl "https://comic.pixiv.net/api/v1/viewer/stories/<comic_session_token>/9869.json" -H "Host: comic.pixiv.net" -H "X-Requested-With: XMLHttpRequest" -H "Cookie: PHPSESSID=<pixiv_session>; _pixiv-comic_session=<comic_session>; is_browser=yes;"

... and the json file (file structure already explained by Type-kun above) with the image urls is returned, now the images can be downloaded:

curl "https://img-comic.pximg.net/images/page/9869/gB6a6Hf9VhHwHUBc1TYY/8.jpg?20151030104009" -H "Referer: https://comic.pixiv.net/viewer/stories/9869" > test.jpg

SD-DAken · 2016-06-11T20:53:33Z

All subsequent requests ~~can~~ must (as long as the comic session stays active) directly use the viewer-api-url value.

r888888888 · 2016-06-14T18:59:57Z

commit to store the comic session id in aa77ba3

r888888888 · 2016-08-28T02:25:15Z

I'm not sure how much demand for this there is but it would be a fair amount of work I think. I assume it would work like the batch bookmarklet where all the posts in a comic can be uploaded. Automatically creating and tagging a pool could be handled, too.

Type-kun · 2016-08-28T07:31:24Z

I'm not sure how much demand for this there is but it would be a fair amount of work I think. I assume it would work like the batch bookmarklet where all the posts in a comic can be uploaded. Automatically creating and tagging a pool could be handled, too.

I expected it to work directly through batch upload bookmarklet. There's no need for new scheme, it fits into current perfectly. I'll try to describe the steps again.

When bookmarklet is used on https://comic.pixiv.net/viewer/stories/9869:

Get HTML https://comic.pixiv.net/viewer/stories/9869 passing pixiv session as PHPSESSID and comic session as _pixiv-comic_session cookies. We seem to already store both.
Parse the resulting HTML for meta viewer-api-url, token-api-url and csrf-token.
1. If viewer-api-url is not empty, use it.
2. If viewer-api-url is empty and token-api-url is not empty:
  - Perform an empty POST request on link specified in token-api-url. Additional headers: X-CSRF-Token: <csrf_token>; X-Requested-With: XMLHttpRequest. Cookies: PHPSESSID and _pixiv-comic_session from before, and also is_browser=yes;
  - Parse the resulting JSON and retrieve data.token value
  - Use that value (<token>) to construct viewer-api-url : https://comic.pixiv.net/api/v1/viewer/stories/<token>/9869.json. "9869" can be retrieved from the initial URL passed to the bookmarklet, it's a story ID.
Perform a GET query on viewer-api-url. Additional headers: X-Requested-With: XMLHttpRequest. Cookies: PHPSESSID and _pixiv-comic_session from before, and also is_browser=yes;
Parse the resulting JSON. Search for data.contents.pages array. Each element in that array can have right and left object. Get data.url from each of those objects. It is the direct image URL that can be displayed at batch upload page as Image 0 and so on.

It takes up to 3 requests, but still fits into the current model. Direct image URLs require passing through image proxy, like regular pixiv images. The rule is: when https://img-comic.pximg.net/images/page/<story_id>/... is passed to image proxy, use https://comic.pixiv.net/viewer/stories/<story_id> as referrer.

I'm also not sure how much demand there is, but there's almost no other way to upload from comic.pixiv.net, because image urls are hidden and require digging through page source code or even inspecting network traffic.

nonamethanks · 2018-09-19T14:33:33Z

@r888888888 this should probably be reopened, as there's been several people who have requested for this recently (most recently for https://comic.pixiv.net/works/5083).

stale · 2019-07-29T21:14:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-10-28T05:26:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Type-kun added the Enhance label Jun 7, 2016

r888888888 closed this as completed Sep 19, 2018

r888888888 reopened this Sep 19, 2018

r888888888 removed the Enhance label Mar 10, 2019

stale bot added the stale label Jul 29, 2019

evazion added Low Priority Valid but not high importance and removed stale labels Jul 30, 2019

stale bot added the stale label Oct 28, 2019

evazion removed the stale label Oct 28, 2019

nonamethanks added the Feature label Aug 12, 2020

nonamethanks added Source Support Upload/Source support and removed Feature Low Priority Valid but not high importance labels Feb 1, 2023

evazion closed this as completed in a4d0e9e May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate comic.pixiv.net #2607

Investigate comic.pixiv.net #2607

Type-kun commented Jun 7, 2016

SD-DAken commented Jun 8, 2016

SD-DAken commented Jun 8, 2016

r888888888 commented Jun 8, 2016

Type-kun commented Jun 9, 2016

SD-DAken commented Jun 11, 2016

SD-DAken commented Jun 11, 2016

SD-DAken commented Jun 11, 2016

SD-DAken commented Jun 11, 2016

r888888888 commented Jun 14, 2016

r888888888 commented Aug 28, 2016

Type-kun commented Aug 28, 2016

nonamethanks commented Sep 19, 2018

stale bot commented Jul 29, 2019

stale bot commented Oct 28, 2019

Investigate comic.pixiv.net #2607

Investigate comic.pixiv.net #2607

Comments

Type-kun commented Jun 7, 2016

SD-DAken commented Jun 8, 2016

SD-DAken commented Jun 8, 2016

r888888888 commented Jun 8, 2016

Type-kun commented Jun 9, 2016

SD-DAken commented Jun 11, 2016

SD-DAken commented Jun 11, 2016

SD-DAken commented Jun 11, 2016

SD-DAken commented Jun 11, 2016

r888888888 commented Jun 14, 2016

r888888888 commented Aug 28, 2016

Type-kun commented Aug 28, 2016

nonamethanks commented Sep 19, 2018

stale bot commented Jul 29, 2019

stale bot commented Oct 28, 2019