Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially unhandled rejection - Error: read ECONNRESET #155

Open
sedimentation-fault opened this issue Apr 17, 2017 · 3 comments
Open

Comments

@sedimentation-fault
Copy link

At 7% of downloading all 24818 papers of the math.DG category of arxiv.org with:

getpapers --api 'arxiv' --query 'cat:math.DG' --outdir arxiv/math.DG -p

I got:

Potentially unhandled rejection [2] Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TLSWrap.onread (net.js:569:26)
Potentially unhandled rejection [4] Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TLSWrap.onread (net.js:569:26)
Potentially unhandled rejection [6] Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TLSWrap.onread (net.js:569:26)
Potentially unhandled rejection [8] Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TLSWrap.onread (net.js:569:26)
Potentially unhandled rejection [10] Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TLSWrap.onread (net.js:569:26)

The same error occurs after rerunning the same command as above,
only at a different place (4% (1019/24818))...and it happens again and again, always at different places.

Reason

It seems that the ubiquitous

Connection reset by peer

error is left unhandled by getpapers. This is a too common (I would almost say normal) error to be thrown unhandled at the user.

Solution

Do something, maybe along the lines of
https://github.com/Vexera/retry-stream/blob/master/index.js

@sedimentation-fault
Copy link
Author

sedimentation-fault commented Apr 17, 2017

To go past this error, one may try to execute curl (or a wrapper to curl that handles HTTP errors gracefully) directly, as shown in my workaround at #152.

EDIT: For the current version of my curl (resp. curl wrapper) workaround, see #157.

It works: you will go past this error - only to encounter a plethora of new ones...

@tarrow
Copy link
Contributor

tarrow commented Apr 17, 2017

I'll have a look and see if I can replicate this.

Is this occurring during the metadata or pdf download stage? To be honest little work has been done recently on arxiv (due to lack of interest) but I'd be keen to make it work nicely for you.

Slightly cleaner error handling would be nice; unfortunately in the end "connection reset by peer" is the best we can do though. (but without a stack trace and after having retried a few times)

In the past I've found you might want to check your wireless card isn't falling asleep. Alternatively if you're using our virtual machine image there seems to be a bug in the virtualbox drivers on some platforms that causes this.

@sedimentation-fault
Copy link
Author

This occurs during pdf download. No wifi or virtual images in use here.

Connection reset is quite a common error. Start hammering any web server and, after a few hundred "200 OK"'s, I bet you'll get a "connection reset by peer" error. arxiv even seems to be quite robust in doing so only after the first thousand downloads...

You should be able to replicate the error by using my example with the math.DG category and its 24000+ papers.

Note that this error prevents me from going past the first 1000-1500 papers. The

if (err) throw err;

line in the downloadURL function of download.js causes getpapers to stop as soon as it encounters it. Retrying is futile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants