Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a Snapshot API is down #21

Open
aylusltd opened this issue Aug 3, 2020 · 13 comments
Open

Creating a Snapshot API is down #21

aylusltd opened this issue Aug 3, 2020 · 13 comments

Comments

@aylusltd
Copy link

aylusltd commented Aug 3, 2020

https://archive.readme.io/docs/creating-a-snapshot

Example returns 502 bad gateway error. Has been down for several days.

@mekarpeles
Copy link
Member

Hi @aylusltd thank you for the heads up. I should have time to debug this weekend (we've had a lot going on)

In the meantime you may wish to use the official Wayback SPN 2.0 (save page now)
https://help.archive.org/hc/en-us/articles/360001513491-Save-Pages-in-the-Wayback-Machine#:~:text=Install%20the%20Wayback%20Machine%20Chrome,give%20you%20a%20permanent%20URL.

@aylusltd
Copy link
Author

aylusltd commented Aug 3, 2020

@mekarpeles , I can't actually use the extension. We use the API form a lambda to get back the permanent url and queue it up for human annotation.

I guess we could try to automate that, but seems like it'd be easier for us just to read the extension code and match its API invocation.

@mekarpeles
Copy link
Member

Should be back up, I think :)
The annotations for this labs is fairly experimental (and not maintained by IA). If anyone is relying on this for mission critical purposes (i.e. you'd be upset if annotations db disappeared) I suggest running this service locally or submitting a PR for a simple docker setup.

This code doesn't require any privileged setup to run and installation is pretty simple:
https://github.com/ArchiveLabs/pragma.archivelab.org#installation

@mekarpeles
Copy link
Member

#21 (comment)

@jerclarke
Copy link

jerclarke commented Aug 5, 2020

I just tried the example from https://archive.readme.io/docs/creating-a-snapshot and got a 502 error.

Maybe it's just the example that's the problem, but figured I'd mention it.

What I really want is the old system where a call to http://web.archive.org/save/http://google.com would create a cache of the URL, and return a Content-Location header with the /web/* path of the archive, but I think I'm probably in the wrong place for that. That "trick" stopped working July 10 (bug report on the AmberLink project).

@mekarpeles
Copy link
Member

If https://pragma.archivelab.org ever throws a 502, that means the service is down. Maybe it's getting DDOS'd? I just restarted the service and it looks like it's currently working.

Sincere apologies for playing close/open tag on this issue! @jerclarke or @aylusltd -- please feel free to close if this resolves your issue :) I'll defer to you two and be on standby if I can help further.

@Towito
Copy link

Towito commented Apr 15, 2021

Hi, I've been trying to use this API and it appears to be down. The GET snapshot API appears to be fine, but POST requests do not appear to be working. I'm hoping to go through multiple web pages on a somewhat regular basis, so the SPN function is not particularly useful to me. Additionally, the size of webpages is far too small to justify an Archive-IT solution. Has this service been deprecated or something?

@mackuba
Copy link

mackuba commented Apr 28, 2021

Yeah, not working for me either…

@SemjonWilke
Copy link

Still 502

@mekarpeles mekarpeles reopened this Oct 7, 2021
@mekarpeles
Copy link
Member

Here's the deal --

pragma.archivelab.org was intended to be a system for saving Wayback snapshots with annotations attached. Very few snapshots have had what I'd consider descriptive, meaningful, or specific annotations attached. Leading me to believe most people just want the use the Wayback API (of which there is one).

The problem here is, I now have a database of 17,000,000 urls that y'all have archived and postgres can't keep up.

So, the course of action I'm planning is that this code be changed to just proxy using the Wayback API and not write to the database at all, unless an annotation is added.

Given the current performance characteristics and use -- which I had not intended / planned on, given that I wrote this while I was a volunteer not even working for the Internet Archive -- I have to now consider what to do with all of this data...

@mekarpeles
Copy link
Member

Ok, I added this capability:
https://github.com/ArchiveLabs/pragma.archivelab.org/blob/master/README.md#performing-a-capture

curl -X POST -H "Content-Type: application/json" -d '{"url": "https://google.com"}}' https://pragma.archivelab.org/capture

@Pandaklez
Copy link

It still throws 502 error :(

@mekarpeles mekarpeles reopened this Jun 21, 2022
@ronthepennyhoarder
Copy link

Getting a 502 error as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants