Skip to content
Daniel Gomes edited this page Jun 6, 2024 · 72 revisions

APIs specific to Arquivo.pt that enable the full exploration of its functions

APIs based on international standards to enable interoperability among web archives and code reuse

API usage limits

Each API has the following usage limits (thresholds), please check if you are exceeding these limits if you start receiving the HTTP response status Error 429 too many requests:

Learn more about APIs

Bulk download of web-archived resources

If you need to download a large amount of web-archived resources, such as all the URLs archived from a large website along time, we suggest the following methodology:

  1. Analyse the Arquivo.pt collections so that you may choose those which may contain the most interesting web-archived data for your use case. If you have any doubt, contact us.

  2. Download the CDXJ index files, (what is CDXJ?) of the Arquivo.pt collections you selected to process. For this purpose, analyse the "column A: Collection ID" and the corresponding CDXJ index files on "column H: Collection CDXJ File");

  3. Create a list of selected URLs to be downloaded, extracted from the CDXJ index files (e.g. using Linux grep command);

  4. Download the web-archived resources for the list of selected URLs from Arquivo.pt by using the above APIs or, by building links to directly access the web-archived resources. These links are available on the Technical details of the Options top-right menu when accessing a web-archived page. For instance, for the URL http://publico.pt/ with timestamp 20120201160355 extracted from the CDXJ index file, build the following links to download the:

Access endpoints usage limits

  • original file of the web-archived page/web-archived page without the Arquivo.pt UI frame (endpoint https://arquivo.pt/noFrame/replay/): 4437 requests/minute.
    • If the client exceeds this limit, it will receive an error "HTTP 429 Too many requests" and should decrease its download rate.

Learn more about bulk download

Contact us

If you have any trouble using our APIs, please contact us so that we can try to help you.

Short link to this page: arquivo.pt/api