Daniel Gomes edited this page Apr 23, 2018 · 15 revisions

robustify.arquivo.pt

robustify.js is a javascript that attempts to fight link rot or content drift with an implementation of Herbert Van de Sompel's Memento Robust Links - Link Decoration specification, in context of the Hiberlink project.

robustify.js will make any clicked hyperlink test if the linked page is available online. If it is not, it will redirect the user to a web archive, by default using the Memento Time Travel service.

This repository is a fork from René Voorburg robustify.js. It provides a customized version named robustifyArquivoPT.js that use the Arquivo.pt Robustify Service to verify the status code of a URL and retrieving an archived version of the URL from the Arquivo.pt infrastructure.

Websites using the robustify.js with Arquivo.pt infrastrutucture:

How to use ?

To robustify an web page the following code snippet has to be inserted at the bottom of the body element of the web page:

<body>
    ...
    ...
    <!-- Code to redirect broken links to web-archived versions-->
    <script src="http://robustify.arquivo.pt/robustifyArquivoPT.js"></script>
    <script>
        robustify({});
    </script>
    <!-- End -->
</body>

How robustify.arquivo.pt works ?

Robustify diagram

1. Embedded script

  • The workflow starts by integrating the following script into the HTML page.
<body>
    ...
    ...
    <!-- Code to redirect broken links to web-archived versions-->
    <script src="http://robustify.arquivo.pt/robustifyArquivoPT.js"></script>
    <script>
        robustify({});
    </script>
    <!-- End -->
</body>

2. Trigger the onclick event

When the HTML page is loaded by the browser an onclick event is added to all the anchors (href attribute). At each click on a link is called the robustLink function of the service robustifyArquivoPT.js.

3. Process anchor

Process the link, tests if given link is available by calling a JSON service, robustify.arquivo.pt/statuscodeArquivoPT.php, resulting object is presented to callback.

4.Call the service

Calls the service statuscodeArquivoPT.php with the following parameters:

  • link: the url to be tested
  • uA: the user agent of the client page
  • origin: current page location

5. Process URL and 6. Request to link

The JSON service, implemented in PHP, starts by making a GET request to the link being tested. To do this, it builds the request headers based on the information provided, such as the user agent and reference. After sending the request and obtaining the response, extract the status code from the header of the response. Analyzes the status code, if status code is 403 or 405 repeats the request, changing the order type to HEAD. Otherwise, it analyzes the location field of the response header, if it’s not empty (redirect), it follows the next link up to a depth of 5 levels. Finally, it returns the status codes of all attempts to access the various links.

If the service receives as a GET parameter, the value soft404detect indicates that we want to parse the 404 soft. For this, the script generates a random link path that returns 404, and generates a hash with that page. It then generates hash of the final page found in the previous step, if they are the same the script assumes that it is soft 404. It should be noted that to enable being functionality has to be sent the Get soft404detect parameter.

7. Redirect

After receiving the response from the service, it is analyzed whether the status code of the response is a 200. If so, a new url is constructed to which the page will be redirected based on the configured settings, which in our case are:

"archive" : "http://arquivo.pt/wayback/{yyyymmddhhmmss}/{url}"

Where yyyymmddhhmmss will be replaced by the current* date and url by the url tested. Before being redirected a popup with the text that is configured in the settings is shown:

"offlineToVersionurl" : "Página não encontrada será redirecionado para uma cópia preservada pelo Arquivo.pt."

If the answer is a 200, be redirected to the url tested.

*If the anchor tag contains the data-versiondate attribute, this value will be assumed. Otherwise, the current date is considered.

Example of request to statuscodeArquivoPT.php

http://robustify.arquivo.pt/statuscodeArquivoPT.php?url=http://www.publico.pt&origin=http://www.google.pt&uA:Mozilla/5.0%20(Windows%20NT%206.1;%20Win64;%20x64;%20rv:47.0)%20Gecko/20100101%20Firefox/47.0

Response elements

{"request":"http:\/\/www.publico.pt","headers":[{"statuscode":200}]}

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.