Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links on web-archived pages to arquivo.pt/wayback should not be rewritten #703

Closed
dcgomes opened this issue Dec 17, 2019 · 9 comments
Closed

Comments

@dcgomes
Copy link
Collaborator

dcgomes commented Dec 17, 2019

Consider a page archived from the live-web that had links to web-archived pages:

When the web-page becomes web-archived:
https://arquivo.pt/wayback/20180919142909/https://memoriafcsh.wordpress.com/2017/08/01/centro-de-estudos-historicos-2000-2015/
"A primeira versão preservada com o endereço fcsh.unl.pt/ceh é de 2000 e não tem alterações até 2006."
links to:
https://arquivo.pt/wayback/20180919142909mp_/http://arquivo.pt/wayback/20000915125205/http://www.fcsh.unl.pt/ceh/
which originates a "Not Archived" message because the Replay system rewrites the URL to and adds the prefix "https://arquivo.pt/wayback/20180919142909mp_/" to redirect the link to Arquivo.pt.

Wayback should detect when the URLs in the links targeted web-archived pages, don't add the prefix and keep the original URLs in web-archived pages when they link to "https://arquivo.pt/wayback/*". In this example, the URL should be kept as in the original live-web page:
https://arquivo.pt/wayback/20000915125205/http://www.fcsh.unl.pt/ceh/

@igobranco
Copy link
Contributor

I think this could be done on Apache HTTPd site configuration.

@danielbicho
Copy link
Contributor

The server redirect doesn't work, weird things start to happen:
image

I don't understand the pywb rewritting rules. will postpone this

@danielbicho danielbicho modified the milestones: WebApp, Responsive Mar 19, 2020
@igobranco
Copy link
Contributor

Can you put the apache httpd configuration on here so we can analyze it.

@danielbicho
Copy link
Contributor

RewriteRule ^/wayback/.*/.arquivo.pt/wayback/(.)$ %{REQUEST_SCHEME}://%{SERVER_NAME}/wayback/$1 [PT]

It works if you open a new tab, or on an unframed context though

@danielbicho danielbicho removed this from the Responsive milestone Apr 22, 2020
@dcgomes dcgomes added this to the Caronte milestone Oct 15, 2020
@dcgomes dcgomes assigned vitgou and unassigned danielbicho Oct 15, 2020
@dcgomes
Copy link
Collaborator Author

dcgomes commented Oct 15, 2020

Check if newer version of PyWB fixes this issue.

@dcgomes
Copy link
Collaborator Author

dcgomes commented Oct 15, 2020

Exceptions to rewrite of URLs was asked to Ilya.

@dcgomes dcgomes modified the milestones: Caronte, Dionisius Jan 14, 2021
@vitgou vitgou added this to To do in Dionisius Mar 3, 2021
@dcgomes dcgomes modified the milestones: Dionisius, Eros Mar 18, 2021
@vitgou vitgou removed this from To do in Dionisius Mar 18, 2021
@vitgou vitgou added this to Review in Eros May 10, 2021
@vitgou vitgou moved this from Review to To Do in Eros Aug 31, 2021
@vitgou vitgou moved this from To Do to Review in Eros Aug 31, 2021
@dcgomes dcgomes removed this from the Eros milestone Nov 5, 2021
@dcgomes dcgomes added this to the Fortuna milestone Nov 5, 2021
@dcgomes
Copy link
Collaborator Author

dcgomes commented Nov 5, 2021

Re-evaluate after Eros deploy

@vitgou vitgou removed this from Review in Eros Nov 5, 2021
@vitgou vitgou added this to To Do in Fortuna Dec 3, 2021
@dcgomes dcgomes assigned VascoRatoFCCN and unassigned vitgou Mar 4, 2022
@VascoRatoFCCN
Copy link

VascoRatoFCCN commented Mar 4, 2022

I'll investigate implementing a fix on the replay page using pywb UI customization: no longer exists.

@arquivo-awp
Copy link

Test when we integrate new version of pywb.
@VascoRatoFCCN report issue to pywb project and close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Fortuna
To Do
Development

No branches or pull requests

6 participants