Architecture: Strip all Javascript from static html archives by default #237
Labels
size: hard
status: idea-phase
Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet
touches: configuration
touches: data/schema/architecture
why: functionality
Intended to improve ArchiveBox functionality or features
why: security
Intended to improve ArchiveBox security or data integrity
Type
What is the problem that your feature request solves
Some websites use javascript to redirect any saved pages to the original site, thereby beaking archiving of pages on the site in question.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
Ideally, the option to scan the javascript in each downloaded file to prevent setting
window.location
in any form.Since JS can be obfuscated in all sorts of forms, perphaps an option to simply strip out javascript from downloaded files could also be useful slash more reasonable to implement.
What hacks or alternative solutions have you tried to solve the problem?
Currently, the only real solution is to open up the offending HTML files myself and remove the javascript causing the redirects from the <script> tags.
How badly do you want this new feature?
The text was updated successfully, but these errors were encountered: