New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML archive format and implementation #5
Comments
We now move the viewer functionality to another addon Web Archive Viewer. Support for Firefox is implemented. |
We decided to merge the viewer functionality back. (The decision to split the function before was because it had been seemed impossible to implement some functionality in Firefox for Android, but finally those issues are solved and WebScrapBook can basically work on Firefox for Android now) |
Calibre (ebook-viewer.exe, particularly) supports zipped HTML files and uses .htmlz extension. Renamed .htz files work fine, with minor issues. Could you allow to use Calibre's extension, at least optionally? |
Thank you for the information. In a quick glance it seems that .htmlz is an archive for a ebook and its purpose could be different from .htz. We need a further investigation sometime. By the way, could you be more specific about what are the "minor issues" you met? This could help us identify them. |
CSS mostly. For instance, https://habrahabr.ru/post/342344/ misses third-party fonts. |
This is a very old issue. As there's no promising archive format found, we decide to stick to HTZ, MAFF, and single HTML currently. Conversion between these formats and other formats like MHT, .archive, .warc, etc., may be implemented in PyWebScrapBook in the future, though. |
In version 0.3.0 we use
.htz
extension as the zipped package of a captured web page. We also implemented a viewer that can directly load a .htz file in the browser, including the direct method (directly open the htz file in Chrome; the user must check "allow file url access") and the indirect method (open the viewer from the toolbar dropdown list and then pick an .htz file). Unfortunately, the viewer requires therequestFileSystem
API, which is currently not available in Firefox.Is there other way to implement the
.htz
viewer using currently available WebExtension APIs in Firefox? One idea may be the technique that EPUBReader use, but it seems that its source code is obfuscated and is not available for us to study.We currently consider the best cross-platform HTML archive format to be zip based. Besides current
.htz
one, another zip based approach isMAFF
that MAF addon uses, which, unfortunately, seems to be unmaintained. Besides zips, there are still many types of HTML archive, such as:.mhtml
,.warc
,.webarchive
, or so. Is there other recommended format?The text was updated successfully, but these errors were encountered: