Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the message body to the pages tab in replayweb #247

Open
peterchanws opened this issue Mar 5, 2024 · 1 comment
Open

Add the message body to the pages tab in replayweb #247

peterchanws opened this issue Mar 5, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@peterchanws
Copy link

Is your feature request related to a problem? Please describe.
Upon opening a WARC file of an archived email message in Archive Web.Page, I found the "Pages" tab to be empty, despite there being numerous (sometimes hundreds) of URLs listed under the URLs tab. To locate the email body, users are required to filter through by searching for "HTML" that contains "mailto." This process can be challenging for many users.

Describe the solution you'd like
Incorporating the message body directly into the pages tab on ReplayWeb would be beneficial, considering the complexity of navigating through WARC files.

Describe alternatives you've considered
Clarify in the user documentation that to access the email body, users should apply a filter by searching for "HTML" that includes "mailto."

Additional context
Add any other context or screenshots about the feature request here.

@peterchanws peterchanws added the enhancement New feature or request label Mar 5, 2024
@edsu
Copy link

edsu commented Mar 5, 2024

I think there are (at least) two approaches to this:

  1. mailbagit could use py-wacz to bundle the WARC files in a WACZ file, and then add each mailto: URI to the enclosed pages.jsonl file.
  2. indicate that a particular WARC record is a "seed" and then see if replay tools like ArchiveWeb.page could support it.

1 seems doable within the context of mailbagit, but perhaps 2 is the more elegant solution? This might be a relevant issue to track for how to indicate a WARC record is for a seed: iipc/warc-specifications#96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants