Skip to content

Conversation

@tisonkun
Copy link
Member

@tisonkun tisonkun commented Mar 21, 2025

Split from #465

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun requested review from bproffitt and sebbASF March 21, 2025 06:29
Copy link
Contributor

@sebbASF sebbASF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to message links are unnecessary, because there are redirects in place.

I'm not convinced that replacing the links is the right thing to do here. Other eyes are needed on this.

@tisonkun
Copy link
Member Author

The changes to message links are unnecessary, because there are redirects in place.

  • www.mail-archive.com is under a different domain name so I'd prefer to use lists.a.o and the UI is different also.
  • mail-archives.apache.org is currently redirected to lists.a.o so I'd consider the the former one outdated and perhaps deprecated

@sebbASF
Copy link
Contributor

sebbASF commented Mar 21, 2025

I agree that any mail-archive.com references should be replaced. However, it's not clear to me that the replacements are correct: how did you arrive at them?

==

Whilst mail-archives.apache.org links redirect to lists.a.o, the resulting Permalink is completely opaque, and unique to that host.

The existing links can readily be found in the ASF-hosted mbox files, or indeed in other 3rd party archive that allows search by message-id and list name.

That is not the case for the lists.a.o links. The only way to find such a link is to first use lists.a.o to determine the list id and message-id.

That's not to say that existing l.a.o links should be reverted to mail-archives.a.o. But I don't think we should replace these existing ones, thus losing direct access to the mail coordinates.

@tisonkun
Copy link
Member Author

how did you arrive at them

  1. Check the mailing list, date, title
  2. Go to lists.a.o, find the thread
  3. Check the content to ensure they're the same

Since it's a limited group of links, I check it manually.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 21, 2025

Regarding the converted mail-archive.com links: they include what appear to be message numbers. However, these don't agree with the numbers in the return path.

I have found a commit which shows an earlier set of references:

svn diff https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/foundation/how-it-works.mdtext -c 202493
Index: how-it-works.xml
===================================================================
--- how-it-works.xml	(revision 202492)
+++ how-it-works.xml	(revision 202493)
@@ -218,9 +218,9 @@
 <p>The 
 <a href="bylaws.html">ASF Bylaws</a> (section 6.3) define a PMC.
 Some other emails help to clarify:
-<a href="http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=community@apache.org&amp;msgNo=3944">here</a>
+<a href="http://www.mail-archive.com/community@apache.org/msg03961.html">here</a>
 and
-<a href="http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=community@apache.org&amp;msgNo=3900">here</a>.
+<a href="http://www.mail-archive.com/community@apache.org/msg04005.html">here</a>.
 </p>
 
 <p>The board has the faculty to terminate a PMC at any time by resolution.</p> 

These are a different set of numbers. It seems likely that neither message numer relates to the Return Path number.
However they do agree in that the difference is 44 in both cases, as is the difference in the lists.a.o Return Path numbers.

If one of the lists.a.o links is correct, the other probably is too, but if one is wrong, the other is likely wrong as well.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 21, 2025

FTR, the links were originally added here:

svn diff https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/foundation/how-it-works.mdtext -c 126202
With comment:
r126202 | crossley | 2005-01-23 07:21:27 +0000 (Sun, 23 Jan 2005) | 3 lines

Link to ASF Bylaws.
Link to two very helpful emails from Greg and Dirk which clarify the PMC and chair.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 22, 2025

When I asked how you managed to translate the mail-archive.com links, the URLs were not working, so I assumed you must have somehow found the messages from the information in the URL itself. The links are now working, and I agree that the translations are correct.
Sorry about that. But it does show that references that only mean something to the particular archive site are not much use if the site is not available.

@tisonkun
Copy link
Member Author

only mean something to the particular archive site are not much use if the site is not available.

Yeah. Could you share some information where the mbox infra is held and how it works? I agree that lists.a.o is somewhat vendor-specific and we may not want to rely on it. But currently, we redirect all the mbox links to lists.a.o so I'm a bit confused.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 22, 2025

http://mail-archives.apache.org/mod_mbox/ no longer exists as an archive.
(There is a separate ASF archive, but there is no public access)

The host now just redirects URLs to lists.a.o.
It can do this, because the URLs contain the list name, date and message id (rather than an opaque id).
The URL is translated into a search on lists.a.o which then redirects to its own unique reference.

This mostly works, but there are some caveats to this conversion.

Message ids are supposed to be unique, but there are some duplicates, sometimes even in the same month.
Lists.a.o currently returns the first match on the list for the message id; that may not correspond with the original.

Also the search currently ignores the date, but it might not be all that useful, because they don't agree on dates:

  • mod_mbox stored messages by arrival date (though the changeover sometimes happened late)
  • lists.a.o stores messages by the Date: field in the message, which may be very different from the arrival date (user clocks may be wildly inaccurate, and messages may be delayed several days by moderation).

I agree it is OK to replace links such as http://mail-archives.apache.org/mod_mbox/www-legal-discuss/ with the equivalent lists.a.o link, because no information is lost.

However replacing specific links loses what could be important information.

, and new
[exclusions](http://mail-archives.apache.org/mod_mbox/www-legal-discuss/200802.mbox/%3cE1A2049F-5D7B-44F7-A1E3-B9645BC52348@yahoo.com%3e).
[clauses](https://lists.apache.org/thread/b0806ypoqlm9ytjkvdbm90blnt5skz8y)
, and new [exclusions](https://lists.apache.org/thread/stw8nh1pg1gl97s04jcbdo5ctf4t4n6y).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think we should keep links to specific messages in mail-archives.a.o

Copy link
Contributor

@sebbASF sebbASF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep references to mail-archives.a.o that relate to specific messages, as these contain vital information (list,date, messsage-id) which is not exposed in the lists.a.o links

@tisonkun
Copy link
Member Author

tisonkun commented Apr 7, 2025

Close for now. Will try to figure out the history of mail-archives.a.o.

@tisonkun tisonkun closed this Apr 7, 2025
@tisonkun tisonkun deleted the fold-mail-archives.apache.org-redirect branch April 7, 2025 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants