Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can/should a mailbag contain multiple email accounts? #2

Open
gwiedeman opened this issue Jun 29, 2021 · 2 comments
Open

Can/should a mailbag contain multiple email accounts? #2

gwiedeman opened this issue Jun 29, 2021 · 2 comments

Comments

@gwiedeman
Copy link
Contributor

@gwiedeman gwiedeman commented Jun 29, 2021

We initially assumed silently that an email account was the “thing,” or “meaningful aggregate” that users would package into a mailbag. The obvious practical case being an MBOX export or folder of EML files.

This was challenged helpfully in two ways. First, when an archivist acquires multiple email accounts it is probably intuitive to both the Nicholas Garza and Andrea Lee personas to package them in a single mailbag. Secondly, are email accounts the grouping that are most meaningful to future email users? In fact, some comments from the working meeting showed us how Isaac Hoffman, our data scientist persona, is likely to be more interested in how an organization communicates as a whole and individual accounts may only tell part of that story. They probably want to do some network analysis of how individuals communicated about an issue. One could also see the Teresa Burns persona also wanting to explore across multiple email accounts.

From the knowledge of our advisory board, it seems like it is really rare to acquire all the email for an organization. Mass subpoenas or leaks of an organization’s email is probably not a realistic collecting strategy for most repositories, but probably do happen in rare cases. While [Isaac Hoffman[(https://archives.albany.edu/mailbag/personas/#isaac-hoffman-data-scientist) may want to see all of an organization’s email, Andrea Lee is more likely to acquire email using a capstone approach or from exports of email folders or individual messages.

Ideally, the Mailbag Specification wouldn’t care about this, and in a perfect world, a mailbag would scale down to a single EML file or up to the Enron emails. Yet, since it’s important to keep explicit connections between multiple representations of email messages, the specification does have to account for, say, if two folders of EML files are separate accounts or not.

The cost for managing multiple email accounts in a Mailbag will make the specification more complex and make it a bit more difficult to build tools to create or manage Mailbags. Making tools more complicated will add significant complexity for Aaron Santos, our tool maintainer. One approach managing this could be to require an account name for every message, but this may be unintuitive for Emily Cooper and other command line users.

We concluded that accounts are a natural boundary for email, and it would keep the specification simpler and more useful to limit mailbags to one account. Even when accounts are stored in separate mailbags, that doesn’t mean they need to be presented to users that way. If Isaac Hoffman requests all email accounts from an organization, Andrea Lee can export the MBOX or WARC files from all the mailbags they are interested in. If a repository does not have a digital archivist, Nicholas Garza could make all the mailbags available, and we’re confident that Isaac can write a script to pull data from all of them.

For rare cases when archivists receive multiple email accounts, archivists could also create multiple mailbags for each account. There seems to be less cost with that approach. Even if Andrea Lee was processing a large number of email accounts, she definitely has the skills to manage multiple mailbags. Even if she needs to package these mailbags in another bag for ingest into a repository, there is little cost to doing so. Leaving the mailbags as directories could require redundant computing power to duplicate hashes, but Andrea could also compress each mailbag in a zip file, so the bag containing them only has to hash the ZIP or TAR.GZ file.

We’re the least confident in this decision, so please comment if you have thoughts! We are very interested if there are use cases that we’re not thinking of where using multiple mailbags for different accounts would be burdensome.

@stuartyeates
Copy link

@stuartyeates stuartyeates commented Jun 29, 2021

In my organisation, we use role email accounts (AKA shared mailboxes and several other things) to communicate with clients (both within and outside of the organisation).

The boundary between an individuals email account and the several role email accounts they may use from time to time in the course their work is at least semi-permeable, particularly when conversations / threads can last for months or years and staff move on. There are a number of issues with external vendors which have continue through three or four different staff members tenure, all on the same role account.

It's not clear whether these role accounts are accounts or email folders in the current spec.

@gwiedeman
Copy link
Contributor Author

@gwiedeman gwiedeman commented Jul 9, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants