Fetching contributors…
Cannot retrieve contributors at this time
809 lines (517 sloc) 25.3 KB
3.1.4: 2010-xx-xx
Smarthost SIZE is forwarded to SMTP submit clients.
Several bugs and one missing feature:
1. INTHREAD doesn't work now. Not sure why.
2. INTHREAD OLDER and similar searches don't work; the date-based
searches look at the outer message.
3. THREAD needs retesting. It may be broken.
4. COMPRESS now reimplemented, and may work better with iphone 4.
IMAP::emitResponses still needs a tweak or two.
And the feature: automatic flag-based views. Almost everything is
SORT(SUBJECT) is broken. The only way to fix it is to add a postgres
function to implement the moronic baseness algorithm in the RFC. I can't
write that sort of thing.
If that were to use a Cache, message injection would be speeded up
Open Bug: aox add view broken
Ron Peterson reports that aox add view breaks.
aox add view /users/auser/someone /users/auser/archive auser address
aox: Can't create view named /users/auser/someone
No idea what's wrong (yet).
1b149d fixed the immediate view creation problem, but the view doesn't
work yet. -- AMS 20100413
Views now work, but threaded views still don't.
Notify: Ron Peterson <>
Message-Id: <>
Plaintext passwords work, sort of
allow-plaintext-passwords almost works. But if we get a cleartext
password and that's disallowed, we ought to treat it as bad, and
notify the admin that a change of passwords is necessary.
Send email to Timo when that works, he suggested the feature
Open bug format blah
I envision the format thus:
"Open Bug: " as headline prefix to mark it for the processor.
The rest of the headline must never change; it'll be used to look up
the web page. This is basically the bug ID. IDs can be reused as
soon as the bug is closed, though.
The body text contains plain old text, which will be exported, and
which may be changed at will.
Lines matching "Notify: " specify an email address. The email
address is not published on the web site.
When a commit affects a note, the script sends a notification to all
specified addresses. It doesn't include the patch though. When a
commit removes a note, the script says tells people "fixed".
We could try to set the References field on responses... but I don't
care to. It would be neat, but I don't care enough to do it. Or
maybe I do. If there's a Message-ID field, we use that in
References, otherwise we skip it?
Is this good?
P/W: Stuff we may want to keep
List::take()->remove() was good
also keeping iterators working
Having EH make a new Log by default
Simplifying the way Connection objects are added to the main loop
List::append( List ) -> appendList()
smtpclient has simpler logic... but don't break what works
cancelQuery rewrite. not sure.
Do (some of) these, one at a time, making sure nothing breaks after
each one. None soon.
A web page about spam filtering
Containing neat queries, such as "tell me whether the user with
address x has mail from y not tagged as spam", or "tell me whether
user x has sent mail to domain y".
With aox that can be done, uniquely. So mention it.
Functions and views
Views that'll be good for people:
- Valid local email addresses and users
- Sender information
- How many earlier messages from that address
- How many messages to that address
- How many earlier messages to/from the domain
- Create user
- Delete user
- Rename user
- Change password
- Change password, given old and new password, checking
- Enable/disable alias
- Add/delete/rename alias
Installer doesn't take steps to ensure that the installation is usable
It could do at least two things:
Run all the same checks on the new installation as 'aox check' and
archiveopteryx at startup.
Try to connect to all the server addresses and if anything's
listening anywhere, mention it on stdout.
In addition to this, the installer does the wrong thing right now if
it creates the database users and then fails to run psql to load the
schema. It exits with an error, which means the randomly-generated
passwords are lost, because the configuration file is not written.
db-address=localhost works, but needs improvement.
- A new connect(addr, port) function resolves the given address and
creates one SerialConnector object for each result. It starts the
first one, which (after an error, or a delay of 1s) initiates the
next connection in line and so on. The first one to connect swaps
out the d of the original connection with its own, and makes the
EventLoop act as though it had just connected.
It works, but the code is a little ugly. The error handling logic
needs a careful look after some time. Once that's solid, the other
callers (SpoolManager/SmtpClient etc.) can be converted.
aox check schema
This command would check several things.
a) that dbuser has the needed rights
b) that all the right tables are there, and all the right columns,
with the right types, and no unexpected constraints
c) that all the right indexes are there
d) that dbowner owns everything
i) that inserts that would duplicate a constraint are properly
As a bonus, perhaps it could list some unexpected/unknown deviations:
e) locally added tables
f) locally added columns
g) locally added indices
h) missing constraints
Change 43939 and following move towards this: the idea is to
introduce new functions e.g. Schema::checkIntegrity (in addition to
checkRevision) and Schema::grantPrivileges, that can be used both by
aox check schema/aox grant privileges/whatever, and also by the
installer (instead of lib/grant-privileges, and instead of the
half-hearted checking it does now). the server is essentially
unaffected, it just uses Database::checkSchema/checkAccess for a
quick check.
this sounds ok, but it's ugly because Schema::execute is completely
given to upgrading the schema, and neither can nor should be
repurposed to do other things besides. so that means more static
functions in Schema and separate EventHandlers to do the
checking/granting/whatevering. but that's okay.
Cleartext passwords
We help migrating away from cleartext/plaintext passwords:
1. We also store SCRAM and similar secrets in the DB (secrets which
aren't password equivalents)
2. We extend the users table with two new columns, 'last time
cleartext was needed' and 'number of successful authentications
without cleartext password usage since cleartext'.
3. If a client uses SCRAM, we increment the counter.
4. If a client uses CRAM or PLAIN, we reset counter and set the time
to today.
5. We provide some helping code to delete passwords for users with a
high count and a long-ago time.
6. We add documentation saying that if you disable auth-this and
auth-that, you can disable store-plaintext-passwords.
7. We add configuration/db sanity checks for ditto.
Database schema range
People occasionally need to access the db with an old version of
mailstore. I suggest that we:
a) add a 'writable_from' column specifying the oldest version that
can write to the database.
b) add a 'readable_from' column specifying the oldest version that
can read the database
aox upgrade schema would update writable_from to the oldest schema
version for which a writer would do the right job. This would often
change when a table changes, but not when a table is added.
readable_from would be the oldest revision that can read the database.
When the server starts up, it would check:
- am I >writable_from? If so, mailboxes can be read-write
- else, am I >readable_from? If so, startup can proceed, but all
mailboxes are read-only. lmtp, smtp and smtp-submit do not start.
- else, quit.
And in order to handle database updates, I suggest another table,
'features', with a single string column. When aox update database
fixes something, it inserts a row into features. A modern database
would have two rows in this table, 'numbered address fields' and 'no
nulls in bodyparts'.
Bounces and DSNs
Mail is currently fairly reliable. There is one big exception:
Bounces aren't 100% parsable. But generally, if you work hard, you
can know whether a message was delivered or not, and mostly they are
So we benefit from converting the most common nonstandard bounces to
DSNs, and then treat them as DSNs.
For nonstandard bounces (like those of qmail) we identify the
message by trying hard, do some hacky parsing, use the bounce
(excluding trailing message) as first part of the DSN multipart,
cook up a new DSN report based on the parsing, and save
text/822-headers as a third part.
Then, searches that tie bounces together with messages sent work
even better.
(Another trick we can/should use is to see whether the host we
deliver to seems to be the final destination based on earlier
(answered) messages.)
Different tasks, some shared code, same file. Separate this out into
different classes inheriting something. Then add the right sort of
logging statement to the end of parse().
Full-text search
There Be Problems.
The code now assumes that the IMAP client searches for one or more
words, rather than an arbitrary substring. Postgres uses word
If postgres were to use e.g. overlapping three-letter languageless
substrings, we would do what IMAP wants. sounds senseless.
We also have a requirement to stem search arguments less.
Specifically, a search for ARM7TDMI should not return messages about
the ARM6 or about my left arm.
Convert more parsers to use AbnfParser
There are still a few places where we roll our own messy parsers and
suffer for it (e.g. HTTP, DigestMD5). We know they work, but making
them use AbnfParser in a spare moment would be an act of kindness.
Those variables are not well described. We need a bit more.
Also, -secret is probably misnamed, we use -password for other
cases. I expect that's why aox show cf tls-certificate-secret yields
while e.g. aox show cf db-password does not.
We have vacation now, but it isn't quite right for autoresponses.
Sieve autorespond should be like this:
1. :quote should quote the first text/plain part if all of the
following are true:
1. The message is signed, and the signature verified (using any
supported signature mechanism, DKIM SHOULD be supported).
2. The first text/plain part does not have a Content-Disposition
other than inline.
If any of the conditions aren't true, :quote shouldn't quote.
If there's a signature block, :quote shouldn't quote that.
If the quoted text would be more than ten lines, :quote may crop
it down as much as it wants, ideally by skipping lines starting
with '>', otherwise by removing the last lines.
2. :subject, :from and :addresses as for vacation.
3. :cc can be used to send a copy to the specified From address.
4. The default :handle should not be based on the quoted text.
5. Two text arguments, one for text before the quoted text, one for
text after the quoted text.
6. The autoresponse goes to the envelope sender, as some RFC
requires. So we want an option to skip the response unless the
return-path matches reply-to (if present) or From (unless
reply-to is present).
Message arrival tag
Once annotate is done, we want a tag, ie. a magic annotation which
stays glued to the message wherever it goes, even after copy/move.
We also want a way to store the original RFC822 format somewhere
inside and/or outside the database, indexed by the arrival tag
identifier. It's good if the tag is split, so we can have "x-y"
where X is the CD/DVD number and Y is the file on the CD/DVD. Or
something like that.
Sieve ihave
There are three holes in our ihave rules.
Single-child anyof doesn't promote the ihave:
if anyof( ihave "foo" ) {
foo; # errors should not be reported here
Not doesn't promote:
if not not ihave "foo" {
foo; # errors should not be reported here
Finally, if/elsif always applies the ihave to its own block, instead
of walking along elsif/else to find the block that might be executed
if ihave returns true:
if not ihave "foo" {
# errors should be reported here
} else {
foo; # but not here
C/R sucks. But it has its uses, so we can benefit from implementing
it somehow. Here are some classes of messages we may want to treat
- replies to own mail
- messages in languages not understood by the user
- mail from previously unknown addresses
- mail from freemail providers
- vacation responses from unknowns
- messages likely, but not certain to be out-of-office-autoreply
- dkim/mass-signed messages (if verified)
The questions are: How can we ensure that we almost never challenge
real mail, while simultaneously challenging most/all messages that
don't come from valid senders? How can we provide suitable
Mail from freemail vendors tends to have a "Received: ... via HTT"
Using rrdtool
What could we want to graph with rrdtool? Lots.
- CPU seconds used
- database size
- messages in the db
- average response time
- 95th percentile response time
- messages per user
- message size per user
- average query execution time
- average query queue size
More? is interesting for generating graphs
via the web interface.
We should be able to use a read-only local database mirror.
That way, we can play nicely with most replication systems.
The way to do it: add a new db-mirror setting pointing to a
read-only database mirror. all queries that update are sent to
db-address, all selects are sent to db-mirror. db-mirror defaults to
We should test multipart/signed and multipart/encrypted support.
We must add a selection of RFC 1847 messages to canonical, and make
sure they survive the round trip. No doubt there will be bugs.
Per-user client certificates
We could store zero or more client certificates (or fingerprints, or
whatever) per user. When a user has logged in, we'd check whether
that user has a non-zero list of certificates, and if so, we'd do a
TLS renegotiation, this time demanding a client certificate. If the
client certificate matches, we allow access, otherwise we don't (and
we alert the user).
A bit difficult to do with the hands-off tlsproxy.
We should store bodyparts.text for PDF/DOC.
We need non-GPLed code to convert PDF and DOC to plaintext.
Or maybe we need a generic interface to talk to plugins.
Switch to using named constraints everywhere.
Default c-t-e of PGP signatures
Right now we give them binary. q-p or 7bit would be better, I think.
What other application/* types are really text?
From a conversation the other day: we could avoid base64 encoding an
entity whose content-type is not text if it contains only printable
ASCII. I don't know if it's worth doing, though.
The problem with doing that is that it treats sequences of CR LF, CR
and LF as equivalent. An application/foobar object that happens to
contain only CR, LF and printable ASCII can be broken.
Recognising spam
The good spam filters now all seem to require local training with
both spam and nonspam corpora. We can do clever stuff... sometimes.
Instead of filtering at delivery, we can filter when a message
becomes \recent. When we increase first_recent, we hand each new
message to the categoriser, and set $Spam or $Nonspam based on its
This lets the categoriser use all the information that's available
right up to the moment the user looks at his mail.
We can also build corpora for training easily. All messages to which
users have replied are nonspam, replies to messages from local users
are nonspam, messages in certain folders are spam, messages with a
certain flag are spam.
We can connect to a local server to ask whether a message is spam.
They seem to work that way, but with n different protocols.
Faster mapping from unicode to 8-bit encodings
At the moment, we use a while loop to find the right codepoint in an
array[256]. Mapping U+00EF to latin-1 requires looping from 0 to
0xEF, checking those 239 entries.
We could use a DAG of partial mappings to make it faster. Much
faster. Mapping U+20AC to 8895-15 would require just one lookup: In
the first partial table for 8859-15. Mapping U+0065 to 8859-15 would
require three: In the first (U+20AC, one entry long), in the
fallback (U+00A0, 96 entries long) and in the last (U+0000, 160
entries long).
Effectively, 8859-15 would be a first table of exceptions and then
fall back to 8859-1.
The tables could be built automatically, compiled in, and would be
tested by our existing apparatus.
Or we could do it simpler and perhaps even faster: Make a local
array from unicode to target at the start, fill it in as we go, and
do the slow scan only when we see a codepoint for the first time.
Multipart/signed automatic processing
We could check signatures automatically on delivery, and reject bad
signed messages.
The big benefit is that some forgeries are rejected, even though the
reader and the reading MUA doesn't do anything different.
The disadvantage is that we (probably?) can't verify all signatures,
which gives a false sense of security for the undetectable forgeries.
In case of PKCS7, it's possible to self-sign. Those we cannot
check. In that case we remove the signature entirely from the MIME
structure, so it doesn't look checked to the end-user.
PGP cannot be checked, except it sort of can. We can have a small
default keyring including the CA key and so on, and treat
that as root CAs, using the keyservers to dig up intermediate keys.
PGP automatic processing
Apparently there are five different PGP wrapping formats. We could
detect four and transform them to the proper MIME format.
It's not given that we want to accept all mail. If we don't, who
makes the decision? A sieve script may, and refuse/reject mail it
does not like. And a little bit of pluginnery may. I think we'd do
well to support the postfix plugin protocol, so all postfix policy
servers can work with aox. (All? Or just half? Doesn't postfix have
two types of policy plugins?)
We may even support site-wide and group-wide sieve scripts and
permit a sieve script to invoke the plugin. A sieve statement like
UsePolicyServer localhost 10023 ;
If the message is multipart and the boundary occurs in a part, that
part needs encoding. Or else switch to a different body.
Delaying seen-flag setting
We can move the seenflagsetter to imapsession, build up flags to
set, flush the write cache before fetch flags, store, state-altering
commands and searches which use either modseq or flags.
This ought to cut down the number of transactions issued per imap
command nicely.
Sending forged From despite check-sender-addresses
vacation :from and notify :mailto :from don't check
The injector probably needs to get the logic from the smtp server.
Per-group and systemwide sieves
People always seem to want such things. It'll be easy to implement.
Most of the tricky issues are described in
The Sieve "header" test may fail
Write a test or three that feeds the thing a 2047-encoded header
field and checks that it's correctly matched/not matched. Then make
it pass.
The subaddress specification says foo@ != foo+@ wrt. :detail
The former causes any :detail tests to evaluate to false, while the
latter treats :detail as an empty string. We treat both as an empty
(We could set detail to a single null byte, to \0\r\0\n\0, to a
sequence of private-use unicode characters, or even to
Entropy::string( 8 ) if there is no separator. The chance of that
appearing in an address is negligible.)
SMTP extensions
Here are the ones we still don't implement, but ought to implement
at some point:
DELIVERBY (RFC 2852): At some time.
DELIVERBY has the funny little characteristic that we can support it
with great ease iff the smarthost does, so we ought to advertise iff
if the smarthost does.
The groups and group_members tables seem a little underused
We do not use them at all. We meant to use them for "advanced" ACL
support, but nobody ever asked, and it didn't seem worthwhile.
I now think it's worthwhile.
Here's what I want to add:
Make a superusers group, which members can authenticate as anyone,
and the notion of group admins, who can authenticate as other
members of the group.
Or maybe an administrator table, linking a user to either a group or
to null. If a group, then the admin can authenticate as other
members of that group and (importantly) has 'a' right on their
mailboxes, if null, then ditto for all groups.
Extend Permissions to link against group_members when selecting
applicable permissions.
Make groups be permissible ACL identifiers.
We need to be able to disable users
- Reject mail with 5xx/4xx.
- Prevent login.
- 1+2.
- a group admin can enable/disable group members
- a superadmin can enable/disable anyone
- a group admin cannot unblock an overall blockage
Dynamically preparing often-used queries
We can prepare queries cleverly.
Inside Query, at submit time, we first check whether a Query's text
matches a PreparedStatement, and uses it if so.
If not, we check whether the query looks preparable. The condition
seems to be simple: Starts with 'select ' and contains no numbers.
If it's preparable we add it to a cache, which is discarded at GC
If a preparable query is used more than n times before the cache is
discarded, we prepare the query and keep the PreparedStatement
Defending against PGP Desktop and similar
There are several more things to do:
- Guess that it's repeating a query for smallish UID sets and do the
query once and for all.
- Defend against 'OR BODY asdf BODY asd' by recognising in
simplify() that when asd matches, asdf always will. Added bonus
for the base64 shit.
- Hack in Search::parse() for that/those specific search keys, and
setting up a more sensible selector.
The first two make sense IMO.
New RFCs
5463: sieve ihave
5490: sieve metadata
5442: lemonade profile-bis
Sieve notify
5435: sieve notify
5436: sieve notify mailto
5437: sieve notify xmpp
mailto: combined with :from is unchecked
Bug confusing U+ED00 and U+0000 in the message cache
When we write to the database, U+0000 (which occasionally occurs,
mostly by mistake but sometimes on purpose) is transformed to
U+ED00, and when we read it, back.
So if U+ED00 is written to the DB, it comes back as U+0000.
This means that Archiveopteryx works differently depending on
whether the cache is used or not. That has to be resolved somehow.
Axel, /Mime problem
The problem is that the VCF file contains literal NUL bytes, but is
sent with a text/* MIME type, and we're mangling the NULs during
charset conversion (or so I guess, given that they become '?'s
Various alias-related feature requests
e.g. Benjamin wants empty localparts, a number of people want multiple
targets (Axel, Ingo).
Axel, /Unable to fetch 12MB mail
Some sort of loop in the fetcher? I didn't look.
Problems found in 3.0.0 by Timo
(Not verified because of segfault; will check later.)
Not fixed; I lean towards fixing it if it's the only thing
imaptest complains about.
Caching search results
If a selector is !dynamic(), its results can be cached until the next
modseq change on the mailbox.
- "aox start" doesn't complain when there's a schema mismatch error.
Gmail compatibility is easy, but
requires, which is
simple but tiresome. Oh no.
- Implement special-use mailboxes. Will need trash, inbox, allmail,
but don't need all of them right away.
- A trigger on mailbox_messages insertion:
if the mailbox has an owner:
look up the owner's allmail and trash mailboxes
if this is allmail:
insert the message into those of the owner's other mailboxes
that contain >=1 message with the same messages.thread_root
else if this is trash:
remove from allmail (perhaps?)
make sure the message is also in the owner's allmail
also insert all other messages that are in the owner's allmail
and have the same messages.thread_root.
This trigger ensures that mailboxes tend to contain complete
threads, that all mail contains everything except trash, and that
delivery just magically picks all the right mailboxes.
- Make expunge move things to trash, with a time, and delete
messages after 30 days. Probably best done as another trigger.
Expunge in allmail expunges everywhere except trash.
- X-GM-blah except the RAW search key
- X-GM-blah RAW search key. work.