Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompactWhitespaces poor performance causes problems with big attachments #68

Closed
Jay1980 opened this issue Oct 24, 2014 · 20 comments
Closed
Assignees

Comments

@Jay1980
Copy link

Jay1980 commented Oct 24, 2014

Hi,

Bit of an odd one here. DKim signer has been working well for us but today noticed that our email had ground to a halt. Discovered that for certain emails they seem to get stuck in the submissions queue and the EdgeTransport starts using excessive CPU and memory. Disabling the DKim Signer agent and restarting the MSTransportAgent allows the messages to leave the server and everything returns to normal.

I'm also periodically seeing the following in the event viewer: (source: MSExchange Extensibility)
"The execution time of agent 'Exchange DkimSigner' exceeded 90000 milliseconds while handling event 'OnCategorizedMessage' for message with InternetMessageId: 'Not Available'. This is an unusual amount of time for an agent to process a single event. However, Transport will continue processing this message."

So far it seems to occur with emails with attachments / over a certain size. Smaller emails seem to go through ok.

We're a design agency so some of our emails are proofs of 6-10mb in size.

Any ideas?
James.

@Pro Pro added the agent label Oct 24, 2014
@Pro
Copy link
Owner

Pro commented Oct 24, 2014

90 seconds to handle a message is quite long, this shouldn't happen.
I think this is related to calculating the body hash of the message: if the message is too big, the body hash function takes too long and Exchange stops submitting the message.

Currently I can't investigate the problem but will look into it in a week or so.

You can try to send multiple emails with different attachment sizes (1MB, 2MB, .... up to 20MB) and tell me which one don't go through, this would speedup my work. Thanks

@Pro Pro added the bug label Oct 24, 2014
@Jay1980
Copy link
Author

Jay1980 commented Oct 24, 2014

Thanks for the quick response.
So far I can see there are emails stuck in the queue with the following sizes:
20715KB
6950KB

I've managed to send emails of the following sizes (larger take longer as I'd expect from your suggestion about calculating the body hash):
937KB
2106KB
3111KB
5390KB

@Jay1980
Copy link
Author

Jay1980 commented Oct 24, 2014

Interesting development....

after looking at the code and seeing the section in GetBodyHash about relaxed canonicalisation I decided to set my configuration to simple as a test.... the emails go through fine!

Looks like the bottleneck could be in the "CompactWhitespaces" function?

@Pro
Copy link
Owner

Pro commented Oct 24, 2014

Yes, that is most probably the problem, this code part has some quite costly function calls and loops.

This should be replaced with a RegEx since those have a much higher performance.
Thanks for your info.

@Pro Pro changed the title Exchange 2010 SP3 Strange hanging issue on EdgeTransport CompactWhitespaces poor performance causes problems with big attachments Oct 24, 2014
@AlexLaroche
Copy link
Collaborator

The problem could be with this function too. It's really delicate to change one of this two functions. :/

May be we could find some inspiration here : https://github.com/dmcgiv/DKIM.Net

        /// <summary>
        /// Runs through the MIME stream and sucks out the headers that should be part
        /// of the DKIM signature. The headers here have their whitespace preserved and
        /// all terminate with CRLF. The stream position is returned to the start of the
        /// stream.
        /// </summary>
        /// <param name="stream">The stream containing the MIME message.</param>
        /// <returns>An enumerable of the headers that should be a part of the signature.</returns>
        private IEnumerable<string> GetCanonicalizedHeaders(Stream stream)
        {
            Dictionary<string, string> headerNameToLineMap;
            string line;
            StreamReader reader;

            stream.Seek(0, SeekOrigin.Begin);

            headerNameToLineMap = new Dictionary<string, string>();
            reader = new StreamReader(stream);

            line = reader.ReadLine();
            while (line != null)
            {
                string header;
                string[] headerParts;

                // We've reached the end of the headers (headers are
                // separated from the body by a blank line).
                if (line.Length == 0)
                    break;

                // Read a line. Because a header can be continued onto
                // subsequent lines, we have to keep reading lines until we
                // run into the end-of-headers marker (an empty line) or another
                // line that doesn't begin with a whitespace character.
                header = line + "\r\n";
                line = reader.ReadLine();
                while (!string.IsNullOrEmpty(line) &&
                    (line.StartsWith("\t", StringComparison.Ordinal) ||
                    line.StartsWith(" ", StringComparison.Ordinal)))
                {
                    header += line + "\r\n";
                    line = reader.ReadLine();
                }

                // Extract the name of the header. Then store the full header
                // in the dictionary. We do this because DKIM mandates that we
                // only sign the LAST instance of any header that occurs.
                headerParts = header.Split(new char[] { ':' }, 2);
                if (headerParts.Length == 2)
                {
                    string headerName;

                    headerName = headerParts[0];

                    // We only want to sign the header if we were told to sign it!
                    if (this.eligibleHeaders.Contains(headerName, StringComparer.OrdinalIgnoreCase))
                    {
                        if (this.headerCanonicalization == DkimCanonicalizationKind.Relaxed)
                        {
                            // Unfold all header field continuation lines as described in
                            // [RFC5322]; in particular, lines with terminators embedded in
                            // continued header field values (that is, CRLF sequences followed by
                            // WSP) MUST be interpreted without the CRLF.  Implementations MUST
                            // NOT remove the CRLF at the end of the header field value.
                            // Delete all WSP characters at the end of each unfolded header field
                            // value.
                            // Convert all sequences of one or more WSP characters to a single SP
                            // character.  WSP characters here include those before and after a
                            // line folding boundary.

                            header = CompactWhitespaces(header);
                            header += "\r\n";

                            // Delete any WSP characters remaining before and after the colon
                            // separating the header field name from the header field value.  The
                            // colon separator MUST be retained.
                            // We can't simply replace all " : " with ":" because only the first
                            // colon should be changed. If e.g. the subject header contains a
                            // "RE: TEST" it mustn't be replaced with "RE:TEST"
                            int firstPos = header.IndexOf(':');
                            if (firstPos > 0 && firstPos < header.Length - 2)
                            {
                                // colon found. Now remove any spaces before/after the colon

                                //check how many whitespaces are before the colon
                                int beforeCount = 0;
                                for (int i = firstPos - 1; i >= 0; i--)
                                {
                                    if (header[i] == ' ')
                                        beforeCount++;
                                    else
                                        break;
                                }
                                if (beforeCount > 0)
                                {
                                    //now remove them
                                    header = header.Remove(firstPos - beforeCount, beforeCount);
                                }
                                // colon is now at another position
                                firstPos -= beforeCount;

                                //check how many whitespaces are after the colon
                                int afterCount = 0;
                                for (int i = firstPos + 1; i < header.Length; i++)
                                {
                                    if (header[i] == ' ')
                                        afterCount++;
                                    else
                                        break;
                                }
                                if (afterCount > 0)
                                {
                                    //now remove them
                                    header = header.Remove(firstPos + 1, afterCount);
                                }
                            }

                            // Convert all header field names (not the header field values) to
                            // lowercase.  For example, convert "SUBJect: AbC" to "subject: AbC".
                            string[] temp = header.Split(new char[] { ':' }, 2);
                            header = temp[0].ToLower() + ":" + temp[1];
                        }

                        headerNameToLineMap[headerName] = header;
                    }
                }
            }

            stream.Seek(0, SeekOrigin.Begin);

            return headerNameToLineMap.Values.OrderBy(x => x, StringComparer.Ordinal);
        }

@AlexLaroche
Copy link
Collaborator

@Pro : I have some problems with my developpement environnement. Can you identify with one of this two functions is in problem?

var watch = Stopwatch.StartNew();
/* ... */
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds; 
Logger.LogInformation(elapsedMs);

@Pro
Copy link
Owner

Pro commented Nov 13, 2014

The DKIM.Net project uses the following function for compact whitespaces:
https://github.com/dmcgiv/DKIM.Net/blob/120ada02a7da74b1ff6c366adc5327bc99adbd90/src/DKIM.Net/WhiteSpace.cs#L33

I've also found an interesting discussion: http://stackoverflow.com/questions/6442421/c-sharp-fastest-way-to-remove-extra-white-spaces
I still think that using a compiled RegEx would be the fastest way for CompactWhitespaces. Maybe we can implement both suggestes ways (from the Stackoverflow page) and then do some performance measurements.

Regarding the Header parsing, maybe we can find a good performing MIME parser which we can integrate to get all the headers of the mail:
http://stackoverflow.com/questions/1669797/are-there-net-framework-methods-to-parse-an-email-mime
http://www.lessanvaezi.com/email-headers-from-outlook-mailitem/
http://jeffreystedfast.blogspot.de/2013/09/time-for-rant-on-mime-parsers.html

@AlexLaroche We should implement the changes in another branch and test them thourugly before adding them back to the master branch. I'll do some tests to find out which function causes the long processing time.

@AlexLaroche
Copy link
Collaborator

@Pro : An another way to do the implementation is to make a wrapper to the c library libopendkim (http://www.opendkim.org/libopendkim/) that we call for signing. This library is part of OpenDKIM project (http://www.opendkim.org/) and OpenDKIM project seem to be a reference for DKIM implementation. The current library isn't made to be build on Windows officially but we could try to made a easy adaptation.

@Pro
Copy link
Owner

Pro commented Nov 13, 2014

Yes, this would also be a good option. But this still needs a parser for header fields. Do you want to investigate if it's possible to use libopendkim in our code?

@AlexLaroche
Copy link
Collaborator

I think if we could use libopendkim, the code no more need to make header parsing... We just past the parameters (SigningAlgorithm, HeaderCanonicalization, BodyCanonicalization, HeadersToSign, Private Key) to the wrapper. The wrapper will past all the stuff to libopendkim and exposed some methods of the library. We just have to let the library do all the signing job after.

@AlexLaroche
Copy link
Collaborator

I will investigate the use of libopendkim in our code.

@Pro
Copy link
Owner

Pro commented Nov 13, 2014

Yep, but looking at http://www.opendkim.org/libopendkim/overview.html, you need to add each header separately with http://www.opendkim.org/libopendkim/dkim_header.html
Maybe this can be used: http://www.opendkim.org/libopendkim/dkim_mail_parse.html

@AlexLaroche
Copy link
Collaborator

/// <summary>
/// Runs through the MIME stream and sucks out the headers that should be part
/// of the DKIM signature. The headers here have their whitespace preserved and
/// all terminate with CRLF. The stream position is returned to the start of the
/// stream.
/// </summary>
/// <param name="stream">The stream containing the MIME message.</param>
/// <returns>An enumerable of the headers that should be a part of the signature.</returns>
private IEnumerable<string> GetCanonicalizedHeaders(Stream stream)
{
    Dictionary<string, string> headerNameToLineMap;
    string line;
    StreamReader reader;

    stream.Seek(0, SeekOrigin.Begin);

    headerNameToLineMap = new Dictionary<string, string>();
    reader = new StreamReader(stream);

    line = reader.ReadLine();
    while (line != null)
    {
        string header;
        string[] headerParts;

        // We've reached the end of the headers (headers are
        // separated from the body by a blank line).
        if (line.Length == 0)
            break;

        // Read a line. Because a header can be continued onto
        // subsequent lines, we have to keep reading lines until we
        // run into the end-of-headers marker (an empty line) or another
        // line that doesn't begin with a whitespace character.
        header = line + "\r\n";
        line = reader.ReadLine();
        while (!string.IsNullOrEmpty(line) &&
            (line.StartsWith("\t", StringComparison.Ordinal) ||
            line.StartsWith(" ", StringComparison.Ordinal)))
        {
            header += line + "\r\n";
            line = reader.ReadLine();
        }

        // Extract the name of the header. Then store the full header
        // in the dictionary. We do this because DKIM mandates that we
        // only sign the LAST instance of any header that occurs.
        headerParts = header.Split(new char[] { ':' }, 2);
        if (headerParts.Length == 2)
        {
            string headerName;

            headerName = headerParts[0];

            // We only want to sign the header if we were told to sign it!
            if (this.eligibleHeaders.Contains(headerName, StringComparer.OrdinalIgnoreCase))
            {
                headerNameToLineMap[headerName] = header;
            }
        }
    }

    stream.Seek(0, SeekOrigin.Begin);

    return headerNameToLineMap.Values.OrderBy(x => x, StringComparer.Ordinal);
}

@AlexLaroche
Copy link
Collaborator

Just remove the header relaxed canonicalization case should probably fix the function GetCanonicalizedHeaders(). I don't think the problem is with parsing header. We don't have the problem in simple canonicalization mode.

@Pro
Copy link
Owner

Pro commented Nov 13, 2014

Yes, this should be fine to use for libopendkim since it does the canonicalization by itself.

@AlexLaroche
Copy link
Collaborator

Do you have identify with one of this two functions is in problem?

@AlexLaroche
Copy link
Collaborator

Do you have identify with one of this two functions is in problem? It will be to much work to try to use libopendkim. Libopendkim need to be port to Windows and a c++/cli wrapper should be develop.

@Pro
Copy link
Owner

Pro commented Nov 27, 2014

I investigated the performance of the two functions. I tested with a 1MB file attachment.
It took in total 61584ms (1 Minute). Here's the detailed performance data:

- Total: 61584ms
  | - DkimSigner.GetBodyHash: 61478ms
  |   | - Regex.Split(tempText, @"\r?\n|\r"): 32ms
  |   | - Loop over 18009 lines calling CompactWhitespaces: 61014ms
  | - DkimSigner.GetCanonicalizedHeaders + DkimSigner.GetSignedDkimHeader: 103ms

So the bottleneck is definetly the CompactWhitespaces method. I'll check if this function can be rewritten for better performance.

@Pro Pro self-assigned this Nov 27, 2014
@Pro Pro closed this as completed in 269de9f Nov 27, 2014
@Pro
Copy link
Owner

Pro commented Nov 27, 2014

Well, the problem was that we added each line directly to the string variable, e.g. bodyText += "This is a string"
The performance of this operation is very poor since the string is recreated in the memory.
Replacing it with a StringBuilder (see commit) increased the performance significantly:
1MB requires now 500ms, 6MB 2900ms

@AlexLaroche
Copy link
Collaborator

Very small change but a big performance improvement. Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants