-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPLIT #239:::Skip attachments #284
Conversation
|
||
if ($this->attachmentsIgnore && | ||
($partStructure->type !== TYPEMULTIPART && | ||
($partStructure->type !== TYPETEXT || !in_array(strtolower($partStructure->subtype), ['plain','html'])))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree we should avoid processing parts that we don't need but I am not sure this is the way.
That sure is a lot of ifs are we certain we have covered all the basis? Just looks like something that can cause major headaches constantly having to patch for edge cases slipping through or getting ignored, isn't this a wheel we already have? Since we are checking now is it still necessary to do isAttachment
again?
php-imap/src/PhpImap/Mailbox.php
Line 652 in 56ed03e
$isAttachment = $partStructure->ifid || isset($params['filename']) || isset($params['name']); |
Instead of something that is supposed to optimise performance we've added extra checks and balances, being punished to get the reward.
Lets have a look and see what is going on here at What do we have hereMessage structureWe retrieve the structure at php-imap/src/PhpImap/Mailbox.php Lines 585 to 594 in 56ed03e
Data retrieve and decodeFetching the data for the part is what we want to avoid. php-imap/src/PhpImap/Mailbox.php Lines 613 to 618 in 56ed03e
Without the data there is no need for expensive decodes. php-imap/src/PhpImap/Mailbox.php Lines 620 to 632 in 56ed03e
No discrimination yet, we have the data long before we even know what we are looking at, now that's equal opportunity greedy loading right there. Parameter collectionNext up is parameter collection, now this is perhaps a bit cumbersome but we already have the structure data so we might as well process it here. php-imap/src/PhpImap/Mailbox.php Lines 634 to 650 in 56ed03e
By the looks of it, and I might be wrong, we are really only concerned with 3 parameters before those pretty params go to garbage:
|
if(empty($params['filename']) && empty($params['name'])) { | |
$fileName = $attachmentId . '.' . strtolower($partStructure->subtype); | |
} | |
else { | |
$fileName = !empty($params['filename']) ? $params['filename'] : $params['name']; |
and charset
php-imap/src/PhpImap/Mailbox.php
Lines 696 to 697 in 56ed03e
if(!empty($params['charset'])) { | |
$data = $this->convertStringEncoding($data, $params['charset'], $this->serverEncoding); |
Not sure how many parameters we are collecting but based on this knowledge does it still make sense to decode everyone we find?
php-imap/src/PhpImap/Mailbox.php
Line 637 in 56ed03e
$params[strtolower($param->attribute)] = $this->decodeMimeStr($param->value); |
Probably a tad over eager here but still it is based on information:
- we already have
- is required knowledge about the message
- might as well collect everything we might need so we don't have to look it up again
Type negotiation
We already have all the information we need to decide whether this is an attachment or not.
php-imap/src/PhpImap/Mailbox.php
Lines 652 to 659 in 56ed03e
$isAttachment = $partStructure->ifid || isset($params['filename']) || isset($params['name']); | |
// ignore contentId on body when mail isn't multipart (https://github.com/barbushin/php-imap/issues/71) | |
if(!$partNum && TYPETEXT === $partStructure->type) { | |
$isAttachment = false; | |
} | |
if($isAttachment) { |
If it is an attachment there is a whole lot of boiler plating to be done but it turns out the only time we actually need the data is when/if we are going to write it to file.
php-imap/src/PhpImap/Mailbox.php
Lines 676 to 692 in 56ed03e
if($this->attachmentsDir) { | |
$replace = [ | |
'/\s/' => '_', | |
'/[^0-9a-zа-яіїє_\.]/iu' => '', | |
'/_+/' => '_', | |
'/(^_)|(_$)/' => '', | |
]; | |
$fileSysName = preg_replace('~[\\\\/]~', '', $mail->id . '_' . $attachmentId . '_' . preg_replace(array_keys($replace), $replace, $fileName)); | |
$attachment->filePath = $this->attachmentsDir . DIRECTORY_SEPARATOR . $fileSysName; | |
if(strlen($attachment->filePath) > 255) { | |
$ext = pathinfo($attachment->filePath, PATHINFO_EXTENSION); | |
$attachment->filePath = substr($attachment->filePath, 0, 255 - 1 - strlen($ext)) . "." . $ext; | |
} | |
file_put_contents($attachment->filePath, $data); | |
} |
For everything else, that is apparently not an attachment we immediately start processing data again
php-imap/src/PhpImap/Mailbox.php
Lines 696 to 698 in 56ed03e
if(!empty($params['charset'])) { | |
$data = $this->convertStringEncoding($data, $params['charset'], $this->serverEncoding); | |
} |
Even before, assuming we have data after all that effort, we considered if we actually have the appropriate types to identify textPlain
or textHtml
.
php-imap/src/PhpImap/Mailbox.php
Lines 699 to 708 in 56ed03e
if($partStructure->type == 0 && $data) { | |
if(strtolower($partStructure->subtype) == 'plain') { | |
$mail->textPlain .= $data; | |
} | |
else { | |
$mail->textHtml .= $data; | |
} | |
} | |
elseif($partStructure->type == 2 && $data) { | |
$mail->textPlain .= trim($data); |
The magic numbers should actually be replaced by the constants 0 = TYPETEXT
and 2 = TYPEMESSAGE
to clarify but not to mention there are a total of 9 possible types.
After this we continue with the recursive processing of additional sub-parts. It is clear that the data is hardly the most important thing but yet takes the highest priority. Even without getting fancy there is a lot that can be optimised already by simply only fetching data that we might actually need.
Lazy loading
Of course it makes sense not to retrieve every attachment if we are not even going to look at any attachments, that is just cray cray. Doesn't the same count for the message text as well? If we are only going to look at plain text then why do we bother with collecting html data and visa versa. The same goes for all the eager decodes rather wait until we actually ask for it.
From the look of things it should be simple enough to just collect the structure data. We have everything we need to retrieve the data so instead lets just capture the ingredients for now. Then only if you actually call getAttachments
, getTextPlain
or getTextHtml
do we finally retrieve the data and do all the decode niceness.
Only pay for what you actually use.
@nickl- Lazy loading sounds good in theory but what about backward compatibility? |
@commanddotcom I am sure we can remain on the no breaking side of the fence, what concerns you? |
fixes barbushin#150 barbushin#167 barbushin#122 as discussed at barbushin#284 instead of trying to avoid loading attachments rather put it off until it is needed. we can do the same for text plain and text html as well, what you don't need won't be retrieved. solution remains non breaking, public lazy load properties are implemented as property magic getter
Please check this #286 |
Tested this change: It works fine and backward compatibility is given, so nobody needs to update his / her code with this change. I'll merge it and add some PHPUnit tests. |
…instead of directly accessing the property
- Updated README - Move phpunit to require-dev - Add note about installing dev dependencies in README - Replaced spaces with tabs - Added PHPUnit tests for MIME decoding - Updated formatting of PHPUnit function testParsedDateTimeWithEmptyHeaderDate() - Issue #209: Function to parse datetime correctly RFC2822 - Issue #280: Added 'Sender' to headers and added additional if-conditions - Issue #115: getMail() method returns an object even for nonexistent mail ID - Issue #273: Added connection check to example - Issue #227: Added Failed-Recipients to IncomingMailHeader - Issue #140, #246: Improved exception handling and added PHPUnit test - Issue #140: Added PHPUnit test for testing ConnectionException - Issue #140: Improved exception / error handling and improved / added PHPUnit tests - Issue #154: Added ability to change the imap_search option from SE_UID to SE_FREE and added PHPUnit tests - Issue #306: Added support for US-ASCII and added ability to disable serverEncoding for searchMailbox() - Imported missing namespaces to avoid 'unknown class' error messages - Issue #86: Simplified and improved one replace regex for attachment file names - Issue #247: Improved grabbing of fromName, fromHost, senderName and senderHost - Issue #39, #71, #229: Fixed body content gets incorrectly processed as attachments - Issue #122, #150, #167: Added ability to skip processing of attachments to increase performance, when attachments are not required - PR #284: Added missing PHPUnit tests - Issue #122, #150, #167: Lazy load message text and attachments data
Split PR of #239 all original commits by @commanddotcom only cherry picked, branched, squashed.
This fixes #150 #167 #122 provides a way to skip processing of attachments when it is not required.