Skip to content

Commit

Permalink
Adding message.errors and also other wild content type handlings foun…
Browse files Browse the repository at this point in the history
…d from enron and corpus database run, changing Peter's body auto encode to only try decoding when asked to
  • Loading branch information
mikel committed Mar 28, 2010
1 parent 26eb925 commit 132bcff
Show file tree
Hide file tree
Showing 23 changed files with 754 additions and 15 deletions.
2 changes: 2 additions & 0 deletions .bundle/config
@@ -0,0 +1,2 @@
---
BUNDLE_WITHOUT: ""
9 changes: 9 additions & 0 deletions CHANGELOG.rdoc
@@ -1,3 +1,12 @@
== Sun Mar 28 00:26:27 UTC 2010 Mikel Lindsaar <raasdnil@gmail.com>

* Merged in Jeremy/treetop to vendored treetop
* Merged in nathansobo/treetop to vendored treetop
* Merged in pzbowen/mail into mail - Adds body auto encoding - awesome work
* Fixed content-transfer-encoding parser to be more compliant per RFC, also now handles trailing semi-colons correctly
* Fixed content-transfer-encoding parser to handle weird "from the wild" misspellings
* Added message.errors, header.errors arrays, returns array of [field_name, value, error_object] for each field that failed to parse

== Wed Feb 24 09:14:56 UTC 2010 Mikel Lindsaar <raasdnil@gmail.com>

* Fixed multiaddress bounce messages crashing when calling .bounced? Now just take the first report and return that.
Expand Down
11 changes: 7 additions & 4 deletions lib/mail/attachments_list.rb
Expand Up @@ -39,10 +39,13 @@ def []=(name, value)

default_values[:body] = value.delete(:data) if value[:data]

if value[:transfer_encoding]
default_values[:content_transfer_encoding] = value.delete(:transfer_encoding)
elsif value[:encoding]
default_values[:content_transfer_encoding] = value.delete(:encoding)
encoding = value.delete(:transfer_encoding) || value.delete(:encoding)
if encoding
if Mail::Encodings.defined? encoding
default_values[:content_transfer_encoding] = encoding
else
raise "Do not know how to handle Content Transfer Encoding #{encoding}, please choose either quoted-printable or base64"
end
end

if value[:mime_type]
Expand Down
5 changes: 1 addition & 4 deletions lib/mail/body.rb
Expand Up @@ -173,12 +173,9 @@ def encoding(val = nil)
end

def encoding=( val )
if val == "text" then
if val == "text" || val.blank? then
val = "8bit"
end
if !Mail::Encodings.defined? val
raise UnknownEncodingType, "Don't know how to decode #{val}, please decode first"
end
@encoding = (val == "text") ? "8bit" : val
end

Expand Down
5 changes: 4 additions & 1 deletion lib/mail/elements/content_transfer_encoding_element.rb
Expand Up @@ -6,7 +6,10 @@ class ContentTransferEncodingElement

def initialize( string )
parser = Mail::ContentTransferEncodingParser.new
if tree = parser.parse(string.downcase)
case
when string.blank?
@encoding = ''
when tree = parser.parse(string.downcase)
@encoding = tree.encoding.text_value
else
raise Mail::Field::ParseError, "ContentTransferEncodingElement can not parse |#{string}|\nReason was: #{parser.failure_reason}\n"
Expand Down
4 changes: 3 additions & 1 deletion lib/mail/field.rb
Expand Up @@ -141,8 +141,10 @@ def split(raw_field)
def create_field(name, value)
begin
self.field = new_field(name, value)
rescue
rescue => e
self.field = Mail::UnstructuredField.new(name, value)
self.field.errors << [name, value, e]
self.field
end
end

Expand Down
2 changes: 2 additions & 0 deletions lib/mail/fields/content_transfer_encoding_field.rb
Expand Up @@ -10,6 +10,8 @@ class ContentTransferEncodingField < StructuredField

def initialize(*args)
super(CAPITALIZED_FIELD, strip_field(FIELD_NAME, args.last.to_s.downcase))
parse(value)
self
end

def parse(val = value)
Expand Down
4 changes: 4 additions & 0 deletions lib/mail/fields/structured_field.rb
Expand Up @@ -31,6 +31,10 @@ def initialize(*args)
def default
decoded
end

def errors
[]
end

end
end
5 changes: 5 additions & 0 deletions lib/mail/fields/unstructured_field.rb
Expand Up @@ -18,11 +18,16 @@ class UnstructuredField
include Mail::Utilities

def initialize(*args)
@errors = []
self.name = args.first
self.value = args.last
self
end

def errors
@errors
end

def encoded
do_encode(self.name)
end
Expand Down
6 changes: 6 additions & 0 deletions lib/mail/header.rb
Expand Up @@ -34,6 +34,7 @@ class Header
# these cases, please make a patch and send it in, or at the least, send
# me the example so we can fix it.
def initialize(header_text = nil)
@errors = []
self.raw_source = header_text.to_crlf
split_header if header_text
end
Expand Down Expand Up @@ -74,6 +75,7 @@ def fields=(unfolded_fields)
unfolded_fields.each do |field|

field = Field.new(field)
field.errors.each { |error| self.errors << error }
selected = select_field_for(field.name)

if selected.any? && limited_field?(field.name)
Expand All @@ -85,6 +87,10 @@ def fields=(unfolded_fields)

end

def errors
@errors
end

# 3.6. Field definitions
#
# The following table indicates limits on the number of times each
Expand Down
27 changes: 25 additions & 2 deletions lib/mail/message.rb
Expand Up @@ -99,6 +99,7 @@ def initialize(*args, &block)
@body = nil
@text_part = nil
@html_part = nil
@errors = nil

@perform_deliveries = true
@raise_delivery_errors = true
Expand Down Expand Up @@ -376,6 +377,27 @@ def headers(hash = {})
end
end

# Returns a list of parser errors on the header, each field that had an error
# will be reparsed as an unstructured field to preserve the data inside, but
# will not be used for further processing.
#
# It returns a nested array of [field_name, value, original_error_message]
# per error found.
#
# Example:
#
# message = Mail.new("Content-Transfer-Encoding: weirdo\r\n")
# message.errors.size #=> 1
# message.errors.first[0] #=> "Content-Transfer-Encoding"
# message.errors.first[1] #=> "weirdo"
# message.errors.first[3] #=> <The original error message exception>
#
# This is a good first defence on detecting spam by the way. Some spammers send
# invalid emails to try and get email parsers to give up parsing them.
def errors
header.errors
end

# Returns the Bcc value of the mail object as an array of strings of
# address specs.
#
Expand Down Expand Up @@ -1258,7 +1280,8 @@ def has_charset?
end

def has_content_transfer_encoding?
!!content_transfer_encoding
header[:content_transfer_encoding] &&
header[:content_transfer_encoding].errors.blank?
end

def has_transfer_encoding? # :nodoc:
Expand Down Expand Up @@ -1723,7 +1746,7 @@ def separate_parts
end

def add_encoding_to_body
unless content_transfer_encoding.blank?
if has_content_transfer_encoding?
body.encoding = content_transfer_encoding
end
end
Expand Down

0 comments on commit 132bcff

Please sign in to comment.