Skip to content

Commit

Permalink
add support for usenet-style signature format
Browse files Browse the repository at this point in the history
  • Loading branch information
ZogStriP committed Oct 12, 2016
1 parent a90a7ea commit 53bddc4
Show file tree
Hide file tree
Showing 22 changed files with 134 additions and 35 deletions.
62 changes: 41 additions & 21 deletions lib/email_reply_trimmer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@ class EmailReplyTrimmer
TEXT = "t"

def self.identify_line_content(line)
return EMPTY if EmptyLineMatcher.match?(line)
return DELIMITER if DelimiterMatcher.match?(line)
return SIGNATURE if SignatureMatcher.match?(line)
return EMBEDDED if EmbeddedEmailMatcher.match?(line)
return EMAIL_HEADER if EmailHeaderMatcher.match?(line)
return QUOTE if QuoteMatcher.match?(line)
return EMPTY if EmptyLineMatcher.match? line
return DELIMITER if DelimiterMatcher.match? line
return SIGNATURE if SignatureMatcher.match? line
return EMBEDDED if EmbeddedEmailMatcher.match? line
return EMAIL_HEADER if EmailHeaderMatcher.match? line
return QUOTE if QuoteMatcher.match? line
return TEXT
end

Expand All @@ -34,22 +34,26 @@ def self.trim(text, split=false)

# fix embedded email markers that might span over multiple lines
EmbeddedEmailMatcher::ON_DATE_SOMEONE_WROTE_REGEXES.each do |r|
if text =~ r
text.gsub!(r) { |m| m.gsub(/\n[[:space:]>\-]*/, " ") }
end
text.gsub!(r) { |m| m.gsub(/\n[[:space:]>\-]*/, " ") }
end

removed = []

# from now on, we'll work on a line-by-line basis
lines = text.split("\n")
lines_dup = lines.dup

# identify content of each lines
pattern = lines.map { |l| identify_line_content(l) }.join

# remove all signatures & delimiters
while pattern =~ /[ds]/
index = pattern =~ /[ds]/
# remove everything after the first delimiter
if pattern =~ /d/
index = pattern =~ /d/
pattern = pattern[0...index]
lines = lines[0...index]
end

# remove all mobile signatures
while pattern =~ /s/
index = pattern =~ /s/
pattern.slice!(index)
lines.slice!(index)
end
Expand All @@ -58,7 +62,6 @@ def self.trim(text, split=false)
# then take everything up to that marker
if pattern =~ /te*b[^q]*$/
index = pattern =~ /te*b[^q]*$/
removed = lines[(index + 1)..-1]
pattern = pattern[0..index]
lines = lines[0..index]
end
Expand All @@ -67,16 +70,15 @@ def self.trim(text, split=false)
# then take everything up to that marker
if pattern =~ /te*b[eqbh]*[te]*$/
index = pattern =~ /te*b[eqbh]*[te]*$/
removed = lines[(index + 1)..-1]
pattern = pattern[0..index]
lines = lines[0..index]
end

# if there still are some embedded email markers, just remove them
while pattern =~ /b/
index = pattern =~ /b/
pattern[index] = "e"
lines[index] = ""
pattern.slice!(index)
lines.slice!(index)
end

# fix email headers when they span over multiple lines
Expand All @@ -90,7 +92,6 @@ def self.trim(text, split=false)
# these headers
if pattern =~ /t[eq]*h{3,}/
index = pattern =~ /t[eq]*h{3,}/
removed = lines[(index + 1)..-1]
pattern = pattern[0..index]
lines = lines[0..index]
end
Expand All @@ -111,13 +112,32 @@ def self.trim(text, split=false)

# results
trimmed = lines.join("\n").strip
elided = removed.join("\n").strip

if split
[trimmed, elided]
[trimmed, compute_elided(lines_dup, lines)]
else
trimmed
end
end

private

def self.compute_elided(text, lines)
elided = []

t = 0
l = 0

while t < text.size
while l < lines.size && text[t] == lines[l]
t += 1
l += 1
end
elided << text[t]
t += 1
end

elided.join("\n").strip
end

end
4 changes: 2 additions & 2 deletions lib/email_reply_trimmer/delimiter_matcher.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
class DelimiterMatcher

DELIMITER_CHARACTERS ||= ['-', '_', '=', '+','~', '#', '*', 'ᐧ']
DELIMITER_REGEX ||= /^[[:space:]]*[#{Regexp.escape(DELIMITER_CHARACTERS.join)}]+[[:space:]]*$/
DELIMITER_CHARACTERS ||= "-_,=+~#*ᐧ"
DELIMITER_REGEX ||= /^[[:space:]]*[#{Regexp.escape(DELIMITER_CHARACTERS)}]+[[:space:]]*$/

def self.match?(line)
line =~ DELIMITER_REGEX
Expand Down
10 changes: 10 additions & 0 deletions test/elided/delimiters.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
--
***
####
~~~~~
======
_______
++++++++

-------
2 changes: 2 additions & 0 deletions test/elided/email_headers_1.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
------------------------------

*From:* Outlook user
*Sent:* 2016-01-27
*To:* info@discourse.org
Expand Down
1 change: 1 addition & 0 deletions test/elided/email_headers_2.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
________________________________________
From: Discourse <info@discourse.org.
Sent: Thursday, 28 January 2016 8:16 p.m.
To: Someone
Expand Down
1 change: 1 addition & 0 deletions test/elided/email_headers_3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Subject: VIS
Here's an email with some very important stuff.


________________________________
Reply here<http://foo.bar> or hit reply from your inbox to help members by sharing your ideas.
Mute this topic<http://42.wat> to stop getting updates, we'll send you the next one.

Expand Down
1 change: 1 addition & 0 deletions test/elided/embedded_ception.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ On Mon, Feb 1, 2016 at 6:32 PM, Jeff Atwood <info@discourse.org> wrote:
>


--
Some One
Community Manager
foo@bar.com
Expand Down
7 changes: 7 additions & 0 deletions test/elided/embedded_email_10.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
Sent from Outlook Mobile<https://foo.bar>




On Sun, Feb 7, 2016 at 12:12 AM -0800, "Arpit Jalan" <arpit.jalan@discourse.org<mailto:arpit.jalan@discourse.org>> wrote:

Hi Some,
Expand All @@ -15,7 +20,9 @@ On Fri, 5 Feb 2016 at 10:42, Some One <foo@bar.com<mailto:foo@bar.com>> wrote:
Arpit,
Yes that sounds good.

Sent from Outlook Mobile<https://foo.bar>

_____________________________
From: Arpit Jalan <arpit.jalan@discourse.org<mailto:arpit.jalan@discourse.org>>
Sent: Thursday, February 4, 2016 10:05 AM
Subject: Meta Discourse update
Expand Down
10 changes: 10 additions & 0 deletions test/elided/embedded_email_7.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
On Tue, 2011-03-01 at 18:02 +0530, Some One wrote:

>
> This is another part of the embedded email.
>
>


_______________________
And here's my signature.
3 changes: 3 additions & 0 deletions test/elided/embedded_email_german_1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@ codinghorror via Discourse Meta <info@discourse.org> schrieb:
>
>To unsubscribe from these emails, visit your [user
>preferences](http://meta.discourse.org/user_preferences).

--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
1 change: 1 addition & 0 deletions test/elided/embedded_email_italian.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,6 @@
> To unsubscribe from these emails, change your [user
> preferences](https://meta.discourse.org/my/preferences)

--
Stefano Costa @stekosteko
Editor, Journal of Open Archaeology Data
3 changes: 3 additions & 0 deletions test/elided/embedded_email_polish.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
--
Łukasz Jan Niemier

Dnia 14 lip 2015 o godz. 00:25 Michael Downey <info@discourse.org> napisał(a):

>
Expand Down
1 change: 1 addition & 0 deletions test/elided/embedded_email_quote_text.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
On Mon, Aug 19, 2013 at 2:36 AM, SomeOne via Discourse Meta < info@discourse.org> wrote:
1 change: 1 addition & 0 deletions test/elided/embedded_email_spanish_2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ Asunto: [MP]Parser del email
Visita el tema o responde a este email para publicar.
Para no recibir m=C3=A1s notificaciones de este tema en particular, haz cli=
c aqu=C3=AD. Para darte de baja de estos emails, cambia tus preferencias
=
6 changes: 6 additions & 0 deletions test/elided/forwarded_message.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---------- Forwarded message ----------
From: Some One <foo@bar.com>
Date: Thu, Jan 28, 2016 at 4:00 PM
Subject: Some subject that
spans over 2 lines
To: infod@discourse.org
26 changes: 26 additions & 0 deletions test/elided/signatures.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Envoyé depuis mon iPhone

Von meinem Mobilgerät gesendet
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Someone from mobile
From My Iphone 6
Sent via mobile
Sent with Airmail
Sent from Windows Mail
Sent from Mailbox
Sent from Mailbox for iPad
Sent from Yahoo Mail on Android
Sent from my TI-85
Sent from my iPhone
Sent from my iPod
Sent from my Alcatel Flash2
Sent from my mobile device
Sent from my cell, please excuse any typos.
Sent from my Samsung Galaxy s5 Octacore device
Sent from my HTC M8 Android phone. Please excuse typoze
Sent from my Windows 8 PC <http://windows.microsoft.com/consumer-preview>
<<sent by galaxy>>
(sent from a phone)
(Sent from mobile device)
從我的 iPhone 傳送
7 changes: 7 additions & 0 deletions test/elided/usenet.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
--
Mit lieben Grüßen

John Doe
http://blog.john.doe
www.facebook.com/johndoe
Mobil: +12 345 6789 012
9 changes: 9 additions & 0 deletions test/emails/usenet.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Mal sehen was hier mit der Signatur passiert!

--
Mit lieben Grüßen

John Doe
http://blog.john.doe
www.facebook.com/johndoe
Mobil: +12 345 6789 012
2 changes: 1 addition & 1 deletion test/test_email_reply_trimmer.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
require "minitest/autorun"
require "email_reply_trimmer"

class TestEmailReplyTrimmer < Minitest::Unit::TestCase
class TestEmailReplyTrimmer < Minitest::Test

EMAILS = Dir["test/emails/*.txt"].map { |path| File.basename(path) }
TRIMMED = Dir["test/trimmed/*.txt"].map { |path| File.basename(path) }
Expand Down
9 changes: 0 additions & 9 deletions test/trimmed/embedded_email_7.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
This is a line before the embedded email.

> Hello
>
> This is the embedded email.
Expand All @@ -9,11 +8,3 @@ This is some text
after the

embedded email.

>
> This is another part of the embedded email.
>
>


And here's my signature.
2 changes: 0 additions & 2 deletions test/trimmed/embedded_email_polish.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
Oh, I've forgot to add. MIT

Łukasz Jan Niemier
1 change: 1 addition & 0 deletions test/trimmed/usenet.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Mal sehen was hier mit der Signatur passiert!

0 comments on commit 53bddc4

Please sign in to comment.