Permalink
Browse files

Convert <div> to <p> tags from HTML emails

Fixes #820
  • Loading branch information...
nikolai-b committed Jan 11, 2019
1 parent 4082562 commit ca23fd7c27b764ab09967b03c1e1ce4fe694714c
Showing with 42 additions and 0 deletions.
  1. +1 −0 app/models/message_thread.rb
  2. +8 −0 spec/models/message_thread_spec.rb
  3. +33 −0 spec/support/text/html_div_email.txt
@@ -215,6 +215,7 @@ def add_messages_from_email!(mail, in_reply_to)
# For multipart messages we pull out the html part content and use python to remove the signature
body = %x(./lib/sig_strip.js #{Shellwords.escape(mail.message.html_part.decoded)})
body.gsub(%r{(</?html>|</?body>|</?head>|\r)},"")
.gsub(%r{<(/)?div},"<\\1p")
else
# When there is no HTML we get the text part or just the message and use EmailReplyParser to remove the signature
body = (mail.message.text_part || mail.message).decoded
@@ -296,6 +296,14 @@
thread.add_messages_from_email!(mail, nil)
expect(messages[-1].body).to eq("<p>\n This email has an HTML message body and a plain link <a href=\"http://www.example.com\">www.example.com</a> .\n</p>\n<br>\n<p>\nNikolai\n</p>\n<br>\n\n")
end

context "split by divs" do
let(:mail) { InboundMail.new(raw_message: File.read(raw_email_path("html_div"))) }
it 'should remove HTML signatures' do
thread.add_messages_from_email!(mail, nil)
expect(messages[-1].body).to eq("<p>  Text split by divs</p><p>And not by p tags</p>\n")
end
end
end

context 'with pgp sig' do
@@ -0,0 +1,33 @@
Return-Path: <george@example.net>
Date: Fri, 11 Jan 2019 11:08:02 +0000
From: George Coulouris <george@example.net>
To: message-2@cyclescape.org
Message-ID: <4B61E3D3-5D34-436E-BCD5-3A03DC7F550C@net>
In-Reply-To: <message-2@cyclescape.org>
Subject: Re: [Cyclescape] junction revisions
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary="Apple-Mail=_C821455D-1F0C-4FD1-A56E-B7CC812C2296"
Content-Transfer-Encoding: 7bit
Envelope-to: message-2@cyclescape.org

--Apple-Mail=_C821455D-1F0C-4FD1-A56E-B7CC812C2296
Content-Type: text/plain;
charset=utf-8
Content-Transfer-Encoding: quoted-printable

Text split by divs=0D
and not by p tags=0D

--Apple-Mail=_C821455D-1F0C-4FD1-A56E-B7CC812C2296
Content-Type: text/html;
charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; chars=
et=3Dutf-8"></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mod=
e: space; line-break: after-white-space;" class=3D""><div class=3D"">&nbs=
p; Text split by divs</div><div class=3D"">And not by p tags</div></body>=
</html>=

--Apple-Mail=_C821455D-1F0C-4FD1-A56E-B7CC812C2296--

3 comments on commit ca23fd7

@mvl22

This comment has been minimized.

Copy link
Member

mvl22 replied Jan 11, 2019

@nikolai-b Could you anonymise lines 1 (e-mail) and 3 (name) of that test example, and ideally ensure that any cases of email @cyclescape.org do not stay as such, as they will just generate spam.

@nikolai-b

This comment has been minimized.

Copy link
Contributor

nikolai-b replied Jan 11, 2019

@mvl22 george@example.net is already anonymised. Given that there are many links to @cyclescape.org (and it is the web address) I don't think it is worth worrying about spam, we should use more sophisticated checks if we think that email is a way of spammers posting content. Currently we are storing all inbound emails in the InboundMail table but I don't think that is a problem, the table is getting big though (298 MB) but most of these are delivery failure reports and us spamming ourself. I'll make an issue to clear it out though.

@mvl22

This comment has been minimized.

Copy link
Member

mvl22 replied Jan 11, 2019

is already anonymised

I was referring to the name part directly after "From", not the e-mail.

Please sign in to comment.