Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: __nameid_ directory should not be parsed (and causing invalid HTML body) #23

Closed
derrohrbach opened this issue Jan 22, 2020 · 13 comments · Fixed by #24
Closed

Bug: __nameid_ directory should not be parsed (and causing invalid HTML body) #23

derrohrbach opened this issue Jan 22, 2020 · 13 comments · Fixed by #24

Comments

@derrohrbach
Copy link

As discussed in PR #22, see also nameid-fix branch:

In a MSG file the _nameid... directory represents named properties, which are as of right now not supported by this library. It tries to parse the entries in this directory using the normal property parser though, which creates conflicts and is wrong. You can also refer to the Microsoft documentation for this: [MS-OXMSG].

I had several emails where an id in this directory collided with the property id for HTML-Content (10130102), which caused RTF-Emails to incorrctly report this invalid content as the HTML body, which prevented the RTF Conversion from being read.

grafik

@derrohrbach derrohrbach changed the title Bugfix: __nameid_ directory should not be parsed Bug: __nameid_ directory should not be parsed Jan 22, 2020
@derrohrbach
Copy link
Author

The fix in #22 could not be merged because of Tests failing! I can not reproduce this though, because my tests always fail with this error:

[ERROR] Tests run: 16, Failures: 0, Errors: 16, Skipped: 0, Time elapsed: 0.07 s <<< FAILURE! - in org.simplejavamail.outlookmessageparser.HighoverEmailsTest
[ERROR] testUnicodeMessage(org.simplejavamail.outlookmessageparser.HighoverEmailsTest)  Time elapsed: 0.009 s  <<< ERROR!
java.lang.Error:
Unresolved compilation problems:
        OutlookMessageAssert cannot be resolved
        OutlookMessageAssert cannot be resolved
        OutlookMessageAssert cannot be resolved
        OutlookMessageAssert cannot be resolved
        The method normalizeText(String) is undefined for the type HighoverEmailsTest

        at org.simplejavamail.outlookmessageparser.HighoverEmailsTest.testUnicodeMessage(HighoverEmailsTest.java:520)

@bbottema
Copy link
Owner

bbottema commented Jan 22, 2020

Ahh, those are classes generated by a maven plugin when you run mvn test

@derrohrbach
Copy link
Author

Well it seems for me those are not being generated, because i run using mvn test

@derrohrbach
Copy link
Author

Are there any news about these errors? I would really like to run the tests and fix the bugs ;)

@bbottema
Copy link
Owner

bbottema commented Jan 24, 2020

I really don't have a clue why mvn test wouldn't work for you. Except maybe that you're JDK is too new? I'm developing this library with JDK 1.8 (and Maven 3.6.0).

@derrohrbach
Copy link
Author

Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: C:\Maven\bin\..
Java version: 1.8.0_222, vendor: AdoptOpenJDK, runtime: C:\Program Files\AdoptOpenJDK\jdk-8.0.222.10-hotspot\jre
Default locale: de_DE, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

@derrohrbach
Copy link
Author

I got something running by copying the class files from target/test-classes to target/classes... Now trying to look into the tests

@bbottema
Copy link
Owner

That's just bizar that you have to copy anything, I have no explanation for it.

@derrohrbach
Copy link
Author

derrohrbach commented Jan 24, 2020

Well a second run of the tests does not work either and fails with an error about something being already instrumented. My current procedure for testing:

  1. mvn clean test
  2. copy files
  3. mvn test

I was able to verify though that it's an error in the tests and not in the fixed code. It always fails at getBodyHTML().isNotEmpty(), because before you had the invalid body content in there. Now it is null, which is correct, because there is no HTML body in these four mails.

To verify this just look at the supposed HTML content which the current code produces for the following mails:

  • plain chain.msg
  • chinese message.msg
  • forward with attachments and embedded images.msg
  • nested simple mail.msg

You will see that the content is just garbled data. That is essentially the bug, which the patch fixed.
I will open a new pull request with the tests fixed.

@bbottema
Copy link
Owner

Thanks for the research, much appreciated. Before I can merge the change, I would like to investigate why body HTML remains empty even if there is an RTF body. I'll be able to check that this weekend.

@derrohrbach
Copy link
Author

Well thats a different issue though, isn't it? The behaviour with this bugfix is better than before, because you can at least see, that there is no HTML Body, instead of getting invalid data. Also how would you detect if it is a real html email or if it was converted from rtf, if the html attribute is always set?

@bbottema
Copy link
Owner

Indeed and I wasn't looking properly (it's been some time since I touched that part of the code): the converted HTML is there, but in msg.getConvertedBodyHTML() instead of msg.getBodyHTML(). so that's all hunky-dory. I'll prepare a release soon.

@bbottema bbottema added this to the 1.7.1 milestone Jan 24, 2020
@bbottema bbottema changed the title Bug: __nameid_ directory should not be parsed Bug: __nameid_ directory should not be parsed (also causing invalid HTML body) Jan 24, 2020
@bbottema bbottema changed the title Bug: __nameid_ directory should not be parsed (also causing invalid HTML body) Bug: __nameid_ directory should not be parsed (and causing invalid HTML body) Jan 24, 2020
@bbottema
Copy link
Owner

Released in 1.7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants