Skip to content
A Java parser for Outlook messages (.msg files)
Java
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Switched to latest project parent and CircleCI Orb [skip ci] Dec 19, 2019
src
.gitignore
.travis.yml
LICENSE-2.0.txt
NOTICE.txt
README.md
RELEASE.txt
how to release.txt
pom.xml
properties-list1.txt
properties-list2.txt

README.md

APACHE v2 License Latest Release Javadocs Codacy

Outlook Message Parser

Outlook Message Parser is a small open source Java library that parses Outlook .msg files.

<dependency>
  <groupId>org.simplejavamail</groupId>
  <artifactId>outlook-message-parser</artifactId>
  <version>1.7.4</version>
</dependency>

Outlook Message Parser is a continuation (or fork if that project independently continues) of msgparser.

Under the hood it uses the Apache POI - POIFS library to parse the message files which use the OLE 2 Compound Document format. Thus, it is merely a convenience library that covers the details of the .msg file. The implementation is based on the information provided at fileformat.info.

v1.7.0 - v1.7.4 (9-January-2020 - 27-January-2020)

  • v1.7.4 - #27 Same as 1.7.3, but now also for chinese senders
  • v1.7.3 - #27 When from name/address are not available (unsent emails), these fields are filled with binary garbage
  • v1.7.2 - #26 To email address is not handled properly when name is omitted
  • v1.7.1 - #25 NPE on ClientSubmitTime when original message has not been sent yet
  • v1.7.1 - #23 Bug: _nameid directory should not be parsed (and causing invalid HTML body)
  • v1.7.0 - #18 Upgrade Apache POI 3.9 -> 4.x

Note: Apache POI requires minimum Java 8

v1.6.0 (8-January-2020)

  • #21 Multiple TO recipients are not handles properly

v1.5.0 (18-December-2019)

  • #20 CC and BCC recipients are not parsed properly
  • #19 Use real Outlook ContentId Attribute to resolve CID Attachments

v1.4.1 (22-October-2019)

  • #17 Fixed encoding error for UTF-8's Windows legacy name (cp)65001

v1.4.0 (13-October-2019)

  • #9 Replaced the RFC to HTML converter with a brand new RFC-compliant convert! (thanks to @fadeyev!)

v1.3.0 (4-October-2019)

  • #14 Dependency problem with Java9+, missing Jakarta Activation Framework
  • #13 HTML start tags with extra space not handled correctly
  • #11 SimpleRTF2HTMLConverter inserts too many
    tags
  • #10 Embedded images with DOS-like names are classified as attachments
  • #9 SimpleRTF2HTMLConverter removes some valid tags during conversion

v1.2.1 (12-May-2019)

  • Ignore non S/MIME related content types when extracting S/MIME metadata
  • Added toString and equals methods to the S/MIME data classes

v1.1.21 (4-May-2019)

  • Upgraded mediatype recognition based on file extension for incomplete attachments
  • Added / improved support for public S/MIME meta data

v1.1.20 (14-April-2019)

  • #7 Fix missing S/MIME header details that are needed to determine the type of S/MIME application

v1.1.19 (10-April-2019)

  • Log rtf compression error, but otherwise ignore it and keep going and extract what we can.

v1.1.18 (5-April-2019)

  • #6 Missing mimeTag for attachments should be guessed based on file extension

v1.1.17 (19-August-2018)

  • #3 implemented robust support for character sets / code pages in RTF to HTML conversion (fixes chinese support #3)
  • fixed bug where too much text was cleaned up as part of superfluous RTF cleanup step when converting to HTML
  • Performance boost in the RTF -> HTML converter

v1.1.16 (~28-Februari-2017)

v1.16

  • Added support for replyTo name and address
  • cleaned up code (1st wave)
You can’t perform that action at this time.