Skip to content

alejandro-g-m/Gmail-MBOX-email-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Gmail MBOX email parser (Python)

Description

This is a quick solution to parse a Gmail export in MBOX format.

This parsing has been created to process emails for Machine Learning Classification. Therefore, the goal is to create a CSV file with two columns: the email's body and the classification label of that email.

Usage

By running the file as __main__ it will extract the messages contained in a MBOX file and create a list of CustomMessage objects (messages property).

A CustomMessage contains the subject, body and content type of an extracted email.

The code includes some functionality used to prepare the extracted data for Machine Learning algorithms that may not be relevant for every use case.

By running this code, after the main processing, the extracted data will be exported to a file:

to_file(text_messages_to_string(messages), 'file_name')

Status

At the time of writing this code I could not find good working examples to do this task. This is why I have decided to upload the developed code, so it can be used as a reference for anybody working on similar tasks. However, this code may not consider many corner cases and most likely will need some modifications depending on the use case. Also, it has not been tested properly yet.

This was developed to serve as a quick solution for a concrete use case that we had to handle in a Machine Learning task, so it should not be considered as a generic way to extract emails from an MBOX file. This has been only tried with Gmail exports, it is unknown how it would work with exports from other services.

About

Parser for Gmail exported emails in MBOX format.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages