Skip to content
Experiments with the enron email corpus
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Imagine we would like to know who is the best person to ask about a subject inside our company —a potential mentor. One way would be to infer each person’s speciality from their main body of work: emails.

If we lived in another world in which privacy is not an obvious concern —or if we worked in Google— reading other people's email would be totally kosher. In the normal, privacy-complaint world, this remains a purely academic exercise.

However we do have access to a publicly-released corpus of emails to work with: the Enron email dataset.

When I first approached this subject, my first idea was to use a named entity recognize (NER), because if one were designing a recommender system for an energy company, one of the use cases would be to suggest whom to ask about a very specific technical issue. At the time, I found SpaCy to to have a nice NER for python.

Read the post at

You can’t perform that action at this time.