Imagine we would like to know who is the best person to ask about a subject inside our company —a potential mentor. One way would be to infer each person’s speciality from their main body of work: emails.
If we lived in another world in which privacy is not an obvious concern —or if we worked in Google— reading other people's email would be totally kosher. In the normal, privacy-complaint world, this remains a purely academic exercise.
However we do have access to a publicly-released corpus of emails to work with: the Enron email dataset.
When I first approached this subject, my first idea was to use a named entity recognize (NER), because if one were designing a recommender system for an energy company, one of the use cases would be to suggest whom to ask about a very specific technical issue. At the time, I found SpaCy to to have a nice NER for python.