ClassMail

Classmail is a mail classification Python library optimized for french mails in the field of insurance. Classmail was created to automate mail classification workflow in quick experiments. Developped during my internship at [Covéa](https://www.covea.eu).

Classmail provides:

Data visualisation: For quick data analysis, based on matplotlib and seaborn
Mails preprocessing (cleaning): Optimised for inasurrance purposes, with prebuilt regular expressions (in french). This configuration file can be adapted for other languages or purposes.
Deep learning model creation (for classification): Simple interface to build Pytorch models quickly based on Flair nlp library.
Model analysis and explainer Simple interface with prebuilt seaborn graphs and model explainer based on Lime.

Quick Start

Requirements and Installation

The project is based on Python 3.7+. If you do not have Python 3.6, install it first. Then, in your favorite virtual environment, simply do:

pip install classmail

Example Usage

Let's run named entity recognition (NER) over an example sentence. All you need to do is make a Sentence, load a pre-trained model and use it to predict tags for the sentence:

Data analysis

import classmail.data_visualisation.data_visualisation as dv

# show class balancing graph
dv.plot_class_balancing(data,col_text='header_body',col_label="COMPETENCE", title="Catégories des mails")
#show most frequent bigrams
dv.plot_word_frequencies(data['message'],ngram=2,words_nb=20)
#plot a wordcloud with most frequent terms in body
dv.plot_wordcloud(data['body'])

Cleaning

from classmail.nlp.cleaning import clean_mail

#create a new column in data ("clean_text") with preprocessed header and body
data = clean_mail(data,"body","header")

I defined my own conf.json :

from classmail.nlp.configjson import ConfigJsonReader as conf

conf.set_config_path(file_path='path/to/my/conf.json')
conf_dict = conf.get_config_file()
print(conf_dict)  # will print the json.

Model creation and training

from classmail.classification.trainer import Trainer

trainer = Trainer()
#generate train / test / val csv files
trainer.prepare_data(data, col_text="clean_text",col_label="COMPETENCE", train_size=0.7, val_size=0.15, test_size=0.15)

#create a new column in data ("clean_text") with preprocessed header and body
data = clean_mail(data,"body","header")

#train a model with default parameters
trainer.train_model(model_name="default_model")

Model predictions, evaluation and explaination

from classmail.classification.model import Model

#load our model, saved in "ressources" folder
model = Model("ressources\\model_default")
#predictions
predictions=model.get_predictions(X_test)
#confusion matrix
model.plot_confusion_matrix(pred_labels=predictions, true_labels=y_test)
#explain one exemple at index 110
model.visualize_one_ex(X_test,y_test,index=110,num_features=6)
#compute most discriminants words in each category
sorted_contributions = model.get_statistical_explanation(X_test, ["class 1","class 2","class 3"] sample_size=15)
#plot them for first class
model.plot_discriminant_words(sorted_contributions, "class 1", nb_words=15)

Tutorial

Here is a more complete usage exemple for the mail classification task. Data cannot be provided for legislation and privacy matters.

Tutorial : General workflow

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
classmail		classmail
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Tutorial.ipynb		Tutorial.ipynb
classmail_icon.png		classmail_icon.png
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClassMail

Quick Start

Requirements and Installation

Example Usage

I defined my own conf.json :

Tutorial

About

Releases

Packages

Languages

License

Fabien-Couthouis/ClassMail

Folders and files

Latest commit

History

Repository files navigation

ClassMail

Quick Start

Requirements and Installation

Example Usage

I defined my own conf.json :

Tutorial

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages