Skip to content
Multi Text Classificaiton
Jupyter Notebook
Branch: master
Clone or download
Latest commit 5ee4cd9 May 2, 2019

Multi-classes task classification and LDA-based topic Recommender System

Here is my winning strategy to carry multi-text classification task out.

Data Source :

1 - Text Mining

  • Word Frequency Plot: Compare frequencies across different texts and quantify how similar and different these sets of word frequencies are using a correlation test. How correlated are the word frequencies between text1 and text2, and between text1 and text3?

  • Most discriminant and important word per categories

  • Relationships between words & Pairwise correlations: examining which words tend to follow others immediately, or that tend to co-occur within the same documents.

Which word is associated with another word? Note that this is a visualization of a Markov chain, a common model in text processing. In a Markov chain, each choice of word depends only on the previous word. In this case, a random generator following this model might spit out “collect”, then “agency”, then “report/credit/score”, by following each word to the most common words that follow it. To make the visualization interpretable, we chose to show only the most common word to word connections, but one could imagine an enormous graph representing all connections that occur in the text.

  • Distribution of words: Want to show that there are similar distributions for all texts, with many words that occur rarely and fewer words that occur frequently. Here is the goal of Zip Law (extended with Harmonic mean) - Zipf’s Law is a statistical distribution in certain data sets, such as words in a linguistic corpus, in which the frequencies of certain words are inversely proportional to their ranks.

  • How to spell variants of a given word

  • Chi-Square to see which words are associated to each category: find the terms that are the most correlated with each of the categories

  • Part of Speech Tags and Frequency distribution of POST: Noun Count, Verb Count, Adjective Count, Adverb Count and Pronoun Count

  • Metrics of words: Word Count of the documents – ie. total number of words in the documents, Character Count of the documents – total number of characters in the documents, Average Word Density of the documents – average length of the words used in the documents, Puncutation Count in the Complete Essay – total number of punctuation marks in the documents, Upper Case Count in the Complete Essay – total number of upper count words in the documents, Title Word Count in the Complete Essay – total number of proper case (title) words in the documents

2 - Word Embedding

A - Frequency Based Embedding

  • Count Vector
  • TF IDF
  • Co-Occurrence Matrix with a fixed context window (SVD)
  • TF-ICF
  • Function Aware Components

B - Prediction Based Embedding

  • CBOW (word2vec)
  • Skip-Grams (word2vec)
  • Glove
  • At character level -> FastText
  • Topic Model as features // LDA features


Visualization provides a global view of the topics (and how they differ from each other), while at the same time allowing for a deep inspection of the terms most highly associated with each individual topic. A novel method for choosing which terms to present to a user to aid in the task of topic interpretation, in which we define the relevance of a term to a topic.

C - Poincaré Embedding [Embeddings and Hyperbolic Geometry]

The main innovation here is that these embeddings are learnt in hyperbolic space, as opposed to the commonly used Euclidean space. The reason behind this is that hyperbolic space is more suitable for capturing any hierarchical information inherently present in the graph. Embedding nodes into a Euclidean space while preserving the distance between the nodes usually requires a very high number of dimensions.

Learning representations of symbolic data such as text, graphs and multi-relational data has become a central paradigm in machine learning and artificial intelligence. For instance, word embeddings such as WORD2VEC, GLOVE and FASTTEXT are widely used for tasks ranging from machine translation to sentiment analysis.

Typically, the objective of embedding methods is to organize symbolic objects (e.g., words, entities, concepts) in a way such that their similarity in the embedding space reflects their semantic or functional similarity. For this purpose, the similarity of objects is usually measured either by their distance or by their inner product in the embedding space. For instance, Mikolov embed words in Rd such that their inner product is maximized when words co-occur within similar contexts in text corpora. This is motivated by the distributional hypothesis, i.e., that the meaning of words can be derived from the contexts in which they appear.

3 - Algorithms

A - Traditional Methods

  • CountVectorizer + Logistic
  • CountVectorizer + NB
  • CountVectorizer + LightGBM
  • HasingTF + IDF + Logistic Regression
  • TFIDF + NB
  • TFIDF + LightGBM
  • TF-IDF + SVM
  • Hashing Vectorizer + Logistic
  • Hashing Vectorizer + NB
  • Hashing Vectorizer + LightGBM
  • Bagging / Boosting
  • Word2Vec + Logistic
  • Word2Vec + LightGNM
  • Word2Vec + XGBoost
  • LSA + SVM

B - Deep Learning Methods

  • GRU + Attention Mechanism
  • CNN + RNN + Attention Mechanism
  • CNN + LSTM/GRU + Attention Mechanism

4 - Explainability

Goal: explain predictions of arbitrary classifiers, including text classifiers (when it is hard to get exact mapping between model coefficients and text features, e.g. if there is dimension reduction involved)

  • Lime
  • Skate
  • Shap

5 - MyApp of multi-classes text classification with Attention mechanism

6 - Ressources / Bibliography

  1. Other Topics - Text Similarity [Word Mover Distance] =========================================================

Others [Quora Datset] :

8 - Other Topics - Topic Modeling LDA

9 - Variational Autoencoder

You can’t perform that action at this time.