Skip to content
Cross-lingual text classification model trained on a single language which can be used to predict labels on a document having different language without machine translation.
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Cross-Lingual Text Classification using Muse Embeddings and Deeplearning


In this project, text-classification model (sentiment analysis) is trained by using Facebook's Muse Embeddings (English). The same model can be used to classify the text having different language without requiring machine translation to english or retraining.

Models Trained

  1. Simple Dense Network, Input - > Average of Token Embeddings
  2. LSTM Network, Input -> Document Embeddings as a Sequence


Model Trained on - English

Method 1 - Vector Average

  1. German - 70.5
  2. French - 72.45

Method 2 - LSTM Network (Can be improved with fine-tuning input and network)

  1. German - 70.15
  2. French - 67.9


Amazon Book review in English, French and German (Attached in the repo).

  • Training - 2000 records
  • Testing - 2000 records
  • Ratings are used for labelling the records

Muse Embeddings

Download the English (en) French (fr) and German (de) embeddings:

# English MUSE embeddings
curl -o data/wiki.en.vec
# French MUSE Wikipedia embeddings
curl -o data/
# German MUSE Wikipedia embeddings
curl -o data/

Dependencies - Python 3.6

  1. Tensorflow
  2. Keras
  3. scikit-learn
  4. nltk
  5. pandas
You can’t perform that action at this time.