Skip to content
Cross-lingual text classification model trained on a single language which can be used to predict labels on a document having different language without machine translation.
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.ipynb_checkpoints
__pycache__
amazon-dataset
LICENSE
README.md
cross-lingual-text-classification-average-vector.ipynb
cross-lingual-text-classification-lstm.ipynb
util.py

README.md

Cross-Lingual Text Classification using Muse Embeddings and Deeplearning

Information

In this project, text-classification model (sentiment analysis) is trained by using Facebook's Muse Embeddings (English). The same model can be used to classify the text having different language without requiring machine translation to english or retraining.

Models Trained

  1. Simple Dense Network, Input - > Average of Token Embeddings
  2. LSTM Network, Input -> Document Embeddings as a Sequence

Results

Model Trained on - English

Method 1 - Vector Average

  1. German - 70.5
  2. French - 72.45

Method 2 - LSTM Network (Can be improved with fine-tuning input and network)

  1. German - 70.15
  2. French - 67.9

Dataset

Amazon Book review in English, French and German (Attached in the repo).

  • Training - 2000 records
  • Testing - 2000 records
  • Ratings are used for labelling the records

Muse Embeddings

Download the English (en) French (fr) and German (de) embeddings:

# English MUSE embeddings
curl -o data/wiki.en.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.en.vec
# French MUSE Wikipedia embeddings
curl -o data/wiki.fr.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.fr.vec
# German MUSE Wikipedia embeddings
curl -o data/wiki.de.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.de.vec

Dependencies - Python 3.6

  1. Tensorflow
  2. Keras
  3. scikit-learn
  4. nltk
  5. pandas
You can’t perform that action at this time.