Cross-Lingual Text Classification using Muse Embeddings and Deeplearning

Information

In this project, text-classification model (sentiment analysis) is trained by using Facebook's Muse Embeddings (English). The same model can be used to classify the text having different language without requiring machine translation to english or retraining.

Models Trained

Simple Dense Network, Input - > Average of Token Embeddings
LSTM Network, Input -> Document Embeddings as a Sequence

Results

Model Trained on - English

Method 1 - Vector Average

German - 70.5
French - 72.45

Method 2 - LSTM Network (Can be improved with fine-tuning input and network)

German - 70.15
French - 67.9

Dataset

Amazon Book review in English, French and German (Attached in the repo).

Training - 2000 records
Testing - 2000 records
Ratings are used for labelling the records

Muse Embeddings

Download the English (en) French (fr) and German (de) embeddings:

# English MUSE embeddings
curl -o data/wiki.en.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.en.vec
# French MUSE Wikipedia embeddings
curl -o data/wiki.fr.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.fr.vec
# German MUSE Wikipedia embeddings
curl -o data/wiki.de.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.de.vec

Dependencies - Python 3.6

Tensorflow
Keras
scikit-learn
nltk
pandas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cross-Lingual Text Classification using Muse Embeddings and Deeplearning

Information

Models Trained

Results

Model Trained on - English

Dataset

Muse Embeddings

Dependencies - Python 3.6

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cross-Lingual Text Classification using Muse Embeddings and Deeplearning

Information

Models Trained

Results

Model Trained on - English

Dataset

Muse Embeddings

Dependencies - Python 3.6