In this project, text-classification model (sentiment analysis) is trained by using Facebook's Muse Embeddings (English). The same model can be used to classify the text having different language without requiring machine translation to english or retraining.
- Simple Dense Network, Input - > Average of Token Embeddings
- LSTM Network, Input -> Document Embeddings as a Sequence
Method 1 - Vector Average
- German - 70.5
- French - 72.45
Method 2 - LSTM Network (Can be improved with fine-tuning input and network)
- German - 70.15
- French - 67.9
Amazon Book review in English, French and German (Attached in the repo).
- Training - 2000 records
- Testing - 2000 records
- Ratings are used for labelling the records
Download the English (en) French (fr) and German (de) embeddings:
# English MUSE embeddings
curl -o data/wiki.en.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.en.vec
# French MUSE Wikipedia embeddings
curl -o data/wiki.fr.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.fr.vec
# German MUSE Wikipedia embeddings
curl -o data/wiki.de.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.de.vec
- Tensorflow
- Keras
- scikit-learn
- nltk
- pandas