GitHub - komoku/bridge2vec: bridge2vec: a modification of fastText proposed in the paper "Towards robust word embeddings for noisy texts".

Introduction

This is the code accompanying the paper Towards robust word embeddings for noisy texts. It is an adaptation of the fastText tool by Facebook (although an older version).

Requirements

Compilation is carried out using a Makefile, so you will need to have a working make and compilers with good C++11 support, such as g++-4.7.2 or clang-3.3, or newer. You will also need the utf8proc library.

Building bridge2vec

$ git clone https://github.com/yeraidm/bridge2vec.git
$ cd bridge2vec
$ make

Example usage

$ ./fasttext skipgram -input data.txt -output model

where data.txt is a training file containing UTF-8 encoded text. At the end of optimization the program will save two files: model.bin and model.vec. model.vec is a text file containing the word vectors, one per line. model.bin is a binary file containing the parameters of the model along with the dictionary and all hyper parameters.

For more information, see the original fastText's README included.

Reference

Please cite us if you use this code in your paper:

@article{doval2019robust,
  title={Towards robust word embeddings for noisy texts},
  author={Yerai Doval and Jesús Vilares and Carlos Gómez-Rodríguez},
  journal={arXiv preprint arXiv:1911.10876},
  year={2019}
}

License

bridge2vec is BSD-licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PATENTS		PATENTS
README.md		README.md
classification-example.sh		classification-example.sh
classification-results.sh		classification-results.sh
get-wikimedia.sh		get-wikimedia.sh
original_readme.md		original_readme.md
pretrained-vectors.md		pretrained-vectors.md
quantization-example.sh		quantization-example.sh
setup.cfg		setup.cfg
wikifil.pl		wikifil.pl
word-vector-example.sh		word-vector-example.sh

License

komoku/bridge2vec

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements

Building bridge2vec

Example usage

Reference

License

About

Resources

License

Stars

Watchers

Forks

Languages