Representation learning for tokens in sequences with neural probabilistic prior (WEP-Word-Embedding)

This is the code for the paper 'Word Embedding with Neural Probabilistic Prior', Ren et al., SDM 2024.

Representation learning of tokens in sequences is still an important problem in information retrieval, vector representation-based databases, etc. It is an effective tool when only a small training dataset is available. The proposed neural probabilistic prior can be extended to other token representation learning problems, e.g., protein sequences, molecule sequences, etc.

Run the code

Install packages

pip3 install -r requirements.txt

Download the dataset:

pip install gdown
gdown --id 1iFpuKFpDnXCD9QpUw8wStG3ndKl7-KwX -O data.zip
unzip data.zip
rm data.zip

Run WEPSyn; example command line:

python3  WEPSyn.py -name test_embeddings -alpha 1.0 -gpu 1 -dump -embed_dim 300 -batch 256

Run WEPSem; example command line:

python3 WEPSem.py -embed ./embeddings/pretrained_embed  -semantic synonyms -embed_dim 300 -alpha 0.001  -name fine_tuned_embeddings -dump -gpu 5

To cite the paper:

@inproceedings{ren2024word,
  title={Word Embedding with Neural Probabilistic Prior},
  author={Ren, Shaogang and Li, Dingcheng and Li, Ping},
  booktitle={Proceedings of the 2024 SIAM International Conference on Data Mining (SDM)},
  pages={896--904},
  year={2024},
  organization={SIAM}
}

The package was developed based on the implementation of 'Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks', Vashishth et al., ACL'19. (https://github.com/malllabiisc/WordGCN)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
web		web
LICENSE		LICENSE
README.md		README.md
WEPSem.py		WEPSem.py
WEPSyn.py		WEPSyn.py
batchGen.so		batchGen.so
batch_generator.cpp		batch_generator.cpp
batch_generator.py		batch_generator.py
helper.py		helper.py
makefile		makefile
models.py		models.py
requirements.txt		requirements.txt
switch_evaluation_data.py		switch_evaluation_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

web

web

LICENSE

LICENSE

README.md

README.md

WEPSem.py

WEPSem.py

WEPSyn.py

WEPSyn.py

batchGen.so

batchGen.so

batch_generator.cpp

batch_generator.cpp

batch_generator.py

batch_generator.py

helper.py

helper.py

makefile

makefile

models.py

models.py

requirements.txt

requirements.txt

switch_evaluation_data.py

switch_evaluation_data.py

Repository files navigation

Representation learning for tokens in sequences with neural probabilistic prior (WEP-Word-Embedding)

Run the code

To cite the paper:

About

Releases

Packages

Languages

License

ShaogangRen/WEP-Word-Embedding

Folders and files

Latest commit

History

Repository files navigation

Representation learning for tokens in sequences with neural probabilistic prior (WEP-Word-Embedding)

Run the code

To cite the paper:

About

Resources

License

Stars

Watchers

Forks

Languages