Skip to content

IreneMorazzoni/def_2_vec_irene

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Def2Vec

The codebase for the paper "Def2Vec: Extensible Word Embeddings from Dictionary Definitions". For all the references, contributions, and credits, please refer to the paper.

This code was initially developed as part of the M.Sc. Thesis in Computer Science and Engineering "Def2Vec: a model to extract word embeddings from dictionary definitions". The M.Sc. degree was released by the Dipartimento di Elettronica, Informazione e Bioingengeria (DEIB) of the Politecnico di Milano University (PoliMI).

Pre-trained models

We release pre-trained models learned from the English Wikitionary. The pre-trained models is compatible with the KeyedVectors of the Gensim API (see example below). The models are available at the following links:

Model size Text file Gensim KV file
50 download download
100 download download
300 download download

Note

Only the embedding part of the Def2Vec models is available at the moment (refer to the paper for further details).

Example

Here follow an example of:

  • model loading
  • word embedding extraction
  • sequence embedding extraction
import numpy as np
from nltk.tokenize import word_tokenize
from gensim.models import KeyedVectors


# Path to the Gensim KV file
path = './def2vec_en_wikitionary_50.kv'

# Load model
def2vec = KeyedVectors.load(path)
# Embed single word
embedding = def2vec['vector']
# Embed word sequence
sequence_embedding = np.vstack([
    def2vec[token.lower()] for token in word_tokenize('Vector semantics is cool!')
])

For further examples, refer to the Jupyter Notebook available in this repository.

Cite work

If you are willing to use our model, please cite our work through the following BibTeX entry:

@inproceedings{morazzoni-etal-2023-def2vec,
  author    = {Irene Morazzoni and
               Vincenzo Scotti and
               Roberto Tedesco},
  title     = {{Def2Vec}: Extensible Word Embeddings from Dictionary Definitions},
  booktitle = {6th International Conference on Natural Language and Speech Processing,
               {ICNLSP} 2023, Trento, Italy, December 16-17, 2023},
  publisher = {Association for Computational Linguistics},
  year      = {2023}
}

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published