Register Explorer

Language registers explorer powered by word embeddings.

This is the framework behind the web service http://ltr.uio.no/embeddings/registers/ accompanying the following paper:

A. Kutuzov, A.Marakasova, and E. Kuzmenko. Exploration of register-dependent lexical semantics using word embeddings. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), pp. 26-34. COLING 2016, Osaka, Japan.

https://www.clarin-d.net/images/lt4dh/pdf/LT4DH05.pdf

This source code can be easily adapted to any set of distributional models.

Installation

Clone the repository
Tune the config file dsm_genres.cfg according to your setup (especially root and url parameters)
Put your models into the models subdirectory (models can be either in gzipped binary word2vec format or in Gensim format)
Describe your models in the models.csv file; the model with all identifier will serve as the reference one
NB: we presuppose that your models use words augmented with PoS tags ('boot_VERB'); if they don't, you'll have to tune the code a bit
Download Stanford Core NLP suite (https://stanfordnlp.github.io/CoreNLP/); we use it for linguistic processing of the queries
Run Core NLP in the background (something like java -mx2g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer --port 9999)
Run the script word2vec_server_genres.py in the background; it loads the models, stays in memory and answers queries from the web interface
Install Gunicorn (http://gunicorn.org/)
Run the service with something like gunicorn run_explorer:app_explorer -b 127.0.0.1:10000
If your url parameter was set to "/mymodels/", your service should now be available at http://127.0.0.1:10000/mymodels/

In case you have any questions about the code, feel free to ask us.

Our contacts can be found at http://ltr.uio.no/embeddings/registers/about

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
templates		templates
LICENSE		LICENSE
README.md		README.md
dsm_genres.cfg		dsm_genres.cfg
dsm_genres.py		dsm_genres.py
favicon.ico		favicon.ico
images_cache.csv		images_cache.csv
models.csv		models.csv
plots.py		plots.py
run_explorer.py		run_explorer.py
sparql.py		sparql.py
tagger.py		tagger.py
tau.py		tau.py
word2vec_server_genres.py		word2vec_server_genres.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Register Explorer

Installation

About

Releases

Packages

Contributors 3

Languages

License

lizaku/dsm_genres

Folders and files

Latest commit

History

Repository files navigation

Register Explorer

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages