hebrew-w2v

a complete reproducible example of training a word2vec model for Hebrew. Similar model is used by Hebrew Semantle. if you do not want to reprepare the data or train, you can download the prepared corpus and model from here. basically the model is a standard gensim word2vec model, train on a corpus which is the Hebrew wikipedia dump only tokenized with hebpipe, to avoid tokens like "שכשנבוא".

Data Preparation

to run data_preparation run pip install -r requirements_data.txt and then python data_preparation/main.py.

wikipedia corpus is downloaded and then split to several files. in each file, articles are separated by a break line. tokenization should be done separately with hebpipe. in case of issues with hebpipe, see hebpipe homepage here. you can then rerun python data_preparation/main.py to create the final corpus which will be used for training.

Model Training

to train a model, run pip install -r requirements_train.txt. note there might be some conflicts with the hebpipe package. you can then run python main.py. the code for training the model and the playground notebook were inspired by this repository

License

Released under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data_preparation		data_preparation
LICENSE.txt		LICENSE.txt
README.md		README.md
base.py		base.py
config.json.format		config.json.format
main.py		main.py
playground.ipynb		playground.ipynb
requirements_data.txt		requirements_data.txt
requirements_train.txt		requirements_train.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hebrew-w2v

Data Preparation

Model Training

License

About

Releases

Packages

Languages

License

Iddoyadlin/hebrew-w2v

Folders and files

Latest commit

History

Repository files navigation

hebrew-w2v

Data Preparation

Model Training

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages