Word2Vec In Java

https://code.google.com/archive/p/word2vec/source/default/source
Changed Word2vec c code to Java

Usage

Put "Input.txt" in the folder containing the source code

The contents of Input.txt are as follows
There is one document per line
All documents must be preprocessed

Preprocessing: documents should be separated by words using morphemes
In Eclipse, you mush give arguments (Run - Run Configurations...)

a = input.txt, b = output.txt... That is, the name of the input output file.
but, in Code I hava set it (Line 34, 35)

Contents of "Input.txt" after preprocessing

Document 1 : KimJunho is interested in machine learning and deep learning
Document 2 : KimJunho is interested in recruiting professional researches

KimJunho isterested machine learning deep learning
KimJunho recruiting professional researchers

Main Variable Description

See Line 894 (public static class Builder)

1. cbow = false
   Which of the cbow and skip-gram models to learn ?
   false : use skip gram
   true : use cbow model

2. startingAlpha = 0.025F
   This is a learningrate
   The smaller the value, the more accurate the learning, but the slower the learning speed

3. window = 5
   How many words to look around when learning
   The default value is 5, meaning that you see 5 words

4. negative = 0
   It can be used to improve the efficiency of calculation speed
   Methodology has Hierarchical Softmax and Negative Sampling
   If 0, Hierarchical Softmax
   else, Negative Sampling.. default value 5~10

5. minCount = 5
   Meaning that I will only see words from at least a few words in the document
   If you want to learn every word, minCount = 0

6. layerOneSize = 200
   Mean dimension of word vector
   default value is 200
   The higher the dimension, the more precise it is, but the learning speed is slower

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Word2Vec		Word2Vec
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word2Vec

Word2Vec

assets

assets

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Word2Vec In Java

Usage

Contents of "Input.txt" after preprocessing

Main Variable Description

About

Releases

Packages

Languages

License

taki0112/Word2VecJava

Folders and files

Latest commit

History

Repository files navigation

Word2Vec In Java

Usage

Contents of "Input.txt" after preprocessing

Main Variable Description

About

Topics

Resources

License

Stars

Watchers

Forks

Languages