Word2Vec In Java

https://code.google.com/archive/p/word2vec/source/default/source
Changed Word2vec c code to Java

Usage

Put "Input.txt" in the folder containing the source code

The contents of Input.txt are as follows
There is one document per line
All documents must be preprocessed

Preprocessing: documents should be separated by words using morphemes
In Eclipse, you mush give arguments (Run - Run Configurations...)

a = input.txt, b = output.txt... That is, the name of the input output file.
but, in Code I hava set it (Line 34, 35)

Contents of "Input.txt" after preprocessing

Document 1 : KimJunho is interested in machine learning and deep learning
Document 2 : KimJunho is interested in recruiting professional researches

KimJunho isterested machine learning deep learning
KimJunho recruiting professional researchers

Main Variable Description

See Line 894 (public static class Builder)

1. cbow = false
   Which of the cbow and skip-gram models to learn ?
   false : use skip gram
   true : use cbow model

2. startingAlpha = 0.025F
   This is a learningrate
   The smaller the value, the more accurate the learning, but the slower the learning speed

3. window = 5
   How many words to look around when learning
   The default value is 5, meaning that you see 5 words

4. negative = 0
   It can be used to improve the efficiency of calculation speed
   Methodology has Hierarchical Softmax and Negative Sampling
   If 0, Hierarchical Softmax
   else, Negative Sampling.. default value 5~10

5. minCount = 5
   Meaning that I will only see words from at least a few words in the document
   If you want to learn every word, minCount = 0

6. layerOneSize = 200
   Mean dimension of word vector
   default value is 200
   The higher the dimension, the more precise it is, but the learning speed is slower

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Word2Vec In Java

Usage

Contents of "Input.txt" after preprocessing

Main Variable Description

Files

README.md

Latest commit

History

README.md

File metadata and controls

Word2Vec In Java

Usage

Contents of "Input.txt" after preprocessing

Main Variable Description