Skip to content

taki0112/Word2VecJava

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Word2Vec In Java

Usage

  • Put "Input.txt" in the folder containing the source code
The contents of Input.txt are as follows
There is one document per line
All documents must be preprocessed
  • Preprocessing: documents should be separated by words using morphemes
  • In Eclipse, you mush give arguments (Run - Run Configurations...)

Arguments

  • a = input.txt, b = output.txt... That is, the name of the input output file.
  • but, in Code I hava set it (Line 34, 35)

Contents of "Input.txt" after preprocessing

  • Document 1 : KimJunho is interested in machine learning and deep learning
  • Document 2 : KimJunho is interested in recruiting professional researches
KimJunho isterested machine learning deep learning
KimJunho recruiting professional researchers

Main Variable Description

  • See Line 894 (public static class Builder)
1. cbow = false
   Which of the cbow and skip-gram models to learn ?
   false : use skip gram
   true : use cbow model

2. startingAlpha = 0.025F
   This is a learningrate
   The smaller the value, the more accurate the learning, but the slower the learning speed

3. window = 5
   How many words to look around when learning
   The default value is 5, meaning that you see 5 words

4. negative = 0
   It can be used to improve the efficiency of calculation speed
   Methodology has Hierarchical Softmax and Negative Sampling
   If 0, Hierarchical Softmax
   else, Negative Sampling.. default value 5~10

5. minCount = 5
   Meaning that I will only see words from at least a few words in the document
   If you want to learn every word, minCount = 0

6. layerOneSize = 200
   Mean dimension of word vector
   default value is 200
   The higher the dimension, the more precise it is, but the learning speed is slower

About

Word2Vec In Java (2013 google word2vec opensource)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages