You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking to use trigrams because there are significant three-word phrases in my corpus (e.g. "economies in transition" to refer to developing countries). I used the following code in R.
Interesting problem. This is a documentation issue, I think. I can't test
right now, but suspect the reason this happens is that 'bundle_ngrams' just
runs the bigram code multiple times. If 'so_that' and 'they_can' are
identified as common bigrams in the first run, they can be grouped together
in the second run as four words even though the code implies only three.
On Fri, Jul 20, 2018, 3:23 PM lawest59 ***@***.***> wrote:
I was looking to use trigrams because there are significant three-word
phrases in my corpus (e.g. "economies in transition" to refer to developing
countries). I used the following code in R.
statements <- prep_word2vec(basePath,
"docs.txt",
lowercase=T, bundle_ngrams = 3, threshold = 50)
w2v <- train_word2vec("docs.txt",
output="./stat_vecs.bin",
threads=detectCores(),
vectors=100,
window=7,
force=TRUE)
It worked as expected with the exception that I got some four word phrases
(e.g. "so_that_they_can"). I'm curious why this is happening. Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#50>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABDy5kOIVz6K8ByHrK29U24YuJTbBJcMks5uIi44gaJpZM4VZA5v>
.
I was looking to use trigrams because there are significant three-word phrases in my corpus (e.g. "economies in transition" to refer to developing countries). I used the following code in R.
statements <- prep_word2vec(basePath,
"docs.txt",
lowercase=T, bundle_ngrams = 3, threshold = 50)
w2v <- train_word2vec("docs.txt",
output="./stat_vecs.bin",
threads=detectCores(),
vectors=100,
window=7,
force=TRUE)
It worked as expected with the exception that I got some four word phrases (e.g. "so_that_they_can"). I'm curious why this is happening. Thanks!
The text was updated successfully, but these errors were encountered: