parallel version of create_tcm doesn't work #296

DavidArenburg · 2019-02-03T14:27:11Z

Reproducible example from the docs

library(text2vec)
data("movie_review")

# set to number of cores on your machine
N_WORKERS = 4
if(require(doParallel)) registerDoParallel(N_WORKERS)
splits = split_into(movie_review$review, N_WORKERS)
jobs = lapply(splits, itoken, tolower, word_tokenizer)
v = create_vocabulary(jobs)
# Warning message:
#   'create_vocabulary.list' is deprecated.
# Use 'create_vocabulary.itoken_parallel()' instead.
# See help("Deprecated") 

vectorizer = vocab_vectorizer(v)
jobs = lapply(splits, itoken, tolower, word_tokenizer)
tcm = create_tcm(jobs, vectorizer, skip_grams_window = 3L, skip_grams_window_context = "symmetric")
# Error in UseMethod("create_tcm") : 
#   no applicable method for 'create_tcm' applied to an object of class "list"

It looks like jobs is supposed to be something else rather a list , but I can't seem to find how to create it otherwise.

sessionInfo()
# R version 3.5.1 (2018-07-02)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)
# 
# Matrix products: default
# 
# locale:
# [1] LC_COLLATE=English_Israel.1252  LC_CTYPE=English_Israel.1252    LC_MONETARY=English_Israel.1252 LC_NUMERIC=C                   
# [5] LC_TIME=English_Israel.1252    
# 
# attached base packages:
# [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] text2vec_0.5.1    doParallel_1.0.14 iterators_1.0.10  foreach_1.4.4    
# 
# loaded via a namespace (and not attached):
# [1] Rcpp_1.0.0           lattice_0.20-35      codetools_0.2-15     digest_0.6.18        grid_3.5.1           R6_2.3.0             futile.options_1.0.1
# [8] formatR_1.5          RcppParallel_4.4.2   data.table_1.11.8    futile.logger_1.4.3  Matrix_1.2-14        lambda.r_1.2.3       tools_3.5.1         
# [15] mlapi_0.1.0          compiler_3.5.1

The text was updated successfully, but these errors were encountered:

dselivanov · 2019-02-03T15:12:23Z

Thanks for reporting. Unfortunately I will not fix this - all high level parallel computing will be dropped on Windows in the next release. Please consider to use serial version - it is not much slower than parallel one on Windows. вс, 3 февр. 2019 г., 18:27 David Arenburg notifications@github.com:

…

Reproducible example from the docs library(text2vec) data("movie_review") # set to number of cores on your machineN_WORKERS = 4if(require(doParallel)) registerDoParallel(N_WORKERS)splits = split_into(movie_review$review, N_WORKERS)jobs = lapply(splits, itoken, tolower, word_tokenizer)v = create_vocabulary(jobs)# Warning message:# 'create_vocabulary.list' is deprecated.# Use 'create_vocabulary.itoken_parallel()' instead.# See help("Deprecated") vectorizer = vocab_vectorizer(v)jobs = lapply(splits, itoken, tolower, word_tokenizer)tcm = create_tcm(jobs, vectorizer, skip_grams_window = 3L, skip_grams_window_context = "symmetric")# Error in UseMethod("create_tcm") : # no applicable method for 'create_tcm' applied to an object of class "list" It looks like jobs is supposed to be something else rather a list , but I can't seem to find how to create it otherwise. sessionInfo()# R version 3.5.1 (2018-07-02)# Platform: x86_64-w64-mingw32/x64 (64-bit)# Running under: Windows >= 8 x64 (build 9200)# # Matrix products: default# # locale:# [1] LC_COLLATE=English_Israel.1252 LC_CTYPE=English_Israel.1252 LC_MONETARY=English_Israel.1252 LC_NUMERIC=C # [5] LC_TIME=English_Israel.1252 # # attached base packages:# [1] parallel stats graphics grDevices utils datasets methods base # # other attached packages:# [1] text2vec_0.5.1 doParallel_1.0.14 iterators_1.0.10 foreach_1.4.4 # # loaded via a namespace (and not attached):# [1] Rcpp_1.0.0 lattice_0.20-35 codetools_0.2-15 digest_0.6.18 grid_3.5.1 R6_2.3.0 futile.options_1.0.1# [8] formatR_1.5 RcppParallel_4.4.2 data.table_1.11.8 futile.logger_1.4.3 Matrix_1.2-14 lambda.r_1.2.3 tools_3.5.1 # [15] mlapi_0.1.0 compiler_3.5.1 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#296>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE4u3XbhPAqDgiPS_nr2qWNYBRHlus0mks5vJvHAgaJpZM4agIdj> .

DavidArenburg · 2019-02-03T15:24:27Z

OK, that's fine. I had glove$fit_transform crushing RStudio, so I though I'll need to parallelise , but eventually setting n_chunks = to a higher value solved the issue.

Thanks for the package btw. You are doing a great job. Any planning to add word2vec too or you left it to the wordVectors package?

dselivanov · 2019-02-03T15:26:57Z

Glove and word2vec usually give very similar results, so I don't see much value working on it. вс, 3 февр. 2019 г., 19:24 David Arenburg notifications@github.com:

…

Closed #296 <#296>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#296 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE4u3T-tL9ROpqHVcVedKUVRs8ep3b5Eks5vJv8wgaJpZM4agIdj> .

RezaSadeghiWSU · 2019-03-03T22:43:37Z

I faced with the similar issue in Ubuntu. Do you have any suggestion?

Regards,
Reza

dselivanov · 2019-03-04T07:03:38Z

@RezaSadeghiWSU please provide reproducible example, otherwise I can't help.

Following code work on my ubuntu machine and text2vec 0.5.1:

library(text2vec, lib.loc = "~/temp/")
data("movie_review")

# set to number of cores on your machine
N_WORKERS = 4
if(require(doParallel)) registerDoParallel(N_WORKERS)
jobs = itoken_parallel(movie_review$review, tolower, word_tokenizer, n_chunks = N_WORKERS, ids = movie_review$id)
v = create_vocabulary(jobs)
vectorizer = vocab_vectorizer(v)
tcm = create_tcm(jobs, vectorizer, skip_grams_window = 3L, skip_grams_window_context = "symmetric")

DavidArenburg changed the title ~~parallel version of create_tcm doesn't work~~ parallel version of create_tcm doesn't work (on Windows) Feb 3, 2019

DavidArenburg closed this as completed Feb 3, 2019

RezaSadeghiWSU mentioned this issue Mar 3, 2019

parallel version of create_tcm doesn't work (on Ubuntu) #298

Closed

dselivanov changed the title ~~parallel version of create_tcm doesn't work (on Windows)~~ parallel version of create_tcm doesn't work Mar 4, 2019

dselivanov added the windows label Mar 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel version of create_tcm doesn't work #296

parallel version of create_tcm doesn't work #296

DavidArenburg commented Feb 3, 2019

dselivanov commented Feb 3, 2019 via email

DavidArenburg commented Feb 3, 2019

dselivanov commented Feb 3, 2019 via email

RezaSadeghiWSU commented Mar 3, 2019

dselivanov commented Mar 4, 2019

parallel version of create_tcm doesn't work #296

parallel version of create_tcm doesn't work #296

Comments

DavidArenburg commented Feb 3, 2019

dselivanov commented Feb 3, 2019 via email

DavidArenburg commented Feb 3, 2019

dselivanov commented Feb 3, 2019 via email

RezaSadeghiWSU commented Mar 3, 2019

dselivanov commented Mar 4, 2019