Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Senti4SD-fast.jar #10

Open
maelick opened this issue Apr 8, 2019 · 0 comments
Open

Documentation for Senti4SD-fast.jar #10

maelick opened this issue Apr 8, 2019 · 0 comments

Comments

@maelick
Copy link

maelick commented Apr 8, 2019

I'm trying to use Senti4SD on a large dataset (~100M lines of text) and would like to instrument most of it from R to improve performance. In particular, I'm trying to avoid the creation of the large CSV file containing the features.

For that, I want to run Senti4SD on chunks of the data. However, this considerably slows down the whole process because each time the script is called, Senti4SD-fast.jar needs to reload dsm.bin. To overcome that problem, I want to use rJava to load the JVM from R itself, load the dsm.bin and run the feature extraction on chunks without storing the result in a file.

Is there any documentation available that would allow me to easily call with rJava the feature extraction without creating files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant