New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`damping` arg ignored in `lexRankFromSimil` #15

Open
jwijffels opened this Issue Feb 8, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@jwijffels

jwijffels commented Feb 8, 2018

Just had a look at this package because I was creating a similar package recently called textrank (https://cran.r-project.org/web/packages/textrank/index.html). This package seems to follow the same approach although the textrank package starts with something which looks like the output of udpipe which contains already sentences and all the words tokenised.
While skimming the code, I noticed that you are not using the damping argument lexRankFromSimil, maybe that is something to fix.
I'm also interested to hear if you have found a way to reduce the computational burden of doing many sentence to sentence similarity calculations?

@AdamSpannbauer

This comment has been minimized.

Owner

AdamSpannbauer commented Feb 12, 2018

Thanks for pointing out the issue with damping; I will fix it.

The only step I have taken to reduce the computation (time) is to write a minimal Rcpp function to perform the similarity calc. Some parallelization could also provide some benefit here.

Another possibility would be to get creative, deviate from the methodologies laid out in lexrank/textrank, and think of a new way to get valid sentence ranks without the cumbersome pairwise comparisons. Of course, I haven't thought of a valid way, but I'd be interested in learning more about a method if there is one out there.

@jwijffels

This comment has been minimized.

jwijffels commented Feb 12, 2018

There is the minhash algorithm which I tried to wrap in the textrank package: https://cran.r-project.org/web/packages/textrank/vignettes/textrank.html#minhash but that requires knowledge of the useR on that algorithm. If you know of another way, also feel free to let me know :).

@AdamSpannbauer

This comment has been minimized.

Owner

AdamSpannbauer commented Feb 12, 2018

FYI going to rename to focus on damping issue and close this when it is fixed

@AdamSpannbauer AdamSpannbauer changed the title from damping / al lot of comparisons to `damping` arg ignored in `lexRankFromSimil` Feb 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment