Skip to content
Extractive Text Summariztion with lexRankr (an R package implementing the LexRank algorithm)
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R fix typos in docs Dec 4, 2018
data rm tm depend by making internal stopword list & rm unneeded usage of … Dec 11, 2017
man
src-i386 0.4.1 build May 10, 2017
src-x64 0.4.1 build May 10, 2017
src
tests rm magrittr as a dependency Dec 11, 2017
vignettes fix dead link in vignette Mar 17, 2019
.Rbuildignore add newline at end of rbuildignore to avoid warnings Dec 4, 2018
.gitignore reverting vignettes Mar 2, 2017
.travis.yml removed oldrel from travis.yml Mar 2, 2017
DESCRIPTION
LICENSE Update LICENSE Sep 21, 2016
NAMESPACE rm magrittr as a dependency Dec 11, 2017
NEWS.md update news for 0.5.2 release Dec 4, 2018
README.md
appveyor.yml
codecov.yml
cran-comments.md

README.md

lexRankr: Extractive Text Summariztion in R

Build Status AppVeyor Build Status Coverage Status CRAN_Status_Badge Last Commit

Installation

##install from CRAN
install.packages("lexRankr")

#install from this github repo
devtools::install_github("AdamSpannbauer/lexRankr")

Overview

lexRankr is an R implementation of the LexRank algorithm discussed by Güneş Erkan & Dragomir R. Radev in LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. LexRank is designed to summarize a cluster of documents by proposing which sentences subsume the most information in that particular set of documents. The algorithm may not perform well on a set of unclustered/unrelated set of documents. As the white paper's title suggests, the sentences are ranked based on their centrality in a graph. The graph is built upon the pairwise similarities of the sentences (where similarity is measured with a modified idf cosine similarity function). The paper describes multiple ways to calculate centrality and these options are available in the R package. The sentences can be ranked according to their degree of centrality or by using the Page Rank algorithm (both of these methods require setting a minimum similarity threshold for a sentence pair to be included in the graph). A third variation is Continuous LexRank which does not require a minimum similarity threshold, but rather uses a weighted graph of sentences as the input to Page Rank.

note: the lexrank algorithm is designed to work on a cluster of documents. LexRank is built on the idea that a cluster of docs will focus on similar topics

note: pairwise sentence similarity is calculated for the entire set of documents passed to the function. This can be a computationally instensive process (esp with a large set of documents)

Basic Usage

library(lexRankr)
library(dplyr)

df <- tibble(doc_id = 1:3, 
             text = c("Testing the system. Second sentence for you.", 
                      "System testing the tidy documents df.", 
                      "Documents will be parsed and lexranked."))
                      
df %>% 
    unnest_sentences(sents, text) %>% 
    bind_lexrank(sents, doc_id, level = 'sentences') %>% 
    arrange(desc(lexrank))

More Examples

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.