lexRankr: Extractive Text Summariztion in R

Installation

##install from CRAN
install.packages("lexRankr")

#install from this github repo
devtools::install_github("AdamSpannbauer/lexRankr")

Overview

lexRankr is an R implementation of the LexRank algorithm discussed by Güneş Erkan & Dragomir R. Radev in LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. LexRank is designed to summarize a cluster of documents by proposing which sentences subsume the most information in that particular set of documents. The algorithm may not perform well on a set of unclustered/unrelated set of documents. As the white paper's title suggests, the sentences are ranked based on their centrality in a graph. The graph is built upon the pairwise similarities of the sentences (where similarity is measured with a modified idf cosine similarity function). The paper describes multiple ways to calculate centrality and these options are available in the R package. The sentences can be ranked according to their degree of centrality or by using the Page Rank algorithm (both of these methods require setting a minimum similarity threshold for a sentence pair to be included in the graph). A third variation is Continuous LexRank which does not require a minimum similarity threshold, but rather uses a weighted graph of sentences as the input to Page Rank.

note: the lexrank algorithm is designed to work on a cluster of documents. LexRank is built on the idea that a cluster of docs will focus on similar topics

note: pairwise sentence similarity is calculated for the entire set of documents passed to the function. This can be a computationally instensive process (esp with a large set of documents)

Basic Usage

library(lexRankr)
library(dplyr)

df <- tibble(doc_id = 1:3, 
             text = c("Testing the system. Second sentence for you.", 
                      "System testing the tidy documents df.", 
                      "Documents will be parsed and lexranked."))
                      
df %>% 
    unnest_sentences(sents, text) %>% 
    bind_lexrank(sents, doc_id, level = 'sentences') %>% 
    arrange(desc(lexrank))

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
R		R
data		data
man		man
src-i386		src-i386
src-x64		src-x64
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
appveyor.yml		appveyor.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lexRankr: Extractive Text Summariztion in R

Installation

Overview

Basic Usage

More Examples

About

Releases 4

Packages

Contributors 2

Languages

License

AdamSpannbauer/lexRankr

Folders and files

Latest commit

History

Repository files navigation

lexRankr: Extractive Text Summariztion in R

Installation

Overview

Basic Usage

More Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages