Skip to content

Latest commit

 

History

History
29 lines (23 loc) · 1.91 KB

README.md

File metadata and controls

29 lines (23 loc) · 1.91 KB

textgraph

Overview

textgraph is a R package for building and analysing large-scale text co-occurrence graphs. It provides functionality to turn document-feature data into (weighted) feature co-occurrence graphs, analyse their contents via seeded random walks, topic clustering and temporal topic clustering, as well as numerous helper functions to prepare, analyse and explore the results.

Currently, the package provides two main workflows:

  1. Retrieving terms or entities functionally equivalent to previously extracted seed terms, and retrieve associated documents. This can be compared to certain types of seeded topic modeling;
  2. Retrieving clusters of related terms or entities via community detection algorithms, either statically for a single network or dynamically for a number of temporal network snapshots. This can be compared to unsupervised topic modeling approaches.

textgraph is built to facilitate the analysis of large-scale graphs built from millions of documents. Whenever feasible, functions can be parallelized through the furrr framework. The majority of graph operations is handled via the igraph library. Random Walk functionality is provided via RandomWalkRestartMH, while dynamic topics are facilitated via Memory Community Matching.

Installation

textgraph is not currently on CRAN. Therefore, it needs to be installed directly from Github.

# install.packages("remotes")
remotes::install_github("TimBMK/textgraph", build_vignettes = TRUE)

Usage

The vignette provides a throrough overview of the workflows, with explanations and example data.

vignette("textgraph")

Bugs

Please report any and all bugs here.