Skip to content
/ sbo Public

❗ This is a read-only mirror of the CRAN R package repository. sbo — Text Prediction via Stupid Back-Off N-Gram Models. Homepage: https://vgherard.github.io/sbo/https://github.com/vgherard/sbo Report bugs for this package: https://github.com/vgherard/sbo/issues

Notifications You must be signed in to change notification settings

cran/sbo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sbo

AppVeyor build status CircleCI build status GitHub Actions build status Codecov test coverage CRAN status CRAN downloads

sbo provides utilities for building and evaluating text predictors based on Stupid Back-off N-gram models in R. It includes functions such as:

  • kgram_freqs(): Extract (k)-gram frequency tables from a text corpus
  • sbo_predictor(): Train a next-word predictor via Stupid Back-off.
  • eval_sbo_predictor(): Test text predictions against an independent corpus.

Installation

Released version

You can install the latest release of sbo from CRAN:

install.packages("sbo")

Development version:

You can install the development version of sbo from GitHub:

# install.packages("devtools")
devtools::install_github("vgherard/sbo")

Example

This example shows how to build a text predictor with sbo:

library(sbo)
p <- sbo_predictor(sbo::twitter_train, # 50k tweets, example dataset
                   N = 3, # Train a 3-gram model
                   dict = sbo::twitter_dict, # Top 1k words appearing in corpus
                   .preprocess = sbo::preprocess, # Preprocessing transformation
                   EOS = ".?!:;" # End-Of-Sentence characters
                   )

The object p can now be used to generate predictive text as follows:

predict(p, "i love") # a character vector
#> [1] "you" "it"  "my"
predict(p, "you love") # another character vector
#> [1] "<EOS>" "me"    "the"
predict(p, 
        c("i love", "you love", "she loves", "we love", "you love", "they love")
        ) # a character matrix
#>      [,1]    [,2]  [,3] 
#> [1,] "you"   "it"  "my" 
#> [2,] "<EOS>" "me"  "the"
#> [3,] "you"   "my"  "me" 
#> [4,] "you"   "our" "it" 
#> [5,] "<EOS>" "me"  "the"
#> [6,] "to"    "you" "and"

Help

For help, see the sbo website.

About

❗ This is a read-only mirror of the CRAN R package repository. sbo — Text Prediction via Stupid Back-Off N-Gram Models. Homepage: https://vgherard.github.io/sbo/https://github.com/vgherard/sbo Report bugs for this package: https://github.com/vgherard/sbo/issues

Resources

Stars

Watchers

Forks

Packages

No packages published