NGramExtractor for PHP

Installation

Simple install via Composer:

composer require linguistic/ngramextractor

Usage

Coming soon.

Example

$tokenizer = new Tokenizer();
$tokenizer->addRemovalRule('/<\/?\w+[\s\w\=\"\/\#\-\:\.\_]*>/') # Removes HTML Tags
->addRemovalRule('/[^a-z0-9]+/', ' ') # Replaces everything which is not text with a space
->setSeperator('/\s+/'); # Tokenizes text with whitespace as delimiter

$content = ""; # The text that should get tokenized
$stopwords = array(); # (optional) array of stopwords

$extractor = new NGramExtractor($content, $tokenizer, $stopwords);
$unigrams    = $extractor->getNGrams(1); # gets all n-grams in the text, n = 1

$unigramsFiltered    = NGramExtractor::limitByOccurance($extractor->getNGramCount(1, true), 3); # get unigrams and their occurance if the occurance is greater or equal 3

Ressources

Download of stopword lists for different languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
tests/src		tests/src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

tests/src

tests/src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

composer.json

composer.json

phpunit.xml.dist

phpunit.xml.dist

Repository files navigation

NGramExtractor for PHP

Installation

Usage

Example

Ressources

About

Releases

Packages

Contributors 2

Languages

License

linguistic-dev/n-gram-extractor

Folders and files

Latest commit

History

Repository files navigation

NGramExtractor for PHP

Installation

Usage

Example

Ressources

About

Topics

Resources

License

Stars

Watchers

Forks

Languages