Skip to content

davicbtoliveira/TextTiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextTiling

A Go implementation of the TextTiling algorithm for automatic text segmentation.

Installation

go install github.com/davicbtoliveira/texttiling/cmd/textiling@latest

Usage

textiling <file.txt>

Options

  • -w: Pseudosentence size (default: 20)
  • -k: Block size (default: 10)
  • -method: Similarity method - block or vocab (default: block)
  • -policy: Cutoff policy - hc or lc (default: hc)
  • -debug: Show debug information

Algorithm

TextTiling identifies topic boundaries in documents by computing lexical similarity between blocks of pseudosentences. It detects gaps in lexical co-occurrence that indicate topic shifts.

About

NLP text tiling algorithm implementation in Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages