Tagger Package

This a golang package designed around taking in a raw text corpus, reading in that corpus and then tagging the parts of speech for any given input byte slice given that the tagger has already read in the corpus. The main go file is the tagger.go and this contains the creation of the tagger and the functions for tagging a slice of bytes. This specific tagger works off of the verterbi algorithm and when splitting a word will split on all symbols which could be bad for possessives, contractions, compounds, and others but could be easily modified to split on specific symbols.

New( path to corpus for tagging (string) );

Takes the path to the corpus to create the tagger module from. This
must be a string and this will return an initialized tagger module
that the following functions can be called on.

TagBytes( raw byte slice );

Returns a slice of Tagged Word objects that have the word, part of
speech tag, and the byte offeset in the original slice.

Tagger Package for copyrights

This package was developed specifically for copyright notice detection; however, the copyright extraction and the part of speech tagging are completely separate from eachother meaning that any different modules/packages can be easily inserted, hacked, or in all possible manners merged together to perform other NLP functionality after the tagging is done.

Important

The tagger.go is mostly separated from the copyright.go part of the package except for a small optimization where the tagger module created with the New function will save the information needed for the copyright functions these part could be easily removed for other projects if needed/wanted.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CopyrightCorpus.in		CopyrightCorpus.in
LICENSE		LICENSE
README.md		README.md
TODO		TODO
WriteUp.md		WriteUp.md
copyright.go		copyright.go
tagger.go		tagger.go
tagger_test.go		tagger_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tagger Package

Tagger Package for copyrights

Important

About

Releases

Packages

Languages

License

ExaTad/goTagger

Folders and files

Latest commit

History

Repository files navigation

Tagger Package

Tagger Package for copyrights

Important

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages