GitHub - KarnYong/C-Cat: A collection of tools for applying word senses to large corpora

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
core		core
data		data
extendOntology		extendOntology
lib		lib
papers/gwa2012		papers/gwa2012
util		util
wordnet		wordnet
LICENSE		LICENSE
README		README
add_non_maven_jars.sh		add_non_maven_jars.sh
pom.xml		pom.xml
run		run

Repository files navigation

The C-Cat library provides libraries for large scale text processing using the
hadoop framework.  It's ultimate goal is to provide tools and libraries for
automatically customizing a wordnet ontology based on the contents of a
particular corpus.  

It is structured into three sub-modules:
1) extendOntology-core: This provides a core set of text and collection like
classes.  
2) extendOntology-wordnet: This provides core wordnet libraries for reading,
writing, and modifying the wordnet hierarchy along with several synset
similarity metrics and several word sense disambiguation algorithms.
3) extendOntology: This is the complete package that includes both the core and
wordnet submodules.  On top of the two modules, it includes a text
pre-processing framework for documents stored in HBase.  This is the most
unstable module and is under heavy development.

This project utilized maven as it's build system.  Most of the library
dependencies are handled via maven, but a few jars are from libraries that have
not been mavenized yet.  

To install these jars into maven, run 

./add_non_maven_jars.sh

Then, build the entire project with

mvn package

This will create two jars in target: extendOntology-1.0.jar and
extendOntology-1.0-jar-with-dependencies.jar.  To run any of the mains provided
without maven, include both of these jars in the classpath.

About

A collection of tools for applying word senses to large corpora

Readme

Activity

Report repository

Releases

No releases published

Packages

No packages published

Languages

Java 94.5%
TeX 4.1%
Scala 1.1%
Other 0.3%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core

core

data

data

extendOntology

extendOntology

lib

lib

papers/gwa2012

papers/gwa2012

util

util

wordnet

wordnet

LICENSE

LICENSE

README

README

add_non_maven_jars.sh

add_non_maven_jars.sh

pom.xml

pom.xml

run

run

Repository files navigation

About

Releases

Packages

Languages

License

KarnYong/C-Cat

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages