Skip to content
Classify UniLang messages, among other corpus processing tasks
Perl Emacs Lisp Perl 6 Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Corpus
frdcsa
scripts
systems/analysis
Corpus.pm
LICENSE
Makefile
README.md
build-arch-stamp
build-indep-stamp
configure-stamp
corpus
corpus.el
data
gpl.txt
use-cases

README.md

Corpus

Simple system to aid in analyzing corpus of data, since this has become a recurring problem. The immediate application of this is applying text clustering and machine learning techniques to the UniLang log files.

Corpus will serve as the automatic classification system for UniLang, which is necessary to achieving the desired capability of automatic message routing. The concept of Adjustable Autonomy is relevant here.

Corpus now has a reasonable UI and is now successfully classifying messages with a reasonable accuracy. We are using the rainbow - bayesian text classifier. This has suprisingly and astonishingly good results considering how little information would appear to be present in the sentences. However, it is not sufficient. While it usually chooses the correct category, the error rate is still too high, and to disambiguate some of the weaker classes will require extra information. Therefore, I am looking to incorporate other sources of classification evidence, based on features recognized by other external codebases.

Other features that will be added are as follows. Have the ability to vet the automatic classifications. A type system will be created. Recipient agents can reject messages which will help with classification. Incorporate mass verification and classification adjustment and subsequent message reclassification.

The next paragraph shows a very preliminary classification example, and the current scheme (ranked in terms of probability associated with example message). Note that the classification is exactly correct. The scheme system will be greatly revamped allowing a subsumption hierarchy and will also focus more on what the actual routing commands are. So for instance, rather than "goal", we would have "(Agent: Verber) (new-goal $1)" or rather than just "icodebase-capability-request", have "(Agent: MyFRDCSA) (capability-request Verber $1)". I.E. the responsible agent and the corresponding command to be sent.

(((Forgot to pick up pay check - need to go pick that ASAP.)))

                             observation	0.441955
                  verber-task-definition	0.244548
                       complex-statement	0.118441

  0) Finished
* 1) observation
* 2) verber-task-definition
  3) complex-statement
  4) icodebase-solution-to-extant-problem
  5) icodebase-capability-request
  6) event
  7) icodebase-input-data
  8) dream
  9) solution-to-extant-problem
  10) system-request
  11) policy
  12) priority-shift
  13) quote
  14) unclassifiable
  15) intersystem-relation
  16) SOP
  17) funny-annecdote
  18) unilang-client-outgoing-message
  19) goal
  20) icodebase-task
  21) suspicion
  22) not-a-unilang-client-entry
  23) dangling-clause
  24) capability-request
  25) rant
  26) icodebase-resource
  27) propaganda
  28) inspiring-annecdote
  29) shopping-list-item
>

http://frdcsa.org/frdcsa/internal/corpus

You can’t perform that action at this time.