Skip to content

Parsing

cteichmann edited this page Mar 31, 2017 · 6 revisions

Parsing

There are three techniques for parsing a single observation.

GUI

Load an IRTG via the file dialog. Then use the "Parse" option under tools in the grammar window. This should bring up a window that allows you to type in the input that you want parsed - one per algebra. If you do not want to specify the input for a given algebra, then simply leave the field blank. How the different types of algebra values are written down is specified on the (codec page)[Codec].

parseWindow.png

Once the input has been parsed a (tree automaton window)[TreeAutomatonWindow] for a tree automaton that contains all the parse trees will open.

Code

First you need to obtain your IRTG, the easiest way of doing this is to read it from an InputStream:

#!java

IrtgInputCodec codec = new IrtgInputCodec();
InterpretedTreeAutomaton grammar = codec.read(yourInputStream);

Assume that you have your input objects given as a list of strings. Convert the string inputs into the actual objects that the underlying algebras can understand. You can achieve this for each input string "s" by calling:

#!java

Object actualInputObject  = grammar.parseString(interpretationName,inputString);

"interpretationName" is the name of interpretation that contains the algebra which is used to read the "inputString", according to the formats explained on the (codec page)[Codec]. The interpretation name must be known to the IRTG you are using. You can then put the objects into a map "representations" from "interpreationName"s to "actualInputObject"s. Finally you can parse this input as follows:

#!java

TreeAutomaton parseChart = grammar.parse(representations);

"parseChart" will contain all the rule trees that generate the given inputs. You can apply the interpretations from "grammar" to the trees of "parseChart" in order to see what they map to.

Shell

Bulk Parsing

Often you will want to parse a whole list of inputs. For this there are options for parsing a whole collection of data.

GUI

Load an IRTG via the file dialogue. There is a "Bulk Parse" option under the "Tools" dialogue. If you choose this option then you will be asked to select a file that is written according to the corpus codec. Once you have chosen a corpus, you are then asked to select a file in which the parsing results are stored. Once bulk parsing is finished, the target file will contain a corpus in which a parse is associated with each corpus entry. This will be (one of) the highest weight parse(s).

Code

Load your IRTG as explained above. First you will need to actually obtain the corpus that you want to parse. This can be achieved by calling:

#!java

Corpus c = Corpus.readCorpus(reader, this);

Note that the corpus should have (a subset of) the same interpretations as the IRTG with which you are going to parse it. Now you can parse the corpus and give the results to a Consumer consumer:

#!java

bulkParse(c, filter, consumer, listener)

where the consumer will be given Instance objects that contain the original entries from the corpus plus there parse (or null if there is non). listener is a ProgressListener that is informed of the progress of parsing; may be null. Finally filter is a Predicate that is used to select only a subset of Instances to parse.

Shell

This portion of the wiki is under construction, please contact us if you have any questions.