Skip to content
Gabriel Weaver edited this page Jun 2, 2013 · 3 revisions
<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-32788022-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

NAME

xuwc - Count the number of strings in the language of a grammar production.

SYNOPSIS

xuwc [--count][--container] xupath file ...

DESCRIPTION

xuwc(1) generalizes wc(1) to count the number of strings in the language of an xupath.

Traditional wc(1) counts the number of words, lines, characters, or bytes contained in each input file or standard input.

xuwc(1) generalizes wc(1) to count strings in the language of an xupath and to report those counts relative to language-specific contexts.

The xupath specifies the strings to count as well as the context in which to report counts. When no options are specified xuwc(1) will count the number of matches to the xupath within the file.

All files are processed in the order specified.

OPTIONS

--count The production references that we should count. By default, this is the final step in the provided xupath. However, this may also be set to builtin:byte, builtin:character, builtin:word, or builtin:line. We have not yet tested the byte and character counting on Unicode.

--container The context in which to report counts. By default, this is the file. However, it could be any step of the xupath. If one uses the count option, however, the container unit should be set to the final component of the xupath.

FILES

ENVIRONMENT

DIAGNOSTICS

Currently there are no error codes. In future releases, however, we will be thinking more carefully about error codes and how to report these errors back to the calling environment.

BUGS

There are likely plenty of bugs as this release of xuwc(1) incorporates a new API for text corpora. Bug reports are welcomed, we want to make these tools as strong as possible.

EXAMPLES

Cisco IOS

  • xuwc "/builtin:file/ios:interface" ./data/test/cisco_ios/router.v1.example. Count the number of Cisco IOS interfaces in the router configuration file.
  • xuwc "/builtin:file/ios:interface/builtin:line" ./data/test/cisco_ios/router.v1.example. Count the number of lines contained in each interface in the router configuration file.
  • xuwc --count=builtin:line --container=ios:interface "/builtin:file/ios:interface/builtin:line" ./data/test/cisco_ios/router.v1.example. Count the number of lines contained in each interface in the router configuration file and report counts relative to interface names.
  • xuwc "/builtin:file/builtin:line" ./data/test/cisco_ios/router.v1.example. Count the number of lines per file.
  • xuwc --count=builtin:word --container=builtin:line "/builtin:file/ios:interface/builtin:line" ./data/test/cisco_ios/router.v1.example ./data/test/cisco_ios/router.v2.example | sort -n -k 2 | sort -k 1. Count the number of words per lines contained in interfaces in both files.

TEI XML

  • xuwc "/builtin:file/tei:section" ./data/test/tei_xml/section.tei.v1.xml. Count the number of sections per file in this security policy.
  • xuwc --count=tei:paragraph --container=tei:section "/tei:section/tei:paragraph" ./data/test/tei_xml/section.tei.v1.xml. Count the number of paragraphs per section in this security policy.
  • xuwc --count=builtin:word --container=tei:section "/tei:section" ./data/test/tei_xml/section.tei.v1.xml. Count the number of words per section in this security policy.

AUTHOR

Gabriel A. Weaver

SEE ALSO

wc(1), xudiff(1), xupath(1), xugrep(1).


Creative Commons License
XUTools Wiki by Gabriel A. Weaver is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.