Skip to content
This repository was archived by the owner on Jul 3, 2023. It is now read-only.

Conversation

@lewismc
Copy link
Member

@lewismc lewismc commented Nov 20, 2020

This PR addresses https://issues.apache.org/jira/browse/ANY23-458

PR address the following improved info to STDOUT. Note the improved info specifically for --extractors and --format flags.

    rover      Apache Any23 Command Line Tool.
      Usage: rover [options] input IRIs {<url>|<file>}+
        Options:
          -d, --defaultns
            Override the default namespace used to produce statements.
          -e, --extractors
            a comma-separated list of extractors, e.g. rdf-xml,rdf-turtle,
            etc. A complete extractor list can be obtained by calling ./any23
            extractor --list
            Default: [csv, html-embedded-jsonld, html-head-icbm, html-head-links, html-head-meta, html-head-title, html-mf-adr, html-mf-geo, html-mf-hcalendar, html-mf-hcard, html-mf-hlisting, html-mf-hrecipe, html-mf-hresume, html-mf-hreview, html-mf-hreview-aggregate, html-mf-license, html-mf-species, html-mf-xfn, html-microdata, html-rdfa11, html-xpath, ical, jcal, owl-functional, owl-manchester, rdf-jsonld, rdf-nq, rdf-nt, rdf-trix, rdf-turtle, rdf-xml, xcal, yaml]
          -f, --format
            a comma-separated list of writer factories, e.g.
            json,jsonld,nquads,notrivial,ntriples,trix,turtle,uri
            Default: [ntriples]
          -l, --log
            Produce log within a file.
          -n, --nesting
            Disable production of nesting triples.
            Default: false
          -t, --notrivial
            Filter trivial statements (e.g. CSS related ones). [DEPRECATED: As
            of version 2.3, use --format instead.]
            Default: false
          -o, --output
            Specify Output file (defaults to standard output)
            Default: java.io.PrintStream@2b2948e2
          -p, --pedantic
            Validate and fixes HTML content detecting commons issues.
            Default: false
          -s, --stats
            Print out extraction statistics.
            Default: false

@lewismc lewismc merged commit c075829 into apache:master Nov 20, 2020
@lewismc lewismc deleted the ANY23-458 branch November 20, 2020 20:50

@Parameter(names = { "-f",
"--format" }, description = "a comma-separated list of writer factories, e.g. notrivial,nquads")
"--format" }, description = "a comma-separated list of writer factories, e.g. json,jsonld,nquads,notrivial,ntriples,trix,turtle,uri")
Copy link
Member

@HansBrende HansBrende Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewismc the --extractors improvement looks good, however the writer example won't work. The only reason notrivial can precede nquads is that notrivial is a delegating writer factory... i.e., it pre-processes the triples (to remove "trivial" triples) before passing them to the nquads writer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There is only 1 format, ultimately, that you write the output as, IIRC)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants