github
Advanced Search
  • Home
  • Pricing and Signup
  • Explore GitHub
  • Blog
  • Login

shellac / java-rdfa

  • Admin
  • Watch Unwatch
  • Fork
  • Your Fork
  • Pull Request
  • Download Source
    • 8
    • 3
  • Source
  • Commits
  • Network (3)
  • Issues (5)
  • Downloads (8)
  • Wiki (2)
  • Graphs
  • Branch: master

click here to add a description

click here to add a homepage

  • Branches (4)
    • java15
    • master ✓
    • sax
    • xml-canon-writer
  • Tags (4)
    • 0.2.1
    • 0.2
    • 0.1.1
    • 0.1
Sending Request…
Enable Donations

Pledgie Donations

Once activated, we'll place the following badge in your repository's detail box:
Pledgie_example
This service is courtesy of Pledgie.

RDFa Parser for java — Read more

  cancel

  cancel
  • Private
  • Read-Only
  • HTTP Read-Only

This URL has Read+Write access

Moving to markdown, updated docs 
shellac (author)
Sun Feb 07 16:27:37 -0800 2010
commit  acc50af471adbf6eb0b63810be2f8988fbe674dc
tree    2a0e8b2b2ae5217d05d59aafe4b1173a9a20af08
parent  5712fd0b151026971b78f13198f61f75cc41828a
java-rdfa /
name age
history
message
file .gitignore Sat Jun 27 11:40:37 -0700 2009 Bye bye target [Damian Steer]
file COPYING Sun Jul 19 15:30:44 -0700 2009 Added (c) and formatted sources [shellac]
file README.md Sun Feb 07 16:27:37 -0800 2010 Moving to markdown, updated docs [shellac]
file pom.xml Sun Feb 07 12:13:19 -0800 2010 Adding sesame module here. Is this a good locat... [shellac]
directory sesame-module/ Sun Feb 07 13:01:31 -0800 2010 Fix brane wrong [shellac]
directory src/ Sun Feb 07 14:22:45 -0800 2010 Adding html 4 tests. All pass, with same except... [shellac]
README.md

Welcome to java-rdfa

The cruftiest RDFa parser in the world, I'll bet. Apologies that there isn't much documentation. Things may explode: you have been warned.

Currently passing all conformance tests for XHTML, and the HTML 4 and 5 tests with one exception.

This was written by Damian Steer. It is an offshoot of the Stars Project which was funded by JISC

Useful Links

  • Maven repository (snapshots)
  • Java api documentation
  • Online parser

Basic Use

$ ls
htmlparser-1.2.0.jar    java-rdfa-0.3.jar

$ java -jar java-rdfa-0.3.jar http://examples.tobyinkster.co.uk/hcard
<http://examples.tobyinkster.co.uk/hcard> <http://xmlns.com/foaf/0.1/primaryTopic> <http://examples.tobyinkster.co.uk/hcard#jack> .
...

or (equivalent):

$ java -cp '*' rdfa.simpleparse http://examples.tobyinkster.co.uk/hcard
<http://examples.tobyinkster.co.uk/hcard> <http://xmlns.com/foaf/0.1/primaryTopic> <http://examples.tobyinkster.co.uk/hcard#jack> .
...

For HTML sources add the format argument, and you will need the validator.nu parser:

$ java -cp '*' rdfa.simpleparse --format HTML http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009
<http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009> <http://www.w3.org/1999/xhtml/vocab#stylesheet> <http://public.slidesharecdn.com/v3/styles/combined.css?1265372095> .
...

The output of simpleparse is n-triples, and hard to read. If you have jena try adding it to you classpath and using rdfa.parse instead:

$ java -cp '*:/path/to/jena/lib/*' rdfa.parse --format HTML http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009
@prefix dc:      <http://purl.org/dc/terms/> .
@prefix hx:      <http://purl.org/NET/hinclude> .
... nice turtle output ...

Java Use

To use the parser directly, without the assistance of an RDF toolkit (a bold choice) implement a StatementSink to collect the triples, then use a parser from the Factory to make a reader:

XMLReader reader = ParserFactory.createReaderForFormat(sink, Format.XHTML); // or HTML, still an XMLReader
reader.parse(source); // Your sink will be sent triples

java-rdfa can be used from jena. Simply invoke:

Class.forName("net.rootdev.javardfa.RDFaReader");

Which will hook the two readers in to jena, then you will be able to:

model.read(url, "XHTML"); // xml parsing
model.read(other, "HTML"); // html parsing

java-rdfa is available in the maven central repositories. Note that it does not depend on jena.

A sesame reader provided by Henry Story is also available.

Form Mode

There is a secret form mode (that prompted the development of this parser). In this mode you can generate basic graph patterns by including ?variables where curies are allowed, and INPUT tags generate @name variables.

Simple example (from the tests) and the query that results.

Changes

0.3

  • Updated to current conformance tests
  • Switched validator.nu to streaming mode (may live to regret this).
  • Created very simple n-triple and rdf/xml streaming serialisers.
  • Usual bug fixes etc.
  • Jena is now a provided maven dependency. Using java-rdfa won't pull in jena.
  • Sesame reader create by Henry Story added. Can't be added to central maven repository since Sesame isn't available, so spun out in small module.
  • Tests for query, and some utilities.
Blog | Support | Training | Contact | API | Status | Twitter | Help | Security
© 2010 GitHub Inc. All rights reserved. | Terms of Service | Privacy Policy
Powered by the Dedicated Servers and
Cloud Computing of Rackspace Hosting®
Dedicated Server