Navigation Menu

Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
emmettbutler committed Aug 22, 2012
0 parents commit d685678
Show file tree
Hide file tree
Showing 30 changed files with 10,077 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
@@ -0,0 +1,4 @@
*.pyc
.*.swp
.*.un~
schemato_config.py
39 changes: 39 additions & 0 deletions README.md
@@ -0,0 +1,39 @@
schema.org/rNews Validator
==========================

This is a validator for the a number of embedded metadata standards. It
works by reading the object ontology and comparing each of a set of
parsed tuples from a document against this ontology.

To test the validation, clone this repo and run

>>> from mrSchemato import Validator
>>> validator = Validator()
>>> validator.validate("docs/rdf.html")

this will run a validation on a correctly-implemented RDFa document (rdf.html). To run
a validation on a document with errors, use one of the error test files

``>>> validator.validate("docs/schema_errors.html")``

The full schema.org standard is now also supported. You can validate any page
that uses this standard against the RDFa ontology hosted at schema.org. To
test this, you can find an arbitrary nytimes.com article, or copy and paste
this example

``>>> validator.validate("http://www.nytimes.com/2012/07/19/world/middleeast/.....html")``

The ``docs`` directory also includes four documents for testing the validation in RDFa
and microdata, both with and without errors built in. Running the validator on
either of the correct files should yield no errors.

Hosted Service
--------------

The mrSchemato module is also incorporated into a web service that provides
a nice frontend for the validation. To test this service locally, run
``python server/schemato_web.py``. Then navigate to localhost:5000, paste
a url into the search bar, and click "Validate" to run a validation on the document.

Running this service locally also requires celery and rabbitmq to be running
and properly configured.
Empty file added __init__.py
Empty file.
116 changes: 116 additions & 0 deletions docs/rdf.html
@@ -0,0 +1,116 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#" xmlns:og="http://ogp.me/ns#">
<head>
<meta name="og:title" content="The Title"/>
<meta property="og:type" content="video.movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
<meta name="parsely-page" content='{"title": "Obama gives speech on Iraq", "link": "http://nytimes.com/2152/obama-iraq","image_url": "http://nytimes.com/img/2152.jpg","type": "post","post_id": "2152","pub_date": "2011-05-25T13:00:00Z","section": "Politics","author": "Josh Jones"}'/>
</head>
<body about="http://dev.iptc.org/rnews/sample_story.html" typeof="rnews:Article">
<div>
<div>
<div property="rnews:headline">Allies Are Split...</div>
<span property="rnews:alternativeHeadline">NATO Takes Command</span>
<div rel="rnews:associatedMedia">
<div about="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg"
typeof="rnews:ImageObject">
<img src="img/libya_sample_reuters.jpg"/>
<div>Credit:

<!-- Declaring the Person Object for the creator -->
<span rel="rnews:creator ">
<span about="http://blogs.reuters.com/goran-tomasevic/"
typeof="rnews:Person">
<span property="rnews:name">Goran Tomasevic</span>
</span>
</span>/
<span rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">

<!--
Declaring the Organization Object for coprightHolder/source/provider
-->
<span about="http://www.reuters.com"
typeof="rnews:Organization">
<span property="rnews:name">Reuters</span>
</span>
</span>
</div>
<div property="rnews:description">Rebel fighters take...</div>

<!-- Adding hidden triples to our image object. -->
</div>
</div>
<div rel="rnews:creator">
<div about="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/"
typeof="rnews:Person">By
<span property="rnews:name">STEVEN LEE MYERS</span>
</div>
</div>
<div>
<span property="rnews:dateline">WASHINGTON</span> |
<span property="rnews:dateCreated" content="2011-03-24">March 24, 2011</span>
</div>
<div property="rnews:articleBody">
<p>Having largely succeeded...</p>
</div>
<div>
<p>
<a href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html"
rel="rnews:copyrightNotice">
© Copyright 2011
</a>
<span>The New York Times Company</span>
</p>
<p>
<a href="http://www.nytimes.com/ref/membercenter/help/agree.html"
rel="rnews:usageTerms">
Disclaimer
</a>
</p>
</div>
</div>
<div>
<div>
<div>Section</div>
<div property="rnews:articleSection">World</div>
</div>
<div>Tags</div>
<div>
<div rel="rnews:about">
<div>People</div>
<div about="http://data.nytimes.com/91178019641520997503" typeof="rnews:Person">
<span property="rnews:name">Qaddafi, Muammar el-</span>
</div>
</div>
</div>
<div rel="rnews:comment">
<div>Discussion</div>
<div about="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=4#comment4" typeof="rnews:UserComment">
<div property="rnews:commentText">So the question is...</div>
<div rel="rnews:creator">
<span about="http://timespeople.nytimes.com/view/user/27242827" typeof="rnews:Person">
<a href="http://timespeople.nytimes.com/view/user/27242827"
property="rnews:name">Chuck</a>
</span>
</div>
<div property="rnews:commentTime" content="2001-03-25T08:27:00">March 25th, 2011 8:27 am</div>
</div>
</div>
</div>
</div>
<div style="display:none">
<div property="rnews:description">The questions about the...</div>
<div property="rnews:inLanguage">en</div>
<div rel="rnews:thumbnailUrl"
href="http://http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif"></div>
<div rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">
<div about="http://www.nytimes.com" typeof="rnews:Organization">
<div rel="rnews:tickerSymbol">
NYSE NYT
</div>
</div>
</div>
</div>
</body>
</html>
116 changes: 116 additions & 0 deletions docs/rdf_errors.html
@@ -0,0 +1,116 @@
<html xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#" xmlns:og="http://ogp.me/ns#">
<head>
<meta property="og:tit" content="The Title"/>
<meta property="og:type" content="video.movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
<meta name="parsely-page" content='{"title": "Obama gives speech on Iraq", "link": "/2152/obama-iraq","image_url": "http://nytimes.com/img/2152.jpg","type": "post","postid": "2152","pubdate": "2011-05-25","section": "Politics","author": "Josh Jones"}'/>
</head>
<body about="http://dev.iptc.org/rnews/sample_story.html" typeof="rnews:Article">
<div>
<div>
<div property="rnews:headline">Allies Are Split...</div>
<div property="rnews:headline">Something different!</div>
<span property="rnews:alternativeHeadline">NATO Takes Command</span>
<div rel="rnews:associatedMedia">
<div about="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg"
typeof="rnews:imageObject">
<img src="img/libya_sample_reuters.jpg"/>
<div>Credit:

<!-- Declaring the Person Object for the creator -->
<span rel="rnews:creator ">
<span about="http://blogs.reuters.com/goran-tomasevic/"
typeof="rnews:Person">
<span property="rnews:name">Goran Tomasevic</span>
</span>
</span>/
<span rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">

<!--
Declaring the Organization Object for coprightHolder/source/provider
-->
<span about="http://www.reuters.com"
typeof="rnews:Organization">
<span property="rnews:articleBody">Reuters</span>
</span>
</span>
</div>
<div property="rnews:description">Rebel fighters take...</div>

<!-- Adding hidden triples to our image object. -->
</div>
</div>
<div rel="rnews:creator">
<div about="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/"
typeof="rnews:Persons">By
<span property="rnews:body">STEVEN LEE MYERS</span>
</div>
</div>
<div>
<span property="rnews:dateline">WASHINGTON</span> |
<span property="rnews:dateCreated" content="2011-03-24">March 24, 2011</span>
</div>
<div property="rnews:articleBody">
<p>Having largely succeeded...</p>
</div>
<div>
<p>
<a href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html"
rel="copyrightNotice">
© Copyright 2011
</a>
<span>The New York Times Company</span>
</p>
<p>
<a href="http://www.nytimes.com/ref/membercenter/help/agree.html"
rel="rnews:usageTerms">
Disclaimer
</a>
</p>
</div>
</div>
<div>
<div>
<div>Section</div>
<div property="rnews:articleSection">World</div>
</div>
<div>Tags</div>
<div>
<div rel="rnews:about">
<div>People</div>
<div about="http://data.nytimes.com/91178019641520997503" typeof="rnews:Person">
<span property="rnews:name">Qaddafi, Muammar el-</span>
</div>
</div>
</div>
<div rel="rnews:comment">
<div>Discussion</div>
<div about="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=4#comment4" typeof="rnews:comment">
<div property="rnews:commentText">So the question is...</div>
<div rel="rnews:creator">
<span about="http://timespeople.nytimes.com/view/user/27242827" typeof="rnews:Person">
<a href="http://timespeople.nytimes.com/view/user/27242827"
property="rnews:name">Chuck</a>
</span>
</div>
<div property="rnews:commentTime" content="2001-03-25T08:27:00">March 25th, 2011 8:27 am</div>
</div>
</div>
</div>
</div>
<div style="display:none">
<div property="rnews:description">The questions about the...</div>
<div property="rnews:inLanguage">en</div>
<div rel="rnews:thumbnailUrl"
href="http://http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif"></div>
<div rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">
<div about="http://www.nytimes.com" typeof="rnews:Organization">
<div rel="rnews:tickerSymbol">
NYSE NYT
</div>
</div>
</div>
</div>
</body>
</html>

0 comments on commit d685678

Please sign in to comment.