Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit d685678
Showing
30 changed files
with
10,077 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
*.pyc | ||
.*.swp | ||
.*.un~ | ||
schemato_config.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
schema.org/rNews Validator | ||
========================== | ||
|
||
This is a validator for the a number of embedded metadata standards. It | ||
works by reading the object ontology and comparing each of a set of | ||
parsed tuples from a document against this ontology. | ||
|
||
To test the validation, clone this repo and run | ||
|
||
>>> from mrSchemato import Validator | ||
>>> validator = Validator() | ||
>>> validator.validate("docs/rdf.html") | ||
|
||
this will run a validation on a correctly-implemented RDFa document (rdf.html). To run | ||
a validation on a document with errors, use one of the error test files | ||
|
||
``>>> validator.validate("docs/schema_errors.html")`` | ||
|
||
The full schema.org standard is now also supported. You can validate any page | ||
that uses this standard against the RDFa ontology hosted at schema.org. To | ||
test this, you can find an arbitrary nytimes.com article, or copy and paste | ||
this example | ||
|
||
``>>> validator.validate("http://www.nytimes.com/2012/07/19/world/middleeast/.....html")`` | ||
|
||
The ``docs`` directory also includes four documents for testing the validation in RDFa | ||
and microdata, both with and without errors built in. Running the validator on | ||
either of the correct files should yield no errors. | ||
|
||
Hosted Service | ||
-------------- | ||
|
||
The mrSchemato module is also incorporated into a web service that provides | ||
a nice frontend for the validation. To test this service locally, run | ||
``python server/schemato_web.py``. Then navigate to localhost:5000, paste | ||
a url into the search bar, and click "Validate" to run a validation on the document. | ||
|
||
Running this service locally also requires celery and rabbitmq to be running | ||
and properly configured. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> | ||
<html xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#" xmlns:og="http://ogp.me/ns#"> | ||
<head> | ||
<meta name="og:title" content="The Title"/> | ||
<meta property="og:type" content="video.movie" /> | ||
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" /> | ||
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> | ||
<meta name="parsely-page" content='{"title": "Obama gives speech on Iraq", "link": "http://nytimes.com/2152/obama-iraq","image_url": "http://nytimes.com/img/2152.jpg","type": "post","post_id": "2152","pub_date": "2011-05-25T13:00:00Z","section": "Politics","author": "Josh Jones"}'/> | ||
</head> | ||
<body about="http://dev.iptc.org/rnews/sample_story.html" typeof="rnews:Article"> | ||
<div> | ||
<div> | ||
<div property="rnews:headline">Allies Are Split...</div> | ||
<span property="rnews:alternativeHeadline">NATO Takes Command</span> | ||
<div rel="rnews:associatedMedia"> | ||
<div about="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg" | ||
typeof="rnews:ImageObject"> | ||
<img src="img/libya_sample_reuters.jpg"/> | ||
<div>Credit: | ||
|
||
<!-- Declaring the Person Object for the creator --> | ||
<span rel="rnews:creator "> | ||
<span about="http://blogs.reuters.com/goran-tomasevic/" | ||
typeof="rnews:Person"> | ||
<span property="rnews:name">Goran Tomasevic</span> | ||
</span> | ||
</span>/ | ||
<span rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider"> | ||
|
||
<!-- | ||
Declaring the Organization Object for coprightHolder/source/provider | ||
--> | ||
<span about="http://www.reuters.com" | ||
typeof="rnews:Organization"> | ||
<span property="rnews:name">Reuters</span> | ||
</span> | ||
</span> | ||
</div> | ||
<div property="rnews:description">Rebel fighters take...</div> | ||
|
||
<!-- Adding hidden triples to our image object. --> | ||
</div> | ||
</div> | ||
<div rel="rnews:creator"> | ||
<div about="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/" | ||
typeof="rnews:Person">By | ||
<span property="rnews:name">STEVEN LEE MYERS</span> | ||
</div> | ||
</div> | ||
<div> | ||
<span property="rnews:dateline">WASHINGTON</span> | | ||
<span property="rnews:dateCreated" content="2011-03-24">March 24, 2011</span> | ||
</div> | ||
<div property="rnews:articleBody"> | ||
<p>Having largely succeeded...</p> | ||
</div> | ||
<div> | ||
<p> | ||
<a href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html" | ||
rel="rnews:copyrightNotice"> | ||
© Copyright 2011 | ||
</a> | ||
<span>The New York Times Company</span> | ||
</p> | ||
<p> | ||
<a href="http://www.nytimes.com/ref/membercenter/help/agree.html" | ||
rel="rnews:usageTerms"> | ||
Disclaimer | ||
</a> | ||
</p> | ||
</div> | ||
</div> | ||
<div> | ||
<div> | ||
<div>Section</div> | ||
<div property="rnews:articleSection">World</div> | ||
</div> | ||
<div>Tags</div> | ||
<div> | ||
<div rel="rnews:about"> | ||
<div>People</div> | ||
<div about="http://data.nytimes.com/91178019641520997503" typeof="rnews:Person"> | ||
<span property="rnews:name">Qaddafi, Muammar el-</span> | ||
</div> | ||
</div> | ||
</div> | ||
<div rel="rnews:comment"> | ||
<div>Discussion</div> | ||
<div about="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=4#comment4" typeof="rnews:UserComment"> | ||
<div property="rnews:commentText">So the question is...</div> | ||
<div rel="rnews:creator"> | ||
<span about="http://timespeople.nytimes.com/view/user/27242827" typeof="rnews:Person"> | ||
<a href="http://timespeople.nytimes.com/view/user/27242827" | ||
property="rnews:name">Chuck</a> | ||
</span> | ||
</div> | ||
<div property="rnews:commentTime" content="2001-03-25T08:27:00">March 25th, 2011 8:27 am</div> | ||
</div> | ||
</div> | ||
</div> | ||
</div> | ||
<div style="display:none"> | ||
<div property="rnews:description">The questions about the...</div> | ||
<div property="rnews:inLanguage">en</div> | ||
<div rel="rnews:thumbnailUrl" | ||
href="http://http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif"></div> | ||
<div rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider"> | ||
<div about="http://www.nytimes.com" typeof="rnews:Organization"> | ||
<div rel="rnews:tickerSymbol"> | ||
NYSE NYT | ||
</div> | ||
</div> | ||
</div> | ||
</div> | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
<html xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#" xmlns:og="http://ogp.me/ns#"> | ||
<head> | ||
<meta property="og:tit" content="The Title"/> | ||
<meta property="og:type" content="video.movie" /> | ||
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" /> | ||
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> | ||
<meta name="parsely-page" content='{"title": "Obama gives speech on Iraq", "link": "/2152/obama-iraq","image_url": "http://nytimes.com/img/2152.jpg","type": "post","postid": "2152","pubdate": "2011-05-25","section": "Politics","author": "Josh Jones"}'/> | ||
</head> | ||
<body about="http://dev.iptc.org/rnews/sample_story.html" typeof="rnews:Article"> | ||
<div> | ||
<div> | ||
<div property="rnews:headline">Allies Are Split...</div> | ||
<div property="rnews:headline">Something different!</div> | ||
<span property="rnews:alternativeHeadline">NATO Takes Command</span> | ||
<div rel="rnews:associatedMedia"> | ||
<div about="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg" | ||
typeof="rnews:imageObject"> | ||
<img src="img/libya_sample_reuters.jpg"/> | ||
<div>Credit: | ||
|
||
<!-- Declaring the Person Object for the creator --> | ||
<span rel="rnews:creator "> | ||
<span about="http://blogs.reuters.com/goran-tomasevic/" | ||
typeof="rnews:Person"> | ||
<span property="rnews:name">Goran Tomasevic</span> | ||
</span> | ||
</span>/ | ||
<span rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider"> | ||
|
||
<!-- | ||
Declaring the Organization Object for coprightHolder/source/provider | ||
--> | ||
<span about="http://www.reuters.com" | ||
typeof="rnews:Organization"> | ||
<span property="rnews:articleBody">Reuters</span> | ||
</span> | ||
</span> | ||
</div> | ||
<div property="rnews:description">Rebel fighters take...</div> | ||
|
||
<!-- Adding hidden triples to our image object. --> | ||
</div> | ||
</div> | ||
<div rel="rnews:creator"> | ||
<div about="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/" | ||
typeof="rnews:Persons">By | ||
<span property="rnews:body">STEVEN LEE MYERS</span> | ||
</div> | ||
</div> | ||
<div> | ||
<span property="rnews:dateline">WASHINGTON</span> | | ||
<span property="rnews:dateCreated" content="2011-03-24">March 24, 2011</span> | ||
</div> | ||
<div property="rnews:articleBody"> | ||
<p>Having largely succeeded...</p> | ||
</div> | ||
<div> | ||
<p> | ||
<a href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html" | ||
rel="copyrightNotice"> | ||
© Copyright 2011 | ||
</a> | ||
<span>The New York Times Company</span> | ||
</p> | ||
<p> | ||
<a href="http://www.nytimes.com/ref/membercenter/help/agree.html" | ||
rel="rnews:usageTerms"> | ||
Disclaimer | ||
</a> | ||
</p> | ||
</div> | ||
</div> | ||
<div> | ||
<div> | ||
<div>Section</div> | ||
<div property="rnews:articleSection">World</div> | ||
</div> | ||
<div>Tags</div> | ||
<div> | ||
<div rel="rnews:about"> | ||
<div>People</div> | ||
<div about="http://data.nytimes.com/91178019641520997503" typeof="rnews:Person"> | ||
<span property="rnews:name">Qaddafi, Muammar el-</span> | ||
</div> | ||
</div> | ||
</div> | ||
<div rel="rnews:comment"> | ||
<div>Discussion</div> | ||
<div about="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=4#comment4" typeof="rnews:comment"> | ||
<div property="rnews:commentText">So the question is...</div> | ||
<div rel="rnews:creator"> | ||
<span about="http://timespeople.nytimes.com/view/user/27242827" typeof="rnews:Person"> | ||
<a href="http://timespeople.nytimes.com/view/user/27242827" | ||
property="rnews:name">Chuck</a> | ||
</span> | ||
</div> | ||
<div property="rnews:commentTime" content="2001-03-25T08:27:00">March 25th, 2011 8:27 am</div> | ||
</div> | ||
</div> | ||
</div> | ||
</div> | ||
<div style="display:none"> | ||
<div property="rnews:description">The questions about the...</div> | ||
<div property="rnews:inLanguage">en</div> | ||
<div rel="rnews:thumbnailUrl" | ||
href="http://http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif"></div> | ||
<div rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider"> | ||
<div about="http://www.nytimes.com" typeof="rnews:Organization"> | ||
<div rel="rnews:tickerSymbol"> | ||
NYSE NYT | ||
</div> | ||
</div> | ||
</div> | ||
</div> | ||
</body> | ||
</html> |
Oops, something went wrong.