Permalink
Browse files

first commit

  • Loading branch information...
0 parents commit d685678676847bf7e81824c59250d7165ed9f182 @emmett9001 emmett9001 committed Aug 22, 2012
@@ -0,0 +1,4 @@
+*.pyc
+.*.swp
+.*.un~
+schemato_config.py
@@ -0,0 +1,39 @@
+schema.org/rNews Validator
+==========================
+
+This is a validator for the a number of embedded metadata standards. It
+works by reading the object ontology and comparing each of a set of
+parsed tuples from a document against this ontology.
+
+To test the validation, clone this repo and run
+
+ >>> from mrSchemato import Validator
+ >>> validator = Validator()
+ >>> validator.validate("docs/rdf.html")
+
+this will run a validation on a correctly-implemented RDFa document (rdf.html). To run
+a validation on a document with errors, use one of the error test files
+
+``>>> validator.validate("docs/schema_errors.html")``
+
+The full schema.org standard is now also supported. You can validate any page
+that uses this standard against the RDFa ontology hosted at schema.org. To
+test this, you can find an arbitrary nytimes.com article, or copy and paste
+this example
+
+``>>> validator.validate("http://www.nytimes.com/2012/07/19/world/middleeast/.....html")``
+
+The ``docs`` directory also includes four documents for testing the validation in RDFa
+and microdata, both with and without errors built in. Running the validator on
+either of the correct files should yield no errors.
+
+Hosted Service
+--------------
+
+The mrSchemato module is also incorporated into a web service that provides
+a nice frontend for the validation. To test this service locally, run
+``python server/schemato_web.py``. Then navigate to localhost:5000, paste
+a url into the search bar, and click "Validate" to run a validation on the document.
+
+Running this service locally also requires celery and rabbitmq to be running
+and properly configured.
No changes.
@@ -0,0 +1,116 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
+<html xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#" xmlns:og="http://ogp.me/ns#">
+<head>
+ <meta name="og:title" content="The Title"/>
+ <meta property="og:type" content="video.movie" />
+ <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
+ <meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
+ <meta name="parsely-page" content='{"title": "Obama gives speech on Iraq", "link": "http://nytimes.com/2152/obama-iraq","image_url": "http://nytimes.com/img/2152.jpg","type": "post","post_id": "2152","pub_date": "2011-05-25T13:00:00Z","section": "Politics","author": "Josh Jones"}'/>
+</head>
+<body about="http://dev.iptc.org/rnews/sample_story.html" typeof="rnews:Article">
+<div>
+ <div>
+ <div property="rnews:headline">Allies Are Split...</div>
+ <span property="rnews:alternativeHeadline">NATO Takes Command</span>
+ <div rel="rnews:associatedMedia">
+ <div about="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg"
+ typeof="rnews:ImageObject">
+ <img src="img/libya_sample_reuters.jpg"/>
+ <div>Credit:
+
+ <!-- Declaring the Person Object for the creator -->
+ <span rel="rnews:creator ">
+ <span about="http://blogs.reuters.com/goran-tomasevic/"
+ typeof="rnews:Person">
+ <span property="rnews:name">Goran Tomasevic</span>
+ </span>
+ </span>/
+ <span rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">
+
+ <!--
+ Declaring the Organization Object for coprightHolder/source/provider
+ -->
+ <span about="http://www.reuters.com"
+ typeof="rnews:Organization">
+ <span property="rnews:name">Reuters</span>
+ </span>
+ </span>
+ </div>
+ <div property="rnews:description">Rebel fighters take...</div>
+
+ <!-- Adding hidden triples to our image object. -->
+ </div>
+ </div>
+ <div rel="rnews:creator">
+ <div about="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/"
+ typeof="rnews:Person">By
+ <span property="rnews:name">STEVEN LEE MYERS</span>
+ </div>
+ </div>
+ <div>
+ <span property="rnews:dateline">WASHINGTON</span> |
+ <span property="rnews:dateCreated" content="2011-03-24">March 24, 2011</span>
+ </div>
+ <div property="rnews:articleBody">
+ <p>Having largely succeeded...</p>
+ </div>
+ <div>
+ <p>
+ <a href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html"
+ rel="rnews:copyrightNotice">
+ © Copyright 2011
+ </a>
+ <span>The New York Times Company</span>
+ </p>
+ <p>
+ <a href="http://www.nytimes.com/ref/membercenter/help/agree.html"
+ rel="rnews:usageTerms">
+ Disclaimer
+ </a>
+ </p>
+ </div>
+ </div>
+ <div>
+ <div>
+ <div>Section</div>
+ <div property="rnews:articleSection">World</div>
+ </div>
+ <div>Tags</div>
+ <div>
+ <div rel="rnews:about">
+ <div>People</div>
+ <div about="http://data.nytimes.com/91178019641520997503" typeof="rnews:Person">
+ <span property="rnews:name">Qaddafi, Muammar el-</span>
+ </div>
+ </div>
+ </div>
+ <div rel="rnews:comment">
+ <div>Discussion</div>
+ <div about="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=4#comment4" typeof="rnews:UserComment">
+ <div property="rnews:commentText">So the question is...</div>
+ <div rel="rnews:creator">
+ <span about="http://timespeople.nytimes.com/view/user/27242827" typeof="rnews:Person">
+ <a href="http://timespeople.nytimes.com/view/user/27242827"
+ property="rnews:name">Chuck</a>
+ </span>
+ </div>
+ <div property="rnews:commentTime" content="2001-03-25T08:27:00">March 25th, 2011 8:27 am</div>
+ </div>
+ </div>
+ </div>
+</div>
+<div style="display:none">
+ <div property="rnews:description">The questions about the...</div>
+ <div property="rnews:inLanguage">en</div>
+ <div rel="rnews:thumbnailUrl"
+ href="http://http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif"></div>
+ <div rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">
+ <div about="http://www.nytimes.com" typeof="rnews:Organization">
+ <div rel="rnews:tickerSymbol">
+ NYSE NYT
+ </div>
+ </div>
+ </div>
+</div>
+</body>
+</html>
@@ -0,0 +1,116 @@
+<html xmlns:rnews="http://iptc.org/std/rNews/2011-10-07#" xmlns:og="http://ogp.me/ns#">
+<head>
+ <meta property="og:tit" content="The Title"/>
+ <meta property="og:type" content="video.movie" />
+ <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
+ <meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
+ <meta name="parsely-page" content='{"title": "Obama gives speech on Iraq", "link": "/2152/obama-iraq","image_url": "http://nytimes.com/img/2152.jpg","type": "post","postid": "2152","pubdate": "2011-05-25","section": "Politics","author": "Josh Jones"}'/>
+</head>
+<body about="http://dev.iptc.org/rnews/sample_story.html" typeof="rnews:Article">
+<div>
+ <div>
+ <div property="rnews:headline">Allies Are Split...</div>
+ <div property="rnews:headline">Something different!</div>
+ <span property="rnews:alternativeHeadline">NATO Takes Command</span>
+ <div rel="rnews:associatedMedia">
+ <div about="http://graphics8.nytimes.com/images/2011/03/25/world/africa/Policy/Policy-articleLarge.jpg"
+ typeof="rnews:imageObject">
+ <img src="img/libya_sample_reuters.jpg"/>
+ <div>Credit:
+
+ <!-- Declaring the Person Object for the creator -->
+ <span rel="rnews:creator ">
+ <span about="http://blogs.reuters.com/goran-tomasevic/"
+ typeof="rnews:Person">
+ <span property="rnews:name">Goran Tomasevic</span>
+ </span>
+ </span>/
+ <span rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">
+
+ <!--
+ Declaring the Organization Object for coprightHolder/source/provider
+ -->
+ <span about="http://www.reuters.com"
+ typeof="rnews:Organization">
+ <span property="rnews:articleBody">Reuters</span>
+ </span>
+ </span>
+ </div>
+ <div property="rnews:description">Rebel fighters take...</div>
+
+ <!-- Adding hidden triples to our image object. -->
+ </div>
+ </div>
+ <div rel="rnews:creator">
+ <div about="http://topics.nytimes.com/topics/reference/timestopics/people/m/steven_lee_myers/"
+ typeof="rnews:Persons">By
+ <span property="rnews:body">STEVEN LEE MYERS</span>
+ </div>
+ </div>
+ <div>
+ <span property="rnews:dateline">WASHINGTON</span> |
+ <span property="rnews:dateCreated" content="2011-03-24">March 24, 2011</span>
+ </div>
+ <div property="rnews:articleBody">
+ <p>Having largely succeeded...</p>
+ </div>
+ <div>
+ <p>
+ <a href="http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html"
+ rel="copyrightNotice">
+ © Copyright 2011
+ </a>
+ <span>The New York Times Company</span>
+ </p>
+ <p>
+ <a href="http://www.nytimes.com/ref/membercenter/help/agree.html"
+ rel="rnews:usageTerms">
+ Disclaimer
+ </a>
+ </p>
+ </div>
+ </div>
+ <div>
+ <div>
+ <div>Section</div>
+ <div property="rnews:articleSection">World</div>
+ </div>
+ <div>Tags</div>
+ <div>
+ <div rel="rnews:about">
+ <div>People</div>
+ <div about="http://data.nytimes.com/91178019641520997503" typeof="rnews:Person">
+ <span property="rnews:name">Qaddafi, Muammar el-</span>
+ </div>
+ </div>
+ </div>
+ <div rel="rnews:comment">
+ <div>Discussion</div>
+ <div about="http://community.nytimes.com/comments/www.nytimes.com/2011/03/25/world/africa/25policy.html?permid=4#comment4" typeof="rnews:comment">
+ <div property="rnews:commentText">So the question is...</div>
+ <div rel="rnews:creator">
+ <span about="http://timespeople.nytimes.com/view/user/27242827" typeof="rnews:Person">
+ <a href="http://timespeople.nytimes.com/view/user/27242827"
+ property="rnews:name">Chuck</a>
+ </span>
+ </div>
+ <div property="rnews:commentTime" content="2001-03-25T08:27:00">March 25th, 2011 8:27 am</div>
+ </div>
+ </div>
+ </div>
+</div>
+<div style="display:none">
+ <div property="rnews:description">The questions about the...</div>
+ <div property="rnews:inLanguage">en</div>
+ <div rel="rnews:thumbnailUrl"
+ href="http://http://graphics8.nytimes.com/images/common/icons/t_wb_75.gif"></div>
+ <div rel="rnews:copyrightHolder rnews:sourceOrganization rnews:provider">
+ <div about="http://www.nytimes.com" typeof="rnews:Organization">
+ <div rel="rnews:tickerSymbol">
+ NYSE NYT
+ </div>
+ </div>
+ </div>
+</div>
+</body>
+</html>
Oops, something went wrong.

0 comments on commit d685678

Please sign in to comment.