schemato

Schemato is a validator for HTML-embedded metadata standards. It knows the location of the official schema definitions, and uses these documents as validation templates. As a contributor, you can easily subclass the base validator class to plug into this functionality.

To see the validator in action:

from schemato import Schemato sc = Schemato("my_test.html") res = sc.validate() [a.to_dict() for a in res]

The first time you run schemato, it will make requests for the latest versions of the official schema definitions. Schemato will then call the validate() method of the Validator subclasses listed in settings.py.

There are a few test documents available for validation in the test_documents subdirectory.

Download

Download the source from PyPI with

pip install schemato

You can also clone this repo and take a closer look at the code and test documents

git clone https://github.com/Parsely/schemato.git

Then, install the library with

python setup.py install

Tests

To run the tests for Schemato:

pip install pytest cd schemato py.test test.py

Distiller

Schemato's distiller framework lets you implement strategies for creating a "normalized" set of metadata by mixing and matching metadata from different supported standards.

Supported so far:

parsely-page

OpenGraph

Schema.org NewsArticle

Take a look at the clean Python class definitions that describe the strategies:

https://github.com/Parsely/schemato/blob/master/schemato/distillery.py

There are two examples -- one that tries pp and falls back on Schema.org/OpenGraph (called ParselyDistiller) and another the tries Schema.org and falls back on OpenGraph (called NewsDistiller).

The distiller returns a clean Python dictionary that has all the extracted fields, as well as a dictionary describing which metadata standard was used to source each field. The framework is defined here:

https://github.com/Parsely/schemato/blob/master/schemato/distillers.py

Here is an example of usage:

python

from schemato import Schemato from schemato.distillery import ParselyDistiller, NewsDistiller mashable = Schemato("http://mashable.com/2012/10/17/iphone-5-supply-problems/") ParselyDistiller(mashable).distill()

{'author': u'Seth Fiegerman', 'image_url': u'http://5.mshcdn.com/wp-content/uploads/2012/10/iphone-lineup.jpg', 'link': u'http://mashable.com/2012/10/17/iphone-5-supply-problems/', 'page_type': u'post', 'post_id': u'1432059', 'pub_date': u'2012-10-17T11:36:40+00:00', 'section': u'bus', 'site': 'Mashable', 'title': u"Apple's Manufacturing Partner Explains iPhone 5 Supply Problems"}

In this case, Mashable implements the parsely-page metadata field, which is used to source all the defined properties for this distiller.

python

d = NewsDistiller(mashable) d.distill()

{'author': None, 'id': None, 'image_url': 'http://5.mshcdn.com/wp-content/uploads/2012/10/iphone-lineup.jpg', 'link': 'http://mashable.com/2012/10/17/iphone-5-supply-problems/', 'pub_date': None, 'section': None, 'title': "Apple's Manufacturing Partner Explains iPhone 5 Supply Problems"}

d.sources

{'author': None, 'id': None, 'image_url': 'og:image', 'link': 'og:url', 'pub_date': None, 'section': None, 'title': 'og:title'}

In this case, our strategy did not involve parsely-page, and instead used Schema.org and OpenGraph. Since Mashable does not implement Schema.org but does implement OpenGraph, it comes up with the fields it can. The sources property shows which fields were populated and how they got their values.

Support

If you need help using Schemato, or have found a bug, please create an issue on the [Github repo](https://github.com/Parsely/schemato/issues?state=open).

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
schemato		schemato
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
LICENSE		LICENSE
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schemato

schemato

.gitignore

.gitignore

.travis.yml

.travis.yml

AUTHORS.rst

AUTHORS.rst

LICENSE

LICENSE

README.rst

README.rst

setup.py

setup.py

Repository files navigation

schemato

Download

Tests

Distiller

Support

About

Releases

Packages

Contributors 4

Languages

License

Parsely/schemato

Folders and files

Latest commit

History

Repository files navigation

schemato

Download

Tests

Distiller

Support

About

Resources

License

Stars

Watchers

Forks

Languages