Skip to content

Commit

Permalink
Rename parsers, update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Dmitry Dygalo committed Sep 22, 2015
1 parent 2340a54 commit 7313312
Show file tree
Hide file tree
Showing 4 changed files with 87 additions and 16 deletions.
81 changes: 76 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ class SecondChildParser(ParentParser):

### HTML & XML

For HTML and XML based interfaces XPath 1.0 syntax is used for settings declaration. Unfortunately XPath 2.0 is not supported by lxml.
XML is about the same as HTMLParser, but uses different lxml parser internally.
Here is an example of usage with ```requests```:

```Python
Expand All @@ -122,8 +124,29 @@ Here is an example of usage with ```requests```:
Example Domain
```

For HTML and XML based interfaces XPath 1.0 syntax is used for settings declaration. Unfortunately XPath 2.0 is not supported by lxml.
XML is about the same as HTMLParser, but uses different lxml parser internally.
If you need, you can execute more XPath queries at any time you want:

```Python
from pyanyapi import HTMLParser

>>> parser = HTMLParser({'header': 'string(.//h1/text())'})
>>> api = parser.parse('<html><body><h1>This is</h1><p>test</p></body></html>')
>>> api.header
This is
>>> api.parse('string(//p)')
test
```

### XML Objectify

Lxml provide interesting feature - objectified interface for XML. It converts whole XML to Python object. This parser doesn't requrie any settings. E.g:

```Python
from pyanyapi import XMLObjectifyParser

>>> XMLObjectifyParser().parse('<xml><test>123</test></xml>').test
123
```

### JSON

Expand All @@ -137,22 +160,53 @@ from pyanyapi import JSONParser
123
```

Or you can access values in lists by index:

```Python
from pyanyapi import JSONParser


>>> JSONParser({'second': 'container > 1'}).parse('{"container":["first", "second", "third"]}').second
second
```

### Regular Expressions Interface

In case, when data has bad format or is just very complex to be parsed with bundled tools, you can use parser based on regular expressions.
Settings is based on Python's regular expressions. It is most powerful parser, because of its simplicity.


```Python
from pyanyapi import RegExpResponseParser
from pyanyapi import RegExpParser

>>> RegExpResponseParser({'error_code': 'Error (\d+)'}).parse('Oh no!!! It is Error 100!!!').error_code
>>> RegExpParser({'error_code': 'Error (\d+)'}).parse('Oh no!!! It is Error 100!!!').error_code
100
```
### Custom Interface

You can easily declare your own interface.
You can easily declare your own interface. For that you should define ```execute_method``` method. And optionally ```perform_parsing```.
Here is an example of naive CSVInterface, which provide an ability to get column value by index. Also you should create separate parser for that.

```Python
from pyanyapi import BaseInterface, BaseParser


class CSVInterface(BaseInterface):

def perform_parsing(self):
return self.content.split(',')

def execute_method(self, settings):
return self.parsed_content[settings]


class CSVParser(BaseParser):
interface_class = CSVInterface


>>> CSVParser({'second': 1}).parse('1,2,3').second
2
```

Extending interfaces
--------------------
Expand Down Expand Up @@ -215,6 +269,23 @@ Complex content parsing
### Combined parsers

In situations, when particular content type is unknown before parsing, you can create combined parser, which allows you to use multiply different parsers transparently.
E.g. some server usually returns JSON, but in cases of server errors it returns HTML pages with some text. Then:

```Python
from pyanyapi import CombinedParser, HTMLParser, JSONParser

class Parser(CombinedParser):
parsers = [
JSONParser({'test': 'test'}),
HTMLParser({'error': 'string(//span)'})
]

>>> parser = Parser()
>>> parser.parse('{"test": "Text"}').content
Text
>>> parser.parse('<body><span>123</span></body>').error
123
```

### Another example

Expand Down
4 changes: 2 additions & 2 deletions pyanyapi/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
Module provides tools for convenient interface creation over various types of data in declarative way.
"""
from .parsers import (
ResponseParser,
BaseParser,
CombinedParser,
HTMLParser,
XMLParser,
XMLObjectifyParser,
JSONParser,
RegExpResponseParser,
RegExpParser,
)
from .interfaces import (
BaseInterface,
Expand Down
14 changes: 7 additions & 7 deletions pyanyapi/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from .helpers import attach_attribute, attach_cached_property


class ResponseParser(object):
class BaseParser(object):
"""
Fabric for some API-like components, which supposes to provide interface to different types of content.
"""
Expand Down Expand Up @@ -101,7 +101,7 @@ def __and__(self, other):
return CombinedParser(self, other)


class CombinedParser(ResponseParser):
class CombinedParser(BaseParser):
"""
Combines multiple parsers in one. Its can be different types also.
"""
Expand All @@ -122,24 +122,24 @@ def get_interface_kwargs(self):
return kwargs


class HTMLParser(ResponseParser):
class HTMLParser(BaseParser):
interface_class = XPathInterface


class XMLParser(ResponseParser):
class XMLParser(BaseParser):
interface_class = XMLInterface

def prepare_content(self, content):
return content.replace('encoding="UTF-8"', '').replace('encoding="utf-8"', '')


class XMLObjectifyParser(ResponseParser):
class XMLObjectifyParser(BaseParser):
interface_class = XMLObjectifyInterface


class JSONParser(ResponseParser):
class JSONParser(BaseParser):
interface_class = JSONInterface


class RegExpResponseParser(ResponseParser):
class RegExpParser(BaseParser):
interface_class = RegExpInterface
4 changes: 2 additions & 2 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# coding: utf-8
import pytest

from pyanyapi import HTMLParser, JSONParser, RegExpResponseParser, CombinedParser, interface_property, interface_method
from pyanyapi import HTMLParser, JSONParser, RegExpParser, CombinedParser, interface_property, interface_method


class EmptyValuesParser(CombinedParser):
parsers = [
RegExpResponseParser({'test': '\d,\d'}),
RegExpParser({'test': '\d,\d'}),
JSONParser(
{
'test': {
Expand Down

0 comments on commit 7313312

Please sign in to comment.