Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let extended translators override what XPathExpr class is used #22

Merged
merged 1 commit into from
Jan 10, 2013

Conversation

dangra
Copy link
Member

@dangra dangra commented Jan 10, 2013

GenericTranslator offers an excelent way to support custom selectors
trough method hooks and allowing to return a new XPathExpr from this
hooks.

The main problem is that returning extended XPathExpr instances fail
for combiners because XPathExpr.join() assume a fixed XPathExpr
instance attributes (element, path and condition) to copy from other to self

XPathExpr.join() can be extended in subclass but needs that left
xpath instance to be of the extended class too, and right now we can
only control right xpath type.

The problem can be mitigated by recasting all xpath returned from
GenericTranslator.xpath_element() that only works because it is the
only hook that cast XPathExpr instances.

The proposed change allow projects extending GenericTranslator to also
safely extend XPathExpr to correctly support combiners in extended
features.

GenericTranslator offers an excelent way to support custom selectors
trough method hooks and allowing to return a *new* XPathExpr from this
hooks.

The main problem is that returning extended `XPathExpr` instances fail
for combiners because `XPathExpr.join()` assume a fixed XPathExpr
instance attributes (element, path and condition) to copy from `other` to `self`

`XPathExpr.join()` can be extended in subclass but needs that `left`
xpath instance to be of the extended class too, and right now we can
only control `right` xpath type.

The problem can be mitigated by recasting all xpath returned from
`GenericTranslator.xpath_element()` that only works because it is the
only hook that cast `XPathExpr` instances.

The proposed change allow projects extending GenericTranslator to also
safely extend `XPathExpr` to correctly support combiners in extended
features.
@dangra
Copy link
Member Author

dangra commented Jan 10, 2013

To give some real context to this pull request you must know I am working on adding CSS selectors to Scrapy project based on cssselect

right now CSS can't select xpath text nodes or attributes and we want to keep this functionality within the selector query

I found GenericTranslator can be extended trough hook (great!) but found some quirks trying to implement it,
see scrapy/selector/csssel.py

and example translations at scrapy/tests/test_selector_csselect.py

thanks!

@SimonSapin
Copy link
Contributor

This change looks easy enough, but I’m reluctant to encourage users to hook into the translators and even more so XPathExpr because backward-compatibility concerns effectively freeze the API.

However I am not confident at all in XPathExpr as an API to be frozen. It basically hasn’t changed yet since I inherited it from lxml.cssselect, but I think that it will need to change in order to properly implement CSS Selectors. See #4, #9, #12 and #18, among others.

I might still accept this change, but use this kind of hook at your own risks. In particular you might want to have cssselect==0.7.1 rather than just cssselect in your requirements, as 0.8 or 1.0 will probably break you program.

@dangra
Copy link
Member Author

dangra commented Jan 10, 2013

I just found pyquery uses cssselect, it extends cssselect.HTMLTranslator and XPathExpr, but pyquery devs took a monkeypatch approach https://github.com/gawel/pyquery/blob/master/pyquery/cssselectpatch.py#L230

Do you think the change is good for pyquery too @gawel?

@SimonSapin
Copy link
Contributor

Well, all of the warnings above also apply to pyquery. They should at least consider requiring cssselect<0.8.

Overall I’m not sure how to manage this whole situation.

@dangra
Copy link
Member Author

dangra commented Jan 10, 2013

hi @SimonSapin, I am more worried about cssselect removing its extensibility than breaking its API.

I took a look to lselect and there isn't a clear way on how to extend it,.
do you plan to remove all form of hooks/extensibility after cssselect 0.8?

@gawel
Copy link

gawel commented Jan 10, 2013

No problem. I'll just have to check if the class attribute exist and use the evil monkey patch iif it's needed

@dangra
Copy link
Member Author

dangra commented Jan 10, 2013

@gawel: the change is pretty simple and much better than monkeypatching, Simon is OK to merge it if we take the risk of changes to cssselect breaking our projects code. This is something we already do by extending undocumented/unstable GenericTranslator and XPathExpr classes.

@SimonSapin: sorry for insisting, my only concern on using cssselect is if you plan to completely remove its extensibility in future release, just breaking current apis doesn't matter to me.

thanks!

@SimonSapin
Copy link
Contributor

@dangra I’m not planning on removing the extensibility, just keeping it undocumented and reserve the right to break it at every minor version until I’m confident that the API is good enough to be frozen.

(By beaking I mean changing it in backward-incompatible ways, not removing features or extensibility.)

As to lselect, it’s a two-weekends hack that is still highly experimental and probably broken. I’m not sure yet if it’ll be extensible or not, but in any case it’s completely separated from cssselect. cssselect won’t go away.

@dangra
Copy link
Member Author

dangra commented Jan 10, 2013

cool 👍

SimonSapin added a commit that referenced this pull request Jan 10, 2013
Let extended translators override what XPathExpr class is used
@SimonSapin SimonSapin merged commit 8cf8755 into scrapy:master Jan 10, 2013
@SimonSapin
Copy link
Contributor

See eac05a4

jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Apr 11, 2014
Version 0.9.1
-------------

Released on 2013-10-17.

* **Backward incompatible change from 0.9**:
  :meth:`~GenericTranslator.selector_to_xpath` defaults to
  ignoring pseudo-elements,
  as it did in 0.8 and previous versions.
  (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.)
* Drop official support for Python 2.4 and 3.1,
  as testing was becoming difficult.
  Nothing will break overnight,
  but future releases may on may not work on these versions.
  Older releases will remain available on PyPI.


Version 0.9
-----------

Released on 2013-10-11.

Add parser support for :attr:`functional
pseudo-elements <Selector.pseudo_element>`.

*Update:*
This version accidentally introduced a **backward incompatible** change:
:meth:`~GenericTranslator.selector_to_xpath` defaults to
rejecting pseudo-elements instead of ignoring them.


Version 0.8
-----------

Released on 2013-03-15.

Improvements:

* `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_
  Let extended translators override what XPathExpr class is used
* `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_
  Use the built-in ``lang()`` XPath function
  for implementing the ``:lang()`` pseudo-class
  with XML documents.
  This is probably faster than ``ancestor-or-self::``.

Bug fixes:

* `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_
  Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.)
* `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_
  As per the spec, elements containing only whitespace are not considered empty
  for the ``:empty`` pseudo-class.


Version 0.7.1
-------------

Released on 2012-06-14. Code name *remember-to-test-with-tox*.

0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x.
Now all is well again.

Also, pseudo-elements are now correctly made lower-case. (They are supposed
to be case-insensitive.)
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Apr 13, 2014
Version 0.9.1
-------------

Released on 2013-10-17.

* **Backward incompatible change from 0.9**:
  :meth:`~GenericTranslator.selector_to_xpath` defaults to
  ignoring pseudo-elements,
  as it did in 0.8 and previous versions.
  (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.)
* Drop official support for Python 2.4 and 3.1,
  as testing was becoming difficult.
  Nothing will break overnight,
  but future releases may on may not work on these versions.
  Older releases will remain available on PyPI.


Version 0.9
-----------

Released on 2013-10-11.

Add parser support for :attr:`functional
pseudo-elements <Selector.pseudo_element>`.

*Update:*
This version accidentally introduced a **backward incompatible** change:
:meth:`~GenericTranslator.selector_to_xpath` defaults to
rejecting pseudo-elements instead of ignoring them.


Version 0.8
-----------

Released on 2013-03-15.

Improvements:

* `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_
  Let extended translators override what XPathExpr class is used
* `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_
  Use the built-in ``lang()`` XPath function
  for implementing the ``:lang()`` pseudo-class
  with XML documents.
  This is probably faster than ``ancestor-or-self::``.

Bug fixes:

* `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_
  Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.)
* `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_
  As per the spec, elements containing only whitespace are not considered empty
  for the ``:empty`` pseudo-class.


Version 0.7.1
-------------

Released on 2012-06-14. Code name *remember-to-test-with-tox*.

0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x.
Now all is well again.

Also, pseudo-elements are now correctly made lower-case. (They are supposed
to be case-insensitive.)
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Jun 11, 2014
Version 0.9.1
-------------

Released on 2013-10-17.

* **Backward incompatible change from 0.9**:
  :meth:`~GenericTranslator.selector_to_xpath` defaults to
  ignoring pseudo-elements,
  as it did in 0.8 and previous versions.
  (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.)
* Drop official support for Python 2.4 and 3.1,
  as testing was becoming difficult.
  Nothing will break overnight,
  but future releases may on may not work on these versions.
  Older releases will remain available on PyPI.


Version 0.9
-----------

Released on 2013-10-11.

Add parser support for :attr:`functional
pseudo-elements <Selector.pseudo_element>`.

*Update:*
This version accidentally introduced a **backward incompatible** change:
:meth:`~GenericTranslator.selector_to_xpath` defaults to
rejecting pseudo-elements instead of ignoring them.


Version 0.8
-----------

Released on 2013-03-15.

Improvements:

* `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_
  Let extended translators override what XPathExpr class is used
* `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_
  Use the built-in ``lang()`` XPath function
  for implementing the ``:lang()`` pseudo-class
  with XML documents.
  This is probably faster than ``ancestor-or-self::``.

Bug fixes:

* `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_
  Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.)
* `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_
  As per the spec, elements containing only whitespace are not considered empty
  for the ``:empty`` pseudo-class.


Version 0.7.1
-------------

Released on 2012-06-14. Code name *remember-to-test-with-tox*.

0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x.
Now all is well again.

Also, pseudo-elements are now correctly made lower-case. (They are supposed
to be case-insensitive.)
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Oct 11, 2014
Version 0.9.1
-------------

Released on 2013-10-17.

* **Backward incompatible change from 0.9**:
  :meth:`~GenericTranslator.selector_to_xpath` defaults to
  ignoring pseudo-elements,
  as it did in 0.8 and previous versions.
  (:meth:`~GenericTranslator.css_to_xpath` doesn’t change.)
* Drop official support for Python 2.4 and 3.1,
  as testing was becoming difficult.
  Nothing will break overnight,
  but future releases may on may not work on these versions.
  Older releases will remain available on PyPI.


Version 0.9
-----------

Released on 2013-10-11.

Add parser support for :attr:`functional
pseudo-elements <Selector.pseudo_element>`.

*Update:*
This version accidentally introduced a **backward incompatible** change:
:meth:`~GenericTranslator.selector_to_xpath` defaults to
rejecting pseudo-elements instead of ignoring them.


Version 0.8
-----------

Released on 2013-03-15.

Improvements:

* `#22 <https://github.com/SimonSapin/cssselect/issues/22>`_
  Let extended translators override what XPathExpr class is used
* `#19 <https://github.com/SimonSapin/cssselect/issues/19>`_
  Use the built-in ``lang()`` XPath function
  for implementing the ``:lang()`` pseudo-class
  with XML documents.
  This is probably faster than ``ancestor-or-self::``.

Bug fixes:

* `#14 <https://github.com/SimonSapin/cssselect/issues/14>`_
  Fix non-ASCII pseudo-classes. (Invalid selector instead of crash.)
* `#20 <https://github.com/SimonSapin/cssselect/issues/20>`_
  As per the spec, elements containing only whitespace are not considered empty
  for the ``:empty`` pseudo-class.


Version 0.7.1
-------------

Released on 2012-06-14. Code name *remember-to-test-with-tox*.

0.7 broke the parser in Python 2.4 and 2.5; the tests in 2.x.
Now all is well again.

Also, pseudo-elements are now correctly made lower-case. (They are supposed
to be case-insensitive.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants