fixed lxml errors when reading Tomcat error messages. #92

atkinson · 2013-08-05T05:39:21Z

When parsing error messages pysolr assumes that Tomcat will send a certain flavour of invalid response.

Sometime in Tomcat 6 (or maybe Solr4) the assertion that this code was based on became untrue, and so the error handling code in pysolr began creating it's own error. This may only be true when using lxml (it's required in my project so I haven't tested without).

This fix prevents pysolr from obscuring the tomcat error message with it's own, if it fails to find the tag it's looking for.

This is what I was getting before making this fix:

File "/home/webapps/.virtualenvs/myapp/local/lib/python2.7/site-packages/pysolr.py", line 404, in _scrape_response
p_nodes = body_node.cssselect('p')
AttributeError: 'NoneType' object has no attribute 'cssselect'

…g tomcat sends certain invalid responses is dangerous

acdha · 2013-08-05T13:34:52Z

I think this could be a bit cleaner: rather than trapping all AttributeErrors, we should probably just have an if not body_node bailout so we don't potentially mask other errors further down.

The if reason is None or p_nodes is None: check also seems dangerous: unlike reason, p_nodes is not certain to be defined. From a quick read of that code, I'm not sure why we need the second check - it seems like the existing reason check is sufficient.

atkinson · 2013-08-05T13:40:49Z

That would work too.

MaximusV · 2013-10-17T14:24:45Z

Had the exact same issue with Solr 4.5 on Tomcat6 using lxml 3.2.3. Took acdha's point on board, just put an 'if body_node:' around the whole p_nodes section, line 431 in the diff. I'll comment in the diff where I made the change. Works fine for me.
Thanks!

MaximusV · 2013-10-17T14:27:18Z

pysolr.py

@@ -426,25 +426,28 @@ def _scrape_response(self, headers, response):
        dom_tree = None

        if server_type == 'tomcat':
-            # Tomcat doesn't produce a valid XML response
-            soup = lxml.html.fromstring(response)
-            body_node = soup.find('body')


Just put 'if body_node:' here and indented the p_nodes declaration and processing loop.

excieve · 2014-04-10T15:28:57Z

Is there any fix planned for this? Experiencing it with pysolr 3.2.0, lxml 3.3.4, tomcat 6.0.35 and solr 4.7.1.

marcelchastain · 2014-05-02T04:49:47Z

@acdha are we waiting on someone to make a pull request with cleaner code for this? @MaximusV has a pretty straightforward fix, would that qualify?

acdha · 2014-05-02T11:33:34Z

pysolr.py


-            if reason is None:
+            if reason is None or p_nodes is None:


p_nodes won't be defined here if something causes that big try block to bail out before line 433. Is it even necessary to check here, however, given that such a failure would leave reason as None?

acdha · 2014-05-02T11:35:35Z

@marcelchastain I would like to see a cleaner patch – unless I'm missing something this one introduces a check on the p_nodes variable which isn't always defined. We need to clean that up before merging. If you wanted to go for honors a test using a canned Tomcat error message which triggers this codepath would be much appreciated.

marcelchastain · 2014-05-02T14:53:18Z

@acdha thanks for the quick reply. I'll try to get something going later today

frankamp · 2014-10-16T18:35:56Z

Looking at that method, and the issues surrounding lxml, it seems like a small bit of manual parsing that has simpler rules for finding an error message is a better idea and fixes all of the complaints around lxml. I've proposed an alternative #133 Its not perfect, but for us it kills 3 deps, improves build time by 5-10 minutes and works great.

acdha · 2014-10-24T21:43:34Z

I tweaked @frankamp's patch in #133 a bit and am liking the reduced dependencies:

https://github.com/acdha/pysolr/tree/simple-error-extraction

Does anyone have some actual Tomcat error messages which we could pull into the test suite? I'm thinking that a simple regex or two to hit the most common cases and falling back to the raw HTML is better than spending time staying in sync with Tomcat, particularly since we're already passing the full response as extra logging data:

https://github.com/toastdriven/pysolr/blob/6e62fad989192d206c21e9acee28d5b1e1a8a0db/pysolr.py#L319-L321

pembo13 · 2015-01-21T15:19:40Z

What's the status of this, I'm still getting errors when trying to build an index.

acdha · 2015-01-21T15:27:20Z

@pembo13 If you can, please test the branch I referenced above – and send us some of the Tomcat error messages you're receiving so we can add them to the test suite.

domenkozar · 2015-05-27T21:07:07Z

This can be closed now, code no longer exists.

fixed lxml errors when reading Tomcat error messages. I think assumin…

3ff9745

…g tomcat sends certain invalid responses is dangerous

MaximusV reviewed Oct 17, 2013
View reviewed changes

MaximusV mentioned this pull request Nov 21, 2013

Latest version of pysolr messes up the log output. yougov/mongo-connector#19

Closed

andreif mentioned this pull request Mar 20, 2014

Add support for error response in JSON format. Closes #108, #109 #113

Closed

acdha reviewed May 2, 2014
View reviewed changes

marcelchastain added a commit to marcelchastain/pysolr that referenced this pull request May 4, 2014

check for body_node in tomcat response, fixes django-haystack#92

64e5d3a

marcelchastain mentioned this pull request May 4, 2014

check for/handle additional tomcat error responses #117

Closed

llvtt mentioned this pull request Oct 16, 2014

replicated mongo data to solr failed yougov/mongo-connector#177

Closed

acdha closed this May 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed lxml errors when reading Tomcat error messages. #92

fixed lxml errors when reading Tomcat error messages. #92

atkinson commented Aug 5, 2013

acdha commented Aug 5, 2013

atkinson commented Aug 5, 2013

MaximusV commented Oct 17, 2013

MaximusV Oct 17, 2013

excieve commented Apr 10, 2014

marcelchastain commented May 2, 2014

acdha May 2, 2014

acdha commented May 2, 2014

marcelchastain commented May 2, 2014

frankamp commented Oct 16, 2014

acdha commented Oct 24, 2014

pembo13 commented Jan 21, 2015

acdha commented Jan 21, 2015

domenkozar commented May 27, 2015

fixed lxml errors when reading Tomcat error messages. #92

fixed lxml errors when reading Tomcat error messages. #92

Conversation

atkinson commented Aug 5, 2013

acdha commented Aug 5, 2013

atkinson commented Aug 5, 2013

MaximusV commented Oct 17, 2013

MaximusV Oct 17, 2013

Choose a reason for hiding this comment

excieve commented Apr 10, 2014

marcelchastain commented May 2, 2014

acdha May 2, 2014

Choose a reason for hiding this comment

acdha commented May 2, 2014

marcelchastain commented May 2, 2014

frankamp commented Oct 16, 2014

acdha commented Oct 24, 2014

pembo13 commented Jan 21, 2015

acdha commented Jan 21, 2015

domenkozar commented May 27, 2015