made changes for NUTCH-2108 and formatted the previously unformatted …#62
made changes for NUTCH-2108 and formatted the previously unformatted …#62asitang wants to merge 1 commit intoapache:trunkfrom asitang:NUTCH-2108
Conversation
…code for this plugin
|
Hey @asitang, I was thinking about this last night. I think we may have missed a few points when we were talking about this previously. All the Driver creation, clean up, and content pulling are done in lib-selenium so we can use that functionality across plugins. I think we can add this functionality without many (or any) changes though. If you want to to do multiple content extractions and include that with the body, the handler can do that by incrementally pulling content out of the page and appending (or replacing) the body of the fetched page. This effectively allows the handler to return whatever subset of data that it wants and it doesn't require us to make any changes. I think that's probably a reasonably clean way of handling the functionality. Thoughts? |
|
Do you mean we can keep appending the new content to the driver instance and return it?? |
|
Hey @asitang, If I'm remembering correctly we were talking about wanting to pull content out of various parts of the page and append that to the body in the same interaction correct? So in psudocode: Wouldn't this cover the use case we were looking to handle sufficiently? Or in other words, if we want to do a bunch of interactions that generate content on a page the workflow per-interaction is:
Once all the interactions we care about are done, we append this content to the body (or completely replace the body even). So imagine an example of a paginated table that dynamically loads content. This should handle what we're looking for I think (again, pseudocode) Now when we process all the links coming out of this page they'll all be coming off the page with the table. |
|
Yup I got that part Mike. But I don't think this is possible in selenium: driver.appendToBody(stuffWeCareAbout) |
|
Hey have you checked out https://selenium.googlecode.com/git/docs/api/java/org/openqa/selenium/JavascriptExecutor.html I think it might do what we're hoping to accomplish. |
…code for this plugin