Skip to content

made changes for NUTCH-2108 and formatted the previously unformatted …#62

Closed
asitang wants to merge 1 commit intoapache:trunkfrom
asitang:NUTCH-2108
Closed

made changes for NUTCH-2108 and formatted the previously unformatted …#62
asitang wants to merge 1 commit intoapache:trunkfrom
asitang:NUTCH-2108

Conversation

@asitang
Copy link

@asitang asitang commented Sep 21, 2015

…code for this plugin

@MJJoyce
Copy link
Member

MJJoyce commented Sep 22, 2015

Hey @asitang,

I was thinking about this last night. I think we may have missed a few points when we were talking about this previously. All the Driver creation, clean up, and content pulling are done in lib-selenium so we can use that functionality across plugins. I think we can add this functionality without many (or any) changes though.

If you want to to do multiple content extractions and include that with the body, the handler can do that by incrementally pulling content out of the page and appending (or replacing) the body of the fetched page. This effectively allows the handler to return whatever subset of data that it wants and it doesn't require us to make any changes. I think that's probably a reasonably clean way of handling the functionality.

Thoughts?

@asitang
Copy link
Author

asitang commented Sep 23, 2015

Do you mean we can keep appending the new content to the driver instance and return it??

@MJJoyce
Copy link
Member

MJJoyce commented Sep 23, 2015

Hey @asitang,

If I'm remembering correctly we were talking about wanting to pull content out of various parts of the page and append that to the body in the same interaction correct? So in psudocode:

public void processDriver(WebDriver driver) {
  String stuffWeCareAbout = ""
  for allInteractionsWeNeedToDo {
    driver.doInteraction()
    stuffWeCareAbout += fetchHTMLFromTheInteractionWeDid()
  }
  driver.appendToBody(stuffWeCareAbout)
}

Wouldn't this cover the use case we were looking to handle sufficiently? Or in other words, if we want to do a bunch of interactions that generate content on a page the workflow per-interaction is:

  • Do the interaction on the driver
  • Grab the content this generates that we care about and save it into a variable
  • Undo the interaction if necessary

Once all the interactions we care about are done, we append this content to the body (or completely replace the body even).

So imagine an example of a paginated table that dynamically loads content. This should handle what we're looking for I think (again, pseudocode)

public void processDriver(WebDriver driver) {
  String paginatedTableContent = ""
  for tableInteractions {
    if (! onFirstTablePage)
      driver.clickPaginationButton()

    paginatedTableContent += driver.table.innerHTML
  }
  driver.appendToBody(stuffWeCareAbout)
}

Now when we process all the links coming out of this page they'll all be coming off the page with the table.

@asitang
Copy link
Author

asitang commented Sep 24, 2015

Yup I got that part Mike. But I don't think this is possible in selenium: driver.appendToBody(stuffWeCareAbout)

@MJJoyce
Copy link
Member

MJJoyce commented Sep 24, 2015

Hey have you checked out https://selenium.googlecode.com/git/docs/api/java/org/openqa/selenium/JavascriptExecutor.html

I think it might do what we're hoping to accomplish.

@asitang asitang closed this Sep 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants