Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add configurable delay to WebDriver interface to allow pages to fully load before extracting text #214

Closed
cedilla1312 opened this issue Sep 3, 2021 · 11 comments · Fixed by #606
Labels
enhancement New feature or request webdriver

Comments

@cedilla1312
Copy link

cedilla1312 commented Sep 3, 2021

Some sites take a time to load JavaScript content (e.g. eshop has content filtering JS code). At times I get detection before JS runs and sometimes after, so I'm constantly getting detections, but these are false. Now, do I have to increase this and then finally make my custom rebuilt Docker image?
Thank you so much for your help and tool.
Ďěkuji.

@cedilla1312 cedilla1312 changed the title How to tell Selenium Webdriver to wait 10secs (guard interval) to scan for change? How to tell Selenium Webdriver to wait 10secs (guard interval) before scanning for change? Sep 3, 2021
@dgtlmoon
Copy link
Owner

dgtlmoon commented Sep 3, 2021

I understand what you're saying, but are you asking how to do it, or are you asking if it should be a feature? this I dont understand

@cedilla1312
Copy link
Author

cedilla1312 commented Sep 4, 2021

I understand what you're saying, but are you asking how to do it, or are you asking if it should be a feature? this I dont understand

Definitely should be a feature. [FEATURE REQUEST]
Also, time scheduling (similar to Sken.io or Calendly) per watch url would be beneficial. Since, at night no changes are being made. Maybe I'm gonna try to change wait constant to an environment variable instead of constant value, since, false positive rate is increasing for me. I knew it, you're either Czech or Slovak, because I have pedantically skimmed the whole Reddit thread and I curiously have found screenshot in different language.

Do you know any free proxy server cloud service with limited data? Thank you.

@dgtlmoon dgtlmoon changed the title How to tell Selenium Webdriver to wait 10secs (guard interval) before scanning for change? Add delay to WebDriver interface to allow pages to fully load before extracting text Sep 5, 2021
@dgtlmoon dgtlmoon added the enhancement New feature or request label Sep 5, 2021
@dgtlmoon
Copy link
Owner

dgtlmoon commented Sep 5, 2021

I changed the post to a more specific title which reflects what the feature is that are you asking for, not if it could be some support request and if you are just asking if it could be done :)

@dgtlmoon
Copy link
Owner

dgtlmoon commented Sep 5, 2021

Also, time scheduling (similar to calendly.com) per watch url would be beneficial. Since, at night no changes are being made.

This is already covered in #164 if you skim the open issues

@dgtlmoon dgtlmoon changed the title Add delay to WebDriver interface to allow pages to fully load before extracting text [feature] Add delay to WebDriver interface to allow pages to fully load before extracting text Sep 5, 2021
@dgtlmoon dgtlmoon changed the title [feature] Add delay to WebDriver interface to allow pages to fully load before extracting text [feature] Add configurable delay to WebDriver interface to allow pages to fully load before extracting text Sep 5, 2021
@dgtlmoon
Copy link
Owner

dgtlmoon commented Sep 5, 2021

Also, better than a delay/wait is something smarter, like using JS and waiting for all DOM events to be loaded or something

@IImtt
Copy link

IImtt commented Nov 7, 2021

Also, better than a delay/wait is something smarter, like using JS and waiting for all DOM events to be loaded or something

That sounds simple, clever and effective. +1

@cedilla1312
Copy link
Author

cedilla1312 commented Nov 7, 2021

Also, better than a delay/wait is something smarter, like using JS and waiting for all DOM events to be loaded or something

Yes, also source code has one line comment about it and @dgtlmoon mentioned it in another post, as well.

For example, when I use this to changedetect, sometimes the DOM is fully loaded, then minutes after, the app detects no .product-box__availability element on such site (diff is blank) and then again this app detects what was detected at first. Thus, I back-and-forth receive notifications about changes which are not relevant.
Is there any way, how to prevent this? When I open the site manually, I have no problem, site loads in less than 5 seconds (though, this is the way how it works right now in changedetection.io). I think I have limit for 4 concurrent WebDriver Chrome sessions and I have like many website detections, could this be the cause? This "bug" happens to all sites I detect changes, as well.

To reproduce:
CD.io version: v0.39
Site: https://www.nay.sk/graficke-karty/velkost-pamate_8
CSS/JSON filter: .product-box__availability

Anybody knows how to solve this, so I don't get false positives? Should I increase time to delay manually in source code? It might not help.

@dgtlmoon
Copy link
Owner

image

maybe something like this? the JS options seem pretty tricky and unreliable at times, maybe adding like "domloaded" JS event that triggers it or... but then at the end.. just a delay would help most situations and be a simple solution

@dgtlmoon
Copy link
Owner

if you get a blank entry, you can also use a text filter.. so a change isnt detected until the regex filter finds a number

@dgtlmoon
Copy link
Owner

time.sleep(int(os.getenv("WEBDRIVER_DELAY_BEFORE_CONTENT_READY", 5)))

there's a new env var available

@dgtlmoon
Copy link
Owner

Also handy #608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request webdriver
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants