Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable browser steps (enter text, scroll, wait for text, etc) #478

Merged
merged 120 commits into from
Nov 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
6db5781
WIP
dgtlmoon Mar 20, 2022
9f4bdaa
Merge branch 'master' into webdriver-steps
dgtlmoon Mar 26, 2022
ec15af0
WIP
dgtlmoon Mar 26, 2022
76c1b81
cleanup
dgtlmoon Mar 26, 2022
52115c2
WIP
dgtlmoon Mar 26, 2022
170ccca
WIP
dgtlmoon Mar 26, 2022
68ddbe1
WIP
dgtlmoon Mar 26, 2022
019333b
WIP
dgtlmoon Mar 27, 2022
8d8075e
set optional for now
dgtlmoon Mar 27, 2022
53a6a1f
Merge branch 'master' into webdriver-steps
dgtlmoon Apr 23, 2022
153bc26
Merge branch 'master' into webdriver-steps
dgtlmoon May 25, 2022
05912d9
fix formatting
dgtlmoon May 25, 2022
9b496c1
WIP
dgtlmoon May 25, 2022
99d043e
WIP
dgtlmoon May 25, 2022
89185f9
WIP
dgtlmoon May 25, 2022
9b0065f
Fix delay
dgtlmoon May 26, 2022
72229ba
set state
dgtlmoon May 26, 2022
2a5d54a
WIP
dgtlmoon May 26, 2022
793a7af
tweak text
dgtlmoon May 26, 2022
1ae1ec0
Make it easier to see what needs to be filled in
dgtlmoon May 26, 2022
07f5401
too hard for now
dgtlmoon May 26, 2022
8751d58
Merge branch 'master' into webdriver-steps
dgtlmoon Jun 6, 2022
facd5cd
Fix bad merge
dgtlmoon Jun 6, 2022
3bc0076
Merge branch 'master' into webdriver-steps
dgtlmoon Jun 6, 2022
8b8a536
Merge branch 'master' into webdriver-steps
dgtlmoon Jun 6, 2022
656998d
Add note
dgtlmoon Jun 6, 2022
0747b3c
Merge branch 'master' into webdriver-steps
dgtlmoon Jun 6, 2022
02f0020
Save a screenshot of each step (after)
dgtlmoon Jun 6, 2022
e7e8d2d
WIP
dgtlmoon Jun 7, 2022
8af3df0
Merge branch 'master' into webdriver-steps
dgtlmoon Jun 11, 2022
68b6ca5
Merge branch 'master' into webdriver-steps
dgtlmoon Jun 22, 2022
9068c6f
separate out
dgtlmoon Jun 22, 2022
c1dfc91
WIP
dgtlmoon Jun 22, 2022
122c1eb
WIP
dgtlmoon Jun 23, 2022
d33434d
WIP
dgtlmoon Jun 24, 2022
809b8d7
wip
dgtlmoon Jun 25, 2022
53844c0
WIP
dgtlmoon Jun 25, 2022
fd05e6b
tweaks
dgtlmoon Jun 25, 2022
513144b
WIP
dgtlmoon Jun 26, 2022
1598d0e
WIP
dgtlmoon Jun 26, 2022
250e45c
WIP
dgtlmoon Jun 26, 2022
1965647
WIP
dgtlmoon Jun 28, 2022
10cf1d1
UI cleanups
dgtlmoon Aug 1, 2022
006902d
fix apply/clear
dgtlmoon Aug 1, 2022
0f34e02
WIP
dgtlmoon Aug 1, 2022
9c4223b
WIP
dgtlmoon Aug 1, 2022
cfbf2ea
tweak
dgtlmoon Aug 2, 2022
e4d8926
UI cleanups
dgtlmoon Aug 2, 2022
6b03c53
Merge branch 'master' into webdriver-steps
dgtlmoon Aug 2, 2022
819dc80
Oops
dgtlmoon Aug 2, 2022
6d51df3
bump ignore
dgtlmoon Aug 2, 2022
ef7185d
tweaks
dgtlmoon Aug 2, 2022
4253d89
UI tweaks
dgtlmoon Aug 2, 2022
fe52032
misc tweaks
dgtlmoon Aug 2, 2022
6f88885
Merge branch 'master' into webdriver-steps
dgtlmoon Aug 2, 2022
2f16fa8
WIP
dgtlmoon Aug 3, 2022
ac617ed
more cleanups
dgtlmoon Aug 3, 2022
4e2dd09
loads of improvements
dgtlmoon Aug 3, 2022
4035ea9
More UI cleanups
dgtlmoon Aug 4, 2022
e0b8c14
WIP
dgtlmoon Aug 4, 2022
6920988
Merge branch 'master' into webdriver-steps
dgtlmoon Aug 5, 2022
1369146
prune locks
dgtlmoon Aug 5, 2022
138c4ca
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 8, 2022
f096b0e
WIP improvements
dgtlmoon Nov 8, 2022
83d1347
WIP
dgtlmoon Nov 10, 2022
17e856c
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 12, 2022
baf4747
WIP
dgtlmoon Nov 12, 2022
4be1615
WIP
dgtlmoon Nov 13, 2022
936a744
Better integration between BrowserSteps and VisualSelector
dgtlmoon Nov 13, 2022
8e4ee91
oops
dgtlmoon Nov 13, 2022
2adb5be
Not sure what happened here, css_filter upgrade didnt apply properly
dgtlmoon Nov 13, 2022
420541b
remove dupe
dgtlmoon Nov 13, 2022
a1ad2dd
remove comment
dgtlmoon Nov 13, 2022
b770e12
fix X,y click
dgtlmoon Nov 13, 2022
6b9ded0
Handle when the step doesnt work
dgtlmoon Nov 14, 2022
7d14b67
Improve error message
dgtlmoon Nov 14, 2022
89f3195
fix icon
dgtlmoon Nov 14, 2022
e815d6d
Support dynamic variables in the browser step value and selector
dgtlmoon Nov 14, 2022
81f31aa
browser steps not supported by selenium
dgtlmoon Nov 14, 2022
d6f07f6
check step was really set
dgtlmoon Nov 14, 2022
cda5aa5
jquery-3.6.0.slim.m not needed
dgtlmoon Nov 14, 2022
7d76bfd
tonnes of improvements
dgtlmoon Nov 14, 2022
6678705
Adding 'click element containing text' which also needed upgrade of p…
dgtlmoon Nov 14, 2022
8b556ef
More simple default for 'goto site'
dgtlmoon Nov 14, 2022
f254e1c
fix heavy import
dgtlmoon Nov 15, 2022
3eb66f3
Handle when playwright not enabled in the UI
dgtlmoon Nov 15, 2022
590ffd9
Moving browser-steps to its own Blueprint
dgtlmoon Nov 15, 2022
c822fa7
Use same connect as the rest of the app
dgtlmoon Nov 15, 2022
bad020e
Add comment
dgtlmoon Nov 15, 2022
64f4534
WIP on delays
dgtlmoon Nov 15, 2022
006550c
WIP
dgtlmoon Nov 15, 2022
ac1165c
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 20, 2022
841d22c
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 20, 2022
41b0025
resolve
dgtlmoon Nov 20, 2022
9ff3717
fix up selectors again
dgtlmoon Nov 20, 2022
c51d5c5
fix visualsecotr<>browsersteps bugs
dgtlmoon Nov 20, 2022
c4a9323
Merge branch 'webdriver-steps' of github.com:dgtlmoon/changedetection…
dgtlmoon Nov 20, 2022
fcb94ff
fix bad merge
dgtlmoon Nov 20, 2022
2db5087
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 20, 2022
0de0478
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 20, 2022
fdebb2d
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 20, 2022
ea4e672
Only update when we are on a URL (like on first init we are not)
dgtlmoon Nov 20, 2022
bda2352
Refactor, seems to be more reliable?
dgtlmoon Nov 20, 2022
2199b74
Bring back original rule
dgtlmoon Nov 20, 2022
9209e88
Manually handle cleanup when the session time is up (because we run o…
dgtlmoon Nov 22, 2022
478ae87
WIP
dgtlmoon Nov 22, 2022
8bd4b4c
Merge branch 'master' into webdriver-steps
dgtlmoon Nov 23, 2022
b4053b4
More browser cleanup helpers
dgtlmoon Nov 23, 2022
9ee52b8
More work on session handling
dgtlmoon Nov 23, 2022
4f131a7
Adding [remove] step button
dgtlmoon Nov 23, 2022
99b844f
connection fixes
dgtlmoon Nov 23, 2022
fde0952
Improved error reporting
dgtlmoon Nov 24, 2022
b525cf8
import not needed
dgtlmoon Nov 24, 2022
72c97f3
Support type=radio
dgtlmoon Nov 24, 2022
1aa4296
browsersteps should follow the watches proxy
dgtlmoon Nov 24, 2022
7deb64e
Fix proxy logic
dgtlmoon Nov 24, 2022
55b7660
LI is also clickable
dgtlmoon Nov 24, 2022
dd2aca4
Updating README
dgtlmoon Nov 24, 2022
5d70f3c
Update readme
dgtlmoon Nov 24, 2022
cadb0e7
Fix tab load behaviour
dgtlmoon Nov 24, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ __pycache__
build
dist
venv
test-datastore/*
test-datastore
*.egg-info*
.vscode/settings.json
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ RUN pip install --target=/dependencies -r /requirements.txt
# Playwright is an alternative to Selenium
# Excluded this package from requirements.txt to prevent arm/v6 and arm/v7 builds from failing
# https://github.com/dgtlmoon/changedetection.io/pull/1067 also musl/alpine (not supported)
RUN pip install --target=/dependencies playwright~=1.26 \
RUN pip install --target=/dependencies playwright~=1.27.1 \
|| echo "WARN: Failed to install Playwright. The application can still run, but the Playwright option will be disabled."

# Final image stage
Expand Down
5 changes: 4 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@ recursive-include changedetectionio/static *
recursive-include changedetectionio/model *
recursive-include changedetectionio/tests *
recursive-include changedetectionio/res *
prune changedetectionio/static/package-lock.json
prune changedetectionio/static/styles/node_modules
prune changedetectionio/static/styles/package-lock.json
include changedetection.py
global-exclude *.pyc
global-exclude node_modules
global-exclude venv
global-exclude venv
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,17 @@ Available when connected to a <a href="https://github.com/dgtlmoon/changedetecti

<img src="https://raw.githubusercontent.com/dgtlmoon/changedetection.io/master/docs/visualselector-anim.gif" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Self-hosted web page change monitoring context difference " />

### Perform interactive browser steps

Fill in text boxes, click buttons and more, setup your changedetection scenario.

Using the **Browser Steps** configuration, add basic steps before performing change detection, such as logging into websites, adding a product to a cart, refining searches.

<img src="docs/browsersteps-anim.gif" style="max-width:100%;" alt="Self-hosted web page change monitoring context difference " title="Website change detection with interactive browser steps, login, cookies etc" />

After **Browser Steps** have been run, then visit the **Visual Selector** tab to refine the content you're interested in.
Requires Playwright to be enabled.

## Installation

### Docker
Expand Down
26 changes: 15 additions & 11 deletions changedetectionio/__init__.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
#!/usr/bin/python3

import datetime
import flask_login
import logging
import os
import pytz
import queue
import threading
import time
import timeago

from copy import deepcopy
from distutils.util import strtobool
from feedgen.feed import FeedGenerator
from threading import Event

import flask_login
import logging
import pytz
import timeago
from feedgen.feed import FeedGenerator
from flask import (
Flask,
abort,
Expand All @@ -27,7 +29,6 @@
)
from flask_login import login_required
from flask_restful import abort, Api

from flask_wtf import CSRFProtect

from changedetectionio import html_tools
Expand All @@ -44,7 +45,6 @@
extra_stylesheets = []

update_q = queue.PriorityQueue()

notification_q = queue.Queue()

app = Flask(__name__,
Expand Down Expand Up @@ -97,7 +97,7 @@ def _jinja2_filter_datetime(watch_obj, format="%Y-%m-%d %H:%M:%S"):
# Worker thread tells us which UUID it is currently processing.
for t in running_update_threads:
if t.current_uuid == watch_obj['uuid']:
return '<span class="loader"></span><span> Checking now</span>'
return '<span class="spinner"></span><span> Checking now</span>'

if watch_obj['last_checked'] == 0:
return 'Not yet'
Expand Down Expand Up @@ -525,6 +525,7 @@ def get_current_checksum_include_ignore_text(uuid):

def edit_page(uuid):
from changedetectionio import forms
from changedetectionio.blueprint.browser_steps.browser_steps import browser_step_ui_config

using_default_check_time = True
# More for testing, possible to return the first/only
Expand Down Expand Up @@ -558,6 +559,8 @@ def edit_page(uuid):
data=default,
)

# form.browser_steps[0] can be assumed that we 'goto url' first

if datastore.proxy_list is None:
# @todo - Couldn't get setattr() etc dynamic addition working, so remove it instead
del form.proxy
Expand Down Expand Up @@ -650,6 +653,7 @@ def edit_page(uuid):
watch.get('fetch_backend', None) is None and system_uses_webdriver) else False

output = render_template("edit.html",
browser_steps_config=browser_step_ui_config,
current_base_url=datastore.data['settings']['application']['base_url'],
emailprefix=os.getenv('NOTIFICATION_MAIL_BUTTON_PREFIX', False),
form=form,
Expand All @@ -661,7 +665,6 @@ def edit_page(uuid):
settings_application=datastore.data['settings']['application'],
using_global_webdriver_wait=default['webdriver_delay'] is None,
uuid=uuid,
visualselector_data_is_ready=visualselector_data_is_ready,
visualselector_enabled=visualselector_enabled,
watch=watch
)
Expand Down Expand Up @@ -1190,7 +1193,6 @@ def form_watch_checknow():
else:
# No tag, no uuid, add everything.
for watch_uuid, watch in datastore.data['watching'].items():

if watch_uuid not in running_uuids and not datastore.data['watching'][watch_uuid]['paused']:
update_q.put((1, watch_uuid))
i += 1
Expand Down Expand Up @@ -1308,9 +1310,11 @@ def form_share_put_watch():
# paste in etc
return redirect(url_for('index'))

import changedetectionio.blueprint.browser_steps as browser_steps
app.register_blueprint(browser_steps.construct_blueprint(datastore), url_prefix='/browser-steps')

# @todo handle ctrl break
ticker_thread = threading.Thread(target=ticker_thread_check_time_launch_checks).start()

threading.Thread(target=notification_runner).start()

# Check for new release version, but not when running in test/build or pytest
Expand Down
Empty file.
226 changes: 226 additions & 0 deletions changedetectionio/blueprint/browser_steps/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@

# HORRIBLE HACK BUT WORKS :-) PR anyone?
#
# Why?
# `browsersteps_playwright_browser_interface.chromium.connect_over_cdp()` will only run once without async()
# - this flask app is not async()
# - browserless has a single timeout/keepalive which applies to the session made at .connect_over_cdp()
#
# So it means that we must unfortunately for now just keep a single timer since .connect_over_cdp() was run
# and know when that reaches timeout/keepalive :( when that time is up, restart the connection and tell the user
# that their time is up, insert another coin. (reload)
#
# Bigger picture
# - It's horrible that we have this click+wait deal, some nice socket.io solution using something similar
# to what the browserless debug UI already gives us would be smarter..
#
# OR
# - Some API call that should be hacked into browserless or playwright that we can "/api/bump-keepalive/{session_id}/60"
# So we can tell it that we need more time (run this on each action)
#
# OR
# - use multiprocessing to bump this over to its own process and add some transport layer (queue/pipes)




from distutils.util import strtobool
from flask import Blueprint, request, make_response
from flask_login import login_required
import os
import logging
from changedetectionio.store import ChangeDetectionStore

browsersteps_live_ui_o = {}
browsersteps_playwright_browser_interface = None
browsersteps_playwright_browser_interface_start_time = None
browsersteps_playwright_browser_interface_browser = None
browsersteps_playwright_browser_interface_end_time = None


def cleanup_playwright_session():
print("Cleaning up old playwright session because time was up")
global browsersteps_playwright_browser_interface
global browsersteps_live_ui_o
global browsersteps_playwright_browser_interface_browser
global browsersteps_playwright_browser_interface
global browsersteps_playwright_browser_interface_start_time
global browsersteps_playwright_browser_interface_end_time

import psutil

current_process = psutil.Process()
children = current_process.children(recursive=True)
for child in children:
print (child)
print('Child pid is {}'.format(child.pid))

# .stop() hangs sometimes if its called when there are no children to process
# but how do we know this is our child? dunno
if children:
browsersteps_playwright_browser_interface.stop()

browsersteps_live_ui_o = {}
browsersteps_playwright_browser_interface = None
browsersteps_playwright_browser_interface_start_time = None
browsersteps_playwright_browser_interface_browser = None
browsersteps_playwright_browser_interface_end_time = None
print ("Cleaning up old playwright session because time was up - done")

def construct_blueprint(datastore: ChangeDetectionStore):

browser_steps_blueprint = Blueprint('browser_steps', __name__, template_folder="templates")

@login_required
@browser_steps_blueprint.route("/browsersteps_update", methods=['GET', 'POST'])
def browsersteps_ui_update():
import base64
import playwright._impl._api_types
import time

from changedetectionio.blueprint.browser_steps import browser_steps

global browsersteps_live_ui_o, browsersteps_playwright_browser_interface_end_time
global browsersteps_playwright_browser_interface_browser
global browsersteps_playwright_browser_interface
global browsersteps_playwright_browser_interface_start_time

step_n = None
remaining =0
uuid = request.args.get('uuid')

browsersteps_session_id = request.args.get('browsersteps_session_id')

if not browsersteps_session_id:
return make_response('No browsersteps_session_id specified', 500)

# Because we don't "really" run in a context manager ( we make the playwright interface global/long-living )
# We need to manage the shutdown when the time is up
if browsersteps_playwright_browser_interface_end_time:
remaining = browsersteps_playwright_browser_interface_end_time-time.time()
if browsersteps_playwright_browser_interface_end_time and remaining <= 0:


cleanup_playwright_session()

return make_response('Browser session expired, please reload the Browser Steps interface', 500)


# Actions - step/apply/etc, do the thing and return state
if request.method == 'POST':
# @todo - should always be an existing session
step_operation = request.form.get('operation')
step_selector = request.form.get('selector')
step_optional_value = request.form.get('optional_value')
step_n = int(request.form.get('step_n'))
is_last_step = strtobool(request.form.get('is_last_step'))

if step_operation == 'Goto site':
step_operation = 'goto_url'
step_optional_value = None
step_selector = datastore.data['watching'][uuid].get('url')

# @todo try.. accept.. nice errors not popups..
try:

this_session = browsersteps_live_ui_o.get(browsersteps_session_id)
if not this_session:
print("Browser exited")
return make_response('Browser session ran out of time :( Please reload this page.', 401)

this_session.call_action(action_name=step_operation,
selector=step_selector,
optional_value=step_optional_value)
except playwright._impl._api_types.TimeoutError as e:
print("Element wasnt found :-(", step_operation)
return make_response("Element was not found on page", 401)

except playwright._impl._api_types.Error as e:
# Browser/playwright level error
print("Browser error - got playwright._impl._api_types.Error, try reloading the session/browser")
print (str(e))

# Try to find something of value to give back to the user
for l in str(e).splitlines():
if 'DOMException' in l:
return make_response(l, 401)

return make_response('Browser session ran out of time :( Please reload this page.', 401)

# Get visual selector ready/update its data (also use the current filter info from the page?)
# When the last 'apply' button was pressed
# @todo this adds overhead because the xpath selection is happening twice
u = this_session.page.url
if is_last_step and u:
(screenshot, xpath_data) = this_session.request_visualselector_data()
datastore.save_screenshot(watch_uuid=uuid, screenshot=screenshot)
datastore.save_xpath_data(watch_uuid=uuid, data=xpath_data)

# Setup interface
if request.method == 'GET':

if not browsersteps_playwright_browser_interface:
print("Starting connection with playwright")
logging.debug("browser_steps.py connecting")
from playwright.sync_api import sync_playwright

browsersteps_playwright_browser_interface = sync_playwright().start()


time.sleep(1)
# At 20 minutes, some other variable is closing it
# @todo find out what it is and set it
seconds_keepalive = int(os.getenv('BROWSERSTEPS_MINUTES_KEEPALIVE', 10)) * 60

# keep it alive for 10 seconds more than we advertise, sometimes it helps to keep it shutting down cleanly
keepalive = "&timeout={}".format(((seconds_keepalive+3) * 1000))
try:
browsersteps_playwright_browser_interface_browser = browsersteps_playwright_browser_interface.chromium.connect_over_cdp(
os.getenv('PLAYWRIGHT_DRIVER_URL', '') + keepalive)
except Exception as e:
if 'ECONNREFUSED' in str(e):
return make_response('Unable to start the Playwright session properly, is it running?', 401)

browsersteps_playwright_browser_interface_end_time = time.time() + (seconds_keepalive-3)
print("Starting connection with playwright - done")

if not browsersteps_live_ui_o.get(browsersteps_session_id):
# Boot up a new session
proxy_id = datastore.get_preferred_proxy_for_watch(uuid=uuid)
proxy = None
if proxy_id:
proxy_url = datastore.proxy_list.get(proxy_id).get('url')
if proxy_url:
proxy = {'server': proxy_url}
print("Browser Steps: UUID {} Using proxy {}".format(uuid, proxy_url))

# Begin the new "Playwright Context" that re-uses the playwright interface
# Each session is a "Playwright Context" as a list, that uses the playwright interface
browsersteps_live_ui_o[browsersteps_session_id] = browser_steps.browsersteps_live_ui(
playwright_browser=browsersteps_playwright_browser_interface_browser,
proxy=proxy)
this_session = browsersteps_live_ui_o[browsersteps_session_id]

if not this_session.page:
cleanup_playwright_session()
return make_response('Browser session ran out of time :( Please reload this page.', 401)

try:
state = this_session.get_current_state()
except playwright._impl._api_types.Error as e:
return make_response("Browser session ran out of time :( Please reload this page."+str(e), 401)

p = {'screenshot': "data:image/png;base64,{}".format(
base64.b64encode(state[0]).decode('ascii')),
'xpath_data': state[1],
'session_age_start': this_session.age_start,
'browser_time_remaining': round(remaining)
}


# @todo BSON/binary JSON, faster xfer, OR pick it off the disk
return p

return browser_steps_blueprint