-
-
Notifications
You must be signed in to change notification settings - Fork 328
Description
Checklist before reporting
- I have searched for similar issues and didn't find a duplicate.
- I have updated to the latest version of pydoll to verify the issue still exists.
pydoll Version
2.12.0
Python Version
3.12.10
Operating System
Windows
Bug Description
When connecting to a remote Chrome instance using Chrome.connect(ws_url) (browser-level connection as described in the Remote Connections docs) and then working with an iframe via tab.get_frame(iframe_element), any subsequent call on the iframe Tab (for example await iframe_tab.page_source) fails.
Internally, the iframe tab's ConnectionHandler tries to establish a new WebSocket connection using a URL that contains :None as the port. This leads to a ValueError: Port could not be cast to integer value as 'None' coming from websockets.uri.parse_uri.
This only happens on the iframe Tab created by get_frame(). The top level Tab returned by Chrome.connect(ws_url) works correctly.
Steps to Reproduce
- Start a Chrome based browser with remote debugging enabled, for example:
Windows example:
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=54321
- From Python, query http://127.0.0.1:54321/json/version and get the webSocketDebuggerUrl field.
Use Pydoll to connect to the remote browser with Chrome.connect(ws_url) as per the Remote Connections documentation.
-
Navigate to any page that contains an iframe.
-
Locate the iframe element with a CSS selector.
-
Call iframe_tab = await tab.get_frame(element).
-
Call await iframe_tab.page_source (or any other method that forces the iframe Tab to establish its own CDP connection).
-
Observe that a ValueError is raised, coming from websockets.uri.parse_uri complaining about port 'None'.
Code Example
import asyncio
import aiohttp
from pydoll.browser.chromium import Chrome
SOLVECAPTCHA_RECAPTCHA2_DEMO = "https://www.google.com/recaptcha/api2/demo"
async def connect_remote_chromium_get_ws_url():
port = 543212 # example port number
async with aiohttp.ClientSession() as session:
url = f"http://127.0.0.1:{port}/json/version"
async with session.get(url) as response:
data = await response.json()
ws_url = data["webSocketDebuggerUrl"]
print("Server info:")
print(f" Browser: {data.get('Browser')}")
print(f" Protocol: {data.get('Protocol-Version')}")
print(f" WebSocket: {ws_url}")
return ws_url
async def solve_with_iframe(tab):
recaptcha2_iframe_css = "iframe[title='reCAPTCHA']"
recaptcha_iframe_element = await tab.query(recaptcha2_iframe_css, timeout=10)
iframe_tab = await tab.get_frame(recaptcha_iframe_element)
page_source = await iframe_tab.page_source # This line triggers the bug
print(page_source)
async def main():
ws_url = await connect_remote_chromium_get_ws_url()
chrome = Chrome()
tab = await chrome.connect(ws_url)
print("\n[SUCCESS] Connected to remote Chrome server!")
await tab.go_to(SOLVECAPTCHA_RECAPTCHA2_DEMO)
await solve_with_iframe(tab)
await chrome.close()
if name == "main":
asyncio.run(main())Expected Behavior
According to the IFrames documentation, tab.get_frame(iframe_element) should return a Tab instance that can be used like any other tab, including calling find, query, execute_script, page_source, and so on, with its own CDP target and separate WebSocket connection.
Pydoll
So in this example, I would expect:
iframe_tab = await tab.get_frame(recaptcha_iframe_element) to succeed.
await iframe_tab.page_source to return the HTML source of the iframe document without errors.
Subsequent operations like await iframe_tab.find(id="recaptcha-anchor") to also work.
Actual Behavior
iframe_tab = await tab.get_frame(recaptcha_iframe_element) returns an object without raising any error.
On the next line, when I run page_source = await iframe_tab.page_source, Pydoll attempts to establish a WebSocket connection for the iframe tab and fails with a ValueError, because the URL it passes to websockets seems to contain :None as the port.
Relevant Log Output
Additional Context
Workaround:
Setting the connection_port manually, something like:
iframe_tab._connection_handler._connection_port = 54321
before calling await iframe_tab.page_source fixes the issue, thus confirming that the missing port is the direct cause.
Request:
It would be great if the iframe Tab created by get_frame() could reuse the necessary connection information from the parent Tab or from the original ws_url passed to Chrome.connect, so that iframe tabs work seamlessly in the remote connection scenario, similar to how they work when you start the browser locally via browser.start().