Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JS Rendering of content. xpath and xpathWaitTimeout parameters. #943

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,8 @@ session. When you no longer need to use a session you should make sure to close
| cookies | Optional. Will be used by the headless browser. Eg: `"cookies": [{"name": "cookie1", "value": "value1"}, {"name": "cookie2", "value": "value2"}]`. |
| returnOnlyCookies | Optional, default false. Only returns the cookies. Response data, headers and other parts of the response are removed. |
| proxy | Optional, default disabled. Eg: `"proxy": {"url": "http://127.0.0.1:8888"}`. You must include the proxy schema in the URL: `http://`, `socks4://` or `socks5://`. Authorization (username/password) is not supported. (When the `session` parameter is set, the proxy is ignored; a session specific proxy can be set in `sessions.create`.) |
| xpath | Optional, default disabled. XPath selector to JS rendered content. |
| xpathWaitTimeout | Optional, default disabled. Max timeout to wait for XPath selector in milliseconds. |

> **Warning**
> If you want to use Cloudflare clearance cookie in your scripts, make sure you use the FlareSolverr User-Agent too. If they don't match you will see the challenge.
Expand Down Expand Up @@ -257,6 +259,11 @@ This is the same as `request.get` but it takes one more param:
|-----------|--------------------------------------------------------------------------|
| postData | Must be a string with `application/x-www-form-urlencoded`. Eg: `a=b&c=d` |

## JS Rendering
If you want to get HTML after JS Rendering, set "xpath" parameter to the request. You should set xpath selector that represents the content you want to get.
Also you can set xpathWaitTimeout parameter to control how much browser will wait for content.


## Environment variables

| Name | Default | Notes |
Expand Down
2 changes: 2 additions & 0 deletions src/dtos.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ class V1RequestBase(object):
session_ttl_minutes: int = None
headers: list = None # deprecated v2.0.0, not used
userAgent: str = None # deprecated v2.0.0, not used
xpath: str = None
xpathWaitTimeout: int = None

# V1Request
url: str = None
Expand Down
8 changes: 8 additions & 0 deletions src/flaresolverr_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@ def _controller_v1_handler(req: V1RequestBase) -> V1ResponseBase:
# set default values
if req.maxTimeout is None or req.maxTimeout < 1:
req.maxTimeout = 60000
if req.xpathWaitTimeout is None or req.xpathWaitTimeout < 1:
req.xpathWaitTimeout = 60000

# execute the command
res: V1ResponseBase
Expand Down Expand Up @@ -413,6 +415,12 @@ def _evil_logic(req: V1RequestBase, driver: WebDriver, method: str) -> Challenge

if not req.returnOnlyCookies:
challenge_res.headers = {} # todo: fix, selenium not provides this info
if req.xpath:
try:
WebDriverWait(driver, req.xpathWaitTimeout / 1000)\
.until(presence_of_element_located((By.XPATH, req.xpath)))
except TimeoutException:
raise Exception(f'JS render timeout. Specified selector was not found in time.')
challenge_res.response = driver.page_source

res.result = challenge_res
Expand Down