Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiP: Fix looping Cloudflare challenge, Resolves #1036 #1163

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ilike2burnthing
Copy link
Contributor

Thanks to @juanfrilla for #1036 (comment).

Unfortunately, currently this only works on Windows, and the looping challenges return if using proxies or VPNs.

@garfield69
Copy link
Contributor

garfield69 commented Apr 21, 2024

FWIW
My win10 is on chrome 124, and i don't use VPN or proxy.
I've tested this PR (as a source based python run), and it solves for trupornolabs, riperam, marinetracker, devil-torrents, 52BT, which were indexers that were giving me issues previoulsy.
Also tested against most of the other cloudflare protected indexers that were previously working for me, and they continue to work with this PR.
Some indexers however continue to fail, leporno still returns the invalid cookies error, and ext-torrents which now fails on ext.to but works for the other 2 alternate domains.

But after each solve there remains a chrome subtask that starts to spin up to 15% CPU and I have to manually kill them off.
Should I test using this PR win10 exe?
[edit] Oh wait, there isn't one.

@juanfrilla
Copy link

juanfrilla commented Apr 21, 2024

Another thing that I've noticed is that in the user-agent headless replacement:

                self.execute_cdp_cmd(
                    "Network.setUserAgentOverride",
                    {
                        "userAgent": self.execute_script(
                            "return navigator.userAgent"
                        ).replace("Headless", "")
                    },
                )

I don't know why but If I hardcode the user-agent using the exact that my computer has like this:

user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
options.add_argument(f"--user-agent={user_agent}")

it bypasses cloudflare, but if i put this to make it automatically like you have it on line 533 from src/undetected_chromedriver/__init__.py it does not work.

So an alternative could be to setup a driver only to get the user agent:

def get_user_agent(driver):
   return driver.execute_script("return navigator.userAgent;").replace("Headless", "")

And then pass the user-agent to the definitive driver

PD: I only can tell you what I've discovered to see if we can go through the solution cuz I'm having troubles to get the project installed/set up 😅

@m33ts4k0z
Copy link

m33ts4k0z commented Apr 21, 2024

Hello. I just wanted to let you know that this didnt bypass the challenge on arab-torrents.net, showing the same internal server error. I am not using VPN, proxies and I dont have a datacenter IP. Let me know if you need any info to troubleshoot this further.

I didnt actually use this branch, it worked fine after I switched to it. Thanks

@ilike2burnthing
Copy link
Contributor Author

ilike2burnthing commented Apr 21, 2024

@garfield69 yea this seems to be an issue with Chrome v124. You can revert to v123 in the mean time if it's easier - #1161

Alternatively, build your own binaries, which will use Chromium v123:

python src/build_package.py

@ilike2burnthing
Copy link
Contributor Author

@m33ts4k0z were you doing this on Windows?

@m33ts4k0z
Copy link

@m33ts4k0z were you doing this on Windows?

Yes on a Windows 11 VM on Unraid but it did work in the end. I updated my first post here with the cause.

@garfield69
Copy link
Contributor

Alternatively, build your own binaries, which will use Chromium v123:

python src/build_package.py

Oh cool, did not know I could build on windows.
Built successfully and tested. Much better, no left over chrome tasks chewing CPU anymore :-)

@ilike2burnthing
Copy link
Contributor Author

ilike2burnthing commented Apr 22, 2024

@juanfrilla sorry for the delay in replying, been busy and only got to a few quick ones on my phone.

I'll have a look at the UA idea when I next get a chance, thanks.

Assuming you're following the run from source instructions, what issue are you having? https://github.com/FlareSolverr/FlareSolverr#from-source-code

@juanfrilla
Copy link

@ilike2burnthing my main problem is that i cannot install Xvfb on MacOS

@ilike2burnthing
Copy link
Contributor Author

Tried XQuartz?

@juanfrilla
Copy link

juanfrilla commented Apr 22, 2024

Tried XQuartz?

yessir now the project is set up, let's see what I can fix

@21hsmw
Copy link
Contributor

21hsmw commented May 1, 2024

What exactly is left to do on this to get it merge? I tried to guess with the comments here and some different issues but I can't get the current status of this. It seems to be stale for quite some time, so what's needed?

@ilike2burnthing
Copy link
Contributor Author

Unfortunately, currently this only works on Windows, and the looping challenges return if using proxies or VPNs.

I'll have a look at the UA idea when I next get a chance, thanks.

@21hsmw
Copy link
Contributor

21hsmw commented May 1, 2024

Unfortunately, currently this only works on Windows, and the looping challenges return if using proxies or VPNs.

I'll have a look at the UA idea when I next get a chance, thanks.

Well, I made my own implementation of this "new tab" idea and I was able to make it work with every website I could (ext.to, www3.yggtorrent.cool, dodi-repacks.site, hd-torrents.me/login.php, nhentai.net) on my Linux system using a VPN / socks5 proxy and also with my container image on my own remote Linux server, which was blocked by cloudflare too.
Unfortunately I can't test on Windows, so if someone can test that and report back please do.

Public image with my edits: 21hsmw/flaresolverr:fixlooping
Code here: 21hsmw@da6cc9d

@ilike2burnthing
Copy link
Contributor Author

That's working 95% of the time on Windows for me, even with a proxy, but failing 95% of the time on Docker. Usual error:

Error: Error solving the challenge. 'NoneType' object has no attribute 'startswith'

Seems it's related to get_correct_window and trying to get driver.current_url. Adding some extra logging shows that the URL is returned as None. Adding some additional sleeps then shows the correct URL, but I'm still getting challenge loops or crashed.

@21hsmw
Copy link
Contributor

21hsmw commented May 2, 2024

That's working 95% of the time on Windows for me, even with a proxy, but failing 95% of the time on Docker.

When you say it fails on Docker, is it still on Windows or Linux?

I got this error on Linux while doing my implementation, but have not been able to replicate it since. For the looping challenges, it seems to be a timing issue. Playing with the timer values can make it work in some cases, but it's not easy to know what works for everyone since it seems to take network latency into account. For example, if I use a proxy close to my location, it works 100% of the time with the sites I listed earlier, but if I use a proxy very far from me, it works 50% of the time.
Can you try to increase the timers to something like 6, 8 or more?

@ilike2burnthing
Copy link
Contributor Author

Linux.

I'll play around with timings again (I did a bunch yesterday), see if I can get something that works both on my Docker and Windows.

@21hsmw
Copy link
Contributor

21hsmw commented May 2, 2024

Linux.

Strange then. I'm able to solve the challenges of all sites I try on my Debian and Fedora systems with different VPNs/Proxies with and without Docker involved.
Can you share an example of one of your tests with debug enabled?

Here's an example with dodi-repacks.site using the docker image I shared previously:
https://pastebin.com/nBramRXq

@aevrard

This comment was marked as off-topic.

@zenderzender
Copy link

Unfortunately, currently this only works on Windows, and the looping challenges return if using proxies or VPNs.

I'll have a look at the UA idea when I next get a chance, thanks.

Well, I made my own implementation of this "new tab" idea and I was able to make it work with every website I could (ext.to, www3.yggtorrent.cool, dodi-repacks.site, hd-torrents.me/login.php, nhentai.net) on my Linux system using a VPN / socks5 proxy and also with my container image on my own remote Linux server, which was blocked by cloudflare too. Unfortunately I can't test on Windows, so if someone can test that and report back please do.

Public image with my edits: 21hsmw/flaresolverr:fixlooping Code here: 21hsmw@da6cc9d

Thanks for your workaround @21hsmw
Here is a temporary image for anybody needing arm build :)
zender/flaresolverr-fixed:arm

Working with LANG=fr-FR

@aevrard the solution you provide will kill the killswitch if you're using something like gluetun...

@LoicBrison
Copy link

Thanks @21hsmw !
Works for YGG with YGGCookie and YGGtorrent; LANG=en_US

@Vrozaksen
Copy link

Vrozaksen commented May 3, 2024

21hsmw/flaresolverr:fixlooping

Worked for me on whatbox.ca

services:
     flaresolverr:
         image: 21hsmw/flaresolverr:fixlooping
         environment:
           - LOG_LEVEL=${LOG_LEVEL:-info}
           - LOG_HTML=${LOG_HTML:-false}
           - CAPTCHA_SOLVER=${CAPTCHA_SOLVER:-none}
           - TZ=UTC
           - PORT=25000
           - HOST=127.0.0.1
         network_mode: host
         pull_policy: always
         restart: unless-stopped

@juanfrilla
Copy link

juanfrilla commented May 3, 2024

Unfortunately, currently this only works on Windows, and the looping challenges return if using proxies or VPNs.

I'll have a look at the UA idea when I next get a chance, thanks.

Well, I made my own implementation of this "new tab" idea and I was able to make it work with every website I could (ext.to, www3.yggtorrent.cool, dodi-repacks.site, hd-torrents.me/login.php, nhentai.net) on my Linux system using a VPN / socks5 proxy and also with my container image on my own remote Linux server, which was blocked by cloudflare too. Unfortunately I can't test on Windows, so if someone can test that and report back please do.

Public image with my edits: 21hsmw/flaresolverr:fixlooping Code here: 21hsmw@da6cc9d

replacing the image of the dockerfile for this:
python:3.11-slim-bullseye works perfectly locally on MacOS M2 with and without proxies (tested for my website "https://www.icj-cij.org/sites/default/files/case-related/187/187-20231215-ord-01-00-en.pdf", I can get the cf_clearance cookie

I tested as well on a centOS server with the previous image (python:3.11-slim-bookworm) and it doesnt work

@21hsmw
Copy link
Contributor

21hsmw commented Jun 10, 2024

I have set up Flaresolverr, Jackett and Prowlarr in containers and I have tested the last 2 sites you talked about:

I set up Flaresolverr in both Prowlarr and Jackett, added the indexers, created an account on seatracker to test it out, and was able to get both to work with Flaresolverr using my image on docker hub.

Jackett:
jackett

Prowlarr:
prowlarr

I tried a few searches, both Prowlarr and Jackett works and I'm able to download torrent files or get the magnets.
The IP I used for the test is getting the Cloudflare verification page, so Flaresolverr was used for both sites.

I also tried https://ilcorsaroblu.org with Jackett and Prowlarr and both worked. It was slow, but eventually it worked.
It's probably a problem with how fast the CPU can process the nodriver requests. I'll take another look at the code later.

Do you have another system running linux (live system can also be tried) with a different CPU that you can try the stack on?

@Investigamer

This comment was marked as duplicate.

@daNutzzzzz

This comment was marked as duplicate.

@juanfrilla

This comment was marked as duplicate.

@daNutzzzzz

This comment was marked as duplicate.

@Investigamer

This comment was marked as duplicate.

@ilike2burnthing
Copy link
Contributor Author

@21hsmw can you try those with an HTTP proxy enabled in Jackett?

@daNutzzzzz

This comment was marked as duplicate.

@ilike2burnthing

This comment was marked as duplicate.

@21hsmw
Copy link
Contributor

21hsmw commented Jun 10, 2024

@21hsmw can you try those with an HTTP proxy enabled in Jackett?

I found a working HTTP proxy online, set it up in Jackett and verified in the flaresolverr logs that it was being used in the incoming command. I was then able to go through the cf challenge and search for torrents for the 3 websites.

@ilike2burnthing
Copy link
Contributor Author

ilike2burnthing commented Jun 10, 2024

I'll try on Windows tomorrow asap. If I still can't track down the issue, I'll fire up a live Ubuntu disk.

@daNutzzzzz

This comment was marked as off-topic.

@ilike2burnthing

This comment was marked as off-topic.

@Gamegenie13
Copy link

Last week, 21hsmw/flaresolverr:nodriver worked 90% of time on my Debian ARM (without VPN or proxy) but it doesn't work anymore since the update from 4-5 days ago.
I've just tried the latest update but flaresolverr just be stuck when challenge is detected (I've configured the Env DRIVER=nodriver)

2024-06-11 06:40:52 DEBUG    ReqId 281472979431840 New instance of chromium has been created to perform the request
2024-06-11 06:40:52 DEBUG    ReqId 281472979431840 Navigating to... https://www.ygg.re/engine/search?do=search&order=desc&sort=publish_date&category=all
2024-06-11 06:40:57 INFO     ReqId 281472979431840 Challenge detected. Title found: Just a moment...
2024-06-11 06:40:57 DEBUG    ReqId 281472979431840 Waiting for title (attempt 1): Just a moment...
2024-06-11 06:40:59 DEBUG    ReqId 281472979431840 Timeout waiting for selector
2024-06-11 06:40:59 DEBUG    ReqId 281472979431840 Trying to find the closest Cloudflare clickable element...
2024-06-11 06:41:01 DEBUG    ReqId 281472979431840 mouse move to location 654.00, 288.50 where <iframe src="https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/turnstile/if/ov2/av0/rcv0/0/o2qp0/0x4AAAAAAADnPIDROrmt1Wwj/light/normal" allow="cross-origin-isolated; fullscreen; autoplay" sandbox="allow-same-origin allow-scripts allow-popups" id="cf-chl-widget-o2qp0" tabindex="0" title="Widget containing a Cloudflare security challenge" style="border: none; overflow: hidden; width: 300px; height: 65px;"></iframe> is located

And when I try again I've this :

2024-06-11 06:57:39 ERROR    ReqId 281472845955488 Error creating Chrome Browser: 
                ---------------------
                Failed to connect to browser
                ---------------------
                One of the causes could be when you are running as root.
                In that case you need to pass no_sandbox=True 
                
2024-06-11 06:57:39 ERROR    ReqId 281472845955488 Error: Error solving the challenge. cannot access local variable 'driver' where it is not associated with a value
2024-06-11 06:57:39 DEBUG    ReqId 281472845955488 Response => POST /v1 body: {'status': 'error', 'message': "Error: Error solving the challenge. cannot access local variable 'driver' where it is not associated with a value", 'startTimestamp': 1718089056909, 'endTimestamp': 1718089059773, 'version': '3.4.0'}

Hope this will help your invetigations !

@juanfrilla
Copy link

juanfrilla commented Jun 11, 2024

Since I'm accessing to pdf urls, a problem i'm facing it's that sometimes the "No space left on device" message appears and it stop working until free space it's available, how can I automatically remove temporal files or how can I not download anything cuz I dont need it?

@21hsmw
Copy link
Contributor

21hsmw commented Jun 11, 2024

Last week, 21hsmw/flaresolverr:nodriver worked 90% of time on my Debian ARM (without VPN or proxy) but it doesn't work anymore since the update from 4-5 days ago. I've just tried the latest update but flaresolverr just be stuck when challenge is detected (I've configured the Env DRIVER=nodriver)

2024-06-11 06:40:52 DEBUG    ReqId 281472979431840 New instance of chromium has been created to perform the request
2024-06-11 06:40:52 DEBUG    ReqId 281472979431840 Navigating to... https://www.ygg.re/engine/search?do=search&order=desc&sort=publish_date&category=all
2024-06-11 06:40:57 INFO     ReqId 281472979431840 Challenge detected. Title found: Just a moment...
2024-06-11 06:40:57 DEBUG    ReqId 281472979431840 Waiting for title (attempt 1): Just a moment...
2024-06-11 06:40:59 DEBUG    ReqId 281472979431840 Timeout waiting for selector
2024-06-11 06:40:59 DEBUG    ReqId 281472979431840 Trying to find the closest Cloudflare clickable element...
2024-06-11 06:41:01 DEBUG    ReqId 281472979431840 mouse move to location 654.00, 288.50 where <iframe src="https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/turnstile/if/ov2/av0/rcv0/0/o2qp0/0x4AAAAAAADnPIDROrmt1Wwj/light/normal" allow="cross-origin-isolated; fullscreen; autoplay" sandbox="allow-same-origin allow-scripts allow-popups" id="cf-chl-widget-o2qp0" tabindex="0" title="Widget containing a Cloudflare security challenge" style="border: none; overflow: hidden; width: 300px; height: 65px;"></iframe> is located

And when I try again I've this :

2024-06-11 06:57:39 ERROR    ReqId 281472845955488 Error creating Chrome Browser: 
                ---------------------
                Failed to connect to browser
                ---------------------
                One of the causes could be when you are running as root.
                In that case you need to pass no_sandbox=True 
                
2024-06-11 06:57:39 ERROR    ReqId 281472845955488 Error: Error solving the challenge. cannot access local variable 'driver' where it is not associated with a value
2024-06-11 06:57:39 DEBUG    ReqId 281472845955488 Response => POST /v1 body: {'status': 'error', 'message': "Error: Error solving the challenge. cannot access local variable 'driver' where it is not associated with a value", 'startTimestamp': 1718089056909, 'endTimestamp': 1718089059773, 'version': '3.4.0'}

Hope this will help your invetigations !

That's probably because I changed to reusing the nodes. Instead of being re-created like before, they are taken at a certain point in time, which could lead to missing elements. I'll see what I can do about that.

Since I'm accessing to pdf urls, a problem i'm facing it's that sometimes the "No space left on device" message appears and it stop working until free space it's available, how can I automatically remove temporal files or how can I not download anything cuz I dont need it?

Nodriver deletes all user data directories when it exits, which in our case is when flaresolverr is completely stopped, so that might explain why you are getting this.
I have added a cleanup for this on my local branch, which I will push to github soon.

@daNutzzzzz

This comment was marked as off-topic.

@ilike2burnthing

This comment was marked as off-topic.

@daNutzzzzz

This comment was marked as off-topic.

@ilike2burnthing

This comment was marked as off-topic.

@today2004
Copy link

在过去的几天里,我花了一些时间用 nodriver 实现了 flaresolverr 的部分。根据我自己的测试,它适用于 Linux 和 Windows,包括 Windows 的无头。如果你有时间,你能建立和测试我的nodriver-support分支吗?还有一些工作要做,但它应该按原样工作,我很好奇你是否能通过它的挑战。request.get

Thanks for your work, he got me through the challenge But I noticed a few issues with cookies There are some errors in nodrive's driver.cookies.set_all function. Comment out cookies = await connection.send(cdp.storage.get_cookies()) And set_all accepts List[cdp.network.CookieParam] So you actually have to build the CookieParam list by hand, and on my machine I can't set a cookie for the current page correctly without specifying the domain for the cookie. Below is the code I modified, it may still have some errors, and root_domain is extracted from the domain name using regular.

        await driver.cookies.clear()
        root_domain = plugin.extract_root_domain(req.url)
        cookies = []
        for cookie in req.cookies:
            if cookie['name'] == 'cf_clearance':
                domain = f".{root_domain}"
            else:
                domain = root_domain
            cookies.append(cdp.network.CookieParam(
                name=cookie['name'],  value=cookie['value'],
                path='/', domain = domain))

        await driver.cookies.set_all(cookies)

I also noticed this issue with the one I am currently using .cookies.load(flie)

@ilike2burnthing
Copy link
Contributor Author

Getting errors and a few zombie Chrome processes on start (though it still works):

C:\Users\WDAGUtilityAccount\Downloads\FlareSolverr-nodriver-support>python src/flaresolverr.py
C:\Users\WDAGUtilityAccount\Downloads\FlareSolverr-nodriver-support\src\utils.py:377: SyntaxWarning: invalid escape sequence '\d'
  pattern = '\d+\.\d+\.\d+\.\d+'
2024-06-14 04:37:35 INFO     ReqId 6308 FlareSolverr 3.4.0
2024-06-14 04:37:35 DEBUG    ReqId 6308 Debug log enabled
2024-06-14 04:37:35 DEBUG    ReqId 6308 Using proactor: IocpProactor
2024-06-14 04:37:35 INFO     ReqId 6308 Testing web browser installation...
2024-06-14 04:37:35 INFO     ReqId 6308 Platform: Windows-10-10.0.19041-SP0
2024-06-14 04:37:35 INFO     ReqId 6308 Chrome / Chromium path: C:\Program Files\Google\Chrome\Application\chrome.exe
2024-06-14 04:37:35 INFO     ReqId 6308 Chrome / Chromium major version: 126
2024-06-14 04:37:35 INFO     ReqId 6308 Launching web browser...
2024-06-14 04:37:35 INFO     ReqId 6308 Launching web browser...
2024-06-14 04:37:35 DEBUG    ReqId 6308 Launching web browser with nodriver...
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\Google/Chrome/Application\chrome.exe is a valid candidate...
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\Google/Chrome Beta/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\Google/Chrome Canary/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\chrome\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files (x86)\Google/Chrome/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files (x86)\Google/Chrome Beta/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files (x86)\Google/Chrome Canary/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files (x86)\chrome\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\AppData\Local\Google/Chrome/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\AppData\Local\Google/Chrome Beta/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\AppData\Local\Google/Chrome Canary/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\AppData\Local\chrome\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\Google/Chrome/Application\chrome.exe is a valid candidate...
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\Google/Chrome Beta/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\Google/Chrome Canary/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Program Files\chrome\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\Downloads\FlareSolverr-nodriver-support\src\Google/Chrome/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\Downloads\FlareSolverr-nodriver-support\src\Google/Chrome Beta/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\Downloads\FlareSolverr-nodriver-support\src\Google/Chrome Canary/Application\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:35 DEBUG    ReqId 6308 C:\Users\WDAGUtilityAccount\Downloads\FlareSolverr-nodriver-support\src\chrome\chrome.exe is not a valid candidate because don't exist or not executable
2024-06-14 04:37:41 DEBUG    ReqId 6308 Removed Browser user data directory C:\Users\WDAGUtilityAccount\AppData\Local\Temp\uc_dxormjqz
2024-06-14 04:37:41 INFO     ReqId 6308 FlareSolverr User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36
2024-06-14 04:37:41 INFO     ReqId 6308 Test successful!
2024-06-14 04:37:41 DEBUG    ReqId 6308 Using proactor: IocpProactor
2024-06-14 04:37:41 INFO     ReqId 6308 Serving on http://0.0.0.0:8191
Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x000001FAADEEEA20>
Traceback (most recent call last):
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_subprocess.py", line 125, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
                               ^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_subprocess.py", line 70, in __repr__
    info.append(f'stdin={stdin.pipe}')
                        ^^^^^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 80, in __repr__
    info.append(f'fd={self._sock.fileno()}')
                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\windows_utils.py", line 102, in fileno
    raise ValueError("I/O operation on closed pipe")
ValueError: I/O operation on closed pipe
Exception ignored in: <function _ProactorBasePipeTransport.__del__ at 0x000001FAADF18220>
Traceback (most recent call last):
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 116, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
                               ^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 80, in __repr__
    info.append(f'fd={self._sock.fileno()}')
                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\windows_utils.py", line 102, in fileno
    raise ValueError("I/O operation on closed pipe")
ValueError: I/O operation on closed pipe
Exception ignored in: <function _ProactorBasePipeTransport.__del__ at 0x000001FAADF18220>
Traceback (most recent call last):
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 116, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
                               ^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 80, in __repr__
    info.append(f'fd={self._sock.fileno()}')
                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\windows_utils.py", line 102, in fileno
    raise ValueError("I/O operation on closed pipe")
ValueError: I/O operation on closed pipe
Exception ignored in: <function _ProactorBasePipeTransport.__del__ at 0x000001FAADF18220>
Traceback (most recent call last):
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 116, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
                               ^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 80, in __repr__
    info.append(f'fd={self._sock.fileno()}')
                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\windows_utils.py", line 102, in fileno
    raise ValueError("I/O operation on closed pipe")
ValueError: I/O operation on closed pipe

Working fine for every link I throw at it, with or without a proxy.

While I can start and use a session, I'm getting an error when trying to destroy it (it continues to run after):

2024-06-14 06:25:53 INFO     ReqId 6808 Incoming request => POST /v1 body: {'cmd': 'sessions.destroy', 'session': 'test1'}
2024-06-14 06:25:53 ERROR    ReqId 6808 Error: Task <Task pending name='Task-24' coro=<controller_v1_endpoint_nd() running at C:\Users\WDAGUtilityAccount\Downloads\FlareSolverr-nodriver-support\src\flaresolverr_service_nd.py:78> cb=[_run_until_complete_cb() at C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py:182]> got Future <Task pending name='Task-10' coro=<WebSocketCommonProtocol.transfer_data() running at C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\site-packages\websockets\legacy\protocol.py:959> wait_for=<Future finished result=None> cb=[Task.task_wakeup(), _wait.<locals>._on_completion() at C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py:534]> attached to a different loop
2024-06-14 06:25:53 DEBUG    ReqId 6808 Response => POST /v1 body: {'status': 'error', 'message': "Error: Task <Task pending name='Task-24' coro=<controller_v1_endpoint_nd() running at C:\\Users\\WDAGUtilityAccount\\Downloads\\FlareSolverr-nodriver-support\\src\\flaresolverr_service_nd.py:78> cb=[_run_until_complete_cb() at C:\\Users\\WDAGUtilityAccount\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\asyncio\\base_events.py:182]> got Future <Task pending name='Task-10' coro=<WebSocketCommonProtocol.transfer_data() running at C:\\Users\\WDAGUtilityAccount\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\websockets\\legacy\\protocol.py:959> wait_for=<Future finished result=None> cb=[Task.task_wakeup(), _wait.<locals>._on_completion() at C:\\Users\\WDAGUtilityAccount\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\asyncio\\tasks.py:534]> attached to a different loop", 'startTimestamp': 1718342753258, 'endTimestamp': 1718342753258, 'version': '3.4.0'}
2024-06-14 06:25:53 INFO     ReqId 6808 Response in 0.0 s
2024-06-14 06:25:53 DEBUG    ReqId 6808 Using proactor: IocpProactor
2024-06-14 06:25:53 INFO     ReqId 6808 127.0.0.1 POST http://localhost:8191/v1 500 Internal Server Error

I'll go back and try Docker on my NAS again, but failing that I'll get a live system going.

@ilike2burnthing
Copy link
Contributor Author

Well I eventually got a couple of successive successful runs with ilcorsaroblu.org, but with high memory usage from the Python process persisting after.

The other two unfortunately continue to return invalid cookies (but work fine on the current release).

@zxsleebu
Copy link

zxsleebu commented Jun 14, 2024

i found a fix for the memory leak. ultrafunkamsterdam/undetected-chromedriver#1851 (comment)

@21hsmw
Copy link
Contributor

21hsmw commented Jun 14, 2024

@ilike2burnthing Thanks, I'll try to see what I can do about the errors you're getting. The first one was fixed on my end the last time I worked on it, so it needs a more global way to fix it. I'll also check the sessions and chromium processes that are still running.

i found a fix for the memory leak. ultrafunkamsterdam/undetected-chromedriver#1851 (comment)

I tried it, but it still creates memory leaks when a website has a lot of elements while using query_selector.

@zxsleebu
Copy link

zxsleebu commented Jun 14, 2024

I tried it, but it still creates memory leaks when a website has a lot of elements while using query_selector.

please send me an example code. maybe there's something more i could do. my approach should fix memory leaks which happened when commands were being sent from nodriver to devtools. i don't think that there is any other memory leak. what you're experiencing is probably just default memory load. every function which returns an element loads page content to memory each time you're calling it. wait_for does it every 0.1s. there always will be some memory load. previously due to a bug the content wasn't able to unload from memory which caused a heavy leak.

@21hsmw
Copy link
Contributor

21hsmw commented Jun 14, 2024

I tried it, but it still creates memory leaks when a website has a lot of elements while using query_selector.

please send me an example code. maybe there's something more i could do

Using Flaresolverr as an example, if you iterate selectors based on the selectors in "CHALLENGE_SELECTORS" with nodriver query_selector, you will see that the memory fills up and then goes down a bit after the instance is closed, but not to the original state. So if you continue, it goes up and up until the system kills the process. The only way I have currently found to stop this memory leak issue and to speed up the queries is to reuse the node (await tab.query_selector(selector=selector, _node=doc)) instead of running doc: cdp.dom.Node = await self.send(cdp.dom.get_document(-1, True)) on every request.

If you remove all _node for all query_selector lines and try flaresolverr on https://ilcorsaroblu.org/index.php?page=torrents&category=0&options=0&active=0&order=3&by=2, it will be very slow and the memory will fill up. When it closes the browser instance, the memory will be higher than before.

@21hsmw
Copy link
Contributor

21hsmw commented Jun 14, 2024

@ilike2burnthing I pushed some changes in my repo and on docker hub, let me know if you still get the errors you were getting.
As for cookies, I'm able to reuse the cf_clearance cookie on both Linux and Windows. From your logs I would guess that it is using Google Chrome, do you have any extensions or specific modifications on it that might change the browser fingerprint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed needs investigation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet