Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network scan of some subnets time out on Ubuntu 22.04.2 #130

Open
martyfay1 opened this issue Apr 26, 2024 · 22 comments
Open

Network scan of some subnets time out on Ubuntu 22.04.2 #130

martyfay1 opened this issue Apr 26, 2024 · 22 comments

Comments

@martyfay1
Copy link

martyfay1 commented Apr 26, 2024

Describe the bug
When making a network from a subnet and then calling network.scan() on certain subnets, it will timeout and fail based on a singular ip being present in the subnet. When that ip is removed from the subnet, (by using MinerNetwork.from_list() and supplying all of the ips except the specific one) it works fine. The thing is that on macOS 13.6.6 it works fine even with that ip included.

I've traced the issue of the bug to the _get_miner_web() call, and more specifically the _web_ping() call with the url f"http://{ip}/". When examining the logs for the from the httpx.AsyncClient.get() call, the

[DEBUG][04/26/24 14:06:01](httpcore.connection) - connect_tcp.started host=<IP> port=80 local_address=None timeout=5.0 socket_options=[(1, 13, b'\x01\x00\x00\x00\xe8\x03\x00\x00')]
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.connection) - connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x104cf6480>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_headers.started request=<Request [b'GET']>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_headers.complete
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_body.started request=<Request [b'GET']>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_body.complete
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - receive_response_headers.started request=<Request [b'GET']>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:06](httpcore.http11) - receive_response_headers.failed exception=ReadTimeout(TimeoutError())
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:06](httpcore.http11) - response_closed.started
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:06](httpcore.http11) - response_closed.complete

It starts hanging on the response_closed.started line for around 100 seconds, before failing. On mac, this get request response closes without timeout of the scan.

The only difference I see is that on mac, the same logs show socket_options=[(65535, 128, b'\x01\x00\x00\x00\xe8\x03\x00\x00')]

Expected behavior
The http get request should fail without disrupting the scan.

Desktop (please complete the following information):

  • OS: macOS 13.6.6 (working)
  • OS: Ubuntu 22.04.2 LTS (not working)
@b-rowan
Copy link
Collaborator

b-rowan commented Apr 26, 2024

This is a very detailed report, thank you. This will likely take some digging, @jameshilliard any initial thought?

@b-rowan
Copy link
Collaborator

b-rowan commented Apr 26, 2024

@martyfay1 would you mind sharing what that IP is?

@UpstreamData
Copy link
Owner

I've made some changes on the dev_scan_fail branch, do you want to see if that fixes anything?

@martyfay1
Copy link
Author

I've tried with that branch and it still fails. I'll get back to you on what that IP is.

@martyfay1
Copy link
Author

This has happened on 2 subnets. In both cases, the IPs correspond to a miner in a bad state. Nothing on it is responsive. The miners are S19J Pro running bitmain firmware Mon Dec 26 17:19:30 CST 2022.

@b-rowan
Copy link
Collaborator

b-rowan commented Apr 26, 2024

Ok interesting. That makes a lot more sense, I had thought maybe somehow the IP address itself was killing the request, but if it's an unresponsive miner that's a lot different.

It's likely just not handling a read timeout properly.

@martyfay1
Copy link
Author

Do you have any suggestion for how I can mitigate this from delaying the whole scan when done on Linux? I don't want potentially one bad miner to harm scan performance...

@b-rowan
Copy link
Collaborator

b-rowan commented Apr 29, 2024

I don't have any ideas yet, we're testing some things, I'm sure it can be fixed here in the library, but I just have to figure out what is actually freezing for that long and why.

@b-rowan
Copy link
Collaborator

b-rowan commented Apr 29, 2024

Seems like its hanging up somewhere in this function in httpcore -

    async def _response_closed(self) -> None:
        async with self._state_lock:
            if (
                self._h11_state.our_state is h11.DONE
                and self._h11_state.their_state is h11.DONE
            ):
                self._state = HTTPConnectionState.IDLE
                self._h11_state.start_next_cycle()
                if self._keepalive_expiry is not None:
                    now = time.monotonic()
                    self._expire_at = now + self._keepalive_expiry
            else:
                await self.aclose()

Any change you can run a debug python handler and see if you can find where it gets stuck?

Github Permalink - https://github.com/encode/httpcore/blob/39b756f07d40fab4eccb7c0dd3d7d369f1d2b4e8/httpcore/_async/http11.py#L245-L257

@martyfay1
Copy link
Author

I agree that it is getting stuck at that function. I am having a hard time figuring out how to debug it since it is on a server. I ran it with the python -m trace --trace scan.py and I think it is getting stuck on the last line there. Sorry if that's not much help. If you have any links to python debugging in the command line on a server lmk

@b-rowan
Copy link
Collaborator

b-rowan commented Apr 29, 2024

@martyfay1
Copy link
Author

Hey so I spent a little more time working with the pdb. It hangs after the await sleep(0) call at site-packages/anyio/_backends/_asyncio.py(1192), because Executing <Handle _SelectorSocketTransport._call_connection_lost(None) created at /usr/lib/python3.12/asyncio/selector_events.py:875> took 94.301 seconds. That's a call to this, which calls this. I am suspicious of the call to self._sock.close() as I've seen that the code was cancelled at here before, after I'd cancelled it when it was hanging.

@b-rowan
Copy link
Collaborator

b-rowan commented Apr 29, 2024

Can you try this at the start of the program? Curious if killing the connections with SO_LINGER will help.

pyasic.settings.update("socket_linger_time", 0)

@martyfay1
Copy link
Author

Set the linger to 0, same problem still.

@b-rowan
Copy link
Collaborator

b-rowan commented Apr 30, 2024

I'm stumped on this one. You might just want to play around with the _web_ping method and see if you can get it to stop happening?

@b-rowan
Copy link
Collaborator

b-rowan commented May 13, 2024

Added an asyncio.wait_for() in the web ping method, can you see if the latest version of the dev_scan_fail branch fixes the issue?

@martyfay1
Copy link
Author

Still the scan takes ~100 seconds longer then it should. I've tried many things, like setting timeouts on the get request and what seems to happen is that the get request knows it needs to timeout after a certain time, say 5 seconds, and starts to close the request then, but the actual closing of the socket takes ~100 seconds. When trying to call a get request from the command line ie. curl -m 10 -v <ip> the timeout works great, and outputs:

curl -m 10 -v <ip>
*   Trying <ip>...
* Connected to <ip> (<ip>) port 80 (#0)
> GET / HTTP/1.1
> Host: <ip>
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Operation timed out after 10001 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received

If not including the -m 10 arguments, it will also take ~100 seconds to time out when the connection is reset by peer. Not sure if this helps at all but I couldn't figure it out from here.

curl -m 10 -v <ip>
*   Trying <ip>...
* Connected to <ip> (<ip>) port 80 (#0)
> GET / HTTP/1.1
> Host: <ip>
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

@b-rowan
Copy link
Collaborator

b-rowan commented May 15, 2024

The last thing I can think of here before I try to take this to httpx is that it's possibly an issue with an older version of python? Have you tried on the mac with the same python version you have on the Ubuntu device?

@martyfay1
Copy link
Author

I just made sure, version 3.12.3 on the mac, and I upgraded from 3.12.2 to 3.12.3 on ubuntu and still experiencing the same issue.

@b-rowan
Copy link
Collaborator

b-rowan commented May 16, 2024

Started a discussion in httpx, we shall see.

encode/httpx#3202

@ph4n70m1984
Copy link

I had a similar problem. I solved it by editing the config.
https://github.com/encode/httpx/blob/master/httpx/_config.py#L370
DEFAULT_TIMEOUT_CONFIG
DEFAULT_LIMITS
DEFAULT_MAX_REDIRECTS

@ph4n70m1984
Copy link

ph4n70m1984 commented May 24, 2024

the bug is observed before the version 0.40.5

I am using the nmap scanner bypassing the built-in scanner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants