Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lpstat hangs with stopped cupsd on Solaris #5833

Closed
l1gi opened this issue Sep 25, 2020 · 4 comments
Closed

lpstat hangs with stopped cupsd on Solaris #5833

l1gi opened this issue Sep 25, 2020 · 4 comments

Comments

@l1gi
Copy link

l1gi commented Sep 25, 2020

There is an issue with cups client time out on Solaris. If I stop cupsd or use -h localhost:632 for lpstat it does not fail even if there is no daemon listening and hangs.

$ lpstat -h localhost:632

I have rebuilt the cups with debug enabled and tried again to generate more debug info. This is the interesting part of it:

T001 07:37:07.242  httpConnect2(host="localhost", port=632, addrlist=0, family=0, encryption=0, blocking=1, msec=30000, cancel=0)
T001 07:37:07.242  http_create(host="localhost", port=632, addrlist=0, family=0, encryption=0, blocking=1, mode=0)
T001 07:37:07.243  httpAddrGetList(hostname="localhost", family=AF_UNSPEC, service="632")
T001 07:37:07.243  httpReconnect2(http=701cd5790, msec=30000, cancel=0)
T001 07:37:07.243  httpAddrString(addr=701cd5558, s=7fd24f9417f0, slen=256)
T001 07:37:07.243  httpAddrString: returning "[v1.::1]"...
T001 07:37:07.243  httpReconnect2: Address [v1.::1]:632
T001 07:37:07.243  httpAddrString(addr=701cd5678, s=7fd24f9417f0, slen=256)
T001 07:37:07.243  httpAddrString: returning "127.0.0.1"...
T001 07:37:07.243  httpReconnect2: Address 127.0.0.1:632
T001 07:37:07.243  httpAddrConnect2(addrlist=701cd5550, sock=701cd5790, msec=30000, cancel=0)
T001 07:37:07.243  httpAddrString(addr=701cd5558, s=7fd24f941670, slen=256)
T001 07:37:07.243  httpAddrString: returning "[v1.::1]"...
T001 07:37:07.243  httpAddrConnect2: Trying [v1.::1]:632...
T001 07:37:07.243  httpAddrConnect2: Setting non-blocking connect()
T001 07:37:07.243  httpAddrConnect2: Finishing async connect()
T001 07:37:07.244  httpAddrConnect2: poll() returned 1 (150)
T001 07:37:07.244  pfds[0].revents=5
T001 07:37:07.244  httpReconnect2: New socket=5
T001 07:37:07.244  httpAddrString(addr=701cd5558, s=7fd24f9417f0, slen=256)
T001 07:37:07.244  httpAddrString: returning "[v1.::1]"...
T001 07:37:07.244  httpReconnect2: Connected to [v1.::1]:632...
T001 07:37:07.244  cups_enum_dests(flags=0, msec=250, cancel=0, type=0, mask=0, cb=7fca01a55cb0, user_data=7fd24f941b00)

Even if there is nothing listening on [::1]:632, it thinks it is connected and continues trying with re-connect. Could you look into it and guess what is wrong there, please?

Complete debug log attached.

@l1gi
Copy link
Author

l1gi commented Sep 25, 2020

cups.log

lpstat was terminated by ^C.

@l1gi
Copy link
Author

l1gi commented Sep 29, 2020

This is an interresting part of truss:

so_socket(PF_INET6, SOCK_STREAM, IPPROTO_IP, NULL, SOV_XPG4_2) = 4
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, 0x7FF8F8CCB678, 4, SOV_DEFAULT) = 0
setsockopt(4, SOL_SOCKET, SO_REUSEPORT, 0x7FF8F8CCB678, 4, SOV_DEFAULT) = 0
setsockopt(4, SOL_SOCKET, SO_NOSIGPIPE, 0x7FF8F8CCB678, 4, SOV_DEFAULT) = 0
setsockopt(4, tcp, TCP_NODELAY, 0x7FF8F8CCB678, 4, SOV_DEFAULT) = 0
fcntl(4, F_SETFD, 0x00000001) = 0
fcntl(4, F_GETFL) = 2
fcntl(4, F_SETFL, FWRITE|FNONBLOCK) = 0
connect(4, 0x71FDC20E8, 32, SOV_XPG4_2) Err#150 EINPROGRESS
fcntl(4, F_SETFL, FWRITE) = 0
pollsys(0x7FF8F8CCBB30, 1, 0x7FF8F8CCB620, 0x00000000) = 1
getpeername(4, 0x7FF8F8CCBE50, 0x7FF8F8CCB67C, SOV_DEFAULT) Err#134 ENOTCONN
uname(0x7FF8F8CCB8A0) = 1
send(4, " P O S T / H T T P /".., 203, 0) Err#32 EPIPE
send(4, " P O S T / H T T P /".., 203, 0) Err#32 EPIPE
close(4) = 0

Firstly I don't understand why httpConnect2 is called with blocking=1 and the argument is not used in the underlying functions while they do unconditional nonblocking connect.

If this is an intention, than there is a missing mechanism to check the state of the socket before it is being polled. Maybe getsockopt() is what would 'workaround' the issue though I am in doubt that is the right fix here.

I assume it works accidentally on linux as the non blocking connec() operation finishes sooner that poll() is called. Still the check should be there as far as I understand.

Could you advise, please?

@l1gi
Copy link
Author

l1gi commented Sep 30, 2020

04-nonblock_connect.txt

Here is the patch which consider a socket on which getsockopt() fails as non connected on Solaris. Though I still don't understand some details of httpAddrConnect2() behavior this change makes Solaris lpstat -t fail in the same manner as on linux:

$ lpstat -t
scheduler is not running
no system default destination
lpstat: Bad file number
lpstat: Bad file number
lpstat: Bad file number
lpstat: Bad file number
lpstat: Bad file number
$

Could I ask you to rework the patch and integrate it in current cups, please?

@michaelrsweet
Copy link
Collaborator

Please report this to the OpenPrinting CUPS project.

The code in httpConnect2 is there to support multi-homing. That is, when a server (or printer) is available on multiple addresses, the client may not be able to determine which address is best. So httpConnect2 uses a non-blocking connect to try each address in parallel, stopping only once a connection is made. It sounds like the Solaris networking stack isn't propagating the error as it should, which is why your proposed patch fixes things on Solaris. We'll need to test this on multiple platforms to ensure this doesn't cause a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants