Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need bidirectional communication channel between netavark and aardvark #338

Open
Luap99 opened this issue Jun 8, 2023 · 8 comments
Open

Comments

@Luap99
Copy link
Member

Luap99 commented Jun 8, 2023

Right now our dns startup is super flaky causing many flakes in CI that are only solved by using retries. This is bad and often not what users are doing. Sending signals is just not reliable. Netavark sends the signal on a update but then never wait for aardvark-dns to actually update the names and be ready to respond to the new name. The same goes for error handling aardvark-dns logs its errors to journald but there is absolutely no way right now to get this error back to netavark and thus podman. A common problem is that port 53 is already bound causing aardvark-dns to be up and running but unable to serve any dns.

There are a lot of dns related issues on the podman issue tracker, most not really possible to debug. IMO we have to address this situation.

Of course one important caveat is that we must stay backwards compatible. I am creating to have a discussion about it so we can find a good solution for this.

cc @baude @mheon @flouthoc

@mheon
Copy link
Member

mheon commented Jun 8, 2023

Are you thinking something like a unix socket, where we could pass requests from NV to AV and receive a response when the change was fully implemented?

@Luap99
Copy link
Member Author

Luap99 commented Jun 8, 2023

Yes, I just want something were can make sure netavark won't return until aardvark-dns is ready and if there was an error we should get it back.

@mheon
Copy link
Member

mheon commented Jun 8, 2023

I would prefer not to drag a full REST API in, so I wonder if we can't do something a little lighter (protobuf, maybe? Does that have good rust bindings?)

The idea in general seems sound, and could serve to enable additional features in the future (we've talked about having Aardvark listen for DBus events and launch Netavark when the firewall reloads, and bidirectional comms could be useful for that)

@Luap99
Copy link
Member Author

Luap99 commented Jun 8, 2023

I don't care about the protocol, protbuf would work we use it already for the dhcp proxy so it is not a new dependency for netavark.
But honestly I think it right now a simple string based API would be enough assuming we keep the current way of writing entries to file.

@edsantiago
Copy link
Collaborator

xref: containers/podman#18325

@edsantiago
Copy link
Collaborator

xref: containers/podman#16272

@Luap99
Copy link
Member Author

Luap99 commented Jun 8, 2023

I didn't even bother linking issues, I could properly link 20+ issues from the podman repo that may not be fixed by this but at least can be diagnosed by the users.

Common error is having something listening on port 53.

$ sudo nc -u -l 53
$ sudo podman run --network podman1 --rm alpine nslookup google.com
nslookup: write to '10.89.0.1': Connection refused
;; connection timed out; no servers could be reached

That is what the user sees, dns not working but they don't know why.

The only real clue is in the journal but most people will never check that:

aardvark-dns[34502]: Unable to start server unable to start CoreDns server: Address already in use (os error 98)

The goal here would be to have the podman run command error out with the aardvark error.

@baude
Copy link
Member

baude commented Jun 29, 2023

@Luap99 do you want to self-assign this or prefer to wait until your workload lessens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants