Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurring SIGSEGV #64

Closed
andbuitra opened this issue Feb 13, 2021 · 27 comments
Closed

Recurring SIGSEGV #64

andbuitra opened this issue Feb 13, 2021 · 27 comments
Assignees
Labels
bug Something isn't working
Projects
Milestone

Comments

@andbuitra
Copy link

andbuitra commented Feb 13, 2021

Hello,

We deployed dnsbl_exporter on a CentOS 7 machine as a systemd service. It's currently going offline pretty often complaining about memory (either oom or sigsegv). This is the error:

feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: panic: runtime error: invalid memory address or nil pointer dereference
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4640e7]
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: goroutine 316739 [running]:
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: github.com/luzilla/dnsbl_exporter/collector.(*Rbl).lookup(0xc000330510, 0xc00019a120, 0x12, 0xc00019a1e0, 0x1d, 0x1, 0x1, 0x0)
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: /home/runner/work/dnsbl_exporter/dnsbl_exporter/collector/rbl.go:147 +0x3bd
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: github.com/luzilla/dnsbl_exporter/collector.(*Rbl).Update.func1(0xc0002da1b0, 0xc000330510, 0xc00019a120, 0x12, 0xc00019a1e0, 0x1d)
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: /home/runner/work/dnsbl_exporter/dnsbl_exporter/collector/rbl.go:166 +0x113
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: created by github.com/luzilla/dnsbl_exporter/collector.(*Rbl).Update
feb 13 04:31:34 monitor2-co dnsbl_exporter[685]: /home/runner/work/dnsbl_exporter/dnsbl_exporter/collector/rbl.go:161 +0xf0
feb 13 04:31:34 monitor2-co systemd[1]: dnsbl_exporter.service: main process exited, code=exited, status=2/INVALIDARGUMENT
feb 13 04:31:34 monitor2-co systemd[1]: Unit dnsbl_exporter.service entered failed state.
feb 13 04:31:34 monitor2-co systemd[1]: dnsbl_exporter.service failed.

There's plenty of memory available (more than 6 GB) so this shouldn't be an issue. So far I've resorted to configure auto restart for the systemd unit. If relevant, the log also shows plenty of these:

feb 13 04:17:35 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:17:35-05:00" level=error
feb 13 04:17:39 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:17:39-05:00" level=error
feb 13 04:19:35 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:19:35-05:00" level=error
feb 13 04:19:35 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:19:35-05:00" level=error
feb 13 04:25:36 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:25:36-05:00" level=error
feb 13 04:25:36 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:25:36-05:00" level=error
feb 13 04:29:36 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:29:36-05:00" level=error
feb 13 04:29:36 monitor2-co dnsbl_exporter[685]: time="2021-02-13T04:29:36-05:00" level=error

There's nothing too special about our config. The only thing is that we load the RBLs and targets (using the proper args with absolute paths) from a folder that is linked to a git repo.

@till
Copy link
Contributor

till commented Feb 13, 2021

@andbuitra Thanks for reporting, can you share config and version? I'll try to reproduce. Sounds like something is missing.

@till till self-assigned this Feb 13, 2021
@till till added the bug Something isn't working label Feb 13, 2021
@till till added this to Backlog in Open Source via automation Feb 13, 2021
@till till added this to the 1.0.0 milestone Feb 13, 2021
@andbuitra
Copy link
Author

andbuitra commented Feb 15, 2021

The configuration is pretty straightforward

[Unit]
Description=DNSBL Exporter
StartLimitBurst=5


[Service]
User=root
ExecStart=/root/prometheus-monitoring/dnsbl_exporter/dnsbl_exporter --config.dns-resolver [REDACTED] --config.rbls /root/prometheus-monitoring/config-files/dnsbl_exporter/rbls.ini --config.targets /root/prometheus-monitoring/config-files/dnsbl_exporter/targets.ini
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=default.target

The version used is the latest release

./dnsbl_exporter --version
dnsbl-exporter version 0.4.3

@till
Copy link
Contributor

till commented Feb 16, 2021

@andbuitra Sorry, I meant rbls.ini and possibly targets.ini. I am assuming something is missing, and I don't handle input correctly.

@andbuitra
Copy link
Author

Hello

The rbls.ini is as follows

[rbl]
server=cbl.abuseat.org
server=bl.deadbeef.com
server=spamtrap.drbl.drand.net
server=spamsources.fabel.dk
server=0spam.fusionzero.com
server=mail-abuse.blacklist.jippg.org
server=dyna.spamrats.com
server=noptr.spamrats.com
server=spam.spamrats.com
server=dnsbl.sorbs.net
server=spam.dnsbl.sorbs.net
server=bl.spamcop.net
server=pbl.spamhaus.org
server=sbl.spamhaus.org
server=xbl.spamhaus.org
server=ubl.unsubscore.com
server=dnsbl-1.uceprotect.net
server=dnsbl-2.uceprotect.net
server=dnsbl-3.uceprotect.net
server=db.wpbl.info
server=access.redhawk.org
server=sbl-xbl.spamhaus.org
server=b.barracudacentral.org
server=dul.dnsbl.sorbs.net
server=http.dnsbl.sorbs.net
server=l1.spews.dnsbl.sorbs.net
server=l2.spews.dnsbl.sorbs.net
server=misc.dnsbl.sorbs.net
server=postmaster.rfc-ignorant.org
server=rbl.spamlab.com
server=rbl.suresupport.com
server=relays.bl.kunden.de
server=smtp.dnsbl.sorbs.net
server=socks.dnsbl.sorbs.net
server=zen.spamhaus.org
server=zombie.dnsbl.sorbs.net
server=truncate.gbudb.net

Targets follows this pattern

[targets]
server=smtp.example1.com
server=smtp.example2.com

@till
Copy link
Contributor

till commented Feb 19, 2021

@andbuitra I'll check it on the weekend 🙏🏼

@till
Copy link
Contributor

till commented Feb 21, 2021

@andbuitra I haven't made much progress. Can you add --log.debug to your systemd unit and see if it uncovers anything? It's a bit noisy, but it would help.

My guess is that it's something inside the RBL requesting and response parsing. Or maybe even in a dependency.

@till
Copy link
Contributor

till commented Feb 21, 2021

@andbuitra
Copy link
Author

@till I completely forgot about this. I will test it on the next couple of days. Thank you!

@till
Copy link
Contributor

till commented Mar 6, 2021

Yeah, let me know how it goes. I think I'll wait a bit until I merge the updated dependency again. Trying to think what else can be done to track this.

@till
Copy link
Contributor

till commented Mar 6, 2021

Btw, if you happen to narrow it down to a host/RBL combo, I can write a test confirming it against the upstream dependency and see about fixing it there.

@till
Copy link
Contributor

till commented Mar 31, 2021

@andbuitra friendly ping. Did you have a chance to take a look?

@till
Copy link
Contributor

till commented Apr 5, 2021

@andbuitra Do you see this happening still? I am currently prepping for a 0.5.0 release.

Btw, I'd like to include service files. Do you feel like contributing your's? With location a la man here would be preferred.

@andbuitra
Copy link
Author

@till Apologies, I was on vacation. I haven't been able to test the package yet but I will now. My systemd unit is simple and it loads the config file from a local git repo; the unit is located at /etc/systemd/system/dnsbl_exporter.service but I have seen other apps like MariaDB putting them on /usr/lib/... and then referencing them. Maybe there's a standard for it by the freedesktop. The restart clause was put to mitigate the original issue.

I will test the release 0.4.4 and let you know if the issue happens again.

@till
Copy link
Contributor

till commented Apr 7, 2021

Here is a 0.4.4-next:
dnsbl-exporter-linux-amd64-0.4.4-next.zip

If you want to build it yourself, you'll need goreleaser and a clone of this repo: make build.

@till
Copy link
Contributor

till commented Apr 8, 2021

I kinda just spotted something else.

Sometimes parsing IPs seems to fail. Why, not sure, but if it's nil. Code panics.

till added a commit that referenced this issue Apr 8, 2021
If the IP address is neither v4 or v6, IP.Parse() returns "nil".

Related: #64
@till
Copy link
Contributor

till commented Apr 8, 2021

So, I can't figure out why this may happen to begin with, but now I should not panic but instead give you a log message about the "string" which it can't determine if it's an IP(v4 or v6).

till added a commit that referenced this issue Apr 8, 2021
If the IP address is neither v4 or v6, IP.Parse() returns "nil".

Related: #64
@till
Copy link
Contributor

till commented Apr 8, 2021

Latest main branch:

dnsbl_exporter_0.4.4-next_Darwin_arm64.tar.gz
dnsbl_exporter_0.4.4-next_Darwin_x86_64.tar.gz

I think this contains an actual fix. So it was not in a dependency, but my use of Go. If I don't hear back from you, I'll release 0.5.0 towards the weekend.

@andbuitra
Copy link
Author

@till I was deploying this but I see this is the Darwin binary and we run Linux on the server. Could you build that latest version for linux x86? I will remove the restarts on my systemd unit so it won't fix itself automatically.

@till
Copy link
Contributor

till commented Apr 9, 2021

@andbuitra
Copy link
Author

I have installed it now and so far so good. I will report back if the issue shows up again

@andbuitra
Copy link
Author

@till No crashes as of now. I believe that panic was causing the binary to stop. It's been working normally for more than 12 hours without needing to reboot

@till
Copy link
Contributor

till commented Apr 10, 2021

@andbuitra Thanks for letting me know.

You catch anything in the logs? I am curious what kind of "ip" caused this.

@andbuitra
Copy link
Author

Nothing special shows up. The only error is "level=error msg="read udp 127.0.0.1:37474->:0: read: connection refused" that shows up multiple times every minute but don't really know what it's about since the monitor works fine (as in metrics show up correctly)

@till
Copy link
Contributor

till commented Apr 10, 2021

Maybe you filter udp? DNS uses both (tcp and udp). If you have a local resolver it should respond to both.

@andbuitra
Copy link
Author

It could be being filtered by upstream firewall. The resolver is in a public network and it's used throughout the infrastructure. dnsbl operates on a server behind a firewall using it as a gateway so that could be the reason. However, it's a non issue since the exporter is working just fine.

The exporter has been running for more than three days now with no issues. I believe this issue can be closed

@till
Copy link
Contributor

till commented Apr 12, 2021

Ok, good to know! I'll close when I cut a release. I am trying to finish #84 first! :) Thanks again for your time and patience.

@till till mentioned this issue Apr 17, 2021
@till
Copy link
Contributor

till commented Apr 17, 2021

@andbuitra I finally released 0.5.0, thanks again for your help and patience. I put your unit into #86. If you have time to contrib a more general unit file, let me know.

@till till closed this as completed Apr 17, 2021
Open Source automation moved this from Backlog to Done Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Open Source
  
Done
Development

No branches or pull requests

2 participants