Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dsc stopped writing reports with 2.3.0 #111

Closed
McStork opened this issue Jan 3, 2017 · 18 comments
Closed

dsc stopped writing reports with 2.3.0 #111

McStork opened this issue Jan 3, 2017 · 18 comments

Comments

@McStork
Copy link
Contributor

McStork commented Jan 3, 2017

Hello and happy new year !

We installed dsc 2.3.0 on two servers shortly after the version release. Three days later, the daemon of one of the servers stopped writing the reports (but dsc kept running). The same situation occurred on the second server, 10 days after the installation of dsc 2.3.0.
It seems that no logs were written when the situations occurred.

We performed today a rollback to 2.1.1 which was the previously installed version on these servers.

@jelu
Copy link
Member

jelu commented Jan 3, 2017

Thanks for the report, there seem to be issues with the threading, signaling and forking. Working on it but hard to say when it can be ready for testing.

What distribution are you running?

You can try running with the -T flag that will disable the usage of threads.

@McStork
Copy link
Contributor Author

McStork commented Jan 3, 2017

Thanks for the quick reply :-).
We are running Debian Jessie x64 on both servers.

@jelu
Copy link
Member

jelu commented Jan 3, 2017

Can you give me the spec of the servers and the QPS your receiving?

@McStork
Copy link
Contributor Author

McStork commented Jan 3, 2017

Servers receive around 2 to 3 QPS.
They are LXC containers with 4 cpu @ 2.80GHz and 4GB RAM each.

@jelu
Copy link
Member

jelu commented Jan 5, 2017

Can you describe how it stopped? What did you see on the ps output? Did you do a strace?

@McStork
Copy link
Contributor Author

McStork commented Jan 5, 2017

ps showed that the process was still running even though reports stopped being written. We did not run strace :-(.

@jelu
Copy link
Member

jelu commented Jan 5, 2017

What was the last file in the data directory? An ...xml.XXX?

@McStork
Copy link
Contributor Author

McStork commented Jan 5, 2017

No, the last written file had the proper .xml / .json extension.

@jelu
Copy link
Member

jelu commented Jan 5, 2017

If you have the possibility could you setup another container and run the develop branch and generate some QPS to it?

It seems that the issue is related to threads and that the process forks when writing the files, this makes libraries very rarely lock up. I was lucky enough to catch this while stracing and I saw a lockup in the NSS library used to get the IP protocol name.

The latest commits to develop has been to ensure signal handlers are correct after fork and to use thread safe libc functions.

I am also setting up dsc on all our build VMs to run the latest develop branch build continuously, generating 10 QPS against them and will monitor file creation.

@jelu
Copy link
Member

jelu commented Jan 6, 2017

If you have not already please read my announcement regarding this:
https://lists.dns-oarc.net/pipermail/dsc/2017-January/000361.html

I have setup some test now for dsc, if you have any other ideas of tests please let me know:
https://dev.dns-oarc.net/jenkins/view/dsctest/

@jelu
Copy link
Member

jelu commented Jan 13, 2017

Are you able to test the latest develop branch?

@McStork
Copy link
Contributor Author

McStork commented Jan 13, 2017

Hi @jelu. We haven't tested the develop branch yet. But we might be able to do so next week.

@McStork
Copy link
Contributor Author

McStork commented Jan 16, 2017

We now have it running on a server.

@jelu
Copy link
Member

jelu commented Jan 16, 2017 via email

@McStork
Copy link
Contributor Author

McStork commented Jan 16, 2017

fa191b0

@jelu
Copy link
Member

jelu commented Jan 20, 2017

Do you have any issue to report with the develop branch?

@McStork
Copy link
Contributor Author

McStork commented Jan 20, 2017

We had no issue with the develop branch so far.

jelu added a commit to jelu/dsc that referenced this issue Jan 23, 2017
jelu added a commit to jelu/dsc that referenced this issue Jan 24, 2017
jelu added a commit to jelu/dsc that referenced this issue Jan 26, 2017
@jelu jelu closed this as completed in b0d2374 Jan 27, 2017
@McStork
Copy link
Contributor Author

McStork commented Jan 27, 2017

Thanks @jelu! Good job. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants