Attemp to fix broken installation on slow hardware / VPS #172

Psycojoker · 2016-08-08T06:08:48Z

Hello,

As reported here or
here, YunoHost post install fails on
slow hardware/vps because slapd is to slow to restart itself after its
regen-conf.

This patch is an attempt to fix this but I don't have a good testing
environment (my vagrant is too fast for that). Maybe testing that it's possible
to run something using the admin user could be a better test but I don't see
how to do it easily.

A workarround would be to use my patch to runs this kind of operation using
root instead of admin but this is a workaround, not a real fix (and this bug
could still generate other problems).

Cheers,

endorama · 2016-08-27T00:19:20Z

Hello, thank you for this project. I wanted to try it out installing on a Raspberry Pi model 2 and I'm experiencing this problem.
I tried this patch, but it doesn't seem to work, as the same error is displayed.

For clarity, I applied this patch to the file /usr/share/yunohost/hooks/conf_regen/06-slapd, then performed the post install again. The same error shows up. Am I missing something?

@Psycojoker related to testing this, I believe you could make you vagrant "slower" by setting the vm to 1 cpu and reducing the Execution Cap. From the Virtualbox docs: This setting limits the amount of time a host CPU spends to emulate a virtual CPU. The default setting is 100% meaning that there is no limitation. A setting of 50% implies a single virtual CPU can use up to 50% of a single host CPU. Note that limiting the execution time of the virtual CPUs may induce guest timing problems.
Don't know if this will be enough.
( I'm assuming you are using Virtualbox, but the same should be available for VMWare )

Here is /var/log/yunohost/yunohost-cli.log

Psycojoker · 2016-08-27T02:18:10Z

@endorama thanks for having tested this patch, that a valuable input :)

I though about slowing down the vagrant box too but never took the time to actually look on how to do that (I'm not really into virtualisation and those stuff).

I'll probably switch my test to something like "wait until you are able to log as the admin user" which is the actually cause of failure here.

But ... hmm...

ALREADY_EXISTS: {'desc': 'Already exists'}

This error is weird, I've never saw it before. Don't know what to think about it right now.

a1ex4 · 2016-08-27T10:18:32Z

@endorama Could you try this image I made for the RPi 2 ? I used it again yesterday and did not ran into any problem. :)

endorama · 2016-08-27T10:28:18Z

Thank you, I'll try it as soon as possible and report here.

Psycojoker · 2016-08-27T15:57:22Z

@likeitneverwentaway hey :)

That's a great news! Could you publish somewhere the script/way you've generated those images? We are still looking for someone to join us and handle the RPI images since the people who used to do that aren't present anymore :/

endorama · 2016-08-27T22:19:18Z

I can confirm is working as expected! @likeitneverwentaway thank you.

…his fix install on slow hardware/vps

Psycojoker · 2016-09-04T07:54:27Z

@endorama I've pushed a new version that this time try to wait for the admin user to be accessible, are you still able to try it? That would be great!

@likeitneverwentaway my request still hold ;)

a1ex4 · 2016-09-04T11:57:39Z

@Psycojoker Sorry :) My workflow was as follow:

install latest raspbian lite
Yunohost installation : I used a mix of the official guide and this one until I managed to get everything working for the main script and the post install. Sorry for being vague but I'm pretty sure everything I used was from these guides.
For the first boot script I used that one, I'm not entirely sure I edited it though... Checking it on my image before the first boot might be worth something.

I also edited and translated the official guide with my additions. Keep in mind that I only have a RPi 2 to play with, and apparently my image does not work flawlessly on the 3, I'm pretty sure this is because of the packages installation before running the main script, things differ here if you have a 3.

If I remember correctly, when I'm here the first command updates the metronome package with this version from Jerome, maybe this happens only on the rpi 2 ? I'm pretty sure it all comes down to this package for the other boards.

I'll be happy maintaining the raspberry image! Well, at least for the 2, all the feedback I received were good. I think officially publishing this image (maybe an announcement ?) would be good for feedback and the project. For updating the image, when should it be done ? Each new release of raspbian, major release of Yunohost ?

Psycojoker · 2016-09-07T15:47:47Z

@likeitneverwentaway thanks a lot for you answer and your work, that looks really cool :)

I'm going to talk about that with the other actives people in YunoHost trying to find someone that is better suited for that than me (I'm more into python dev). If you want you can join us on the xmpp chatroom, it's were most of us hang out dev@conference.yunohost.org

I would be very happy to see at least a rpi2 image maintained again :)

endorama · 2016-09-07T21:22:02Z

@Psycojoker I tried the latest master this evening, on a clean raspbian image, but the same error appeared.

Here is the installation log: http://pastebin.com/kZGhc0XF
yunohost-cli.log is empty
yunohost-api.log has http logs + the "missing admin user error"

alexAubin · 2016-11-25T05:31:44Z

Is there any update on this ? This looks like an important issue to fix...

I can't find the Unknown 'admin' user in the logs of your last comment, @endorama, instead it seems to be a failed grep on /etc/ldap/slapd.conf ?

After investigating the issue on my side, I think we actually need to call the hook right after (or inside) tools_ldapinit(). As this issue points out, the bug can also occur here (regen_conf for ssl before creating the CA) and will require the admin user to exist (somewhere inside the black magic of the pre-callback weird stuff) - but the regen_conf for ldap will only occur later.

This TODO actually points to this as well 😛 (though the last words are swapped I think).

alexAubin · 2016-11-25T05:38:05Z

If we really want to be paranoid, we can also add a check somewhere after tools_ldapinit() that the admin user exists. Something like :

try:
    pwd.getpwnam("admin")
except KeyError:
    raise MoulinetteError(...)

Psycojoker · 2016-11-25T13:39:59Z

Yes, this is an important issue but since I don't have any dev environment to test it I've stopped working on it (that and lack of free time :/).

This isn't a part of the code I'm familiar with, do you think you can fix this?

alexAubin · 2016-11-25T15:24:55Z

Yes, I'll have a look ASAP (though I'm not more familar with this part of code than you 😛), but I'll probably be too busy until tomorrow night.

Psycojoker · 2016-11-28T01:11:32Z

Thanks to the work on @alexAubin we now have a solution 504baefd87a4

alexAubin · 2016-11-28T02:42:26Z

Actually, looking carefully at these logs posted on issue 463, the fix you proposed here might really be needed.

If you look at the log, you see that in beginning of post-install, user admin really do exists. Otherwise conf_regen/02-ssl would have crashed as in here, which is what fix #191 addresses. (You even see the Creating directory '/home/admin'.)

But later, conf_regen/06-slapd is called, and after it's done goes in conf_regen/09-nslcd which then crashes with ... sudo: unknown user: admin !

So we really do need this fix too. I don't know if the wait loop is a good solution, but that should do the trick. Maybe it can be improved with also a nscd -i passwd like in #191. Maybe inside the loop ?

Psycojoker · 2016-11-28T11:44:47Z

I would expect this to be a caching problem again, so yes, we probably need to put sudo nscd -i passwd in those files too.

opi · 2016-11-28T14:22:03Z

Note that this "nscd" trick is already there for user_create and user_delete method. Source: https://github.com/YunoHost/yunohost/blob/unstable/src/yunohost/user.py#L191

alexAubin · 2016-12-03T13:13:52Z

We should also push this, it's also a critical issue (like #191). I'll test it as soon as I have some times. We need to decide if we do a nscd -i passwd somewhere.

julienmalik · 2016-12-03T16:35:58Z

@alexAubin @Psycojoker not sure to follow this one. You added the small decision label but you are thinking about changing it to the nscd -i passwd trick.

You both are more aware of this stuff because you debugged it, so I'm following you.
From what I understood, this nscd cache flush does not pose problems (but instead solves some), so why not putting it anywhere we play inside the ldap around the users/admin entries ?
I would not hesitate too much. Since it seems quite hard to test/reproduce, I would be ok to ship the regenconf helpers in next release with some more well placed nscd -i passwd and see if we still get this bug in the user reports, or if it killed it.

Psycojoker · 2016-12-03T17:23:32Z

@julienmalik well, I'm personally not fully convinced that this patch is still needed but @alexAubin think so. I haven't took the time to fully think about it so I'm trusting him on this one.

I would too be in favor of going "nscd -i password ALL THE THINGS", I just haven't took the time to do so.

alexAubin · 2016-12-05T04:22:41Z

TL;DR : fix works ! But we should address issue #656 which is probably the root cause.

So, I've been able to reproduce and pinpoint the issue observed in the log. It's been a long journey 😄 and learned some stuff, so here's what I done if that's of any interest.

What I did is to use the prefix the post-install command with nice, a tool to put higher (or lower) priority on CPU for some commands. I actually launched other high-cpu processes just to keep the CPU really busy :

sudo apt-get install mathomatic-primes --yes
nice -n -5 matho-primes 0 9999999999 > /dev/null &
nice -n -5 matho-primes 0 9999999999 > /dev/null &
nice -n -5 matho-primes 0 9999999999 > /dev/null &
nice -n -5 matho-primes 0 9999999999 > /dev/null &
nice -n -17 yunohost tools postinstall -d yunohostdev123.netlib.re -p yunohost --ignore-dyndns --debug

That way, the post-install would go super-fast while other process (and in particular the ldap restart) would go slower - which I expected to simulate "slow hardware", though still not sure it really does. I encountered the following message in the lines after the slapd force-reload in 06-slapd (using this PR's branch) :

61585 INFO + sudo su admin -c ''
61592 WARNING sudo: ldap_sasl_bind_s(): Can't contact LDAP server

So it looks like it cannot contact the LDAP server (i.e. still 'rebooting' ?). On the next try (I put a delay of 0.01 s), sudo was working fine. Not really what I wanted to obtain : in the logs of the actual issue, admin was reported to be unknown several times. How did that happened ? I played around a bit more, in particular trying to invalidate nscd's cache with the famous nscd -i passwd but couldn't get admin to be reported as unknown...

Then I wondered, what if nscd wasn't started at all ? I added a service nscd stop before launching the postinstall, and here it goes ! Everything went fine, up to the post_regen_conf where every hook was crashing with unknown user : admin ! You don't even need a slow hardware for this to happen.

So my best guess is that it's related to issue #656 : nscd isn't in Yunohost's dependencies. On most debian setups, we got lucky nscd is there somehow (it's only in the "Recommends" of nslcd, as found by @opi) - but maybe on some particular hardware or image, nscd isn't there by default.

Good news is : the currently proposed fix properly work around this ! (It displays a funny WARNING sudo: unknown uid 1007: who are you? (because 06-slapd itself is actually running as admin - which is unknown lol !) the first time, then realize admin truly exists because slapd is fully up I guess). But we could properly avoid triggering this situation by making sure nscd is installed, and running.

Psycojoker · 2016-12-05T18:42:56Z

But we could properly avoid triggering this situation by making sure nscd is installed, and running.

This makes me wonder if we shouldn't do precheck before running certain things that everything is running as expected (like a serie of assert in some programming language like effel). For example we should check that ldap/nscd/nslcd etc... are running before starting a hook_exec.

Thanks a lot for the tests.

alexAubin · 2016-12-05T18:47:34Z

This makes me wonder if we shouldn't do precheck before running certain things that everything is running as expected (like a serie of assert in some programming language like effel). For example we should check that ldap/nscd/nslcd etc... are running before starting a hook_exec.

Agreed, I was thinking about this and would be really in favor of doing this. Maybe not for every command, but at least for the postinstall which is a quite critical part. We could open a dedicated ticket on Redmine.

Psycojoker · 2016-12-11T11:33:46Z

We need another opinion on this one.

opi

( untested, but trusted )

#148, #172, #189

[fix] wait for admin user to be available after a slapd regen-conf, t…

9a66a00

…his fix install on slow hardware/vps

Psycojoker force-pushed the fix_slapd_regenconf_on_slowhardware branch from 8d698f7 to 9a66a00 Compare September 4, 2016 07:57

alexAubin mentioned this pull request Nov 28, 2016

[fix] Fix ldap caching on postinstall, which might cause 'Unknown admin user' #191

Merged

Psycojoker added the small decision label Dec 2, 2016

alexAubin added the opinion needed label Dec 3, 2016

alexAubin approved these changes Dec 5, 2016

View reviewed changes

alexAubin added the important label Dec 7, 2016

alexAubin mentioned this pull request Dec 7, 2016

[fix] Add missing dependency to nscd package #656 #203

Merged

M5oul added this to the 2.5.x milestone Dec 11, 2016

opi approved these changes Dec 11, 2016

View reviewed changes

alexAubin merged commit 47ac089 into unstable Dec 11, 2016

alexAubin deleted the fix_slapd_regenconf_on_slowhardware branch December 11, 2016 23:21

M5oul pushed a commit that referenced this pull request Nov 18, 2017

switched to grabbing the agreement url from /directory, addresses #145,

c4940d2

#148, #172, #189

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attemp to fix broken installation on slow hardware / VPS #172

Attemp to fix broken installation on slow hardware / VPS #172

Psycojoker commented Aug 8, 2016 •

edited by M5oul

endorama commented Aug 27, 2016

Psycojoker commented Aug 27, 2016 •

edited

a1ex4 commented Aug 27, 2016

endorama commented Aug 27, 2016 via email

Psycojoker commented Aug 27, 2016

endorama commented Aug 27, 2016

Psycojoker commented Sep 4, 2016

a1ex4 commented Sep 4, 2016 •

edited by M5oul

Psycojoker commented Sep 7, 2016

endorama commented Sep 7, 2016

alexAubin commented Nov 25, 2016

alexAubin commented Nov 25, 2016

Psycojoker commented Nov 25, 2016

alexAubin commented Nov 25, 2016

Psycojoker commented Nov 28, 2016

alexAubin commented Nov 28, 2016

Psycojoker commented Nov 28, 2016

opi commented Nov 28, 2016

alexAubin commented Dec 3, 2016

julienmalik commented Dec 3, 2016

Psycojoker commented Dec 3, 2016

alexAubin commented Dec 5, 2016 •

edited

Psycojoker commented Dec 5, 2016

alexAubin commented Dec 5, 2016

Psycojoker commented Dec 11, 2016

opi left a comment

Attemp to fix broken installation on slow hardware / VPS #172

Attemp to fix broken installation on slow hardware / VPS #172

Conversation

Psycojoker commented Aug 8, 2016 • edited by M5oul

endorama commented Aug 27, 2016

Psycojoker commented Aug 27, 2016 • edited

a1ex4 commented Aug 27, 2016

endorama commented Aug 27, 2016 via email

Psycojoker commented Aug 27, 2016

endorama commented Aug 27, 2016

Psycojoker commented Sep 4, 2016

a1ex4 commented Sep 4, 2016 • edited by M5oul

Psycojoker commented Sep 7, 2016

endorama commented Sep 7, 2016

alexAubin commented Nov 25, 2016

alexAubin commented Nov 25, 2016

Psycojoker commented Nov 25, 2016

alexAubin commented Nov 25, 2016

Psycojoker commented Nov 28, 2016

alexAubin commented Nov 28, 2016

Psycojoker commented Nov 28, 2016

opi commented Nov 28, 2016

alexAubin commented Dec 3, 2016

julienmalik commented Dec 3, 2016

Psycojoker commented Dec 3, 2016

alexAubin commented Dec 5, 2016 • edited

Psycojoker commented Dec 5, 2016

alexAubin commented Dec 5, 2016

Psycojoker commented Dec 11, 2016

opi left a comment

Choose a reason for hiding this comment

Psycojoker commented Aug 8, 2016 •

edited by M5oul

Psycojoker commented Aug 27, 2016 •

edited

a1ex4 commented Sep 4, 2016 •

edited by M5oul

alexAubin commented Dec 5, 2016 •

edited