New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attemp to fix broken installation on slow hardware / VPS #172
Conversation
Hello, thank you for this project. I wanted to try it out installing on a Raspberry Pi model 2 and I'm experiencing this problem. For clarity, I applied this patch to the file @Psycojoker related to testing this, I believe you could make you vagrant "slower" by setting the vm to 1 cpu and reducing the Execution Cap. From the Virtualbox docs: This setting limits the amount of time a host CPU spends to emulate a virtual CPU. The default setting is 100% meaning that there is no limitation. A setting of 50% implies a single virtual CPU can use up to 50% of a single host CPU. Note that limiting the execution time of the virtual CPUs may induce guest timing problems. Here is |
@endorama thanks for having tested this patch, that a valuable input :) I though about slowing down the vagrant box too but never took the time to actually look on how to do that (I'm not really into virtualisation and those stuff). I'll probably switch my test to something like "wait until you are able to log as the admin user" which is the actually cause of failure here. But ... hmm...
This error is weird, I've never saw it before. Don't know what to think about it right now. |
@endorama Could you try this image I made for the RPi 2 ? I used it again yesterday and did not ran into any problem. :) |
Thank you, I'll try it as soon as possible and report here.
|
@likeitneverwentaway hey :) That's a great news! Could you publish somewhere the script/way you've generated those images? We are still looking for someone to join us and handle the RPI images since the people who used to do that aren't present anymore :/ |
I can confirm is working as expected! @likeitneverwentaway thank you. |
…his fix install on slow hardware/vps
@endorama I've pushed a new version that this time try to wait for the admin user to be accessible, are you still able to try it? That would be great! @likeitneverwentaway my request still hold ;) |
8d698f7
to
9a66a00
Compare
@Psycojoker Sorry :) My workflow was as follow:
I also edited and translated the official guide with my additions. Keep in mind that I only have a RPi 2 to play with, and apparently my image does not work flawlessly on the 3, I'm pretty sure this is because of the packages installation before running the main script, things differ here if you have a 3. If I remember correctly, when I'm here the first command updates the metronome package with this version from Jerome, maybe this happens only on the rpi 2 ? I'm pretty sure it all comes down to this package for the other boards. I'll be happy maintaining the raspberry image! Well, at least for the 2, all the feedback I received were good. I think officially publishing this image (maybe an announcement ?) would be good for feedback and the project. For updating the image, when should it be done ? Each new release of raspbian, major release of Yunohost ? |
@likeitneverwentaway thanks a lot for you answer and your work, that looks really cool :) I'm going to talk about that with the other actives people in YunoHost trying to find someone that is better suited for that than me (I'm more into python dev). If you want you can join us on the xmpp chatroom, it's were most of us hang out dev@conference.yunohost.org I would be very happy to see at least a rpi2 image maintained again :) |
@Psycojoker I tried the latest master this evening, on a clean raspbian image, but the same error appeared. Here is the installation log: http://pastebin.com/kZGhc0XF |
Is there any update on this ? This looks like an important issue to fix... I can't find the After investigating the issue on my side, I think we actually need to call the hook right after (or inside) This TODO actually points to this as well 😛 (though the last words are swapped I think). |
If we really want to be paranoid, we can also add a check somewhere after try:
pwd.getpwnam("admin")
except KeyError:
raise MoulinetteError(...) |
Yes, this is an important issue but since I don't have any dev environment to test it I've stopped working on it (that and lack of free time :/). This isn't a part of the code I'm familiar with, do you think you can fix this? |
Yes, I'll have a look ASAP (though I'm not more familar with this part of code than you 😛), but I'll probably be too busy until tomorrow night. |
Thanks to the work on @alexAubin we now have a solution 504baefd87a4 |
Actually, looking carefully at these logs posted on issue 463, the fix you proposed here might really be needed. If you look at the log, you see that in beginning of post-install, user admin really do exists. Otherwise But later, So we really do need this fix too. I don't know if the wait loop is a good solution, but that should do the trick. Maybe it can be improved with also a |
I would expect this to be a caching problem again, so yes, we probably need to put |
Note that this "nscd" trick is already there for |
We should also push this, it's also a critical issue (like #191). I'll test it as soon as I have some times. We need to decide if we do a |
@alexAubin @Psycojoker not sure to follow this one. You added the small decision label but you are thinking about changing it to the You both are more aware of this stuff because you debugged it, so I'm following you. |
@julienmalik well, I'm personally not fully convinced that this patch is still needed but @alexAubin think so. I haven't took the time to fully think about it so I'm trusting him on this one. I would too be in favor of going " |
TL;DR : fix works ! But we should address issue #656 which is probably the root cause. So, I've been able to reproduce and pinpoint the issue observed in the log. It's been a long journey 😄 and learned some stuff, so here's what I done if that's of any interest. What I did is to use the prefix the post-install command with
That way, the post-install would go super-fast while other process (and in particular the ldap restart) would go slower - which I expected to simulate "slow hardware", though still not sure it really does. I encountered the following message in the lines after the
So it looks like it cannot contact the LDAP server (i.e. still 'rebooting' ?). On the next try (I put a delay of 0.01 s), sudo was working fine. Not really what I wanted to obtain : in the logs of the actual issue, admin was reported to be unknown several times. How did that happened ? I played around a bit more, in particular trying to invalidate nscd's cache with the famous Then I wondered, what if So my best guess is that it's related to issue #656 : nscd isn't in Yunohost's dependencies. On most debian setups, we got lucky nscd is there somehow (it's only in the "Recommends" of nslcd, as found by @opi) - but maybe on some particular hardware or image, nscd isn't there by default. Good news is : the currently proposed fix properly work around this ! (It displays a funny |
This makes me wonder if we shouldn't do precheck before running certain things that everything is running as expected (like a serie of assert in some programming language like effel). For example we should check that ldap/nscd/nslcd etc... are running before starting a Thanks a lot for the tests. |
Agreed, I was thinking about this and would be really in favor of doing this. Maybe not for every command, but at least for the postinstall which is a quite critical part. We could open a dedicated ticket on Redmine. |
We need another opinion on this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( untested, but trusted )
Hello,
As reported here or
here, YunoHost post install fails on
slow hardware/vps because slapd is to slow to restart itself after its
regen-conf.
This patch is an attempt to fix this but I don't have a good testing
environment (my vagrant is too fast for that). Maybe testing that it's possible
to run something using the admin user could be a better test but I don't see
how to do it easily.
A workarround would be to use my patch to runs this kind of operation using
root instead of admin but this is a workaround, not a real fix (and this bug
could still generate other problems).
Cheers,