Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to upgrade #65

Closed
isAAAc opened this issue Apr 1, 2019 · 21 comments
Closed

Failed to upgrade #65

isAAAc opened this issue Apr 1, 2019 · 21 comments

Comments

@isAAAc
Copy link

isAAAc commented Apr 1, 2019

Hi, etherpad_mypads_ynh is in fail when trying to upgrade,
there is two logs :
https://paste.yunohost.org/raw/rucoqajeji
https://paste.yunohost.org/raw/dihilivuxa

i don't understand what's happen,

Feel free to ask for more details,
thx for your help

@kay0u
Copy link
Member

kay0u commented Apr 1, 2019

Hi, thank you to open this one!

I think the problem comes from:

systemctl reload fail2ban

Found in logs:

2019-04-01 10:10:22,956: WARNING - Job for fail2ban.service failed.
2019-04-01 10:10:22,956: DEBUG - + local exit_code=1
2019-04-01 10:10:22,956: WARNING - See "systemctl status fail2ban.service" and "journalctl -xe" for details.

Can you give us the result of this command please?

sudo journalctl -u fail2ban -n10

In any case, I think a quick fix on our side would be to replace this:
systemctl reload fail2ban
by this:
systemctl reload-or-restart fail2ban

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

Hi @kay0u ;)
here are lthe results of requested commands :

journalctl -u fail2ban -n10

c# journalctl -u fail2ban -n10
-- Logs begin at Mon 2019-04-01 10:23:55 CEST, end at Mon 2019-04-01 11:36:55 CEST. --
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: State 'stop-sigterm' timed out. Killing.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Killing process 13174 (fail2ban-server) with signal SIGKILL.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Main process exited, code=killed, status=9/KILL
avril 01 10:41:40 krashboyz systemd[1]: Stopped Fail2Ban Service.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Unit entered failed state.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Failed with result 'timeout'.
avril 01 10:41:40 krashboyz systemd[1]: Starting Fail2Ban Service...
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,993 fail2ban.server         [24551]: INFO    Starting Fail2ban v0.9.6
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,995 fail2ban.server         [24551]: INFO    Starting in daemon mode
avril 01 10:41:42 krashboyz systemd[1]: Started Fail2Ban Service.

the app rolled-back on the previous version and is availlable,

where do you think i should replace systemctl reload fail2ban by systemctl reload-or-restart fail2ban ?

the Failed with result 'timeout'. comes perhaps if my banned list is to large ?

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

for information , fail2ban is running without any action on my side :

# service fail2ban status
● fail2ban.service - Fail2Ban Service
   Loaded: loaded (/lib/systemd/system/fail2ban.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-04-01 10:41:42 CEST; 1h 1min ago
     Docs: man:fail2ban(1)
  Process: 21513 ExecStop=/usr/bin/fail2ban-client stop (code=exited, status=255)
  Process: 24549 ExecStart=/usr/bin/fail2ban-client -x start (code=exited, status=0/SUCCESS)
 Main PID: 24553 (fail2ban-server)
    Tasks: 27 (limit: 4915)
   Memory: 100.5M
      CPU: 19min 9.426s
   CGroup: /system.slice/fail2ban.service
           └─24553 /usr/bin/python3 /usr/bin/fail2ban-server -s /var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ban.pid -x -b

avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Main process exited, code=killed, status=9/KILL
avril 01 10:41:40 krashboyz systemd[1]: Stopped Fail2Ban Service.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Unit entered failed state.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Failed with result 'timeout'.
avril 01 10:41:40 krashboyz systemd[1]: Starting Fail2Ban Service...
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,993 fail2ban.server         [24551]: INFO    Starting Fail2ban v0.9.6
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,995 fail2ban.server         [24551]: INFO    Starting in daemon mode
avril 01 10:41:42 krashboyz systemd[1]: Started Fail2Ban Service.

@kay0u
Copy link
Member

kay0u commented Apr 1, 2019

where do you think i should replace systemctl reload fail2ban by systemctl reload-or-restart fail2ban ?

Nowhere, it was on our side :-), at this place:

systemctl reload fail2ban

the Failed with result 'timeout'. comes perhaps if my banned list is to large ?

I don't know, but if it is, we should handle this case anyway

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

yep our != your, it's not my day ^^

@maniackcrudelis
Copy link
Contributor

Nowhere, it was on our side :-), at this place:

https://github.com/YunoHost/yunohost/blob/stretch-testing/data/helpers.d/backend#L421

Already fixed on the incoming testing.
And even more globally for ynh_systemd_action, https://github.com/YunoHost/yunohost/blob/stretch-testing/data/helpers.d/system#L112-L113

Anyway, I don't think the problem was about the reload itself, but more probably because of this timeout that has killed the service.

@maniackcrudelis
Copy link
Contributor

Also, this is not the log of the crash !
@isAAAc your log state that the crash happened at 10:31 this morning

2019-04-01 10:31:33,187: WARNING - Job for fail2ban.service failed.

Fail2ban's log is about a crash at 10:41

avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: State 'stop-sigterm' timed out. Killing.

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

humm,
@maniackcrudelis , do you want i reproduce the whole upgrade trouble ?
yunohost tools update && yunohost tools upgrade && yunohost app upgrade && service fail2ban status ?

@maniackcrudelis
Copy link
Contributor

No, just remove -n10 in journalctl -u fail2ban -n10 and scroll until 10:31 this morning.
Maybe that's going to be the same error, but that's would be interesting to know what happened exactly at this time.

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

damn :/

root@krashboyz:/home/isaaac# journalctl -u fail2ban
-- No entries --

/var/log/fail2ban.log (extract) is here : https://krashboyz.org/zerobin/?7870be00a8e608ec#HpNFrfGQCtBDOE6TPxiAbjZh8mvZYy8U9eT6Z1h59fM=

i think fail2ban was reloading and fetching banned ip ,
many IPs so to long time for the "wait" before stating the time out status (?)

@maniackcrudelis
Copy link
Contributor

!!!
I guess your ssh port is still 22, maybe you should change it to prevent so much bots being banned by your fail2ban.
Anyway, your fail2ban was indeed quite busy, but I see also errors, not related to etherpad I think.

First thing would be to retry to update, if it was just because of fail2ban being busy, it could work this time.
At the same time, you could tail -f fail2ban log in another terminal, so you'll see if something happen.

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

when Info: [################....] > Reconfigure fail2ban during the upgrade,
the tail -f /var/log/fail2ban.log is stil indicating the unban action :

2019-04-01 13:31:29,895 fail2ban.actions        [24553]: NOTICE  [sshd] Unban 104.236.78.228
2019-04-01 13:31:31,646 fail2ban.actions        [24553]: NOTICE  [sshd] Unban 104.237.230.211
2019-04-01 13:31:32,944 fail2ban.actions        [24553]: NOTICE  [sshd] Unban 104.238.92.100
2019-04-01 13:31:34,124 fail2ban.actions        [24553]: NOTICE  [sshd] Unban 104.239.173.150
2019-04-01 13:31:34,905 fail2ban.actions        [24553]: NOTICE  [sshd] Unban 104.248.11.46

perhaps we should flush all the banned ip before restarting the fail2ban / before the whole upgrade ?

@maniackcrudelis
Copy link
Contributor

Probably the unban of all IP is a internal process of fail2ban before stopping.
Did it failed the upgrade ?

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

Did it failed the upgrade ?

yep

@maniackcrudelis
Copy link
Contributor

Could you provide the full log, captured with tail -f ?

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

yes sure,
it is still running, i'll send it asap (going to eat for now)

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

the full log of the tail -f /var/log/fail2ban.org : https://krashboyz.org/zerobin/?ee4f8ff93bc7a403#MLFPGGEw39+N1A0H5Ksp8Ine/xc9tZl4V4fx7y/E+qY=

the output of the cli upgrade : https://krashboyz.org/zerobin/?51d32205b1964b61#EK2BvpnSYSdVmhn6eSffosL6c9RXXHEXL5lyQh4mNZ4=

the log of the first fail during this operation :
https://paste.yunohost.org/raw/egocohuven

the second one :
https://paste.yunohost.org/raw/oyegovirap

the service fail2ban status (right now)

root@krashboyz:/home/isaaac# service fail2ban status
● fail2ban.service - Fail2Ban Service
   Loaded: loaded (/lib/systemd/system/fail2ban.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-04-01 14:01:59 CEST; 23min ago
     Docs: man:fail2ban(1)
  Process: 4247 ExecStop=/usr/bin/fail2ban-client stop (code=exited, status=255)
  Process: 7543 ExecStart=/usr/bin/fail2ban-client -x start (code=exited, status=0/SUCCESS)
 Main PID: 7550 (fail2ban-server)
    Tasks: 27 (limit: 4915)
   Memory: 81.9M
      CPU: 9min 18.936s
   CGroup: /system.slice/fail2ban.service
           └─7550 /usr/bin/python3 /usr/bin/fail2ban-server -s /var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ban.pid -x -b

avril 01 14:01:57 krashboyz systemd[1]: fail2ban.service: Main process exited, code=killed, status=9/KILL
avril 01 14:01:57 krashboyz systemd[1]: Stopped Fail2Ban Service.
avril 01 14:01:57 krashboyz systemd[1]: fail2ban.service: Unit entered failed state.
avril 01 14:01:57 krashboyz systemd[1]: fail2ban.service: Failed with result 'timeout'.
avril 01 14:01:57 krashboyz systemd[1]: Starting Fail2Ban Service...
avril 01 14:01:58 krashboyz fail2ban-client[7543]: 2019-04-01 14:01:58,077 fail2ban.server         [7548]: INFO    Starting Fail2ban v0.9.6
avril 01 14:01:58 krashboyz fail2ban-client[7543]: 2019-04-01 14:01:58,078 fail2ban.server         [7548]: INFO    Starting in daemon mode
avril 01 14:01:59 krashboyz systemd[1]: Started Fail2Ban Service.

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

do you want i remove all banned ip by fail2ban, relaunch fail2ban and start the upgrade again ?

@maniackcrudelis
Copy link
Contributor

Yes please try that.
I suspect that the reload is too long to execute for you. It took 4s only to unban before reloading.

Maybe for this specific service, in this helper, we should stop , then start to be sure it have all the time it needs.

@isAAAc
Copy link
Author

isAAAc commented Apr 1, 2019

ok, it worked,

# service fail2ban stop
# cd /var/lib/fail2ban
# sqlite3 fail2ban.sqlite3
sqlite> DELETE FROM bans ;
sqlite> .quit
# service fail2ban start

then i used : yunohost tools update && yunohost tools upgrade && yunohost app upgrade

upgrade is OK

perhaps we should flush the fail2ban.sqlite3 as first instruction during the upgrade process ?

@maniackcrudelis
Copy link
Contributor

I'd rather prefer to let time to fail2ban to do its job. An app upgrade shouldn't remove banned IP from fail2ban.

Anyway, thanks or this bug, we now know that it could happen with fail2ban.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants