Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmp.mount restart handler will fail if /tmp is in use #46

Closed
erpadmin opened this issue Oct 5, 2017 · 16 comments
Closed

tmp.mount restart handler will fail if /tmp is in use #46

erpadmin opened this issue Oct 5, 2017 · 16 comments
Labels

Comments

@erpadmin
Copy link
Contributor

erpadmin commented Oct 5, 2017

Need to determine how to approach in use /tmp mount. I don't think we should hard fail the playbook for this condition, but certainly notify.

RUNNING HANDLER [RHEL7-CIS : systemd restart tmp.mount] *************************************************************************************
fatal: [someserver.somewhere.com]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to reload service tmp.mount: Job for tmp.mount failed. See \"systemctl status tmp.mount\" and \"journalctl -xe\" for details.\n"}
# systemctl status tmp.mount
● tmp.mount - Temporary Directory
   Loaded: loaded (/etc/systemd/system/tmp.mount; enabled; vendor preset: disabled)
   Active: active (mounted) (Result: exit-code) since Thu 2017-10-05 09:34:33 EDT; 7s ago
    Where: /tmp
     What: /dev/mapper/rootvg-tmplv
     Docs: man:hier(7)
           http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
  Process: 4911 ExecRemount=/bin/mount tmpfs /tmp -o remount,mode=1777,strictatime,noexec,nodev,nosuid -t tmpfs (code=exited, status=32)
  Process: 4927 ExecUnmount=/bin/umount /tmp (code=exited, status=32)

Oct 05 09:33:05 someserver.somewhere.com mount[4911]: mount: /tmp not mounted or bad option
Oct 05 09:33:05 someserver.somewhere.com mount[4911]: In some cases useful info is found in syslog - try
Oct 05 09:33:05 someserver.somewhere.com mount[4911]: dmesg | tail or so.
Oct 05 09:33:05 someserver.somewhere.com systemd[1]: tmp.mount mount process exited, code=exited status=32
Oct 05 09:33:05 someserver.somewhere.com systemd[1]: Reload failed for Temporary Directory.
Oct 05 09:34:33 someserver.somewhere.com systemd[1]: Unmounting Temporary Directory...
Oct 05 09:34:33 someserver.somewhere.com umount[4927]: umount: /tmp: target is busy.
Oct 05 09:34:33 someserver.somewhere.com umount[4927]: (In some cases useful info about processes that use
Oct 05 09:34:33 someserver.somewhere.com umount[4927]: the device is found by lsof(8) or fuser(1))
Oct 05 09:34:33 someserver.somewhere.com systemd[1]: tmp.mount mount process exited, code=exited status=32
[root@someserver service]# lsof /tmp
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
firewalld  975 root  DEL    REG  253,6           136 /tmp/ffikXVvEQ
firewalld  975 root    8u   REG  253,6     4096  136 /tmp/ffikXVvEQ (deleted)
tuned     1281 root  DEL    REG  253,6           137 /tmp/ffiau2RGq
tuned     1281 root    7u   REG  253,6     4096  137 /tmp/ffiau2RGq (deleted)
[root@someserver service]# systemctl stop firewalld
[root@someserver service]# systemctl stop tuned
[root@someserver service]# systemctl restart tmp.mount
[root@someserver service]# systemctl start firewalld
[root@someserver service]# systemctl start tuned
@shepdelacreme
Copy link
Contributor

I've been thinking about this. We could put a meta: flush_handlers after this task

- name: "SCORED | 1.1.3 | PATCH | Ensure nodev option set on /tmp partition\n
         SCORED | 1.1.4 | PATCH | Ensure nosuid option set on /tmp partition\n
         SCORED | 1.1.5 | PATCH | Ensure noexec option set on /tmp partition\n
         | drop custom tmp.mount"
  copy:
      src: etc/systemd/system/tmp.mount
      dest: /etc/systemd/system/tmp.mount
      owner: root
      group: root
      mode: 0644
  notify: systemd restart tmp.mount
  tags:
      - level1
      - scored
      - patch
      - rule_1.1.3
      - rule_1.1.4
      - rule_1.1.5

or we could just put an ignore_errors: yes on the systemd restart tmp.mount handler.

Flushing the handler early will probably catch most instances of this but it isn't guaranteed. Ignoring failure is not a great option either but I'm not sure what the other options are without jumping through a bunch of hoops to find what is using /tmp and then stop those processes/etc.

@bbaassssiiee
Copy link
Member

Please make /tmp changing operations optional.

@shepdelacreme
Copy link
Contributor

Making it optional would work but not solve the issue with systemd reload. Optional execution is an open issue #26

@samdoran
Copy link

samdoran commented Nov 7, 2017

Another option may be to just changing the remote_tmp directory to /var/tmp and avoid working out of /tmp.

@erpadmin
Copy link
Contributor Author

erpadmin commented Nov 8, 2017

register lsof count, separate the notify portion into next rule and notify when count < 1 ?
perhaps also register if change was even made to /tmp as an extra condition?

@erpadmin
Copy link
Contributor Author

erpadmin commented Mar 8, 2018

what about
add this "put an ignore_errors: yes on the systemd restart tmp.mount handler"
and then generate a task with warning msg when lsof count > 1 on /tmp
?

@marcjay
Copy link

marcjay commented Aug 21, 2018

What's the preferred solution for this? Ran into this issue and seems like an easy fix if there's agreement

@sambanks
Copy link
Contributor

We change remote_tmp, that's my preference

@sambanks
Copy link
Contributor

We use the pwd of the provisioning user for ansible and pip and docker etc, so it can be cleaned up by removing the user at the end of provisioning

@marcjay
Copy link

marcjay commented Aug 21, 2018

Thanks @sambanks, I tried that ansible.cfg change but unfortunately that didn't work for me. It doesn't seem to be temporary Ansible files that are keeping /tmp busy for me (lsof /tmp returns nothing). I'm going with the ignore_errors: yes approach in a private fork

@sambanks
Copy link
Contributor

sambanks commented Aug 22, 2018 via email

@jmcshane
Copy link

@sambanks that ansible.cfg change didn't work for us either. It seems like we have some files from firewalld and tuned in there. Was there something you did to move those tmp files to a different location?

@sambanks
Copy link
Contributor

@jmcshane we had to track down the processes one by one. Does lsof | grep tmp show anything before you run ansible, or are those files created during the ansible run?

If it's only during the ansible run, then it's probably python writing to tmp. To work around this you need to set the TMPDIR environment variable before calling ansible.

@erpadmin
Copy link
Contributor Author

will you accept PR to simply remove line 'notify: systemd restart tmp.mount' ?

above people are discussing a use case where ansible is contributing however my experience as depicted in the OP is that files are in use before ansible runs.

my current view is a detached reboot is needed to ensure all changes are completely in effect and also the boot process needs to be tested before it becomes a surprise issue anyway. with that stated maybe we could avoid the issue altogether? an alternative could be variable switch for notify remounts w/default True to maintain current functionality?

austindimmer referenced this issue in austindimmer/Ubuntu1804-CIS Mar 12, 2019
Observing the following  build failure when running in AWS 
AWS AMI Builder - CIS: ·[0;31mfatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": "Error loading unit file 'tmp.mount': org.freedesktop.DBus.Error.InconsistentMessage \"Bad message\""}·[0m

Potential solutions found on 
https://github.com/MindPointGroup/RHEL7-CIS/issues/46
@bbaassssiiee
Copy link
Member

I notice this in the logs mount[27619]: mount: /tmp not mounted or bad option.
/etc/fstab has default in the options, should be removed.

@georgenalen
Copy link
Contributor

Hello,
I wanted to reach out and let you know that this issue is being closed. We have re-worked the role and want to start with a fresh issues list with this latest version. There was a post in the Ansible-Lockdown google group (https://groups.google.com/g/ansible-lockdown) with the details of the changes that are coming. Please checkout the thread titled RHEL 7 CIS and STIG Changes for all of the details, I also have the message pasted at below.
Please as you use the latest version and open issue tickets as you find them, it is the best way for us to improve the role for everyone. Thank you for being part of the community and providing awareness of problems or advice on improvement. Reporting is a huge part of improving this project.


Hello,
Thank you to everyone in the Ansible-Lockdown community who has contributed to RHEL7 STIG/CIS. Our team at MindPoint Group has been working with the entirety of the Ansible-Lockdown project, and we have some significant updates for both RHEL 7 STIG and CIS. With these updates, some larger changes have been made. I have these changes/updates outlined below.
Testing:

  1. CI/CD - We have implemented some automated testing pipelines to test pull requests into the devel and main branches. With the current workflow, the community will PR into the devel branch (never the main branch) for review by the administrators. When your PR is created, the first check will remain the DCO check. The second check is a functional testing pipeline that will automatically perform a functional test of the branch the PR is initiated from. Once both tests pass, someone from the Administrator Team will review the changes and merge them into the devel branch. From there, an additional review is completed before the devel branch is merged into the main branch. Only the Admin Team will perform PRs/merges into the main branch. There is also an automated pipeline for PRs from devel to main. Please do not edit the .github/workflows files since those are used as part of the pipeline.
  2. Compliance Checking – MindPoint Group has been working to create our own compliance audit scan tool. The tool uses a goss framework executable to run custom checks that we have created. The goal is to provide a more thorough check for control compliance and decrease the number of false positives/negatives. For example, it will check the configuration file related to the control and as well as checking if that configuration is active. With a smarter scan, we can hopefully identify attempts to trick scanners as well (for example stacking a parameter in a config file where the first instance is enabled and second disabled. Most audit tools search for the first instance but the application might look for the last instance of the parameter, thus making the scanning tool think it's enabled). In testing, we have found that our audit scan runs significantly faster than other audit tools, reducing audit times. Our audit tool and profiles will have their own repositories in the Ansible-Lockdown org, but within the remediation role there will be an integrated way to incorporate the audit. Keep an eye out for the audit tools as they are released. We plan on developing a goss audit profile for each current remediation role. Going forward, we plan to release a remediation role and goss audit tool profile simultaneously.
    Role Updates:
  3. RHEL 7 STIG/CIS – We have re-written much of the RHEL 7 STIG and CIS roles to increase clarity and readability and address some functionality items. We performed these updates while creating our goss testing framework for each of these roles. We plan on pushing our update to the devel and main branches. We will move the current devel and master branches to a devel_stable_ and master_stable_ branch in the respective repositories. Accordingly, community members who rely on the current version can still use that version going forward; this process will not remove what is currently there. The latest versions of the roles have also been updated to comply with the latest benchmarks.
  4. Role Architecture – All roles will change with regard to the structure in the tasks folder. Taking CIS as an example, there will be a folder per section and yaml files for each sub-section. For example control 1.2.1 in CIS will be located in RHEL7-CIS/tasks/section_1/cis_1.2.x.yml. The cis_1.2.x.yml file will contain all controls related to section 1.2.x. This will hopefully make updates to roles a bit easier with less risk. This matches the architecture of our audit tool, creating consistency across remediation and audit platforms. The end goal is to repeat this architecture (the best we can) on STIG roles, but we are starting with CIS.
  5. Existing PRs and Issues – With all of these changes comes the task of cleaning up existing PR’s and issues. Our plan is to close all of the existing PR’s and issues because of the re-work. Our team is growing and should be able to stay on top of the new issues and PRs as they come in.
    Again, I would like to thank the community for your involvement in this project. The input and work from the community has contributed significantly to the success of this project. Please keep an eye out for these changes, which will be rolling out in the coming weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants