Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process: add possiblity to hide specific output to call function #92

Merged
merged 9 commits into from
Nov 27, 2023

Conversation

jasugun
Copy link
Contributor

@jasugun jasugun commented Nov 9, 2023

Handy when we need to hide stuff in CI for example.

nimp/sys/process.py Outdated Show resolved Hide resolved
nimp/sys/process.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ftith ftith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR works but could be interesting to use SensitiveDataFilter(logging.Filter) instead

@jasugun
Copy link
Contributor Author

jasugun commented Nov 10, 2023

The PR works but could be interesting to use SensitiveDataFilter(logging.Filter) instead

The thing is that the captured output (std threads) does not go through our logger. It's a huge change that I'd rather avoid for now. What do you think?

@tdesveaux
Copy link
Member

The PR works but could be interesting to use SensitiveDataFilter(logging.Filter) instead

The thing is that the captured output (std threads) does not go through our logger. It's a huge change that I'd rather avoid for now. What do you think?

Does it not? https://github.com/dontnod/nimp/pull/92/files#diff-5a44de719039a4548dedb99a3a5d387b5447e1114b1e2bc3b7e87550ee9bcdfdR143

@ftith
Copy link
Contributor

ftith commented Nov 10, 2023

I confirm that we are already using a logger for the actual command (but not for the dry-run logging) so it's possible if use the same logger for the dry run as well

Here is an example of Filter class:

class SensitiveDataFilter(logging.Filter):
    def __init__(self, sensitive_data):
        self._pattern = re.compile(f'({"|".join(re.escape(el) for el in sensitive_data)})')
        self._hidden_str = '*****'
    def filter(self, record):
        record.args = tuple(self._pattern.sub(self._hidden_str, child_arg) for child_arg in record.args)
        return True

You can then apply it to your logger:

    logger = logging.getLogger('child_processes')
    logger.addFilter(SensitiveDataFilter(hide_output_specific))

Be careful of nested args though, and convert list to str (otherwise the substitution will be broken):
logger.info('%s "%s" in "%s"', '[DRY-RUN]' if dry_run else 'Running', str(command), os.path.abspath(cwd))

@jasugun
Copy link
Contributor Author

jasugun commented Nov 10, 2023

You're right this is going through the logger. And looking at the python logging code it's not too hard to subclass.
Françoise I watched your comment just now, from what I see I might have done something like you.

@@ -109,7 +116,9 @@ def _output_worker(index, decoding_format):
return
force_ascii = locale.getpreferredencoding().lower() != 'utf-8'
while process is not None:
logger = logging.getLogger('child_processes')
logger_child_processes = logging.getLogger('child_processes')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why you need 2 different loggers? they are in the same fct and you apply the same filter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried using one single logger for the whole function instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if is relevant to have all the logs in the same logger as the call_logger actually. I think it wouls be insteresting to keep the distinction between the "child_process" and the call_logger. For instance, the difference between the keep alive and the call to unreal:

[2023.11.13-11.53.58:518][  0]LogGatherTextFromAssetsCommandlet: Display: [  9.62%] Loading package: '/Game/Dialogue/P01/R02/Settler/M07/STG/DLG_R02_Settler_M07_A_MagalieAlive_STG_SCRIPT2'...
2023-11-13 11:53:58,635 [INFO] Keepalive for ..\..\.\UE\Engine\Binaries\Win64\UnrealEditor.exe

def call(command, cwd='.', heartbeat=0, stdin=None, encoding='utf-8',
capture_output=False, capture_debug=False, hide_output=False,
capture_output=False, capture_debug=False, hide_output=False, hide_output_specific=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure about the hide_output_specific naming.
It's not really obvious what it does.

Also, it's not really something that's specific to the process.call function, maybe it should be some helper that can be used in a more global way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a small test and it's a bit annoying though:

import logging
import re


logging.basicConfig(level=logging.INFO)

class SensitiveDataFilter(logging.Filter):
    def __init__(self, *args):
        super().__init__()
        self.pattern = re.compile(rf"({'|'.join(args)})")

    def filter(self, record):
        record.msg = self.pattern.sub("*****", record.msg)
        record.args = tuple(self.pattern.sub("****", a) for a in record.args)
        return super().filter(record)


filter_ = SensitiveDataFilter('toto', 'tata')

logger = logging.getLogger()
logger.addFilter(filter_)

logger.info('logger: toto %s, %s, %s', 'tata', 'tatato', 'tata:toto')
logging.info('logging: toto %s, %s, %s', 'tata', 'tatato', 'tata:toto')

test_logger = logging.getLogger('test')
test_logger.info('test_logger: toto %s, %s, %s', 'tata', 'tatato', 'tata:toto')
INFO:root:logger: ***** ****, ****to, ****:****
INFO:root:logging: ***** ****, ****to, ****:****   
INFO:test:test_logger: toto tata, tatato, tata:toto

new logger do not inherit filter from root logger

Copy link
Contributor Author

@jasugun jasugun Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdesveaux @ftith I added convenience context manager and filters to more flexibly filter nimp logs; could you please take a look at it?

Example use:

from nimp.sys.logging import FilteredLogging
from nimp.sys.logging import SensitiveDataFilter

<...>

with FilteredLogging(SensitiveDataFilter('string_to_hide_1', 'string_to_hide_2')) as filter_logger:
            if nimp.sys.process.call(command, dry_run=env.dry_run) != 0:  # call ouput will hide sensitive info
                    filter_logger.info('This will hide string_to_hide_1')  # this will also hide stuff

nimp/sys/process.py Outdated Show resolved Hide resolved
nimp/sys/process.py Outdated Show resolved Hide resolved
@jasugun jasugun force-pushed the lca/process/call_hide_specific branch from 13ed4ed to 9f0ac0e Compare November 22, 2023 16:28
Also add a conveninec filter for hiding stuff.

Typical use:
with FilteredLogging(SensitiveDataFilter('string_to_hide_1', 'string_to_hide_2')) as filter_logger:
	if nimp.sys.process.call(bpt_command, dry_run=env.dry_run) != 0:
	        # call ouput will hide sensitive info
		filter_logger.info('This will hide string_to_hide_1')
@jasugun
Copy link
Contributor Author

jasugun commented Nov 27, 2023

@tdesveaux @tdesveaux
Could you please take a look a this whole new way to filter our logging please?

""" Custom filter to hide specific information from logging stream """
def __init__(self, *args):
super().__init__()
self.pattern = re.compile(rf"({'|'.join(args)})")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should escape args just in case re.escape(a) for a in args

if isinstance(arg, list):
record_args.append([self.pattern.sub(hide_string, str(a)) for a in arg])
elif isinstance(arg, str):
record_args.append(self.pattern.sub(hide_string, str(arg)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the str(arg) since you already check that it's an instance of str?

@ftith
Copy link
Contributor

ftith commented Nov 27, 2023

I'm starting to losing tracks and missing the way to test correctly the PR. Maybe we can talk about it together tomorrow.
A reminder just in case:

Note It is strongly advised that you do not log to the root logger in your library. Instead, use a logger with a unique and easily identifiable name, such as the name for your library’s top-level package or module. Logging to the root logger will make it difficult or impossible for the application developer to configure the logging verbosity or handlers of your library as they wish.

Source: https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library

and also that summary handler is using logging.getLogger('child_processes')

@jasugun jasugun merged commit 935509a into dev Nov 27, 2023
2 checks passed
@jasugun
Copy link
Contributor Author

jasugun commented Nov 28, 2023

@ftith good catch I'll look into that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants