-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SKLL logging #380
Improve SKLL logging #380
Conversation
- Default logging level is now INFO instead of WARNING. - Verbose now means that DEBUG messages are included in the logging output. - Pass through the log level to `run_configuration()`.
- Add a new function that creates a logger that includes a file handler as well as console logging.
- Create a new experiment-level log file named `<experiment_name>.log` that will contain all of the configuration and experiment level logging messages. - The job level log file that is already created now contains all of the info messages and warnings that were only printed out before.
- Use an exception instead since we don't want to pass loggers at this level.
@dan-blanchard I tagged you on this review because you are probably more of an expert at logging than I am so I really would like your input. |
doc/run_experiment.rst
Outdated
in the configuration file), SKLL also produces a single, top level "experiment" | ||
log file with only ``EXPERIMENT`` as the prefix. While the job-level log files | ||
contain messages that pertain to the specific characteristics of the job, the | ||
experiment-level log file will contains logging messages that pertain to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contains --> contain
doc/run_experiment.rst
Outdated
|
||
TIMESTAMP - LEVEL - MSG | ||
|
||
where ``TIMESTAMP`` refers to the exact time when the messages was logged, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
messages -> message
skll/data/readers.py
Outdated
@@ -695,19 +696,24 @@ def safe_float(text, replace_dict=None): | |||
floats. Anything not in the mapping will be kept the | |||
same. | |||
:type replace_dict: dict from str to str | |||
:param text: The Logger instance to use to log messages. Used instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation here and below should have logger
instead of text
as the parameter name.
skll/utilities/run_experiment.py
Outdated
created and run on a DRMAA-compatible cluster.", | ||
formatter_class=argparse.ArgumentDefaultsHelpFormatter, | ||
conflict_handler='resolve') | ||
parser = argparse.ArgumentParser(description='Runs the scikit-learn ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: To save on space, it might be slightly better to just import ArgumentParser
/ArgumentDefaultsHelpFormatter
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I ran run_experiment
with and without gridmap using a config that had multiple learners and multiple objectives and the logging looks good to me. It's convenient to have the gridmap job output in the logs directory, even if it is a bit noisy.
@@ -143,7 +144,7 @@ def _write_summary_file(result_json_paths, output_file, ablation=0): | |||
learner_result_dicts = [] | |||
# Map from feature set names to all features in them | |||
all_features = defaultdict(set) | |||
logger = logging.getLogger(__name__) | |||
logger = get_skll_logger('experiment') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this (hard coding the name of the logger to be experiment
) cause any conflicts if multiple experiments are being called on the same machine at the same time? Or would those always be in separate python processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Logging Cookbook:
Multiple calls to logging.getLogger('someLogger') return a reference to the same logger object. This is true not only within the same module, but also across modules as long as it is in the same Python interpreter process.
So, I don't think there should be any conflicts as long as you are using multiple run_experiment
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The goal of this PR is to improve how logging works in SKLL. The current situation is that many, if not all, of the warning messages are logged to the console but never stored in any of the log files. This is not great since a lot of people may not pay attention to the console messages.
In an ideal situation, the solution would be quite simple. The way that
logging
works is thatlogging.getLogger(<name>)
always returns the same logger instance no matter where that call is made as long as the call is within the same Python interpreter process. However, the problem is that SKLL jobs can also run on other machines in completely different Python processes.The solution in this PR works as follows:
I create a function called
get_skll_logger()
that returns aLogger
instance that can log messages both to thesys.stderr
as well as to a specified file. Repeated invocations ofget_skll_logger()
with the same name (and the same file path) will return the sameLogger
instance.get_skll_logger()
is first called fromconfig.py
to create an experiment-level logger where messages that are relevant to configuration parsing are logged. It is called again fromexperiments.py
at the top level to log messages (to the sameLogger
instance) about the whole experiment (not the individual jobs). This logger writes to the console as well as to a file named<experiment_name>.log
that is automatically created in thelog
directory.get_skll_logger()
is called a third time from inside_classify_featureset()
. This is a secondLogger
instance that is at the individual job level (where job = featureset + learner + objective). All of the messages fromlearner.py
,readers.py
, andwriters.py
are logged by this logger. This logger writes to the console as well as to the files named<job_name>.log
that SKLL already creates in thelog
directory.So, at the end of the experiment, there is (a)
<experiment_name>.log
containing messages pertaining to configuration parsing as well as the top-level experiment, and (b) multiple<job_name>.log
files containing messages specific to each job.Notes:
To get
Learner
,Reader
, andWriter
to log to the same job log file, I had to modify them to accept loggers as keyword arguments.Although the console output shows the logger names (
"experiment"
for the main logger and<job_name>
for the job logger(s)), the log files themselves just show the time stamps, the logging level, and the actual message. This is because the names of the files themselves tell you what is what.I also had to pass in logger arguments to some top-level functions for things to work perfectly and make a one of the static methods be a regular method instead.
I changed the default logging level for SKLL to be
INFO
rather than warnings since the INFO messages are actually quite useful. The-v
flag forrun_experiment
changes the logging level to include DEBUG messages as well.I ran several tests both on and off the grid and the log files are created properly and with the right messages. However, I encourage you to run some tests (e.g., the SKLL examples) yourself.
I added a couple of tests that look at the new log files.