Improve SKLL logging #380

desilinguist · 2017-10-31T19:27:23Z

The goal of this PR is to improve how logging works in SKLL. The current situation is that many, if not all, of the warning messages are logged to the console but never stored in any of the log files. This is not great since a lot of people may not pay attention to the console messages.

In an ideal situation, the solution would be quite simple. The way that logging works is that logging.getLogger(<name>) always returns the same logger instance no matter where that call is made as long as the call is within the same Python interpreter process. However, the problem is that SKLL jobs can also run on other machines in completely different Python processes.

The solution in this PR works as follows:

I create a function called get_skll_logger() that returns a Logger instance that can log messages both to the sys.stderr as well as to a specified file. Repeated invocations of get_skll_logger() with the same name (and the same file path) will return the same Logger instance.
get_skll_logger() is first called from config.py to create an experiment-level logger where messages that are relevant to configuration parsing are logged. It is called again from experiments.py at the top level to log messages (to the same Logger instance) about the whole experiment (not the individual jobs). This logger writes to the console as well as to a file named <experiment_name>.log that is automatically created in the log directory.
get_skll_logger() is called a third time from inside _classify_featureset(). This is a second Logger instance that is at the individual job level (where job = featureset + learner + objective). All of the messages from learner.py, readers.py, and writers.py are logged by this logger. This logger writes to the console as well as to the files named <job_name>.log that SKLL already creates in the log directory.
So, at the end of the experiment, there is (a) <experiment_name>.log containing messages pertaining to configuration parsing as well as the top-level experiment, and (b) multiple <job_name>.log files containing messages specific to each job.

Notes:

To get Learner, Reader, and Writer to log to the same job log file, I had to modify them to accept loggers as keyword arguments.
Although the console output shows the logger names ("experiment" for the main logger and <job_name> for the job logger(s)), the log files themselves just show the time stamps, the logging level, and the actual message. This is because the names of the files themselves tell you what is what.
I also had to pass in logger arguments to some top-level functions for things to work perfectly and make a one of the static methods be a regular method instead.
I changed the default logging level for SKLL to be INFO rather than warnings since the INFO messages are actually quite useful. The -v flag for run_experiment changes the logging level to include DEBUG messages as well.
I ran several tests both on and off the grid and the log files are created properly and with the right messages. However, I encourage you to run some tests (e.g., the SKLL examples) yourself.
I added a couple of tests that look at the new log files.

- Default logging level is now INFO instead of WARNING. - Verbose now means that DEBUG messages are included in the logging output. - Pass through the log level to `run_configuration()`.

- Add a new function that creates a logger that includes a file handler as well as console logging.

- Create a new experiment-level log file named `<experiment_name>.log` that will contain all of the configuration and experiment level logging messages. - The job level log file that is already created now contains all of the info messages and warnings that were only printed out before.

- Use an exception instead since we don't want to pass loggers at this level.

desilinguist · 2017-10-31T19:29:23Z

@dan-blanchard I tagged you on this review because you are probably more of an expert at logging than I am so I really would like your input.

coveralls · 2017-10-31T20:08:17Z

Coverage decreased (-0.003%) to 92.009% when pulling 4f74ad6 on improve-logging into 28a80b1 on master.

mulhod · 2017-10-31T21:15:11Z

doc/run_experiment.rst

+in the configuration file), SKLL also produces a single, top level "experiment"
+log file with only ``EXPERIMENT`` as the prefix. While the job-level log files
+contain messages that pertain to the specific characteristics of the job, the
+experiment-level log file will contains logging messages that pertain to the


contains --> contain

mulhod · 2017-10-31T21:16:14Z

doc/run_experiment.rst

+
+    TIMESTAMP - LEVEL - MSG
+
+where ``TIMESTAMP`` refers to the exact time when the messages was logged,


messages -> message

mulhod · 2017-10-31T21:21:56Z

skll/data/readers.py

@@ -695,19 +696,24 @@ def safe_float(text, replace_dict=None):
                         floats. Anything not in the mapping will be kept the
                         same.
    :type replace_dict: dict from str to str
+    :param text: The Logger instance to use to log messages. Used instead of


The documentation here and below should have logger instead of text as the parameter name.

coveralls · 2017-10-31T21:26:19Z

Coverage increased (+0.05%) to 92.064% when pulling a00b2a1 on improve-logging into 28a80b1 on master.

mulhod · 2017-10-31T21:32:00Z

skll/utilities/run_experiment.py

-                     created and run on a DRMAA-compatible cluster.",
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
-        conflict_handler='resolve')
+    parser = argparse.ArgumentParser(description='Runs the scikit-learn '


Nitpick: To save on space, it might be slightly better to just import ArgumentParser/ArgumentDefaultsHelpFormatter directly.

aoifecahill

This looks good to me. I ran run_experiment with and without gridmap using a config that had multiple learners and multiple objectives and the logging looks good to me. It's convenient to have the gridmap job output in the logs directory, even if it is a bit noisy.

aoifecahill · 2017-10-31T20:26:46Z

skll/experiments.py

@@ -143,7 +144,7 @@ def _write_summary_file(result_json_paths, output_file, ablation=0):
    learner_result_dicts = []
    # Map from feature set names to all features in them
    all_features = defaultdict(set)
-    logger = logging.getLogger(__name__)
+    logger = get_skll_logger('experiment')


will this (hard coding the name of the logger to be experiment) cause any conflicts if multiple experiments are being called on the same machine at the same time? Or would those always be in separate python processes?

From the Logging Cookbook:

Multiple calls to logging.getLogger('someLogger') return a reference to the same logger object. This is true not only within the same module, but also across modules as long as it is in the same Python interpreter process.

So, I don't think there should be any conflicts as long as you are using multiple run_experiment calls.

desilinguist added 14 commits October 26, 2017 15:50

Rename log file variable.

71f2cd1

Reformat for easier reading.

b8d6cc1

Change the default logging level

58fe14b

- Default logging level is now INFO instead of WARNING. - Verbose now means that DEBUG messages are included in the logging output. - Pass through the log level to `run_configuration()`.

Add file-friendly logging to SKLL

86bad97

- Add a new function that creates a logger that includes a file handler as well as console logging.

Make logging function available module-wide.

6b28290

Remove logging from metrics.py

7245432

- Use an exception instead since we don't want to pass loggers at this level.

Merge branch 'master' into improve-logging

1644964

Properly clean up config files in test_cv.py

2a0598c

Add a new test for logging message.

8b777a8

We cannot pickle loggers.

3d05a12

Use __getstate__() to get rid of logger before saving.

b6f7687

Update documentation with logging info.

ce923f3

Get rid of unnecessary indentation.

4f74ad6

desilinguist added this to the 1.5 milestone Oct 31, 2017

desilinguist self-assigned this Oct 31, 2017

desilinguist requested review from dan-blanchard, aoifecahill, mulhod and Lguyogiro October 31, 2017 19:27

Remove unneeded line [ci skip]

9136f25

desilinguist added 2 commits October 31, 2017 16:34

Remove redundant statement.

19ab347

Remove redundant check for log path.

a00b2a1

mulhod reviewed Oct 31, 2017

View reviewed changes

mulhod approved these changes Oct 31, 2017

View reviewed changes

desilinguist added 3 commits November 1, 2017 09:55

Rejigger imports for readability.

feedb3a

Fix typo in docstring.

10f3788

Fix typos [ci skip]

3ea058f

Lguyogiro approved these changes Nov 3, 2017

View reviewed changes

desilinguist mentioned this pull request Nov 7, 2017

Decouple evaluation metrics from tuning objectives #384

Merged

aoifecahill approved these changes Nov 8, 2017

View reviewed changes

desilinguist merged commit abf1db5 into master Nov 8, 2017

desilinguist deleted the improve-logging branch November 8, 2017 14:55

This was referenced Nov 9, 2017

send logging output to logfile instead of console #369

Closed

write test to check that the correct folds are being used during grid search #370

Closed

aoifecahill mentioned this pull request Nov 27, 2017

Add warnings for conflicts between fixed params and param grids #297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SKLL logging #380

Improve SKLL logging #380

desilinguist commented Oct 31, 2017 •

edited

Loading

desilinguist commented Oct 31, 2017

coveralls commented Oct 31, 2017 •

edited

Loading

mulhod Oct 31, 2017

mulhod Oct 31, 2017

mulhod Oct 31, 2017

coveralls commented Oct 31, 2017 •

edited

Loading

mulhod Oct 31, 2017

aoifecahill left a comment

aoifecahill Oct 31, 2017

desilinguist Nov 8, 2017

aoifecahill Nov 8, 2017


		TIMESTAMP - LEVEL - MSG

		where ``TIMESTAMP`` refers to the exact time when the messages was logged,

Improve SKLL logging #380

Improve SKLL logging #380

Conversation

desilinguist commented Oct 31, 2017 • edited Loading

desilinguist commented Oct 31, 2017

coveralls commented Oct 31, 2017 • edited Loading

mulhod Oct 31, 2017

Choose a reason for hiding this comment

mulhod Oct 31, 2017

Choose a reason for hiding this comment

mulhod Oct 31, 2017

Choose a reason for hiding this comment

coveralls commented Oct 31, 2017 • edited Loading

mulhod Oct 31, 2017

Choose a reason for hiding this comment

aoifecahill left a comment

Choose a reason for hiding this comment

aoifecahill Oct 31, 2017

Choose a reason for hiding this comment

desilinguist Nov 8, 2017

Choose a reason for hiding this comment

aoifecahill Nov 8, 2017

Choose a reason for hiding this comment

desilinguist commented Oct 31, 2017 •

edited

Loading

coveralls commented Oct 31, 2017 •

edited

Loading

coveralls commented Oct 31, 2017 •

edited

Loading