Skip to content

Commit

Permalink
Rework scan-archive multiprocessing scan
Browse files Browse the repository at this point in the history
This is an important rework of the code in s01scan_archive.py to make it
more readable, easier to maintain, and more efficient while scanning
files and directories.  (This was first initiated to correct some
multiprocessing-related bug/odd behaviour on CentOS 6.5).

It uses multiprocessing.Pool objects to distribute the scan of the
archive directories among multiple child processes, if the -t/--threads
option was used to indicate the number of concurrent processes to use.

The logging system was also slightly changed in s01scan_archive.py and
scripts/msnoise.py.  A logger instance is now setup through the
api.get_logger() function (and a new instance is created in child
processes to add the process id in the log record format).
  • Loading branch information
Xavier Béguin committed Dec 18, 2018
1 parent e150eb7 commit a133a7b
Show file tree
Hide file tree
Showing 5 changed files with 547 additions and 238 deletions.
4 changes: 3 additions & 1 deletion msnoise/__init__.py
Expand Up @@ -11,6 +11,8 @@
class MSNoiseError(Exception):
pass


class DBConfigNotFoundError(MSNoiseError):
pass

class FatalError(MSNoiseError):
pass
24 changes: 24 additions & 0 deletions msnoise/api.py
Expand Up @@ -11,6 +11,7 @@
import pickle as cPickle
import math
import pkg_resources
import sys

from sqlalchemy import create_engine, func
from sqlalchemy.orm import sessionmaker
Expand All @@ -33,6 +34,29 @@
from .msnoise_table_def import Filter, Job, Station, Config, DataAvailability


def get_logger(name, loglevel=None, with_pid=False):
"""
Returns the current configured logger or configure a new one.
"""
if with_pid:
log_fmt='%(asctime)s msnoise [pid %(process)d] '\
'[%(levelname)s] %(message)s'
else:
log_fmt='%(asctime)s msnoise [%(levelname)s] %(message)s'
logger = logging.getLogger(name)
# Remove any inherited StreamHandler to avoid duplicate lines
for h in logger.handlers:
if isinstance(h, logging.StreamHandler):
logger.removeHandler(h)
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(
logging.Formatter(fmt=log_fmt, datefmt='%Y-%m-%d %H:%M:%S'))
logger.addHandler(handler)
logger.setLevel(loglevel)
logger.propagate = False
return logger


def get_tech():
"""Returns the current DB technology used (reads from the db.ini file)
Expand Down

0 comments on commit a133a7b

Please sign in to comment.