Simplify logging via custom context manager text writer

- python standard library's logging package doesn't work for this use case - use case: - log output from multiple ListCreator instances running on different threads to DIFFERENT files - tried running 2 instances of ListCreator on 2 different threads - program scraping logic executes properly - HOWEVER, content of log files is scrambled - instead of thread 1 writing only to file 1 and thread 2 writing only to file 2: - thread 1 writes to both file 1 AND file 2 - thread 2 writes to both file 2 AND file 1 - the pattern of a specific thread writing to different files was not readily obvious - it DID seem like the first thread wrote only to its specified file UNTIL the second thread was created, after which the threads mixed their output - logging package claims to be thread safe and available for use on multiple threads without additional configuration - however, no variation of attribute changes, object creation, or logging.basicConfig() modification seemed to avoid the problem of writing to multiple files instead of the ONE designated file for that specific thread - problem might be because logging package enables logging to all modules in a python package by making logging GLOBAL (?) - logging may be global on a single python process (?) - this would explain why both threads wrote to both open files instead of the file specified for their own thread - ESSENTIALLY - the logging from thread 1 initially sets the output to be file 1 - logging from thread 1 writes to only file 1 - thread 2 is created and the logging package sets thread 2 to write to file 2 - HOWEVER, since logging is global, thread 2 is essentially ADDING file 2 to the "list of logging output locations," therefore - thread 2 writes to file 2 AND file 1 - thread 1 writes to file 1 AND file 2 - BECAUSE: logging package NOW takes anything that should be logged and logs it to EVERY LOCATION that is added to the "list of logging output locations" using the .addHandler() logging method - the thread safe claim by logging package makes sense for a multi-threaded application that needs to log ALL info into the SAME file(s) EVEN with application running on multiple threads - however, this use case is different and the goal is to NOT write the info to the same file(s) across different threads, but to write to only a SPECIFIC file specified by its specific thread - workaround: provide the output location to any function that logged any information - used @contextlib.contextmanager (to avoid manually creating __enter__() and __exit__() methods) for using a FUNCTION instead of a file like resource with the "with" keyword - trying to run ``` with open(filename, 'a', encoding='utf-8') if log_file else sys.stdout as logging_output_location: logging_output_location.writelines(content) ``` - did not seem like best practice, since trying to run these lines more than once resulted in: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: I/O operation on closed file. ``` - this might be okay in typical use cases if user exits the python interpreter after running ListCreator().create_list_for() method ONCE - but might cause problems if user tries to run method again OR tries to print() anything in the same python process (does not restart interpreter) - also tried: ``` with open(sys.stdout) as f: f.write('some text') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper ``` - tried to "yield" a file object from a function without the @contextlib.contextmanager decorator: ``` def stdout_writer(): yield sys.stdout def file_writer(filename): with open(filename, 'a', encoding='utf-8') as file: yield file with file_writer(filename) if some_truthy_value else stdout_writer() as out: out.write('haha') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: __enter__ ``` - decorating both functions with @contextlib.contextmanager addressed AttributeError: __enter__ ``` @contextlib.contextmanager def stdout_writer(): yield sys.stdout @contextlib.contextmanager def file_writer(filename): with open(filename, 'a', encoding='utf-8') as file: yield file with file_writer(filename) if True else stdout_writer() as out: out.write('haha') with file_writer(filename) if False else stdout_writer() as out: out.write('haha') haha4 ``` - as seen above, running the .write(content) method to the provided writer object always returns the NUMBER of bytes written - the .writelines(content) method does NOT display number of bytes written to user - HOWEVER, the .writelines(content) method does NOT add a newline to the end of every write, so '\n' needed to be manually added to all logging statements - resources: - logging package - https://realpython.com/python-logging/#other-configuration-methods - https://stackoverflow.com/questions/11232230/logging-to-two-files-with-different-settings/11233293 - https://stackoverflow.com/questions/15199816/python-logging-multiple-files-using-the-same-logger - https://stackoverflow.com/questions/24816456/python-logging-wont-shutdown - https://docs.python.org/3/library/logging.html - https://docs.python.org/3/howto/logging-cookbook.html#:~:text=Although%20logging%20is%20thread-safe,across%20multiple%20processes%20in%20Python. - https://docs.python.org/3/howto/logging-cookbook.html#multiple-handlers-and-formatters - https://docs.python.org/3/library/logging.html#logging.getLogger - sys.stdout, print(), and formatting strings - https://stackoverflow.com/questions/3263672/the-difference-between-sys-stdout-write-and-print - https://stackoverflow.com/questions/493386/how-to-print-without-newline-or-space?rq=1 - https://stackoverflow.com/questions/12377473/python-write-versus-writelines-and-concatenated-strings - time package - https://docs.python.org/3/library/time.html#time.strftime - https://en.wikipedia.org/wiki/ISO_8601 - contextlib package - https://dbader.org/blog/python-context-managers-and-with-statement - https://docs.python.org/3/library/contextlib.html - https://pymotw.com/2/contextlib/ - https://www.geeksforgeeks.org/context-manager-using-contextmanager-decorator/
shailshouryya · Dec 28, 2020 · 82a0129 · 82a0129
1 parent 7134c3e
commit 82a0129
Show file tree

Hide file tree

Showing 10 changed files with 146 additions and 169 deletions.
diff --git a/python/dev/execute.py b/python/dev/execute.py
@@ -1,6 +1,6 @@
 import sys
 import time
-import logging
+import contextlib
 
 import selenium
 from selenium import webdriver
@@ -139,6 +139,15 @@ def show_user_how_to_set_up_selenium():
         common_message.display_dependency_setup_instructions(user_driver, user_os)
 
 
+    @contextlib.contextmanager
+    def yield_file_writer(log_file):
+        with open (log_file, 'a', encoding='utf-8') as output_location:
+            yield output_location
+
+    @contextlib.contextmanager
+    def yield_stdout_writer():
+        yield sys.stdout
+
 
     user_os             = determine_user_os()
     url, seleniumdriver = check_user_input()
@@ -163,28 +172,8 @@ def show_user_how_to_set_up_selenium():
         driver.set_window_size(780, 800)
         driver.set_window_position(0, 0)
         file_name = determine_file_name()
-        if log_file:
-            log = logging.getLogger()
-            # logging level (in order of increasing severity) can be:
-            # - DEBUG (prints everything python does)
-            # - INFO
-            # - WARNING
-            # - ERROR
-            # - CRITICAL
-            log.setLevel(logging.INFO)
-            handler = logging.FileHandler(log_file)
-            handler.setLevel(logging.DEBUG)
-            formatter = logging.Formatter(fmt='%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
-            handler.setFormatter(formatter)
-            log.addHandler(handler)
-        else:
-            logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S', level=logging.INFO)
-        logging.info('*' * 100)
-        logging.info(f'Starting new job at {time.asctime()}')
-        program.determine_action(url, driver, scroll_pause_time, reverse_chronological, file_name, txt, csv, markdown)
-        if log_file:
-            # log.removeHandler(handler) # if you don't provide the log_file argument, program ALWAYS prints DEBUG info to console
-            log.handlers.clear()         # if you don't provide the log_file argument, program prints DEBUG info to console the first time after providing log_file argument (but not second or subsequent times)
+        with yield_file_writer(log_file) if log_file else yield_stdout_writer() as logging_output_location:
+            program.determine_action(url, driver, scroll_pause_time, reverse_chronological, file_name, txt, csv, markdown, logging_output_location)
     program_end = time.perf_counter()
     total_time  = program_end - program_start
     print(f'This program took {total_time} seconds to complete.\n')

diff --git a/python/dev/file/create_file.py b/python/dev/file/create_file.py
@@ -1,5 +1,4 @@
 import functools
-import logging
 import time
 import csv
 import os
@@ -11,40 +10,40 @@
 NEWLINE = '\n'
 
 
-def scroll_down(current_elements_count, driver, scroll_pause_time):
+def scroll_down(current_elements_count, driver, scroll_pause_time, logging_output_location):
     driver.execute_script('window.scrollBy(0, 50000);')
     time.sleep(scroll_pause_time)
     new_elements_count = driver.execute_script('return document.querySelectorAll("ytd-grid-video-renderer").length')
-    logging.info(f'Found {new_elements_count} videos...')
+    logging_output_location.writelines(f'Found {new_elements_count} videos...\n')
     if new_elements_count == current_elements_count:
         # wait scroll_pause_time seconds and check again to verify you really did reach the end of the page, and there wasn't a buffer loading period
-        logging.info(common_message.no_new_videos_found(scroll_pause_time * 2))
+        logging_output_location.writelines(common_message.no_new_videos_found(scroll_pause_time * 2))
         time.sleep(scroll_pause_time * 2)
         new_elements_count = driver.execute_script('return document.querySelectorAll("ytd-grid-video-renderer").length')
         if new_elements_count == current_elements_count:
-            logging.info('Reached end of page!')
+            logging_output_location.writelines('Reached end of page!\n')
     return new_elements_count
 
 
-def save_elements_to_list(driver, start_time, scroll_pause_time, url):
+def save_elements_to_list(driver, start_time, scroll_pause_time, url, logging_output_location):
     elements   = driver.find_elements_by_xpath('//*[@id="video-title"]')
     end_time   = time.perf_counter()
     total_time = end_time - start_time - scroll_pause_time # subtract scroll_pause_time to account for the extra waiting time to verify end of page
-    logging.info(f'It took {total_time} seconds to find all {len(elements)} videos from {url}{NEWLINE}')
+    logging_output_location.writelines(f'It took {total_time} seconds to find all {len(elements)} videos from {url}{NEWLINE}\n')
     return elements
 
 
-def scroll_to_bottom(url, driver, scroll_pause_time):
+def scroll_to_bottom(url, driver, scroll_pause_time, logging_output_location):
     start_time = time.perf_counter() # timer stops in save_elements_to_list() function
     driver.get(url)
     current_elements_count = driver.execute_script('return document.querySelectorAll("ytd-grid-video-renderer").length')
     while True:
-        new_elements_count = scroll_down(current_elements_count, driver, scroll_pause_time)
+        new_elements_count = scroll_down(current_elements_count, driver, scroll_pause_time, logging_output_location)
         if new_elements_count == current_elements_count:
             break
         else:
             current_elements_count = new_elements_count
-    return save_elements_to_list(driver, start_time, scroll_pause_time, url)
+    return save_elements_to_list(driver, start_time, scroll_pause_time, url, logging_output_location)
 
 
 def time_writer_function(writer_function):
@@ -53,18 +52,18 @@ def wrapper_timer(*args, **kwargs):
         start_time                = time.perf_counter()
         extension                 = writer_function.__name__.split('_')[-1]
         timestamp                 = kwargs.get('timestamp', 'undeteremined_start_time')
-        logging.info(f'Opening a temp {extension} file and writing video information to the file....')
-        file_name, videos_written = writer_function(*args, **kwargs)         # writer_function() writes to temp_{file_name}
+        file_name, videos_written, logging_output_location = writer_function(*args, **kwargs)         # writer_function() writes to temp_{file_name}
+        logging_output_location.writelines(f'Opening a temp {extension} file and writing video information to the file....\n')
         end_time                  = time.perf_counter()
         total_time                = end_time - start_time
         temp_file                 = f'temp_{file_name}_{timestamp}.{extension}'    # determine temp_{file_name} for wrapper_timer() scope
         final_file                = f'{file_name}.{extension}'
-        logging.info(f'Finished writing to'.ljust(38) + f'{temp_file}')
-        logging.info(f'{videos_written} videos written to'.ljust(38) + f'{temp_file}')
-        logging.info(f'Closing'.ljust(38) + f'{temp_file}')
+        logging_output_location.writelines(f'Finished writing to'.ljust(38) + f'{temp_file}\n')
+        logging_output_location.writelines(f'{videos_written} videos written to'.ljust(38) + f'{temp_file}\n')
+        logging_output_location.writelines(f'Closing'.ljust(38) + f'{temp_file}\n')
         os.replace(temp_file, final_file)                                    # rename temp_{file_name} to {file_name}.{extension} here AFTER everything else finishes to ensure atomicity
-        logging.info(f'Successfully completed write, renamed {temp_file} to {final_file}')
-        logging.info(f'It took {total_time} seconds to write all {videos_written} videos to {final_file}{NEWLINE}')
+        logging_output_location.writelines(f'Successfully completed write, renamed {temp_file} to {final_file}\n')
+        logging_output_location.writelines(f'It took {total_time} seconds to write all {videos_written} videos to {final_file}{NEWLINE}\n')
     return wrapper_timer
 
 
@@ -80,34 +79,34 @@ def prepare_output(list_of_videos, reverse_chronological):
     return total_videos, total_writes, video_number, incrementer
 
 
-def txt_writer(file, markdown_formatting, reverse_chronological, list_of_videos, spacing, video_number, incrementer, total_writes):
+def txt_writer(file, markdown_formatting, reverse_chronological, list_of_videos, spacing, video_number, incrementer, total_writes, logging_output_location):
     for selenium_element in list_of_videos if reverse_chronological else list_of_videos[::-1]:
         video_number, total_writes = write.txt_entry(file, markdown_formatting, selenium_element, NEWLINE, spacing, video_number, incrementer, total_writes)
         if total_writes % 250 == 0:
-            logging.info(f'{total_writes} videos written to {file.name}...')
+            logging_output_location.writelines(f'{total_writes} videos written to {file.name}...\n')
 
 @time_writer_function
-def write_to_txt(list_of_videos, file_name, reverse_chronological, timestamp):
+def write_to_txt(list_of_videos, file_name, reverse_chronological, logging_output_location, timestamp):
     total_videos, total_writes, video_number, incrementer = prepare_output(list_of_videos, reverse_chronological)
     markdown_formatting                                   = False
     spacing                                               = f'{NEWLINE}' + ' '*4
     with open(f'temp_{file_name}_{timestamp}.txt', 'w', encoding='utf-8') as txt_file:
-        txt_writer(txt_file, markdown_formatting, reverse_chronological, list_of_videos, spacing, video_number, incrementer, total_writes)
-    return file_name, total_videos
+        txt_writer(txt_file, markdown_formatting, reverse_chronological, list_of_videos, spacing, video_number, incrementer, total_writes, logging_output_location)
+    return file_name, total_videos, logging_output_location
 
 
 @time_writer_function
-def write_to_md(list_of_videos, file_name, reverse_chronological, timestamp):
+def write_to_md(list_of_videos, file_name, reverse_chronological, logging_output_location, timestamp):
     total_videos, total_writes, video_number, incrementer = prepare_output(list_of_videos, reverse_chronological)
     markdown_formatting                                   = True
     spacing                                               = f'{NEWLINE}' + '- ' + f'{NEWLINE}'
     with open(f'temp_{file_name}_{timestamp}.md', 'w', encoding='utf-8') as md_file:
-        txt_writer(md_file, markdown_formatting, reverse_chronological, list_of_videos, spacing, video_number, incrementer, total_writes)
-    return file_name, total_videos
+        txt_writer(md_file, markdown_formatting, reverse_chronological, list_of_videos, spacing, video_number, incrementer, total_writes, logging_output_location)
+    return file_name, total_videos, logging_output_location
 
 
 @time_writer_function
-def write_to_csv(list_of_videos, file_name, reverse_chronological, timestamp):
+def write_to_csv(list_of_videos, file_name, reverse_chronological, logging_output_location, timestamp):
     total_videos, total_writes, video_number, incrementer = prepare_output(list_of_videos, reverse_chronological)
     with open(f'temp_{file_name}_{timestamp}.csv', 'w', newline='', encoding='utf-8') as csv_file:
         fieldnames = ['Video Number', 'Video Title', 'Video URL', 'Watched?', 'Watch again later?', 'Notes']
@@ -116,5 +115,5 @@ def write_to_csv(list_of_videos, file_name, reverse_chronological, timestamp):
         for selenium_element in list_of_videos if reverse_chronological else list_of_videos[::-1]:
             video_number, total_writes = write.csv_entry(writer, selenium_element, video_number, incrementer, total_writes)
             if total_writes % 250 == 0:
-                logging.info(f'{total_writes} videos written to {csv_file.name}...')
-    return file_name, total_videos
+                logging_output_location.writelines(f'{total_writes} videos written to {csv_file.name}...\n')
+    return file_name, total_videos, logging_output_location