Skip to content

0.5.0: Add bugfixes, improve logging + documentation, support threading

Compare
Choose a tag to compare
@shailshouryya shailshouryya released this 05 Jan 11:22
a88ec79
  • compare changes to previous version
  • major change includes changing output file name to include _reverse_chronological or _chronological
    • e.g.
      • MyChannel_reverse_chronological_videos_list.txt
      • MyChannel_chronological_videos_list.txt
        • compare with previous naming convention of
      • MyChannel_videos_list.txt - REGARDLESS of whether the file was in reverse chronological order or chronological order
addresses issues:
  • 'gbk' codec can't decode byte 0x80
    • #3
    • PR #5
      • commit 1fcf32a: Explicitly use utf-8 for encoding and decoding all files
      • commit 4c4cc5a: Specify name,operation,encoding params for all file IO
  • Can lc.create_list_for() return the csv file name?
    • #4
    • PR #6
      • commit a30be66: Return file name after lc.create_list_for() finishes
  • Temporal filename "yt_videos_list_temp.txt"
    • #7
    • #8
      • commit 5d7bfad: Name temp files using channel (for file creation)
      • commit c781a2f: Name temp files using channel (for file updates)
      • commit 3772d57: Indicate type of temp file being written
    • append time to end of temp file name
      • initially appended UNIX time
        • commit d0f76e2: Append UNIX timestamp to temporary file name
        • commit 110912d: Replace the dot in timestamp with a dash (should have been included in commit above)
      • then changed appended time from UNIX time to ISO 8601 datetime format to increase readability
        • commit a349c5b: Append ISO 8601 datetime instead of UNIX time to temp file name
inserts "_reverse_chronological" or "_chronological" to file name:
  • commit 8cf2e15: Append (reverse_)?chronological to file name
  • commit 3285d93: Update location for testing file paths
  • commit dc38e2d: Modify output file naming
  • commit 90212e8: Simplify file suffix creation
significantly improves logging:
  • vertically align similar messages to facilitate quick comparisons between related messages
    • commit c877927: Vertically align logging output
    • commit e7458d0: Make logging messages more visible
    • commit 3487888: Right pad all testing log messages with ">"
    • commit 60cbee3: Log thread being created (during testing)
    • commit 18146c4: Rejustify program logging messages
    • commit 296fc4d: log program info with custom logger helper module (↑ DRY)
    • commit ab18840: Log "PROGRAM COMPLETE" instead of "PROGRAM COMPLETED"
  • log datetime for every event
    • commit b00b088: Print datetime during testing
    • commit e6101b8: Log datetime while running program
  • LOG ALL information to corresponding LOG FILE for channel
    • log file naming
      • commit 257e9e9: Name log file using ISO 8601 datetime
      • commit f53e6af: Name log file using output file name
  • general logging
    • commit 2b6bb4f: Enable optional logging to user specified log_file
    • commit b1e784a: Log test output to "{suffix}.log" (testing)
    • commit 2b63e9c: Enable INFO level logging by default
    • commit 82a0129: Simplify logging via custom context manager text writer (EXTREMELY detailed!)
      • commit 3b6e3fc: Pass logging_output_location to txt_writer()
    • commit 6513697: Log program start & end messages instead of printing to console
    • commit add1f35: Log name of driver during testing
    • commit 21e6bde: Add testing info to log files during tests
    • commit 9a41424: Enable logging to multiple files during testing
    • commit 8bb1008: Simplify testing info logging
    • commit 9a41424: Enable logging to multiple files during testing
    • commit 99d7be0: Add "*" 200x when test starts to clearly divide log file
    • commit ca7f4c9: Log thread name when new thread created (during testing)
    • commit 97ccc33: Log ">>>STARTING PROGRAM<<<"
    • commit 2e20d8a: Log ">>>PROGRAM COMPLETED<<<"
    • commit 8a3e4f2: Log write & file renaming successes separately
    • commit dab5ecf: Move create_file.py & update_file.py decorator code → log_extraction_information()
    • commit fb83118: Always log to log file but allow console logging muting
    • commit 56bc309: Log "video" if 1 new video found, otherwise log "videos"
interesting logging (python standard library package) bug and workaround:
  • commit 82a0129: Simplify logging via custom context manager text writer (also mentioned above, EXTREMELY detailed!)
multi-threading bug (very detailed explanations) and workaround (just avoid using global variables):
  • only occurs when
    • scraping the same channel on 2 threads with reverse_chronological set to True on one thread and False on the other thread
    • and starting both threads WITHIN a few tenths of a second of each other
    • WITH pre-existing files for both reverse_chronological file and chronological files but DIFFERENT number of videos in the files for reverse_chronological and the chronological files
  • commit 97d928f: Test pre-existing csv, txt, md files first
  • commit 7bf88c1: Modify partial chronological files (catch bug more frequently)
  • commit e787c3a: Return visited videos sets instead of creating global variables
other multi-threading bugs/challenges/changes:
  • commit 3b78b0a: Delete only relevant files before testing
  • commit e2e1ae9: Avoid starting new thread after last test case
  • commit 7be64c3: Explicitly check which thread ends first
  • commit 5aafa78: Simplify threading logic for tests
  • commit 930b59c: Avoid multi-threading for safaridriver
  • commit 4da6186: Remove debugging print statements ("previous commit" refers to the commit above)
  • commit fcd744d: Verify variable exists before printing message
  • commit 9c2529d: Ensure threads finish before proceeding
  • commit 27cc6a9: Make thread checks more robust
removes deprecated create_list_for() arguments:
  • commit 6bbac49: Remove deprecated create_list_for() arguments
**creates custom threading.Thread subclass to store result of thread during testing**:
  • commit 8fc6270: Add custom class to store thread result
  • commit f1d58f6: Make ThreadWithResult attribute names more descriptive
  • commit b10480b: Add ThreadWithResult class docstring (test_shared.py)
points future drivers to newest available driver:
  • commit fd8ad48: Point future drivers to newest available driver ("next commit" refers to commit below)
  • commit fd878f3: Indicate failed update may be due to new driver version
creates json file with all download commands:
  • commit e0569f2: Create json file for download commands
    • previously the project only provided pseudo json in the yt_videos_list/docs/dependencies_pseudo_json.txt file
fixes inability to update package due to testing module dependency on package submodule:
  • started with
    • commit 6fa0deb: Run "pip" on Windows and "pip3" on Unix
      • following commit 8c73de6: Make PATH_SLASH a global variable
  • addressed with
    • commit 9550ca5: Update local package without yt_videos_list submodule function
    • commit 878fb67: Remove duplicate import (test_cross_platform_drivers.py) (since function now imported from tests.determine module)
    • commit 0204dd2: Run pip install directly from test script
    • commit 829a1ae: Update local package if python test module called directly
Benchmarking
# without yt_videos_list submodule function
for i in {1..10}; do (time (for i in {1..100}; do python3 minifier.py; done)); done

real	0m8.261s
user	0m5.433s
sys	0m2.259s

real	0m8.288s
user	0m5.429s
sys	0m2.247s

real	0m8.022s
user	0m5.272s
sys	0m2.164s

real	0m7.989s
user	0m5.266s
sys	0m2.165s

real	0m7.984s
user	0m5.253s
sys	0m2.163s

real	0m8.009s
user	0m5.268s
sys	0m2.164s

real	0m8.047s
user	0m5.269s
sys	0m2.175s

real	0m8.068s
user	0m5.242s
sys	0m2.182s

real	0m8.030s
user	0m5.289s
sys	0m2.164s

real	0m8.046s
user	0m5.284s
sys	0m2.176s
# with yt_videos_list submodule function
for i in {1..10}; do (time (for i in {1..100}; do python3 minifier.py; done)); done

real	1m28.987s
user	0m42.470s
sys	0m41.508s

real	1m28.921s
user	0m42.508s
sys	0m41.411s

real	1m28.753s
user	0m42.436s
sys	0m41.378s

real	1m29.467s
user	0m42.700s
sys	0m41.732s

real	1m28.672s
user	0m42.286s
sys	0m41.406s

real	1m28.415s
user	0m42.297s
sys	0m41.202s

real	1m28.629s
user	0m42.360s
sys	0m41.244s

real	1m29.088s
user	0m42.587s
sys	0m41.527s

real	1m29.392s
user	0m42.644s
sys	0m41.637s

real	1m29.345s
user	0m42.657s
sys	0m41.643s
# without yt_videos_list submodule function again
for i in {1..10}; do (time (for i in {1..100}; do python3 minifier.py; done)); done

real	0m8.488s
user	0m5.585s
sys	0m2.308s

real	0m8.293s
user	0m5.497s
sys	0m2.251s

real	0m8.115s
user	0m5.396s
sys	0m2.188s

real	0m8.116s
user	0m5.396s
sys	0m2.179s

real	0m8.145s
user	0m5.395s
sys	0m2.198s

real	0m8.066s
user	0m5.367s
sys	0m2.170s

real	0m8.042s
user	0m5.340s
sys	0m2.162s

real	0m8.029s
user	0m5.329s
sys	0m2.159s

real	0m8.170s
user	0m5.420s
sys	0m2.195s

real	0m8.154s
user	0m5.426s
sys	0m2.190s
other interesting bugs:
  • commit 1cdd8f5: Revert "Make command consistent with other unix commands"
    • commit 6783c40: Make command consistent with other unix commands
      • following commit 76c066f: Move repeated commands into helper functions
  • addressed in
    • commit d762b00: Remove rm /usr/local/bin/sha512_sum command (bravedriver)
    • commit e32f69f: Remove sha512 removal command for Windows bravedriver too
not a bug, but best practice:
  • commit 30e9701: Make global varaibles local