Skip to content

Skip zero-byte files in scanner.py during directory scans #6

@chigwell

Description

@chigwell

User Story
As a software developer using FolderScanner,
I want the scanner to skip reading zero-byte files
so that large directory scans consume fewer system resources.

Background
The current implementation of scan_directory in scanner.py reads every file matching non-ignored paths, including empty (0-byte) files. This leads to unnecessary I/O operations, particularly during large-scale scans of directories containing temporary or placeholder files. For example, the loop:

for file in files:  
    file_path = os.path.join(root, file)  
    if spec.match_file(file_path):  
        continue  
    # ... file read occurs regardless of size  

wastes CPU cycles and I/O bandwidth opening/reading files with no actionable data.

Acceptance Criteria

  • Modify scanner.py to check file size before reading:
    • Add if os.path.getsize(file_path) == 0: continue after the ignore-pattern check.
  • Ensure skipped files:
    • Do not trigger "Error reading..." messages.
    • Are excluded from the yielded file_chunk results.
  • Validation steps:
    1. Create test directories with mixed empty/non-empty files.
    2. Verify zero-byte files are never opened (add debug logging if needed).
    3. Confirm scan duration improves in environments with many empty files.
  • Update unit tests to validate this optimization.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions