-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Environment
SOMEF version: 0.9.13
Python: 3.11.14
OS: Ubuntu (GitHub Actions ubuntu-latest)
Summary
For certain README files, header_analysis.py crashes with an IndexError inside mardown_parser.extract_content_per_header. The error is swallowed by a logging formatter bug (secondary crash in logging/init.py), making it very hard to diagnose.
Error (from stderr)
--- Logging error ---
Traceback (most recent call last):
File ".../somef/header_analysis.py", line 212, in extract_categories
data, none_header_content = extract_header_content(repo_data)
File ".../somef/header_analysis.py", line 103, in extract_header_content
content, none_header_content = mardown_parser.extract_content_per_header(text, headers)
File ".../somef/parser/mardown_parser.py", line 54, in extract_content_per_header
top = keys[0]
~~~~^^^
IndexError: list index out of range
During handling of the above exception, another exception occurred:
File ".../logging/init.py", line 377, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Root cause
mardown_parser.extract_content_per_header (line 54) does:
top = keys[0]
without first checking if keys is empty. When a README has headers that produce no parseable content (e.g. headers with only badge/image lines, or very short single-line READMEs), the keys dict/list is empty and the index access crashes.
The TypeError in the logging handler is a secondary issue — the exception message contains %s-style format characters that conflict with the logging formatter, masking the real traceback in normal log output.
Suggested fix
mardown_parser.py line ~53
if not keys:
return {}, none_header_content # or whatever the appropriate empty return is
top = keys[0]