Fixed Extraction when Meta tag has an empty content #545
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey @adbar,
I had a few cases that the below meta tags are empty, and when it happen the extraction stops to work.
I checked and the line below is checking for None before trying to get the correct title, but it never happens, because the title is ' ' in this line.
trafilatura/trafilatura/metadata.py
Line 507 in fb3e174
Evan on json parse it is checking for None but the current title is ' ':
trafilatura/trafilatura/json_metadata.py
Line 109 in fb3e174
And then this function convert the the title ' ' to none:
trafilatura/trafilatura/metadata.py
Line 573 in fb3e174
And then it will fail here:
trafilatura/trafilatura/core.py
Line 929 in fb3e174
This is an example of the problem:
Example
I run the the tests and comparison_small.py and it appears to be the same.
Thanks.