How to handle html from the `Author` field? #290

chuanqisun · 2021-05-08T02:45:35Z

Before submitting your issue, please make sure these boxes are checked. Thank you!

Review the compressed example.
I tried but the URL is broken.
FeedParser@2.2.10
Node@14.16.1
Problem feed: https://alistapart.com/main/feed/

Problem feed meta:

In the feed item, the author field contains HTML:

The parser strips the entire <a> tag from the author property in the output

The rss:author property has some additional information but I think it's difficult write generalized extract logic as the structure can differ from feed to feed

I wonder if there is an easy way to just get the plaintext within the Author field by Preston So.

Thanks!

The text was updated successfully, but these errors were encountered:

danmactough · 2021-05-08T20:39:36Z

That feed is not valid https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Falistapart.com%2Fmain%2Ffeed%2F This is a sad but common problem when parsing feeds. Feedparser doesn't have an opinion about how you should handle invalid feeds -- everyone kind of needs to figure that out for themself given the goals of the project they're working on.

I wonder if there is an easy way to just get the plaintext within the Author field by Preston So

For this specific workaround, the # property contains the plain text parts of the original feed item. So, you would need to recursively parse the rss:author property to pull out the # properties, then join them together with a space.

danmactough closed this as completed May 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle html from the `Author` field? #290

How to handle html from the `Author` field? #290

chuanqisun commented May 8, 2021

danmactough commented May 8, 2021

How to handle html from the Author field? #290

How to handle html from the Author field? #290

Comments

chuanqisun commented May 8, 2021

danmactough commented May 8, 2021

How to handle html from the `Author` field? #290

How to handle html from the `Author` field? #290