Accomodate special case : Title attribute missing from rss #17

BBArikL · 2021-11-18T03:17:55Z

When parsing a rss file, it might be possible for the item to not have any title property. I added a check for the title to be sure it does not crash any user's application. When there is no title found in the rss, the "title" property is changed to a empty string.

dhvcc · 2021-11-18T08:10:06Z

Hi, thanks for your contribution! The reason why there's no checks in that title, description and link fields are required fields by RSS specification
On the other hand it may be useful to implement "optional" mode where every field is not mandatory

BBArikL · 2021-11-18T17:25:01Z

Oh yes definitely! It would help parse non-standard rss files like this one. While it seems to be standard in the most part, the first item was making my program crash when it had to parse it because of the lack of title given.

dhvcc · 2021-11-19T11:16:45Z

Oh yes definitely! It would help parse non-standard rss files like this one. While it seems to be standard in the most part, the first item was making my program crash when it had to parse it because of the lack of title given.

Nice! Would you want to create a separate PR or force-push this one to update the logic and add this feature?

BBArikL · 2021-11-19T21:08:57Z

I think I will update this PR when I'll have time. 👍

Thewildweb · 2021-12-12T11:30:21Z

@BBArikL Have you gotten around to this? Otherwise I could implement this the coming week.

BBArikL · 2021-12-12T14:18:29Z

Hello! I have been quite busy with life since last time. I should be way more free after this thursday and I'll try update the branch somewhere in the next weekend. Thank you for reminding me of it, I'll add a reminder so I do not forget :)

…e additional entries.

BBArikL · 2021-12-18T23:58:55Z

@dhvcc @Thewildweb Here is the commit that I promised! I added some whitespaces to help the code review and added a functionality to have additional entries to the RSS scraped. Let me know what you think!

BBArikL · 2021-12-19T00:06:17Z

To add a field, call parse() with entries equal to a list of fields the scraper should look for.
Let's say for a field 'author':

parser = Parser(xml=someRSSSite)

feed = parser.parse(entries=["author"])

Then you can retrieve the information (let's say for the first item) by callying:

item = feed[0]

author = item.other["author"]

And now the variable author contains the author value that was in the rss, or contains a empty string if there was not any value set.

dhvcc · 2021-12-28T21:00:48Z

Hi @BBArikL @Thewildweb
Sorry for not updating on the issue, had a lot of work. I'll try to review and sort this out before NY's. Happy holidays!

dhvcc

Looks good to me, but there are couple points which need to be considered I think

dhvcc · 2021-12-31T08:36:17Z

rss_parser/_parser.py

-                "publish_date": getattr(item.pubDate, "text", ""),
-                "category": getattr(item.category, "text", ""),
-                "description": description_soup.text,
+                "title": getattr(getattr(item, "title", ""), "text", ""),


May be we need to to something about those double getattrs. Perhaps move them into a separate function?

Right, I'll do that!

rss_parser/_parser.py

dhvcc · 2021-12-31T11:30:53Z

rss_parser/_parser.py

-        default: str,
-        item_dict: Optional[str] = None,
-        default_dict: Optional[str] = None,
+            item: object,


I think black uses 4 spaces instead of 8, let's try to not cause any extra diffs

Oh I did not, see that I put more spaces, I will fix this in the next commit.

dhvcc · 2022-01-11T08:03:05Z

@BBArikL please fix linting issues

BBArikL · 2022-01-11T19:17:55Z

That last commit should fix the linting issues 😅 . First time working with strict code checkers.

BBArikL added 2 commits November 17, 2021 21:18

Custom changes to accomodate for special cases

6894e28

Fixed breaking change

ede9b1d

dhvcc mentioned this pull request Dec 11, 2021

Crash when there is no description_soup #18

Closed

Added checks for each item's rss field and added functionality to hav…

7105a57

…e additional entries.

dhvcc reviewed Dec 31, 2021

View reviewed changes

Gettars moved to get_text() and deleted not needed spaces.

bc45eca

Fix linting issues

2b6516e

dhvcc approved these changes Jan 13, 2022

View reviewed changes

dhvcc merged commit f48f72e into dhvcc:master Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accomodate special case : Title attribute missing from rss #17

Accomodate special case : Title attribute missing from rss #17

BBArikL commented Nov 18, 2021

dhvcc commented Nov 18, 2021 •

edited

Loading

BBArikL commented Nov 18, 2021

dhvcc commented Nov 19, 2021 •

edited

Loading

BBArikL commented Nov 19, 2021

Thewildweb commented Dec 12, 2021

BBArikL commented Dec 12, 2021

BBArikL commented Dec 18, 2021

BBArikL commented Dec 19, 2021

dhvcc commented Dec 28, 2021

dhvcc left a comment

dhvcc Dec 31, 2021

BBArikL Jan 6, 2022

dhvcc Dec 31, 2021

BBArikL Jan 6, 2022

dhvcc commented Jan 11, 2022

BBArikL commented Jan 11, 2022 •

edited

Loading

Accomodate special case : Title attribute missing from rss #17

Accomodate special case : Title attribute missing from rss #17

Conversation

BBArikL commented Nov 18, 2021

dhvcc commented Nov 18, 2021 • edited Loading

BBArikL commented Nov 18, 2021

dhvcc commented Nov 19, 2021 • edited Loading

BBArikL commented Nov 19, 2021

Thewildweb commented Dec 12, 2021

BBArikL commented Dec 12, 2021

BBArikL commented Dec 18, 2021

BBArikL commented Dec 19, 2021

dhvcc commented Dec 28, 2021

dhvcc left a comment

Choose a reason for hiding this comment

dhvcc Dec 31, 2021

Choose a reason for hiding this comment

BBArikL Jan 6, 2022

Choose a reason for hiding this comment

dhvcc Dec 31, 2021

Choose a reason for hiding this comment

BBArikL Jan 6, 2022

Choose a reason for hiding this comment

dhvcc commented Jan 11, 2022

BBArikL commented Jan 11, 2022 • edited Loading

dhvcc commented Nov 18, 2021 •

edited

Loading

dhvcc commented Nov 19, 2021 •

edited

Loading

BBArikL commented Jan 11, 2022 •

edited

Loading