## RSS

You can find RSS feeds on many different sites. Most blogs and news web sites have them, for example [Tech Crunch](https://techcrunch.com/rssfeeds/), [New York Times](http://www.nytimes.com/services/xml/rss/index.html), [Vox](https://www.vox.com/rss/index.xml), [Naval's Podcast](https://nav.al/podcast/feed) and [NPR](https://help.npr.org/customer/portal/articles/2094175-where-can-i-find-npr-rss-feeds-). Even iTunes delivers podcasts by [aggregating RSS feeds](http://itunespartner.apple.com/en/podcasts/faq) from content creators. 

Today we are going to take a look at the [Buzzfeed Book Article Feed](https://www.buzzfeed.com/books). We will use the Python package [FeedParser](https://pypi.python.org/pypi/feedparser) to work with the RSS feed. FeedParser will allow us to deconstruct the data in the feed.

In [1]:
import feedparser
import pandas as pd

In [2]:
RSS_URL = "https://www.buzzfeed.com/books.xml"

In [3]:
feed = feedparser.parse(RSS_URL)

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly to a dictionary. For example, we can look at the keys it contains and what type of items those keys are.

In [4]:
type(feed)

feedparser.FeedParserDict

In [5]:
feed.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [6]:
type(feed.feed)

feedparser.FeedParserDict

We will look at some, but not all, of the data stored in the feed. For more information about the keys, see the [documentation](http://pythonhosted.org/feedparser/).

We can use the version to check which type of feed we have.

In [7]:
feed.version

'rss20'

In [8]:
type(feed.bozo)

int

Bozo is an interesing key to know about if you are going to parse a RSS feed in code. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) You can use the bozo bit to create error handling or just print a simple warning.

In [9]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

Well done, you have a well-formed feed!


We can look at some of the feed elements through the feed attribute.

In [10]:
feed.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'language', 'rights', 'rights_detail', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'authors', 'author', 'author_detail', 'publisher', 'publisher_detail', 'image'])

In [11]:
print(feed.feed.title)
print(feed.feed.title_detail)
print(feed.feed.links)
print(feed.feed.language)
print(feed.feed.updated)

BuzzFeed - Books
{'type': 'text/plain', 'language': None, 'base': 'https://www.buzzfeed.com/books.xml', 'value': 'BuzzFeed - Books'}
[{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.buzzfeed.com/books'}, {'href': 'https://www.buzzfeed.com/books.xml', 'rel': 'self', 'type': 'application/atom+xml'}]
en
Thu, 14 May 2020 09:44:33 +0000


The [reference section](http://pythonhosted.org/feedparser/reference.html) of the feedparser documenation shows us all the inforamtion thatcan be in a feed. [Annotated Examples](http://pythonhosted.org/feedparser/annotated-examples.html) are also provided. But note the caution provided-

"Caution: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present."

For example, our feed is RSS 2.0. One of the elements available in this version is the published date.

The data we are looking for are contained in the entries. Given the feed we are working with, how many entries do you think we have?

In [12]:
len(feed.entries)

100

The items in entries are stored as a list.

In [13]:
type(feed.entries)

list

In [14]:
feed.entries[0].title

'Everyone Is Either Lydia, Cassie Or Emily From "Finding Cassie Crazy" — Which One Are You?'

In [15]:
i = 0
for entry in feed.entries:
    print(i, feed.entries[i].title)
    i += 1

0 Everyone Is Either Lydia, Cassie Or Emily From "Finding Cassie Crazy" — Which One Are You?
1 Someone Pointed Out How Many Times Edward Cullen Chuckled In The First Twilight Book And It's A Lot
2 Only Someone Who Has Read "Harry Potter And The Prisoner Of Azkaban" At Least 3 Times Can Pass This Quiz
3 15 Things All Shakespeare Characters Knoweth To Be True
4 Did Shakespeare Create These Common Phrases — True Or False?
5 17 Historical Fiction Books That Will Immerse You In A Different Era
6 Author Emily Giffin Shared A Negative Rant About Meghan Markle And Now She's Being Called Out For Racism And Mom-Shaming
7 Only 17% Of People Can Get An "A" On This Eighth-Grade Vocab Quiz
8 Let's See If You Would Choose To Save The Same "Harry Potter" Characters As Everyone Else
9 20 Unpopular Opinion Polls About "Harry Potter" That Might Get You Heated
10 How Do Your YA Book Vs. Movie Opinions Stack Up Against Everyone Else's?
11 32 Book Adaptations You Can Stream On Netflix Right Now
12 19 Books 

Given that information, what is something we can do with this data? Why not make it a dataframe?

In [16]:
df = pd.DataFrame(feed.entries)

In [17]:
df.head()

Unnamed: 0,guidislink,href,id,link,links,media_thumbnail,published,published_parsed,summary,summary_detail,title,title_detail
0,False,,https://www.buzzfeed.com/clareaston/are-you-ly...,https://www.buzzfeed.com/clareaston/are-you-ly...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",[{'url': 'https://img.buzzfeed.com/buzzfeed-st...,"Thu, 14 May 2020 01:29:39 -0400","(2020, 5, 14, 5, 29, 39, 3, 135, 0)",<h1>The Brookfield/Ashbury war is starting up ...,"{'type': 'text/html', 'language': None, 'base'...","Everyone Is Either Lydia, Cassie Or Emily From...","{'type': 'text/plain', 'language': None, 'base..."
1,False,,https://www.buzzfeed.com/farrahpenn/someone-on...,https://www.buzzfeed.com/farrahpenn/someone-on...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",[{'url': 'https://img.buzzfeed.com/buzzfeed-st...,"Thu, 14 May 2020 00:25:33 -0400","(2020, 5, 14, 4, 25, 33, 3, 135, 0)","<h1>""He chucked blackly.""</h1><p><img src=""htt...","{'type': 'text/html', 'language': None, 'base'...",Someone Pointed Out How Many Times Edward Cull...,"{'type': 'text/plain', 'language': None, 'base..."
2,False,,https://www.buzzfeed.com/noradominick/harry-po...,https://www.buzzfeed.com/noradominick/harry-po...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",[{'url': 'https://img.buzzfeed.com/buzzfeed-st...,"Thu, 14 May 2020 01:25:27 -0400","(2020, 5, 14, 5, 25, 27, 3, 135, 0)","<h1>""I solemnly swear that I am up to no good....","{'type': 'text/html', 'language': None, 'base'...","Only Someone Who Has Read ""Harry Potter And Th...","{'type': 'text/plain', 'language': None, 'base..."
3,False,,https://www.buzzfeed.com/hanifahrahman/do-u-bi...,https://www.buzzfeed.com/hanifahrahman/do-u-bi...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",[{'url': 'https://img.buzzfeed.com/buzzfeed-st...,"Tue, 12 May 2020 12:34:41 -0400","(2020, 5, 12, 16, 34, 41, 1, 133, 0)","<h1>If thou knows, thou knows.</h1><p><img src...","{'type': 'text/html', 'language': None, 'base'...",15 Things All Shakespeare Characters Knoweth T...,"{'type': 'text/plain', 'language': None, 'base..."
4,False,,https://www.buzzfeed.com/crystalro/shakespeare...,https://www.buzzfeed.com/crystalro/shakespeare...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",[{'url': 'https://img.buzzfeed.com/buzzfeed-st...,"Wed, 13 May 2020 00:38:11 -0400","(2020, 5, 13, 4, 38, 11, 2, 134, 0)","<h1>To guess or not to guess, that is the ques...","{'type': 'text/html', 'language': None, 'base'...",Did Shakespeare Create These Common Phrases — ...,"{'type': 'text/plain', 'language': None, 'base..."
