## RSS

You can find RSS feeds on many different sites. [Library of Congress](https://www.loc.gov/rss/) has a lot. Most blogs and news web sites have them, for example [Tech Crunch](https://techcrunch.com/rssfeeds/), [New York Times](http://www.nytimes.com/services/xml/rss/index.html), and [NPR](https://help.npr.org/customer/portal/articles/2094175-where-can-i-find-npr-rss-feeds-). The [DC Public Library](http://www.dclibrary.org/) even gives you an RSS feed of your [catalog searches](https://catalog.dclibrary.org/client/rss/hitlist/dcpl/qu=python). iTunes delivers podcasts by [aggregating RSS feeds](http://itunespartner.apple.com/en/podcasts/faq) from content creators. 

Today we are going to take a look at the [Netflix Top 100 DVDs](https://dvd.netflix.com/RSSFeeds). We will use the Python package [FeedParser](https://pypi.python.org/pypi/feedparser) to work with the RSS feed. FeedParser will allow us to deconstruct the data in the feed.

In [1]:
import feedparser
import pandas as pd

In [38]:
RSS_URL = "https://usa.newonnetflix.info/feed"#"http://dvd.netflix.com/Top100RSS"

In [39]:
feed = feedparser.parse(RSS_URL)

In [40]:
type(feed)

feedparser.util.FeedParserDict

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly to a dictionary. For example, we can look at the keys it contains and what type of items those keys are.

In [41]:
feed.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [42]:
type(feed.bozo)

bool

In [43]:
type(feed.feed)

feedparser.util.FeedParserDict

We will look at some, but not all, of the data stored in the feed. For more information about the keys, see the [documentation](http://pythonhosted.org/feedparser/).

We can use the version to check which type of feed we have.

In [44]:
feed.version

'rss20'

Bozo is an interesing key to know about if you are going to parse a RSS feed in code. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) You can use the bozo bit to create error handling or just print a simple warning.

In [45]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

Well done, you have a well-formed feed!


We can look at some of the feed elements through the feed attribute.

In [46]:
feed.feed.keys()

dict_keys(['webfeeds_analytics', 'title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'language', 'published', 'published_parsed', 'updated', 'updated_parsed', 'authors', 'author', 'author_detail', 'publisher', 'publisher_detail'])

In [47]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.description)

New On Netflix USA
https://usa.newonnetflix.info
RSS feed for new additions over the last 5 days to Netflix USA (100% unofficial!). A project by MaFt.co.uk


The [reference section](http://pythonhosted.org/feedparser/reference.html) of the feedparser documenation shows us all the inforamtion thatcan be in a feed. [Annotated Examples](http://pythonhosted.org/feedparser/annotated-examples.html) are also provided. But note the caution provided-

"Caution: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present."

For example, our feed is RSS 2.0. One of the elements available in this version is the published date.

In [48]:
feed.feed.published

'Thu, 07 Oct 2021 11:07:08 -0400'

We can see from our error, our feed is not using 'published'.

As with [standard python dictionaries](https://docs.python.org/3.5/library/stdtypes.html#dict), we can use the "get" method to see if a key exists. This is useful if we are writing code.

In [49]:
feed.feed.get('published', 'N/A')

'Thu, 07 Oct 2021 11:07:08 -0400'

The data we are looking for are contained in the entries. Given the feed we are working with, how many entries do you think we have?

In [50]:
len(feed.entries)

23

The items in entries are stored as a list.

In [51]:
type(feed.entries)

list

In [52]:
feed.entries[0].title

'7th Oct: Thalaivii (2021), 2hr 28m [TV-PG] (6/10)'

In [53]:
i = 0
for entry in feed.entries:
    print(i, feed.entries[i].title)
    i += 1

0 7th Oct: Thalaivii (2021), 2hr 28m [TV-PG] (6/10)
1 7th Oct: Encounters (2019), 1 Season [TV-PG] (6/10)
2 7th Oct: Making The Billion Dollar Code (2021), 28m [TV-PG] (6/10)
3 7th Oct: The Ingenuity of the Househusband (2021), 1 Season [TV-G] - New Episodes (6.35/10)
4 7th Oct: The Billion Dollar Code (2021), 1 Season [TV-MA] (6/10)
5 7th Oct: The Way of the Househusband (2021), 1 Season [TV-MA] - New Episodes (6.65/10)
6 7th Oct: Sexy Beasts (2021), 2 Seasons [TV-14] - New Episodes (5.3/10)
7 6th Oct: Ella Fitzgerald: Just One of Those Things (2019), 1hr 29m [TV-14] (6.4/10)
8 6th Oct: Bad Sport (2021), 1 Volume [TV-MA] (6/10)
9 6th Oct: Baking Impossible (2021), 1 Season [TV-PG] (6/10)
10 6th Oct: The Five Juanas (2021), 1 Season [TV-MA] (6/10)
11 6th Oct: Love Is Blind: Brazil (2021), 1 Season [TV-MA] (6/10)
12 6th Oct: There's Someone Inside Your House (2021), 1hr 36m [TV-MA] (6/10)
13 6th Oct: The Blacklist (2021), 8 Seasons [TV-14] - New Episodes (7/10)
14 5th Oct: Remember You 

Given that information, what is something we can do with this data? Why not make it a dataframe?

In [54]:
df = pd.DataFrame(feed.entries)

In [55]:
df.head()

Unnamed: 0,title,title_detail,links,link,summary,summary_detail,published,published_parsed,id,guidislink
0,"7th Oct: Thalaivii (2021), 2hr 28m [TV-PG] (6/10)","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81220676,This biopic charts the life of actor-turned-ch...,"{'type': 'text/html', 'language': None, 'base'...","Thu, 07 Oct 2021 11:07:08 -0400","(2021, 10, 7, 15, 7, 8, 3, 280, 0)",https://usa.newonnetflix.info/info/81220676,False
1,"7th Oct: Encounters (2019), 1 Season [TV-PG] (...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81453806,"To please her mother, a woman sets out to find...","{'type': 'text/html', 'language': None, 'base'...","Thu, 07 Oct 2021 01:07:09 -0400","(2021, 10, 7, 5, 7, 9, 3, 280, 0)",https://usa.newonnetflix.info/info/81453806,False
2,7th Oct: Making The Billion Dollar Code (2021)...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81503864,"In this featurette, ART + COM members join the...","{'type': 'text/html', 'language': None, 'base'...","Thu, 07 Oct 2021 01:07:09 -0400","(2021, 10, 7, 5, 7, 9, 3, 280, 0)",https://usa.newonnetflix.info/info/81503864,False
3,7th Oct: The Ingenuity of the Househusband (20...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81389585,[New Episodes] A tough guy with a knack for ho...,"{'type': 'text/html', 'language': None, 'base'...","Wed, 06 Oct 2021 22:15:46 -0400","(2021, 10, 7, 2, 15, 46, 3, 280, 0)",https://usa.newonnetflix.info/info/81389585,False
4,"7th Oct: The Billion Dollar Code (2021), 1 Sea...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81074012,"In 1990s Berlin, an artist and a hacker invent...","{'type': 'text/html', 'language': None, 'base'...","Wed, 06 Oct 2021 22:07:27 -0400","(2021, 10, 7, 2, 7, 27, 3, 280, 0)",https://usa.newonnetflix.info/info/81074012,False


Challenge: write code to create a dataframe of the top 10 movies from the Netflix Top 100 DVDs and iTunes. Check to see if your feed is well formed. Compile the name of the feed as the souce, the published date, the movie ranking in the list, the movie title, a link to the movie, and the summary. If the published date does not exist in the feed, use the current date. Save your dataframe as a csv. Here is a link to one [possible solution](./rss_challenge.py).