## RSS

You can find RSS feeds on many different sites. [Library of Congress](https://www.loc.gov/rss/) has a lot. Most blogs and news web sites have them, for example [Tech Crunch](https://techcrunch.com/rssfeeds/), [New York Times](http://www.nytimes.com/services/xml/rss/index.html), and [NPR](https://help.npr.org/customer/portal/articles/2094175-where-can-i-find-npr-rss-feeds-). The [DC Public Library](http://www.dclibrary.org/) even gives you an RSS feed of your [catalog searches](https://catalog.dclibrary.org/client/rss/hitlist/dcpl/qu=python). iTunes delivers podcasts by [aggregating RSS feeds](http://itunespartner.apple.com/en/podcasts/faq) from content creators. 

Today we are going to take a look at the [Netflix Top 100 DVDs](https://dvd.netflix.com/RSSFeeds). We will use the Python package [FeedParser](https://pypi.python.org/pypi/feedparser) to work with the RSS feed. FeedParser will allow us to deconstruct the data in the feed.

In [1]:
import feedparser
import pandas as pd

In [2]:
RSS_URL = "https://usa.newonnetflix.info/feed"

In [3]:
feed = feedparser.parse(RSS_URL)

In [4]:
type(feed)

feedparser.util.FeedParserDict

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly to a dictionary. For example, we can look at the keys it contains and what type of items those keys are.

In [5]:
feed.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [6]:
type(feed.bozo)

bool

In [7]:
type(feed.feed)

feedparser.util.FeedParserDict

We will look at some, but not all, of the data stored in the feed. For more information about the keys, see the [documentation](http://pythonhosted.org/feedparser/).

We can use the version to check which type of feed we have.

In [8]:
feed.version

'rss20'

Bozo is an interesing key to know about if you are going to parse a RSS feed in code. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) You can use the bozo bit to create error handling or just print a simple warning.

In [9]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

Well done, you have a well-formed feed!


We can look at some of the feed elements through the feed attribute.

In [10]:
feed.feed.keys()

dict_keys(['webfeeds_analytics', 'title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'language', 'published', 'published_parsed', 'updated', 'updated_parsed', 'authors', 'author', 'author_detail', 'publisher', 'publisher_detail'])

In [11]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.description)

New On Netflix USA
https://usa.newonnetflix.info
RSS feed for new additions over the last 5 days to Netflix USA (100% unofficial!). A project by MaFt.co.uk


The [reference section](http://pythonhosted.org/feedparser/reference.html) of the feedparser documenation shows us all the inforamtion thatcan be in a feed. [Annotated Examples](http://pythonhosted.org/feedparser/annotated-examples.html) are also provided. But note the caution provided-

"Caution: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present."

For example, our feed is RSS 2.0. One of the elements available in this version is the published date.

In [17]:
feed.feed.published

'Wed, 18 Aug 2021 01:07:08 -0400'

We can see from our error, our feed is not using 'published'.

As with [standard python dictionaries](https://docs.python.org/3.5/library/stdtypes.html#dict), we can use the "get" method to see if a key exists. This is useful if we are writing code.

In [15]:
feed.feed.get('published', 'N/A')

'Wed, 18 Aug 2021 01:07:08 -0400'

The data we are looking for are contained in the entries. Given the feed we are working with, how many entries do you think we have?

In [18]:
len(feed.entries)

26

The items in entries are stored as a list.

In [19]:
type(feed.entries)

list

In [20]:
feed.entries[0].title

'18th Aug: Nneka The Pretty Serpent (2020), 2hr 20m [TV-14] (6/10)'

In [None]:
i = 0
for entry in feed.entries:
    print(i, feed.entries[i].title)
    i += 1

Given that information, what is something we can do with this data? Why not make it a dataframe?

In [22]:
df = pd.DataFrame(feed.entries)

In [23]:
df.head()

Unnamed: 0,title,title_detail,links,link,summary,summary_detail,published,published_parsed,id,guidislink
0,"18th Aug: Nneka The Pretty Serpent (2020), 2hr...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81435734,Years after the mysterious murder of her paren...,"{'type': 'text/html', 'language': None, 'base'...","Wed, 18 Aug 2021 01:07:08 -0400","(2021, 8, 18, 5, 7, 8, 2, 230, 0)",https://usa.newonnetflix.info/info/81435734,False
1,"18th Aug: Pahuna (2018), 1hr 22m [TV-PG] - Str...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81045052,[Streaming Again] Fleeing unrest in their nati...,"{'type': 'text/html', 'language': None, 'base'...","Tue, 17 Aug 2021 22:07:19 -0400","(2021, 8, 18, 2, 7, 19, 2, 230, 0)",https://usa.newonnetflix.info/info/81045052,False
2,"18th Aug: Black Island (2021), 1hr 44m [TV-MA]...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81170838,The dark secrets of a seemingly peaceful islan...,"{'type': 'text/html', 'language': None, 'base'...","Tue, 17 Aug 2021 22:07:08 -0400","(2021, 8, 18, 2, 7, 8, 2, 230, 0)",https://usa.newonnetflix.info/info/81170838,False
3,"18th Aug: The Defeated (2020), 1 Season [TV-MA...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81424068,"In 1946 Berlin, an American cop searches for h...","{'type': 'text/html', 'language': None, 'base'...","Tue, 17 Aug 2021 22:07:08 -0400","(2021, 8, 18, 2, 7, 8, 2, 230, 0)",https://usa.newonnetflix.info/info/81424068,False
4,18th Aug: Memories of a Murderer: The Nilsen T...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://usa.newonnetflix.info/info/81097791,Serial killer Dennis Nilsen narrates his life ...,"{'type': 'text/html', 'language': None, 'base'...","Tue, 17 Aug 2021 22:07:08 -0400","(2021, 8, 18, 2, 7, 8, 2, 230, 0)",https://usa.newonnetflix.info/info/81097791,False


Challenge: write code to create a dataframe of the top 10 movies from the Netflix Top 100 DVDs and iTunes. Check to see if your feed is well formed. Compile the name of the feed as the souce, the published date, the movie ranking in the list, the movie title, a link to the movie, and the summary. If the published date does not exist in the feed, use the current date. Save your dataframe as a csv. Here is a link to one [possible solution](./rss_challenge.py).

In [24]:
if feed.bozo == 0:
    print("well formed")
else:
    print (":(")

well formed


In [25]:
print(feed.entries)

[{'title': '18th Aug: Nneka The Pretty Serpent (2020), 2hr 20m [TV-14] (6/10)', 'title_detail': {'type': 'text/plain', 'language': None, 'base': 'https://usa.newonnetflix.info/feed', 'value': '18th Aug: Nneka The Pretty Serpent (2020), 2hr 20m [TV-14] (6/10)'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://usa.newonnetflix.info/info/81435734'}], 'link': 'https://usa.newonnetflix.info/info/81435734', 'summary': 'Years after the mysterious murder of her parents, a traumatized woman gains supernatural powers that aid in her quest for revenge against the killers.<br /><img src="https://occ-0-3419-3418.1.nflxso.net/dnm/api/v6/7e0PTVDdJ65eumyzagWiJKiw6MU/AAAABT97gR7HXuq53m2BTmHPnYTJ46_MWIbJvLKSPtMDO55EJmAma6C5mLqixkG4eGlWBvM9OctGwy4gF49l4OOxdLuGEw6a.jpg?r=5a5" />', 'summary_detail': {'type': 'text/html', 'language': None, 'base': 'https://usa.newonnetflix.info/feed', 'value': 'Years after the mysterious murder of her parents, a traumatized woman gains supernatural powe

In [35]:
print(feed.entries[0])

{'title': '18th Aug: Nneka The Pretty Serpent (2020), 2hr 20m [TV-14] (6/10)', 'title_detail': {'type': 'text/plain', 'language': None, 'base': 'https://usa.newonnetflix.info/feed', 'value': '18th Aug: Nneka The Pretty Serpent (2020), 2hr 20m [TV-14] (6/10)'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://usa.newonnetflix.info/info/81435734'}], 'link': 'https://usa.newonnetflix.info/info/81435734', 'summary': 'Years after the mysterious murder of her parents, a traumatized woman gains supernatural powers that aid in her quest for revenge against the killers.<br /><img src="https://occ-0-3419-3418.1.nflxso.net/dnm/api/v6/7e0PTVDdJ65eumyzagWiJKiw6MU/AAAABT97gR7HXuq53m2BTmHPnYTJ46_MWIbJvLKSPtMDO55EJmAma6C5mLqixkG4eGlWBvM9OctGwy4gF49l4OOxdLuGEw6a.jpg?r=5a5" />', 'summary_detail': {'type': 'text/html', 'language': None, 'base': 'https://usa.newonnetflix.info/feed', 'value': 'Years after the mysterious murder of her parents, a traumatized woman gains supernatural power

In [37]:
_keys=['id', 'published', 'title', 'link', 'summary']

In [49]:
_df_list = [entry for entry in feed.entries for k, v in entry.items() if k in _keys]

In [50]:
_df = pd.DataFrame(_df_list)

In [51]:
print(_df)

                                                 title  \
0    18th Aug: Nneka The Pretty Serpent (2020), 2hr...   
1    18th Aug: Nneka The Pretty Serpent (2020), 2hr...   
2    18th Aug: Nneka The Pretty Serpent (2020), 2hr...   
3    18th Aug: Nneka The Pretty Serpent (2020), 2hr...   
4    18th Aug: Nneka The Pretty Serpent (2020), 2hr...   
..                                                 ...   
125  13th Aug: The Kingdom (2021), 1 Season [TV-MA]...   
126  13th Aug: The Kingdom (2021), 1 Season [TV-MA]...   
127  13th Aug: The Kingdom (2021), 1 Season [TV-MA]...   
128  13th Aug: The Kingdom (2021), 1 Season [TV-MA]...   
129  13th Aug: The Kingdom (2021), 1 Season [TV-MA]...   

                                          title_detail  \
0    {'type': 'text/plain', 'language': None, 'base...   
1    {'type': 'text/plain', 'language': None, 'base...   
2    {'type': 'text/plain', 'language': None, 'base...   
3    {'type': 'text/plain', 'language': None, 'base...   
4    {'type':

In [56]:
_df.to_csv(r'C:\Users\maggi\01Jupyter\pycon_2017-master\rss_challenge_output.csv')

In [57]:
import os
print(os.path.exists(r'C:\Users\maggi\01Jupyter\pycon_2017-master\rss_challenge_output.csv'))

True
