In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# RSS

You can find RSS feeds on many different sites. [Library of Congress](https://www.loc.gov/rss/) has a lot. Most blogs and news web sites have them, for example [Tech Crunch](https://techcrunch.com/rssfeeds/), [New York Times](http://www.nytimes.com/services/xml/rss/index.html), and [NPR](https://help.npr.org/customer/portal/articles/2094175-where-can-i-find-npr-rss-feeds-). The [DC Public Library](http://www.dclibrary.org/) even gives you an RSS feed of your [catalog searches](https://catalog.dclibrary.org/client/rss/hitlist/dcpl/qu=python). iTunes delivers podcasts by [aggregating RSS feeds](http://itunespartner.apple.com/en/podcasts/faq) from content creators. 

Today we are going to take a look at the [Netflix Top 100 DVDs](https://dvd.netflix.com/RSSFeeds). We will use the Python package [FeedParser](https://pypi.python.org/pypi/feedparser) to work with the RSS feed. FeedParser will allow us to deconstruct the data in the feed.

### NYTimes feeds
https://blog.feedspot.com/nytimes_rss_feeds/

In [2]:
import feedparser
import pandas as pd

In [3]:
# next line is original, which no longer exists
#RSS_URL = "http://dvd.netflix.com/Top100RSS"

RSS_URL = "https://rss.nytimes.com/services/xml/rss/nyt/Science.xml"

In [4]:
feed = feedparser.parse(RSS_URL)

In [5]:
type(feed)

feedparser.FeedParserDict

"parse" is the primary function in FeedParser. The returned object is dictionary like and can be handled similarly to a dictionary. For example, we can look at the keys it contains and what type of items those keys are.

In [6]:
feed.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [7]:
type(feed.bozo)

int

In [8]:
type(feed.feed)

feedparser.FeedParserDict

We will look at some, but not all, of the data stored in the feed. For more information about the keys, see the [documentation](http://pythonhosted.org/feedparser/).

We can use the version to check which type of feed we have.

In [9]:
feed.version

'rss20'

Bozo is an interesing key to know about if you are going to parse a RSS feed in code. FeedParser sets the bozo bit when it detects a feed is not well-formed. (FeedParser will still parse the feed if it is not well-formed.) You can use the bozo bit to create error handling or just print a simple warning.

In [10]:
if feed.bozo == 0:
    print("Well done, you have a well-formed feed!")
else:
    print("Potential trouble ahead.")

Well done, you have a well-formed feed!


### We can look at some of the feed elements through the feed attribute.

In [11]:
feed.feed

{'title': 'NYT > Science',
 'title_detail': {'type': 'text/plain',
  'language': None,
  'base': 'https://rss.nytimes.com/services/xml/rss/nyt/Science.xml',
  'value': 'NYT > Science'},
 'links': [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.nytimes.com/section/science'},
  {'href': 'https://rss.nytimes.com/services/xml/rss/nyt/Science.xml',
   'rel': 'self',
   'type': 'application/rss+xml'}],
 'link': 'https://www.nytimes.com/section/science',
 'subtitle': '',
 'subtitle_detail': {'type': 'text/html',
  'language': None,
  'base': 'https://rss.nytimes.com/services/xml/rss/nyt/Science.xml',
  'value': ''},
 'language': 'en-us',
 'rights': 'Copyright 2020 The New York Times Company',
 'rights_detail': {'type': 'text/plain',
  'language': None,
  'base': 'https://rss.nytimes.com/services/xml/rss/nyt/Science.xml',
  'value': 'Copyright 2020 The New York Times Company'},
 'updated': 'Thu, 10 Sep 2020 11:46:06 +0000',
 'updated_parsed': time.struct_time(tm_year=2020

In [12]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.links[1].href)
print(feed.feed.updated)

NYT > Science
https://www.nytimes.com/section/science
https://rss.nytimes.com/services/xml/rss/nyt/Science.xml
Thu, 10 Sep 2020 11:46:06 +0000


The [reference section](http://pythonhosted.org/feedparser/reference.html) of the feedparser documenation shows us all the inforamtion thatcan be in a feed. [Annotated Examples](http://pythonhosted.org/feedparser/annotated-examples.html) are also provided. But note the caution provided-

"Caution: Even though many of these elements are required according to the specification, real-world feeds may be missing any element. If an element is not present in the feed, it will not be present in the parsed results. You should not rely on any particular element being present."

For example, our feed is RSS 2.0. One of the elements available in this version is the published date. Another is contributors.

In [13]:
feed.feed.published

'Thu, 10 Sep 2020 11:46:06 +0000'

In [14]:
feed.feed.contributors

AttributeError: object has no attribute 'contributors'

We can see from our error, our feed is not using 'contributors'.

As with [standard python dictionaries](https://docs.python.org/3.5/library/stdtypes.html#dict), we can use the "get" method to see if a key exists. This is useful if we are writing code.

In [15]:
feed.feed.get('contributors', 'N/A')

'N/A'

### The data we are looking for are contained in the entries.
Given the feed we are working with, how many entries do you think we have?

In [16]:
len(feed.entries)

36

The items in entries are stored as a list.

In [17]:
type(feed.entries)

list

In [18]:
feed.entries[0]

{'title': 'These Hummingbirds Take Extreme Naps. Some May Even Hibernate.',
 'title_detail': {'type': 'text/plain',
  'language': None,
  'base': 'https://rss.nytimes.com/services/xml/rss/nyt/Science.xml',
  'value': 'These Hummingbirds Take Extreme Naps. Some May Even Hibernate.'},
 'links': [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.nytimes.com/2020/09/08/science/hummingbirds-torpor-hibernation.html'},
  {'href': 'https://www.nytimes.com/2020/09/08/science/hummingbirds-torpor-hibernation.html',
   'rel': 'standout',
   'type': 'text/html'}],
 'link': 'https://www.nytimes.com/2020/09/08/science/hummingbirds-torpor-hibernation.html',
 'id': 'https://www.nytimes.com/2020/09/08/science/hummingbirds-torpor-hibernation.html',
 'guidislink': False,
 'summary': 'To adapt to life in the Andes Mountains, some South American species go into exceptionally deep torpor to save energy.',
 'summary_detail': {'type': 'text/html',
  'language': None,
  'base': 'https://rss.n

In [51]:
feed.entries[0].title
feed.entries[0].summary
feed.entries[0].authors[0].name
feed.entries[0].published

'These Hummingbirds Take Extreme Naps. Some May Even Hibernate.'

'To adapt to life in the Andes Mountains, some South American species go into exceptionally deep torpor to save energy.'

'Veronique Greenwood'

'Tue, 08 Sep 2020 23:01:21 +0000'

In [43]:
feed.entries[11].title
feed.entries[11].summary
#feed.entries[11].authors[0].name

temp = feed.entries[11].get('authors', 'unknown')
if temp != 'unknown':
    author = temp[0].name
else:
    author = 'unknown'
author

'Covid-19 News: Live Updates'

'The measure, which Democrats see as inadequate, is likely to fail.'

'unknown'

In [40]:
i = 0
for entry in feed.entries:
    print(i, entry.title, entry.authors[0].name)
    i += 1

0 These Hummingbirds Take Extreme Naps. Some May Even Hibernate. Veronique Greenwood
1 When These Sea Anemones Eat, It Goes Straight to Their Arms Cara Giaimo
2 Old Male Elephants: Don’t Count Them Out Rachel Nuwer
3 How a Praying Mantis Says ‘Boo!’ Cara Giaimo
4 Up Is Down in This Fun Physics Experiment Kenneth Chang
5 A Turtle With a Permanent Smile Was Brought Back From Near Extinction Rachel Nuwer
6 These Black Holes Shouldn’t Exist, but There They Are Dennis Overbye
7 Melting Glaciers Are Filling Unstable Lakes. And They’re Growing. Katherine Kornei
8 DIY Coronavirus Vaccines? These Scientists Are Giving Themselves Their Own Heather Murphy
9 Inquiry Begins Into AstraZeneca's Vaccine Trial Katherine J. Wu
10 Wildfires Are Worsening. The Way We Manage Them Isn’t Keeping Pace. Brad Plumer and John Schwartz


AttributeError: object has no attribute 'authors'

### Given that information, what is something we can do with this data? Why not make it a dataframe?

In [21]:
df = pd.DataFrame(feed.entries)

In [22]:
df.columns
df.shape

Index(['title', 'title_detail', 'links', 'link', 'id', 'guidislink', 'summary',
       'summary_detail', 'authors', 'author', 'author_detail', 'published',
       'published_parsed', 'tags', 'media_content', 'media_credit', 'credit',
       'content'],
      dtype='object')

(36, 18)

In [23]:
df.head()

Unnamed: 0,title,title_detail,links,link,id,guidislink,summary,summary_detail,authors,author,author_detail,published,published_parsed,tags,media_content,media_credit,credit,content
0,These Hummingbirds Take Extreme Naps. Some May...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.nytimes.com/2020/09/08/science/hum...,https://www.nytimes.com/2020/09/08/science/hum...,False,"To adapt to life in the Andes Mountains, some ...","{'type': 'text/html', 'language': None, 'base'...",[{'name': 'Veronique Greenwood'}],Veronique Greenwood,{'name': 'Veronique Greenwood'},"Tue, 08 Sep 2020 23:01:21 +0000","(2020, 9, 8, 23, 1, 21, 1, 252, 0)","[{'term': 'Hummingbirds', 'scheme': 'http://ww...","[{'height': '151', 'medium': 'image', 'url': '...",[{'content': 'Gabbro/Alamy'}],Gabbro/Alamy,"[{'type': 'text/plain', 'language': None, 'bas..."
1,"When These Sea Anemones Eat, It Goes Straight ...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.nytimes.com/2020/09/05/science/sea...,https://www.nytimes.com/2020/09/05/science/sea...,False,They’re the first animals known to turn food i...,"{'type': 'text/html', 'language': None, 'base'...",[{'name': 'Cara Giaimo'}],Cara Giaimo,{'name': 'Cara Giaimo'},"Sat, 05 Sep 2020 20:39:36 +0000","(2020, 9, 5, 20, 39, 36, 5, 249, 0)","[{'term': 'Fish and Other Marine Life', 'schem...","[{'height': '151', 'medium': 'image', 'url': '...",[{'content': 'Anniek Stokkermans/Embl'}],Anniek Stokkermans/Embl,"[{'type': 'text/plain', 'language': None, 'bas..."
2,Old Male Elephants: Don’t Count Them Out,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.nytimes.com/2020/09/04/science/mal...,https://www.nytimes.com/2020/09/04/science/mal...,False,New research challenges the assumption that bu...,"{'type': 'text/html', 'language': None, 'base'...",[{'name': 'Rachel Nuwer'}],Rachel Nuwer,{'name': 'Rachel Nuwer'},"Fri, 04 Sep 2020 21:41:57 +0000","(2020, 9, 4, 21, 41, 57, 4, 248, 0)","[{'term': 'Elephants', 'scheme': 'http://www.n...","[{'height': '151', 'medium': 'image', 'url': '...",[{'content': 'Connie Allen'}],Connie Allen,"[{'type': 'text/plain', 'language': None, 'bas..."
3,How a Praying Mantis Says ‘Boo!’,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.nytimes.com/2020/09/03/science/pra...,https://www.nytimes.com/2020/09/03/science/pra...,False,A study of startle displays hints at why provo...,"{'type': 'text/html', 'language': None, 'base'...",[{'name': 'Cara Giaimo'}],Cara Giaimo,{'name': 'Cara Giaimo'},"Thu, 03 Sep 2020 16:11:00 +0000","(2020, 9, 3, 16, 11, 0, 3, 247, 0)","[{'term': 'Praying Mantis', 'scheme': 'http://...","[{'height': '151', 'medium': 'image', 'url': '...",[{'content': 'Dave Hunt/Alamy'}],Dave Hunt/Alamy,"[{'type': 'text/html', 'language': None, 'base..."
4,Up Is Down in This Fun Physics Experiment,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.nytimes.com/2020/09/03/science/flo...,https://www.nytimes.com/2020/09/03/science/flo...,False,"The liquid levitates, and a boat floats along ...","{'type': 'text/html', 'language': None, 'base'...",[{'name': 'Kenneth Chang'}],Kenneth Chang,{'name': 'Kenneth Chang'},"Fri, 04 Sep 2020 02:26:21 +0000","(2020, 9, 4, 2, 26, 21, 4, 248, 0)","[{'term': 'Physics', 'scheme': 'http://www.nyt...",,,,


### Challenge
##### next paragraph is from original notebook, which used a feed no longer available
write code to create a dataframe of the top 10 movies from the Netflix Top 100 DVDs and iTunes. Check to see if your feed is well formed. Compile the name of the feed as the souce, the published date, the movie ranking in the list, the movie title, a link to the movie, and the summary. If the published date does not exist in the feed, use the current date. Save your dataframe as a csv. Here is a link to one [possible solution](./rss_challenge.py).   
   
##### this paragraph is what will be done with working feed in this notebook
For csv file, want these columns: feed date, article date, NYTimes section, title, author, link, summary

##### figure out how to get individual data values

In [None]:
print(feed.feed.title)
print(feed.feed.link)
print(feed.feed.links[1].href)
print(feed.feed.updated)

In [30]:
section = feed.feed.title.replace('NYT' , 'NY Times').replace(' >', '')
section

'NY Times Science'

In [31]:
date = feed.feed.updated
date

'Thu, 10 Sep 2020 11:46:06 +0000'

In [53]:
published = feed.entries[0].published
title = feed.entries[0].title
link = feed.entries[0].link
author = feed.entries[0].authors[0].name
summary = feed.entries[0].summary
published
title
link
author
summary

'Tue, 08 Sep 2020 23:01:21 +0000'

'These Hummingbirds Take Extreme Naps. Some May Even Hibernate.'

'https://www.nytimes.com/2020/09/08/science/hummingbirds-torpor-hibernation.html'

'Veronique Greenwood'

'To adapt to life in the Andes Mountains, some South American species go into exceptionally deep torpor to save energy.'

##### functions

In [56]:
def make_df_from_feed(rss):
    feed = feedparser.parse(rss)
    
    section = feed.feed.title.replace('NYT' , 'NY Times').replace(' >', '')
    feed_date = feed.feed.updated
    
    df_cols = ['section' , 'feed date' , 'article date' , 'title' , 'link' , 'author' , 'summary']
    df_nyt = pd.DataFrame(columns = df_cols)
    
    for entry in feed.entries:
        title = entry.title
        link = entry.link
        summary = entry.summary
        article_date = entry.published
        
        #author = entry.authors[0].name
        author = 'unknown'
        temp = entry.get('authors', 'N/A')
        if temp != 'N/A':
            author = temp[0].name
       
        feed_data = pd.Series([section, feed_date, article_date, title, link, author, summary], df_cols)
        df_nyt = df_nyt.append(feed_data, ignore_index = True)   
    
    return df_nyt

In [49]:
def make_csv_from_feed(rss):
    df_nytimes = make_df_from_feed(rss)
    
    df_nytimes.to_csv('nytimes_science_feed.csv', index=False)
    
    return

##### DO IT

In [57]:
RSS_URL = "https://rss.nytimes.com/services/xml/rss/nyt/Science.xml"

dt_nytimes = make_df_from_feed(RSS_URL)

In [58]:
dt_nytimes.shape
dt_nytimes.columns
dt_nytimes

(36, 7)

Index(['section', 'feed date', 'article date', 'title', 'link', 'author',
       'summary'],
      dtype='object')

Unnamed: 0,section,feed date,article date,title,link,author,summary
0,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Tue, 08 Sep 2020 23:01:22 +0000",These Hummingbirds Take Extreme Naps. Some May...,https://www.nytimes.com/2020/09/08/science/hum...,Veronique Greenwood,"To adapt to life in the Andes Mountains, some ..."
1,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Sat, 05 Sep 2020 20:39:36 +0000","When These Sea Anemones Eat, It Goes Straight ...",https://www.nytimes.com/2020/09/05/science/sea...,Cara Giaimo,They’re the first animals known to turn food i...
2,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Fri, 04 Sep 2020 21:41:57 +0000",Old Male Elephants: Don’t Count Them Out,https://www.nytimes.com/2020/09/04/science/mal...,Rachel Nuwer,New research challenges the assumption that bu...
3,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Thu, 03 Sep 2020 16:11:00 +0000",How a Praying Mantis Says ‘Boo!’,https://www.nytimes.com/2020/09/03/science/pra...,Cara Giaimo,A study of startle displays hints at why provo...
4,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Fri, 04 Sep 2020 02:26:21 +0000",Up Is Down in This Fun Physics Experiment,https://www.nytimes.com/2020/09/03/science/flo...,Kenneth Chang,"The liquid levitates, and a boat floats along ..."
5,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Thu, 03 Sep 2020 09:00:30 +0000",A Turtle With a Permanent Smile Was Brought Ba...,https://www.nytimes.com/2020/09/03/science/bur...,Rachel Nuwer,Scientists have rebuilt the population of Burm...
6,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Thu, 03 Sep 2020 19:35:40 +0000","These Black Holes Shouldn’t Exist, but There T...",https://www.nytimes.com/2020/09/02/science/bla...,Dennis Overbye,"On the far side of the universe, a collision o..."
7,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Wed, 02 Sep 2020 09:00:32 +0000",Melting Glaciers Are Filling Unstable Lakes. A...,https://www.nytimes.com/2020/09/02/science/glo...,Katherine Kornei,A census of the world’s glacial lakes shows th...
8,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Tue, 08 Sep 2020 22:39:46 +0000",DIY Coronavirus Vaccines? These Scientists Are...,https://www.nytimes.com/2020/09/01/science/cov...,Heather Murphy,"Impatient for a coronavirus vaccine, dozens of..."
9,NY Times Science,"Thu, 10 Sep 2020 14:03:54 +0000","Thu, 10 Sep 2020 13:50:09 +0000",Inquiry Begins Into AstraZeneca's Coronavirus ...,https://www.nytimes.com/2020/09/10/health/covi...,Katherine J. Wu,A participant in the company’s late-stage coro...


In [59]:
make_csv_from_feed(RSS_URL)