# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [2]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [4]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [6]:
feedBurner = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [12]:
feedBurner.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [19]:
feedBurner['feed'].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [18]:
print("Title: " + feedBurner['feed']['title'])
print("Subtitle: " + feedBurner['feed']['subtitle'])
# print("Author: " + feedBurner['feed']['author'])
print("Link: " + feedBurner['feed']['link'])

Title: Radar
Subtitle: Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
Link: https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [26]:
len(feedBurner['entries'])

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [28]:
feedBurner['entries'][0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [30]:
for entry in feedBurner['entries']:
    print(entry['title'])

Four short links: 23 January 2020
Four short links: 22 January 2020
Four short links: 21 January 2020
Four short links: 20 January 2020
Four short links: 17 January 2020
Four short links: 16 January 2020
Reinforcement learning for the real world
Four short links: 15 January 2020
Four short links: 14 January 2020
Where programming languages are headed in 2020
Four short links: 13 January 2020
Four short links: 10 January 2020
Radar trends to watch: January 2020
Four short links: 9 January 2020
Four short links: 8 January 2020
9 additional books for the Next Economy
8 AI trends we’re watching in 2020
Four short links: 7 January 2020
Rethinking programming
Four short links: 6 January 2020
Four short links: 3 January 2020
Four short links: 2 January 2020
10+ books for the Next Economy
Four short links: 1 January 2020
Four short links: 31 December 2019
Four short links: 30 December 2019
Four short links: 27 December 2019
Four short links: 26 December 2019
Four short links: 25 December 2019


### 8. Calculate the percentage of "Four short links" entry titles.

In [43]:
fourShort = 0

for entry in feedBurner['entries']:
    if entry['title'].startswith("Four short links:"):
        fourShort += 1

print(fourShort/len(feedBurner['entries'])*100)

75.0


### 9. Create a Pandas data frame from the feed's entries.

In [44]:
import pandas as pd

In [51]:
dataTwo = pd.DataFrame(feedBurner['entries'])
dataTwo.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 23 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 23 Jan 2020 11:00:00 +0000","(2020, 1, 23, 11, 0, 0, 3, 23, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11539,False,The Business Case for Formal Methods &#8212; a...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Four short links: 22 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 22 Jan 2020 05:01:00 +0000","(2020, 1, 22, 5, 1, 0, 2, 22, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11535,False,Elements of Scheduling &#8212; notable for sev...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,Four short links: 21 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 21 Jan 2020 05:01:00 +0000","(2020, 1, 21, 5, 1, 0, 1, 21, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11531,False,Cytoscape &#8212; an open source software plat...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
3,Four short links: 20 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Mon, 20 Jan 2020 05:01:00 +0000","(2020, 1, 20, 5, 1, 0, 0, 20, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11525,False,AR Contact Lens &#8212; The path ahead is not ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
4,Four short links: 17 January 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 17 Jan 2020 05:01:00 +0000","(2020, 1, 17, 5, 1, 0, 4, 17, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11519,False,cursedfs &#8212; Make a disk image formatted w...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

### 12. Create a list of entry titles whose summary includes the phrase "machine learning."