# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
feedBurner = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
feedBurner.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
feedBurner['feed'].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [6]:
print("Title: " + feedBurner['feed']['title'])
print("Subtitle: " + feedBurner['feed']['subtitle'])
# print("Author: " + feedBurner['feed']['author'])
print("Link: " + feedBurner['feed']['link'])

Title: Radar
Subtitle: Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
Link: https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [7]:
len(feedBurner['entries'])

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [8]:
feedBurner['entries'][0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [9]:
for entry in feedBurner['entries']:
    print(entry['title'])

Four short links: 23 January 2020
Four short links: 22 January 2020
Four short links: 21 January 2020
Four short links: 20 January 2020
Four short links: 17 January 2020
Four short links: 16 January 2020
Reinforcement learning for the real world
Four short links: 15 January 2020
Four short links: 14 January 2020
Where programming languages are headed in 2020
Four short links: 13 January 2020
Four short links: 10 January 2020
Radar trends to watch: January 2020
Four short links: 9 January 2020
Four short links: 8 January 2020
9 additional books for the Next Economy
8 AI trends we’re watching in 2020
Four short links: 7 January 2020
Rethinking programming
Four short links: 6 January 2020
Four short links: 3 January 2020
Four short links: 2 January 2020
10+ books for the Next Economy
Four short links: 1 January 2020
Four short links: 31 December 2019
Four short links: 30 December 2019
Four short links: 27 December 2019
Four short links: 26 December 2019
Four short links: 25 December 2019


### 8. Calculate the percentage of "Four short links" entry titles.

In [10]:
fourShort = 0

for entry in feedBurner['entries']:
    if entry['title'].startswith("Four short links:"):
        fourShort += 1

print(fourShort/len(feedBurner['entries'])*100)

75.0


### 9. Create a Pandas data frame from the feed's entries.

In [11]:
import pandas as pd
from pandas.io.json import json_normalize

In [12]:
data = pd.DataFrame(json_normalize(feedBurner['entries']))
data.head()

Unnamed: 0,title,links,link,comments,published,published_parsed,authors,author,tags,id,...,feedburner_origlink,title_detail.type,title_detail.language,title_detail.base,title_detail.value,author_detail.name,summary_detail.type,summary_detail.language,summary_detail.base,summary_detail.value
0,Four short links: 23 January 2020,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 23 Jan 2020 11:00:00 +0000","(2020, 1, 23, 11, 0, 0, 3, 23, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11539,...,https://www.oreilly.com/radar/four-short-links...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Four short links: 23 January 2020,Nat Torkington,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,The Business Case for Formal Methods &#8212; a...
1,Four short links: 22 January 2020,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 22 Jan 2020 05:01:00 +0000","(2020, 1, 22, 5, 1, 0, 2, 22, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11535,...,https://www.oreilly.com/radar/four-short-links...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Four short links: 22 January 2020,Nat Torkington,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Elements of Scheduling &#8212; notable for sev...
2,Four short links: 21 January 2020,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 21 Jan 2020 05:01:00 +0000","(2020, 1, 21, 5, 1, 0, 1, 21, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11531,...,https://www.oreilly.com/radar/four-short-links...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Four short links: 21 January 2020,Nat Torkington,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Cytoscape &#8212; an open source software plat...
3,Four short links: 20 January 2020,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Mon, 20 Jan 2020 05:01:00 +0000","(2020, 1, 20, 5, 1, 0, 0, 20, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11525,...,https://www.oreilly.com/radar/four-short-links...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Four short links: 20 January 2020,Nat Torkington,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,AR Contact Lens &#8212; The path ahead is not ...
4,Four short links: 17 January 2020,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 17 Jan 2020 05:01:00 +0000","(2020, 1, 17, 5, 1, 0, 4, 17, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11519,...,https://www.oreilly.com/radar/four-short-links...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Four short links: 17 January 2020,Nat Torkington,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,cursedfs &#8212; Make a disk image formatted w...


### 10. Count the number of entries per author and sort them in descending order.

In [13]:
authors = data.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
5,Nat Torkington,45
3,Mike Loukides,4
2,Jenn Webb,3
0,,2
1,Alison McCauley,1
4,Mike Loukides and Ben Lorica,1
6,Pamela Rucker,1
7,Patrick Hall and Andrew Burt,1
8,Roger Magoulas,1
9,Zan McQuade and Amanda Quinn,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [14]:
data["title_length"] = data['title'].apply(lambda x: len(x))

In [15]:
data.sort_values("title_length", ascending=False)

Unnamed: 0,title,links,link,comments,published,published_parsed,authors,author,tags,id,...,title_detail.type,title_detail.language,title_detail.base,title_detail.value,author_detail.name,summary_detail.type,summary_detail.language,summary_detail.base,summary_detail.value,title_length
36,5 industries that demonstrate how blockchains ...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/5-industries-tha...,"Mon, 16 Dec 2019 11:00:00 +0000","(2019, 12, 16, 11, 0, 0, 0, 350, 0)",[{'name': 'Alison McCauley'}],Alison McCauley,"[{'term': 'Innovation & Disruption', 'scheme':...",https://www.oreilly.com/radar/?p=11177,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,5 industries that demonstrate how blockchains ...,Alison McCauley,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Although blockchain technology is still in its...,63
39,Why you should care about debugging machine le...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/why-you-should-c...,"Thu, 12 Dec 2019 11:00:00 +0000","(2019, 12, 12, 11, 0, 0, 3, 346, 0)",[{'name': 'Patrick Hall and Andrew Burt'}],Patrick Hall and Andrew Burt,"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11197,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Why you should care about debugging machine le...,Patrick Hall and Andrew Burt,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,For all the excitement about machine learning ...,59
56,Moving AI and ML from research into production,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/moving-ai-and-ml...,"Tue, 26 Nov 2019 05:10:13 +0000","(2019, 11, 26, 5, 10, 13, 1, 330, 0)",[{'name': 'Jenn Webb'}],Jenn Webb,"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=10241,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Moving AI and ML from research into production,Jenn Webb,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,In this interview from O&#8217;Reilly Foo Camp...,46
9,Where programming languages are headed in 2020,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/where-programmin...,"Mon, 13 Jan 2020 11:30:00 +0000","(2020, 1, 13, 11, 30, 0, 0, 13, 0)",[{'name': 'Zan McQuade and Amanda Quinn'}],Zan McQuade and Amanda Quinn,"[{'term': 'Innovation & Disruption', 'scheme':...",https://www.oreilly.com/radar/?p=11305,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Where programming languages are headed in 2020,Zan McQuade and Amanda Quinn,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,"As we enter a new decade, we asked programming...",46
33,AI is computer science disguised as hard work,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/ai-is-computer-s...,"Wed, 18 Dec 2019 11:00:00 +0000","(2019, 12, 18, 11, 0, 0, 2, 352, 0)",[{'name': 'Jenn Webb'}],Jenn Webb,"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=10816,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,AI is computer science disguised as hard work,Jenn Webb,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Roger Magoulas recently sat down with Rob Thom...,45
6,Reinforcement learning for the real world,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/reinforcement-le...,"Wed, 15 Jan 2020 11:00:00 +0000","(2020, 1, 15, 11, 0, 0, 2, 15, 0)",[{'name': 'Jenn Webb'}],Jenn Webb,"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11335,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Reinforcement learning for the real world,Jenn Webb,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Roger Magoulas recently sat down with Edward J...,41
49,Use your people as competitive advantage,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/use-your-people-...,"Tue, 03 Dec 2019 09:00:00 +0000","(2019, 12, 3, 9, 0, 0, 1, 337, 0)",[{'name': 'Pamela Rucker'}],Pamela Rucker,"[{'term': 'Future of the Firm', 'scheme': None...",https://www.oreilly.com/radar/?p=11068,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Use your people as competitive advantage,Pamela Rucker,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,"In a fast-paced digital world, it is tempting ...",40
15,9 additional books for the Next Economy,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/more-books-for-t...,"Wed, 08 Jan 2020 05:01:00 +0000","(2020, 1, 8, 5, 1, 0, 2, 8, 0)",[{}],,"[{'term': 'Next Economy', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=11466,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,9 additional books for the Next Economy,,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,We originally shared a selection of books rele...,39
46,Radar trends to watch: December 2019,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Thu, 05 Dec 2019 12:00:00 +0000","(2019, 12, 5, 12, 0, 0, 3, 339, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=11118,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Radar trends to watch: December 2019,Mike Loukides,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Privacy and security trends DNS over HTTPS is ...,36
12,Radar trends to watch: January 2020,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Thu, 09 Jan 2020 12:00:00 +0000","(2020, 1, 9, 12, 0, 0, 3, 9, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,"[{'term': 'Innovation & Disruption', 'scheme':...",https://www.oreilly.com/radar/?p=11452,...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Radar trends to watch: January 2020,Mike Loukides,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,3 thoughts for 2020 I’m kicking things off wit...,35


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [16]:
listMachineLearning = [content for content in data['summary'] if "machine learning" in content.lower()]
listMachineLearning

['Simulated Customer &#8212; The site will randomly generate one of 40 different [sales] objections, and give you 20 seconds to answer it. From Shallow to Deep Interactions Between Knowledge Representation, Reasoning, and Machine Learning &#8212; This paper proposes a tentative and original survey of meeting points between knowledge representation and reasoning (KRR) and machine learning [&#8230;]',
 'For all the excitement about machine learning (ML), there are serious impediments to its widespread adoption. Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing problems in ML models, is so critical to the future of ML. Without being able [&#8230;]',
 'Roughly a year ago, we wrote “What machine learning means for software development.” In that article, we talked about Andrej Karpathy’s concept of Software 2.0. Karpathy argues that we’re at the beginning of a profound change in the way software is