# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [2]:
!pip install feedparser

Defaulting to user installation because normal site-packages is not writeable
Collecting feedparser
  Downloading feedparser-6.0.8-py3-none-any.whl (81 kB)
[K     |████████████████████████████████| 81 kB 895 kB/s eta 0:00:01
[?25hCollecting sgmllib3k
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
Using legacy 'setup.py install' for sgmllib3k, since package 'wheel' is not installed.
Installing collected packages: sgmllib3k, feedparser
    Running setup.py install for sgmllib3k ... [?25ldone
[?25hSuccessfully installed feedparser-6.0.8 sgmllib3k-1.0.0


In [3]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [4]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [9]:
d = feedparser.parse(url)


### 2. Obtain a list of components (keys) that are available for this feed.

In [10]:
d.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [42]:
d.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [78]:
d.entries[15].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink'])

In [184]:
d.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [124]:
title = d.feed.title
subtitle = d.feed.subtitle
author = d.entries[0].author
link = d.feed.link
print(f'feed: {title} \nsubtitle: {subtitle} \nauthor: {author} \nlink: {link}')


feed: Radar 
subtitle: Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology 
author: Mike Loukides 
link: https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [49]:
len(d.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [142]:
components = list(d.entries[0].keys())
components


['title',
 'title_detail',
 'links',
 'link',
 'comments',
 'published',
 'published_parsed',
 'authors',
 'author',
 'author_detail',
 'tags',
 'id',
 'guidislink',
 'summary',
 'summary_detail',
 'content',
 'wfw_commentrss',
 'slash_comments',
 'feedburner_origlink']

### 7. Extract a list of entry titles.

In [143]:
titles = []

for i in d.entries:
    titles.append(i.title) 
titles

['Radar trends to watch: July 2021',
 'Hand Labeling Considered Harmful',
 'Two economies. Two sets of rules.',
 'Communal Computing',
 'Code as Infrastructure',
 'Radar trends to watch: June 2021',
 'AI Powered Misinformation and Manipulation at Scale #GPT-3',
 'DeepCheapFakes',
 'Radar trends to watch: May 2021',
 'Checking Jeff Bezos’s Math',
 'AI Adoption in the Enterprise 2021',
 'NFTs: Owning Digital Art',
 'Radar trends to watch: April 2021',
 'InfoTribes, Reality Brokers',
 'The End of Silicon Valley as We Know It?',
 'The Next Generation of AI',
 'Radar trends to watch: March 2021',
 'Product Management for AI',
 '5 things on our data and AI radar for 2021',
 '5 infrastructure and operations trends to watch in 2021',
 'The Wrong Question',
 'Radar trends to watch: February 2021',
 'Where Programming, Ops, AI, and the Cloud are Headed in 2021',
 'Seven Legal Questions for Data Scientists',
 'Patterns',
 'Radar trends to watch: January 2021',
 'Four short links: 14 Dec 2020',
 '

### 8. Calculate the percentage of "Four short links" entry titles.

In [154]:
short_links = []

for i in titles:
    if i.startswith('Four short links'):
        short_links.append(i)

p = len(short_links)/len(titles)

print('Percentage: ',str(round(p*100,4)) + ' ' + '%')

Percentage:  38.3333 %


### 9. Create a Pandas data frame from the feed's entries.

In [155]:
import pandas as pd

In [157]:
df = pd.json_normalize(d.entries)
df

Unnamed: 0,title,links,link,comments,published,published_parsed,authors,author,tags,id,...,feedburner_origlink,title_detail.type,title_detail.language,title_detail.base,title_detail.value,author_detail.name,summary_detail.type,summary_detail.language,summary_detail.base,summary_detail.value
0,Radar trends to watch: July 2021,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 06 Jul 2021 17:12:56 +0000","(2021, 7, 6, 17, 12, 56, 1, 187, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=13856,...,https://www.oreilly.com/radar/radar-trends-to-...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Radar trends to watch: July 2021,Mike Loukides,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Certainly the biggest news of the past month h...
1,Hand Labeling Considered Harmful,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/arguments-agains...,"Wed, 23 Jun 2021 12:34:40 +0000","(2021, 6, 23, 12, 34, 40, 2, 174, 0)",[{'name': 'Shayan Mohanty and Hugo Bowne-Ander...,Shayan Mohanty and Hugo Bowne-Anderson,"[{'term': 'Artificial Intelligence', 'scheme':...",https://www.oreilly.com/radar/?p=13825,...,https://www.oreilly.com/radar/arguments-agains...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Hand Labeling Considered Harmful,Shayan Mohanty and Hugo Bowne-Anderson,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,We are traveling through the era of Software 2...
2,Two economies. Two sets of rules.,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/two-economies-tw...,"Tue, 22 Jun 2021 13:07:19 +0000","(2021, 6, 22, 13, 7, 19, 1, 173, 0)",[{'name': 'Tim O’Reilly'}],Tim O’Reilly,"[{'term': 'Bubbles', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=13820,...,https://www.oreilly.com/radar/two-economies-tw...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Two economies. Two sets of rules.,Tim O’Reilly,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,"At one point early this year, Elon Musk briefl..."
3,Communal Computing,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/communal-computi...,"Tue, 15 Jun 2021 11:27:36 +0000","(2021, 6, 15, 11, 27, 36, 1, 166, 0)",[{'name': 'Chris Butler'}],Chris Butler,"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=13812,...,https://www.oreilly.com/radar/communal-computing/,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Communal Computing,Chris Butler,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,Home assistants and smart displays are being s...
4,Code as Infrastructure,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/code-as-infrastr...,"Tue, 08 Jun 2021 13:22:32 +0000","(2021, 6, 8, 13, 22, 32, 1, 159, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,"[{'term': 'Infrastructure', 'scheme': None, 'l...",https://www.oreilly.com/radar/?p=13808,...,https://www.oreilly.com/radar/code-as-infrastr...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Code as Infrastructure,Mike Loukides,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,"A few months ago, I was asked if there were an..."
5,Radar trends to watch: June 2021,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 01 Jun 2021 13:45:05 +0000","(2021, 6, 1, 13, 45, 5, 1, 152, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=13803,...,https://www.oreilly.com/radar/radar-trends-to-...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Radar trends to watch: June 2021,Mike Loukides,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,"The most fascinating idea this month is POET, ..."
6,AI Powered Misinformation and Manipulation at ...,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/ai-powered-misin...,"Tue, 25 May 2021 14:14:49 +0000","(2021, 5, 25, 14, 14, 49, 1, 145, 0)",[{'name': 'Nitesh Dhanjani'}],Nitesh Dhanjani,"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=13789,...,https://www.oreilly.com/radar/ai-powered-misin...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,AI Powered Misinformation and Manipulation at ...,Nitesh Dhanjani,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,OpenAI’s text generating system GPT-3 has capt...
7,DeepCheapFakes,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/deepcheapfakes/#...,"Tue, 11 May 2021 11:58:37 +0000","(2021, 5, 11, 11, 58, 37, 1, 131, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,"[{'term': 'Artificial Intelligence', 'scheme':...",https://www.oreilly.com/radar/?p=13768,...,https://www.oreilly.com/radar/deepcheapfakes/,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,DeepCheapFakes,Mike Loukides,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,"Back in 2019, Ben Lorica and I wrote about de..."
8,Radar trends to watch: May 2021,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Mon, 03 May 2021 14:05:40 +0000","(2021, 5, 3, 14, 5, 40, 0, 123, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=13755,...,https://www.oreilly.com/radar/radar-trends-to-...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Radar trends to watch: May 2021,Mike Loukides,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,We’ll start with a moment of silence. RIP Dan ...
9,Checking Jeff Bezos’s Math,"[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/checking-jeff-be...,"Fri, 23 Apr 2021 20:43:28 +0000","(2021, 4, 23, 20, 43, 28, 4, 113, 0)",[{'name': 'Tim O’Reilly'}],Tim O’Reilly,"[{'term': 'Business', 'scheme': None, 'label':...",https://www.oreilly.com/radar/?p=13748,...,https://www.oreilly.com/radar/checking-jeff-be...,text/plain,,http://feeds.feedburner.com/oreilly/radar/atom,Checking Jeff Bezos’s Math,Tim O’Reilly,text/html,,http://feeds.feedburner.com/oreilly/radar/atom,“If you want to be successful in business (in ...


### 10. Count the number of entries per author and sort them in descending order.

In [180]:
df.groupby('author')['title'].count().sort_values(ascending=False)

author
Nat Torkington                                    23
Mike Loukides                                     22
                                                   4
Tim O’Reilly                                       3
Chris Butler                                       1
Hugo Bowne-Anderson                                1
Justin Norman and Mike Loukides                    1
Kevlin Henney                                      1
Nitesh Dhanjani                                    1
Patrick Hall and Ayoub Ouederni                    1
Q Ethan McCallum, Chris Butler and Shane Glynn     1
Shayan Mohanty and Hugo Bowne-Anderson             1
Name: title, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [175]:
df['length_title'] = df['title'].apply(lambda x: len(x))
dg = df[['title','author','length_title']].sort_values(by='length_title',ascending=False)
dg.reset_index(drop=True).head(10)

Unnamed: 0,title,author,length_title
0,"Where Programming, Ops, AI, and the Cloud are ...",Mike Loukides,60
1,AI Powered Misinformation and Manipulation at ...,Nitesh Dhanjani,58
2,5 infrastructure and operations trends to watc...,,55
3,O’Reilly’s top 20 live online training courses...,,54
4,5 things on our data and AI radar for 2021,,42
5,Seven Legal Questions for Data Scientists,Patrick Hall and Ayoub Ouederni,41
6,The End of Silicon Valley as We Know It?,Tim O’Reilly,40
7,AI Product Management After Deployment,Justin Norman and Mike Loukides,38
8,Radar trends to watch: November 2020,Mike Loukides,36
9,Radar trends to watch: December 2020,Mike Loukides,36


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [183]:
list_ml = []
for s in d.entries:
    if 'machine learning' in s.summary:
        list_ml.append(s.title)
list_ml

['Hand Labeling Considered Harmful',
 'Radar trends to watch: April 2021',
 'Seven Legal Questions for Data Scientists']