# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
feedburner = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
feedburner.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
feedburner.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [33]:
title = feedburner.feed.title
subtitle = feedburner.feed.subtitle
author = feedburner.entries[0]['author']
link = feedburner.entries[0]['link']
print(title)
print(subtitle)
print(author)
print(link)

Radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
Ben Lorica and Mike Loukides
http://feedproxy.google.com/~r/oreilly/radar/atom/~3/9PQAmHRa6oA/


### 5. Count the number of entries that are contained in this RSS feed.

In [34]:
len(feedburner.entries)

18

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [35]:
components = [i for i in feedburner.feed.keys()]
components

['title',
 'title_detail',
 'links',
 'link',
 'subtitle',
 'subtitle_detail',
 'updated',
 'updated_parsed',
 'language',
 'sy_updateperiod',
 'sy_updatefrequency',
 'generator_detail',
 'generator',
 'feedburner_info',
 'geo_lat',
 'geo_long',
 'feedburner_emailserviceid',
 'feedburner_feedburnerhostname']

### 7. Extract a list of entry titles.

In [44]:
entry_titles = [feedburner.entries[i].title for i in range(len(feedburner.entries))]
entry_titles

['A world of deepfakes',
 'Radar trends to watch: November 2019',
 'Four short links: 7 November 2019',
 'Highlights from the O’Reilly Velocity Conference in Berlin 2019',
 'Highlights from the O’Reilly Software Architecture Conference in Berlin 2019',
 'Modern machine learning architectures: Data and hardware and platform, oh my',
 'The new norms of cloud native',
 'Observability: Understanding production through your customers’ eyes',
 'Secure reliable systems',
 'My love letter to computer science is very short and I also forgot to mail it',
 'Everything is a little bit broken; or, The illusion of control',
 'Four short links: 6 November 2019',
 'It’s important to cultivate your organization’s collective genius',
 'Four short links: 5 November 2019',
 'Four short links: 4 November 2019',
 'Quantum computing’s potential is still far off, but quantum supremacy shows we’re on the right track',
 'Four short links: 1 November 2019',
 'Highlights from TensorFlow World in Santa Clara, Cali

### 8. Calculate the percentage of "Four short links" entry titles.

In [53]:
four_short_links = [e for e in lst if 'Four short links' in e]
percentage = len(four_short_links)/len(entry_titles)*100
print(round(percentage, 2), '%')

27.78 %


### 9. Create a Pandas data frame from the feed's entries.

In [51]:
import pandas as pd

In [56]:
entries = feedburner.entries
df = pd.DataFrame(entries)
df

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,A world of deepfakes,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/a-world-of-deepf...,"Thu, 07 Nov 2019 14:00:50 +0000","(2019, 11, 7, 14, 0, 50, 3, 311, 0)",[{'name': 'Ben Lorica and Mike Loukides'}],Ben Lorica and Mike Loukides,{'name': 'Ben Lorica and Mike Loukides'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=10617,False,Deepfakes have been very much in the news for ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/a-world-of-deepf...,0,https://www.oreilly.com/radar/a-world-of-deepf...
1,Radar trends to watch: November 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Thu, 07 Nov 2019 05:10:11 +0000","(2019, 11, 7, 5, 10, 11, 3, 311, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=10365,False,5G trends 5G networks get so much commentary t...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0,https://www.oreilly.com/radar/radar-trends-to-...
2,Four short links: 7 November 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 07 Nov 2019 05:01:24 +0000","(2019, 11, 7, 5, 1, 24, 3, 311, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=10731,False,DNS Wars &#8212; But perhaps the position make...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
3,Highlights from the O’Reilly Velocity Conferen...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/highlights-from-...,"Wed, 06 Nov 2019 18:45:07 +0000","(2019, 11, 6, 18, 45, 7, 2, 310, 0)",[{'name': 'Mac Slocum'}],Mac Slocum,{'name': 'Mac Slocum'},"[{'term': 'Next Architecture', 'scheme': None,...",https://www.oreilly.com/radar/?p=10577,False,People from across the cloud native and distri...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/highlights-from-...,0,https://www.oreilly.com/radar/highlights-from-...
4,Highlights from the O’Reilly Software Architec...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/highlights-from-...,"Wed, 06 Nov 2019 18:40:44 +0000","(2019, 11, 6, 18, 40, 44, 2, 310, 0)",[{'name': 'Mac Slocum'}],Mac Slocum,{'name': 'Mac Slocum'},"[{'term': 'Next Architecture', 'scheme': None,...",https://www.oreilly.com/radar/?p=10569,False,Experts from across the software architecture ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/highlights-from-...,0,https://www.oreilly.com/radar/highlights-from-...
5,Modern machine learning architectures: Data an...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/modern-machine-l...,"Wed, 06 Nov 2019 18:40:13 +0000","(2019, 11, 6, 18, 40, 13, 2, 310, 0)",[{'name': 'Brian Sletten'}],Brian Sletten,{'name': 'Brian Sletten'},"[{'term': 'Next Architecture', 'scheme': None,...",https://www.oreilly.com/radar/?p=10473,False,This is a keynote highlight from the O&#8217;R...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/modern-machine-l...,0,https://www.oreilly.com/radar/modern-machine-l...
6,The new norms of cloud native,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/the-new-norms-of...,"Wed, 06 Nov 2019 18:40:10 +0000","(2019, 11, 6, 18, 40, 10, 2, 310, 0)",[{'name': 'Cheryl Hung'}],Cheryl Hung,{'name': 'Cheryl Hung'},"[{'term': 'Next Architecture', 'scheme': None,...",https://www.oreilly.com/radar/?p=10484,False,This is a keynote highlight from the O&#8217;R...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-new-norms-of...,0,https://www.oreilly.com/radar/the-new-norms-of...
7,Observability: Understanding production throug...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/observability-un...,"Wed, 06 Nov 2019 18:30:32 +0000","(2019, 11, 6, 18, 30, 32, 2, 310, 0)",[{'name': 'Christine Yen'}],Christine Yen,{'name': 'Christine Yen'},"[{'term': 'Next Architecture', 'scheme': None,...",https://www.oreilly.com/radar/?p=10517,False,This is a keynote highlight from the O&#8217;R...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/observability-un...,0,https://www.oreilly.com/radar/observability-un...
8,Secure reliable systems,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/secure-reliable-...,"Wed, 06 Nov 2019 18:30:29 +0000","(2019, 11, 6, 18, 30, 29, 2, 310, 0)",[{'name': 'Ana Oprea'}],Ana Oprea,{'name': 'Ana Oprea'},"[{'term': 'Next Architecture', 'scheme': None,...",https://www.oreilly.com/radar/?p=10524,False,This is a keynote highlight from the O&#8217;R...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/secure-reliable-...,0,https://www.oreilly.com/radar/secure-reliable-...
9,My love letter to computer science is very sho...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/my-love-letter-t...,"Wed, 06 Nov 2019 18:30:20 +0000","(2019, 11, 6, 18, 30, 20, 2, 310, 0)",[{'name': 'James Mickens'}],James Mickens,{'name': 'James Mickens'},"[{'term': 'Next Architecture', 'scheme': None,...",https://www.oreilly.com/radar/?p=10510,False,This is a keynote highlight from the O&#8217;R...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/my-love-letter-t...,0,https://www.oreilly.com/radar/my-love-letter-t...


### 10. Count the number of entries per author and sort them in descending order.

In [58]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
10,Nat Torkington,5
8,Mac Slocum,3
9,Mike Loukides,2
0,Ana Oprea,1
1,Ben Lorica and Mike Loukides,1
2,Brian Sletten,1
3,Cheryl Hung,1
4,Christine Yen,1
5,Heidi Waterhouse,1
6,James Mickens,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [59]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False)

Unnamed: 0,title,author,title_length
15,Quantum computing’s potential is still far off...,Mike Loukides,100
9,My love letter to computer science is very sho...,James Mickens,77
4,Highlights from the O’Reilly Software Architec...,Mac Slocum,76
5,Modern machine learning architectures: Data an...,Brian Sletten,76
7,Observability: Understanding production throug...,Christine Yen,68
12,It’s important to cultivate your organization’...,Jenn Webb,65
17,Highlights from TensorFlow World in Santa Clar...,Mac Slocum,64
3,Highlights from the O’Reilly Velocity Conferen...,Mac Slocum,63
10,"Everything is a little bit broken; or, The ill...",Heidi Waterhouse,62
1,Radar trends to watch: November 2019,Mike Loukides,36


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [83]:
summaries = [feedburner.entries[i]['summary'] for i in range(len(feedburner.entries))]
dictionary = dict(zip(entry_titles, summaries))
text = "machine learning"
machine_learning = [i for i,j in dictionary.items() if text in j]
machine_learning

['Highlights from the O’Reilly Software Architecture Conference in Berlin 2019',
 'Highlights from TensorFlow World in Santa Clara, California 2019']