# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [2]:
!pip install feedparser
import feedparser



### 1. Use feedparser to parse the following RSS feed URL.

In [3]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [24]:
d = feedparser.parse(url)
print(d)



### 2. Obtain a list of components (keys) that are available for this feed.

In [5]:
list(d)

['feed',
 'entries',
 'bozo',
 'headers',
 'updated',
 'updated_parsed',
 'href',
 'status',
 'encoding',
 'version',
 'namespaces']

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [37]:
[i for i in d.feed]

['title',
 'title_detail',
 'links',
 'link',
 'subtitle',
 'subtitle_detail',
 'updated',
 'updated_parsed',
 'language',
 'sy_updateperiod',
 'sy_updatefrequency',
 'generator_detail',
 'generator',
 'feedburner_info',
 'geo_lat',
 'geo_long',
 'feedburner_emailserviceid',
 'feedburner_feedburnerhostname']

### 4. Extract and print the feed title, subtitle, author, and link.

In [8]:
print(d.feed.title)
print(d.feed.subtitle)
print(d.feed.link)

Radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [35]:
print(len(d.entries))

18


### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [38]:
list(d.entries[0])

['title',
 'title_detail',
 'links',
 'link',
 'comments',
 'published',
 'published_parsed',
 'authors',
 'author',
 'author_detail',
 'tags',
 'id',
 'guidislink',
 'summary',
 'summary_detail',
 'content',
 'wfw_commentrss',
 'slash_comments',
 'feedburner_origlink']

### 7. Extract a list of entry titles.

In [48]:
entry_t = [d.entries[i].title for i in range(0,18)]

### 8. Calculate the percentage of "Four short links" entry titles.

In [51]:
entry_t2 = [i for i in entry_t if i.startswith('Four short')]
print(len(entry_t2)/len(entry_t)*100)


66.66666666666666


### 9. Create a Pandas data frame from the feed's entries.

In [52]:
import pandas as pd

In [57]:
df = pd.DataFrame(d.entries)
df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Why you should care about debugging machine le...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/why-you-should-c...,"Thu, 12 Dec 2019 11:00:00 +0000","(2019, 12, 12, 11, 0, 0, 3, 346, 0)",[{'name': 'Patrick Hall and Andrew Burt'}],Patrick Hall and Andrew Burt,{'name': 'Patrick Hall and Andrew Burt'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11197,False,For all the excitement about machine learning ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/why-you-should-c...,0,https://www.oreilly.com/radar/why-you-should-c...
1,Four short links: 12 December 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 12 Dec 2019 05:01:00 +0000","(2019, 12, 12, 5, 1, 0, 3, 346, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11251,False,Social Science One Advisory Group Fingers Face...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,Four short links: 11 December 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 11 Dec 2019 05:01:00 +0000","(2019, 12, 11, 5, 1, 0, 2, 345, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11222,False,disaster.radio &#8212; a disaster-resilient co...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
3,The road to Software 2.0,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/the-road-to-soft...,"Tue, 10 Dec 2019 11:00:00 +0000","(2019, 12, 10, 11, 0, 0, 1, 344, 0)",[{'name': 'Mike Loukides and Ben Lorica'}],Mike Loukides and Ben Lorica,{'name': 'Mike Loukides and Ben Lorica'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11155,False,"Roughly a year ago, we wrote “What machine lea...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-road-to-soft...,0,https://www.oreilly.com/radar/the-road-to-soft...
4,Four short links: 10 December 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 10 Dec 2019 05:01:00 +0000","(2019, 12, 10, 5, 1, 0, 1, 344, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11192,False,The Hidden Worries of Facial Recognition Techn...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [62]:
df['author'].value_counts().sort_values(ascending=False)

Nat Torkington                  12
Mike Loukides                    2
Mike Loukides and Ben Lorica     1
Jenn Webb                        1
Patrick Hall and Andrew Burt     1
Pamela Rucker                    1
Name: author, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [69]:
df['new'] = [len(i) for i in df['title']]
df[['title', 'author', 'new']].sort_values(by='new', ascending=False)

Unnamed: 0,title,author,new
0,Why you should care about debugging machine le...,Patrick Hall and Andrew Burt,59
17,Moving AI and ML from research into production,Jenn Webb,46
10,Use your people as competitive advantage,Pamela Rucker,40
7,Radar trends to watch: December 2019,Mike Loukides,36
2,Four short links: 11 December 2019,Nat Torkington,34
4,Four short links: 10 December 2019,Nat Torkington,34
1,Four short links: 12 December 2019,Nat Torkington,34
14,Four short links: 29 November 2019,Nat Torkington,34
15,Four short links: 28 November 2019,Nat Torkington,34
16,Four short links: 27 November 2019,Nat Torkington,34


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [75]:
ml_list = [df['title'][i] for i in range(len(df)) if 'machine learning' in df['summary'][i]]
display(ml_list)

['Why you should care about debugging machine learning models',
 'The road to Software 2.0',
 'Moving AI and ML from research into production']