# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [6]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [8]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [13]:
rss = feedparser.parse('http://feeds.feedburner.com/oreilly/radar/atom')
rss

{'feed': {'title': 'Radar',
  'title_detail': {'type': 'text/plain',
   'language': None,
   'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
   'value': 'Radar'},
  'links': [{'rel': 'alternate',
    'type': 'text/html',
    'href': 'https://www.oreilly.com/radar'},
   {'rel': 'self',
    'type': 'application/rss+xml',
    'href': 'http://feeds.feedburner.com/oreilly/radar/atom'},
   {'rel': 'hub',
    'href': 'http://pubsubhubbub.appspot.com/',
    'type': 'text/html'}],
  'link': 'https://www.oreilly.com/radar',
  'subtitle': 'Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology',
  'subtitle_detail': {'type': 'text/html',
   'language': None,
   'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
   'value': 'Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology'},
  'updated': 'Wed, 13 Nov 2019 12:06:03 +0000',
  'updated_parsed': time.struct_time(tm_year=2019, tm_mon=11, tm_m

### 2. Obtain a list of components (keys) that are available for this feed.

In [11]:
rss.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [12]:
rss.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [15]:
print (rss.feed.title)
print ('')
print (rss.feed.subtitle)
print ('')
print (rss.feed.link)

Radar

Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology

https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [20]:
rss.entries

[{'title': 'Four short links: 13 November 2019',
  'title_detail': {'type': 'text/plain',
   'language': None,
   'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
   'value': 'Four short links: 13 November 2019'},
  'links': [{'rel': 'alternate',
    'type': 'text/html',
    'href': 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/ID1RcQfYgOk/'}],
  'link': 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/ID1RcQfYgOk/',
  'comments': 'https://www.oreilly.com/radar/four-short-links-13-november-2019/#respond',
  'published': 'Wed, 13 Nov 2019 05:01:22 +0000',
  'published_parsed': time.struct_time(tm_year=2019, tm_mon=11, tm_mday=13, tm_hour=5, tm_min=1, tm_sec=22, tm_wday=2, tm_yday=317, tm_isdst=0),
  'authors': [{'name': 'Nat Torkington'}],
  'author': 'Nat Torkington',
  'author_detail': {'name': 'Nat Torkington'},
  'tags': [{'term': 'Four Short Links', 'scheme': None, 'label': None},
   {'term': 'Signals', 'scheme': None, 'label': None}],
  'id': 'https://www.ore

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [24]:
list(rss.entries[0].keys())

['title',
 'title_detail',
 'links',
 'link',
 'comments',
 'published',
 'published_parsed',
 'authors',
 'author',
 'author_detail',
 'tags',
 'id',
 'guidislink',
 'summary',
 'summary_detail',
 'content',
 'wfw_commentrss',
 'slash_comments',
 'feedburner_origlink']

### 7. Extract a list of entry titles.

In [37]:
titles = [rss.entries[i].title for i in range(len(rss.entries))]
titles

['Four short links: 13 November 2019',
 'O’Reilly serverless survey 2019: Concerns, what works, and what to expect',
 'Improving the military UX',
 'Four short links: 12 November 2019',
 'Four short links: 11 November 2019',
 'Bitcoin and the disruption of monetary oppression',
 'Four short links: 8 November 2019',
 'Highlights from the O’Reilly Software Architecture Conference in Berlin 2019',
 'Highlights from the O’Reilly Velocity Conference in Berlin 2019',
 'From the trenches: Patrick Kua',
 '5 things Go taught me about open source?',
 'Building high-performing engineering teams, one pixel at a time',
 'How to deploy infrastructure in just 13.8 billion years',
 'Controlled chaos: The inevitable marriage of DevOps and security',
 'The ultimate guide to complicated systems',
 'Cognitive biases in the architect’s life',
 'The three-headed dog: Architecture, process, structure',
 'A world of deepfakes']

### 8. Calculate the percentage of "Four short links" entry titles.

In [40]:
count=0
for u in titles:
    if "Four short links" in u:
        count+=1
    else:
        pass
len(titles)/count

4.5

### 9. Create a Pandas data frame from the feed's entries.

In [50]:
import pandas as pd

In [57]:
df=pd.DataFrame(rss.entries)


### 10. Count the number of entries per author and sort them in descending order.

In [62]:
df.groupby('author', as_index=False).agg({'title':'count'})
df.sort_values('entries', ascending=False)

KeyError: 'entries'

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [63]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False).head()

Unnamed: 0,title,author,title_length
7,Highlights from the O’Reilly Software Architec...,Mac Slocum,76
1,"O’Reilly serverless survey 2019: Concerns, wha...",Roger Magoulas and Chris Guzikowski,73
13,Controlled chaos: The inevitable marriage of D...,Kelly Shortridge,64
8,Highlights from the O’Reilly Velocity Conferen...,Mac Slocum,63
11,"Building high-performing engineering teams, on...",Lena Reinhard,63


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [68]:
print(list(df.loc["machine learning" in df.title]))

KeyError: False