# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser;

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom';

In [3]:
feedburner = feedparser.parse(url);

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
key_list = list(feedburner.keys());
print(key_list);

['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces']


### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
feed_key_list = list(feedburner.feed.keys());
print(feed_key_list);

['title', 'title_detail', 'id', 'guidislink', 'link', 'updated', 'updated_parsed', 'subtitle', 'subtitle_detail', 'links', 'authors', 'author_detail', 'author', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname']


### 4. Extract and print the feed title, subtitle, author, and link.

In [6]:
title = feedburner.feed.title;
subtitle = feedburner.feed.subtitle;
author = feedburner.feed.author;
link = feedburner.feed.link;

In [7]:
print('Title: {}. \nSubtitle: {}. \nAuthor: {}. \nLink: {}.'.format(title, subtitle, author, link));

Title: All - O'Reilly Media. 
Subtitle: All of our Ideas and Learning material from all of our topics.. 
Author: O'Reilly Media. 
Link: https://www.oreilly.com.


### 5. Count the number of entries that are contained in this RSS feed.

In [8]:
len(feedburner.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [9]:
entries_key_list = list(feedburner.entries[0].keys());
print(entries_key_list);

['title', 'title_detail', 'updated', 'updated_parsed', 'id', 'guidislink', 'link', 'content', 'summary', 'links', 'authors', 'author_detail', 'author', 'feedburner_origlink']


### 7. Extract a list of entry titles.

In [10]:
titles = [entry['title'] for entry in feedburner.entries];

### 8. Calculate the percentage of "Four short links" entry titles.

In [11]:
four_short_entries = [title for title in titles if "Four short links" in title];
percentage = 100 * len(four_short_entries) / len(titles);

In [12]:
print("{}%".format(percentage));

63.333333333333336%


### 9. Create a Pandas data frame from the feed's entries.

In [13]:
import pandas as pd;

In [14]:
entries = pd.DataFrame(feedburner.entries);

### 10. Count the number of entries per author and sort them in descending order.

In [15]:
num_entries = entries.groupby(['author']).size().sort_values(ascending=False);
num_entries

author
Nat Torkington                 38
Ben Lorica                      7
Mike Loukides                   2
Tyler Ortman, Jeff Bleiel       1
Tim O'Reilly                    1
Roger Magoulas, Andy Oram       1
Nikki McDonald, John Devins     1
Liam Li, Ameet Talwalkar        1
Laurent Gil                     1
Jennifer Pollock                1
Jenn Webb                       1
Deepak Kanungo                  1
Ben Lorica, Paco Nathan         1
Ben Lorica, Mike Loukides       1
dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [16]:
entries['title_length'] = entries['title'].str.len();
len_entries = entries[['author', 'title', 'title_length']].sort_values('title_length', ascending=False);
len_entries

Unnamed: 0,author,title,title_length
17,Deepak Kanungo,The trinity of errors in financial models: An ...,96
2,Ben Lorica,Artificial intelligence and machine learning a...,76
42,,250+ live online training courses opened for J...,73
6,Ben Lorica,Using machine learning and analytics to attrac...,68
9,"Ben Lorica, Paco Nathan",How companies are building sustainable AI and ...,60
56,,10 top AWS resources on O’Reilly’s online lear...,59
24,"Tyler Ortman, Jeff Bleiel","What lies ahead for Python, Java, Go, C#, Kotl...",58
19,"Nikki McDonald, John Devins",9 trends to watch in systems engineering and o...,55
39,Ben Lorica,"In the age of AI, fundamental value resides in...",51
22,Ben Lorica,How machine learning impacts information security,49


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [17]:
ml_titles = entries[entries['summary'].str.contains('machine learning')];
ml_titles_list = list(ml_titles['title']);
ml_titles_list

['3 emerging trends tech leaders should watch',
 'Artificial intelligence and machine learning adoption in European enterprise',
 'Reinforcement learning for the birds',
 'Using machine learning and analytics to attract and retain employees',
 'How companies are building sustainable AI and ML initiatives',
 'Rethinking informed consent',
 '7 web dev trends on our radar',
 'The trinity of errors in financial models: An introductory analysis using TensorFlow Probability',
 '9 trends to watch in systems engineering and operations',
 'Four short links: 18 January 2019',
 'How machine learning impacts information security',
 'What lies ahead for Python, Java, Go, C#, Kotlin, and Rust',
 'AI brings speed to security',
 'Overcoming barriers to AI adoption',
 'Four short links: 15 January 2019',
 '9 AI trends on our radar',
 'Gradually, then suddenly',
 '7 data trends on our radar',
 '250+ live online training courses opened for January, February, and March',
 'Four short links: 28 December 20