# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
%pip install feedparser

Note: you may need to restart the kernel to use updated packages.


In [2]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [3]:
feeds=feedparser.parse('http://feeds.feedburner.com/oreilly/radar/atom')

In [4]:
type(feeds)

feedparser.util.FeedParserDict

### 2. Obtain a list of components (keys) that are available for this feed.

In [5]:
feeds.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [6]:
feeds['feed'].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [14]:
print("Title:", feeds["feed"]["title"])
print("Subtitle:", feeds["feed"]["subtitle"])
print("Link:", feeds["feed"]["link"])

Title: Radar
Subtitle: Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
Link: https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [19]:
len(feeds['entries'])

15

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [23]:
entry = feeds.entries[0]
keys = list(entry.keys())
print("Components available for the entry:", keys)

Components available for the entry: ['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments']


### 7. Extract a list of entry titles.

In [34]:
entry_titles = []
for i in feeds.entries:
    title = i.title
    entry_titles.append(title)
print("List of entry titles:", entry_titles)

List of entry titles: ['Automating the Automators: Shift Change in the Robot Factory', 'Digesting 2022', 'Radar Trends to Watch: January 2023', 'What Does Copyright Say about Generative Models?', 'Radar Trends to Watch: December 2022', 'AI’s ‘SolarWinds Moment’ Will Occur; It’s Just a Matter of When', 'Technical Health Isn’t Optional', 'Healthy Data', 'Formal Informal Languages', 'Radar Trends to Watch: November 2022', 'What We Learned Auditing Sophisticated AI for Bias', 'The Collaborative Metaverse', 'What Is Hyperautomation?', 'Radar Trends to Watch: October 2022', 'The Problem with Intelligence']


### 8. Calculate the percentage of "Four short links" entry titles.

In [None]:
#  me han comentado que no había que hacerla

### 9. Create a Pandas data frame from the feed's entries.

In [35]:
import pandas as pd

In [46]:
jelly=pd.DataFrame(feeds['entries'])

### 10. Count the number of entries per author and sort them in descending order.

In [52]:
jelly.author.value_counts()

Mike Loukides    12
Q McCallum        1
Mike Barlow       1
Patrick Hall      1
Name: author, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [56]:
jelly['title_length'] = jelly['title'].str.len()  
sorted_jelly = jelly.sort_values(by='title_length', ascending=False)
print(jelly)
 


                                                title  \
0   Automating the Automators: Shift Change in the...   
1                                      Digesting 2022   
2                 Radar Trends to Watch: January 2023   
3    What Does Copyright Say about Generative Models?   
4                Radar Trends to Watch: December 2022   
5   AI’s ‘SolarWinds Moment’ Will Occur; It’s Just...   
6                     Technical Health Isn’t Optional   
7                                        Healthy Data   
8                           Formal Informal Languages   
9                Radar Trends to Watch: November 2022   
10  What We Learned Auditing Sophisticated AI for ...   
11                        The Collaborative Metaverse   
12                           What Is Hyperautomation?   
13                Radar Trends to Watch: October 2022   
14                      The Problem with Intelligence   

                                         title_detail  \
0   {'type': 'text/plain', 'la

### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [58]:
entry_titles = []
for entry in feeds.entries:
    if "machine learning" in entry.summary:
        entry_titles.append(entry.title)
print("List of entry titles containing the phrase 'machine learning':")
print(entry_titles)

List of entry titles containing the phrase 'machine learning':
['Radar Trends to Watch: October 2022']
