# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [2]:
!pip install feedparser

Collecting feedparser
  Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
     ---------------------------------------- 81.1/81.1 kB 2.3 MB/s eta 0:00:00
Collecting sgmllib3k
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: sgmllib3k
  Building wheel for sgmllib3k (setup.py): started
  Building wheel for sgmllib3k (setup.py): finished with status 'done'
  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6061 sha256=423a2fcf7d6f37b5bc403682e3d4a8ed03dd61026bd2de5f0fc604f304356151
  Stored in directory: c:\users\demia\appdata\local\pip\cache\wheels\3b\24\68\f82c1fe16fe6cc7c6f9f67fe4bbf2a4ce527dea6b14a4b34ee
Successfully built sgmllib3k
Installing collected packages: sgmllib3k, feedparser
Successfully installed feedparser-6.0.10 sgmllib3k-1.0.0


In [4]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [5]:
life = feedparser.parse('https://lifehacker.com/rss')

### 2. Obtain a list of components (keys) that are available for this feed.

In [6]:
life.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [7]:
life.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'language'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [8]:
print (life.feed.title)
print(' ')
title = life.feed.get('title')
print("Feed Title:", title)

Lifehacker
 
Feed Title: Lifehacker


In [10]:
print (life.feed.title)
print(' ')
print (life.feed.subtitle)
print(' ')
print (life.feed.title_detail)
print(' ')
print (life.feed.link)
print(' ')

Lifehacker
 
Do everything better
 
{'type': 'text/plain', 'language': None, 'base': 'https://lifehacker.com/rss', 'value': 'Lifehacker'}
 
https://lifehacker.com
 


### 5. Count the number of entries that are contained in this RSS feed.

In [11]:
entries_count = len(life.entries)
print(entries_count)

51


### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [12]:
life.entries[0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'summary', 'summary_detail', 'tags', 'published', 'published_parsed', 'id', 'guidislink', 'authors', 'author', 'author_detail', 'media_thumbnail', 'href'])

In [13]:
entry_index = 0  # Replace with the index of the entry you want to retrieve
entry = life.entries[entry_index]

In [14]:
components = entry.keys()
print(list(components))

['title', 'title_detail', 'links', 'link', 'summary', 'summary_detail', 'tags', 'published', 'published_parsed', 'id', 'guidislink', 'authors', 'author', 'author_detail', 'media_thumbnail', 'href']


### 7. Extract a list of entry titles.

In [15]:
titles = [life.entries[i].title for i in range(len(life.entries))]
print(titles)

['Just Go Buy the Weird Padded Cycling Shorts', 'People Are Deadheading Their Flowers All Wrong', 'Turn a Peach Into a Sweet and Summery Pasta Sauce', '10 Celebrity Gossip Podcasts That Spill the Tea', 'When to Be Honest With Your Partner, and When a White Lie is OK', 'Turn Your Overcooked Steak Into Something Better', "Ask Yourself These Questions Before Saying 'No' at Work", 'Why You Need to Check Local Laws Before Booking Your Next Airbnb', "What's New on Prime Video and Freevee in August 2023", 'How to Return Nearly Anything Without a Receipt', 'This Apple Pen Alternative Is $36 Right Now', "What's New on Max in August 2023", 'Make a Sweeter Caprese Salad With Peaches and Plums', 'This Costco Membership Comes With a $45 Gift Card and $40 Off One Purchase', 'Make Summer Cooking Tolerable With These Tools', 'Why the Hell Did We Ever Stop Wearing Sweatbands?', 'Trader Joe’s Falafel Might Contain Rocks, Too', 'Save Money With These Wifi-Enabled Smart Watering Devices', 'Keep Your Dog C

### 8. Calculate the percentage of "Four short links" entry titles.

In [16]:
total_titles = len(life.entries)

# Count number of entry titles containing "summer" or "Summer"
summer_count = sum('summer' in entry['title'].lower() or 'Summer' in entry['title'] for entry in life.entries)

# Calculate the percentage of "summer" or "Summer" entry titles
percentage = (summer_count / total_titles) * 100

print(f"Percentage of 'summer' or 'Summer' entry titles: {percentage:.2f}%")

Percentage of 'summer' or 'Summer' entry titles: 7.84%


### 9. Create a Pandas data frame from the feed's entries.

In [17]:
import pandas as pd

data = pd.DataFrame(life.entries)
data.head()

Unnamed: 0,title,title_detail,links,link,summary,summary_detail,tags,published,published_parsed,id,guidislink,authors,author,author_detail,media_thumbnail,href
0,Just Go Buy the Weird Padded Cycling Shorts,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://lifehacker.com/just-go-buy-the-weird-p...,"<img class=""type:primaryImage"" src=""https://i....","{'type': 'text/html', 'language': None, 'base'...","[{'term': 'cycling', 'scheme': None, 'label': ...","Tue, 01 Aug 2023 15:00:00 +0000","(2023, 8, 1, 15, 0, 0, 1, 213, 0)",1850688084,False,[{'name': 'Beth Skwarecki'}],Beth Skwarecki,{'name': 'Beth Skwarecki'},[{'url': 'https://i.kinja-img.com/gawker-media...,
1,People Are Deadheading Their Flowers All Wrong,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://lifehacker.com/people-are-deadheading-...,"<img class=""type:primaryImage"" src=""https://i....","{'type': 'text/html', 'language': None, 'base'...","[{'term': 'deadheading', 'scheme': None, 'labe...","Tue, 01 Aug 2023 14:30:00 +0000","(2023, 8, 1, 14, 30, 0, 1, 213, 0)",1850690797,False,[{'name': 'Amanda Blum'}],Amanda Blum,{'name': 'Amanda Blum'},[{'url': 'https://i.kinja-img.com/gawker-media...,
2,Turn a Peach Into a Sweet and Summery Pasta Sauce,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://lifehacker.com/turn-a-peach-into-a-swe...,"<img class=""type:primaryImage"" src=""https://i....","{'type': 'text/html', 'language': None, 'base'...","[{'term': 'pasta', 'scheme': None, 'label': No...","Tue, 01 Aug 2023 14:00:00 +0000","(2023, 8, 1, 14, 0, 0, 1, 213, 0)",1850693637,False,[{'name': 'Claire Lower'}],Claire Lower,{'name': 'Claire Lower'},[{'url': 'https://i.kinja-img.com/gawker-media...,
3,10 Celebrity Gossip Podcasts That Spill the Tea,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://lifehacker.com/best-celebrity-gossip-p...,"<img class=""type:primaryImage"" src=""https://i....","{'type': 'text/html', 'language': None, 'base'...","[{'term': 'celebrity', 'scheme': None, 'label'...","Tue, 01 Aug 2023 13:30:00 +0000","(2023, 8, 1, 13, 30, 0, 1, 213, 0)",1850689823,False,[{'name': 'Lauren Passell'}],Lauren Passell,{'name': 'Lauren Passell'},[{'url': 'https://i.kinja-img.com/gawker-media...,
4,"When to Be Honest With Your Partner, and When ...","{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://lifehacker.com/when-to-be-honest-with-...,"<img class=""type:primaryImage"" src=""https://i....","{'type': 'text/html', 'language': None, 'base'...","[{'term': 'emotional safety', 'scheme': None, ...","Tue, 01 Aug 2023 13:00:00 +0000","(2023, 8, 1, 13, 0, 0, 1, 213, 0)",1850687382,False,[{'name': 'Brianne Hogan'}],Brianne Hogan,{'name': 'Brianne Hogan'},[{'url': 'https://i.kinja-img.com/gawker-media...,


### 10. Count the number of entries per author and sort them in descending order.

In [18]:
authors = data.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
11,Jake Peterson,7
8,Elizabeth Yuko,6
12,Jessica Kanzler,4
21,Stephen Johnson,3
6,Claire Lower,3
7,Daniel Oropeza,3
1,Amanda Blum,3
3,Beth Skwarecki,2
4,Brendan Hesse,2
19,Pranay Parab,2


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [19]:
data['title_length'] = data['title'].apply(len)
data[['title', 'author', 'title_length']].sort_values('title_length', ascending=False).head()

Unnamed: 0,title,author,title_length
13,This Costco Membership Comes With a $45 Gift C...,Daniel Oropeza,74
41,These Are the Best Cheap Hotel Chains in the U...,Elizabeth Yuko,70
36,The Difference Between Unplugging and Rechargi...,Elizabeth Yuko,69
39,"You Can Download Instructions for More Than 6,...",Elizabeth Yuko,68
47,Why Kids Stop Reading for Fun by Age 9 (and Wh...,Michelle Woo,64


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [32]:
ml_titles = [entry['title'] for entry in life.entries if 'machine learning' in entry.summary.lower()]

print(ml_titles)

[]
