# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

### 1. Use feedparser to parse the following RSS feed URL.

In [1]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [2]:
import feedparser

In [3]:
radar = feedparser.parse('http://feeds.feedburner.com/oreilly/radar/atom')

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
radar.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
radar.feed.keys ()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [6]:
titles = [radar.entries[i].title for i in range(len(radar.entries))]
print(titles)

['DeepCheapFakes', 'Radar trends to watch: May 2021', 'Checking Jeff Bezos’s Math', 'AI Adoption in the Enterprise 2021', 'NFTs: Owning Digital Art', 'Radar trends to watch: April 2021', 'InfoTribes, Reality Brokers', 'The End of Silicon Valley as We Know It?', 'The Next Generation of AI', 'Radar trends to watch: March 2021', 'Product Management for AI', '5 things on our data and AI radar for 2021', '5 infrastructure and operations trends to watch in 2021', 'The Wrong Question', 'Radar trends to watch: February 2021', 'Where Programming, Ops, AI, and the Cloud are Headed in 2021', 'Seven Legal Questions for Data Scientists', 'Patterns', 'Radar trends to watch: January 2021', 'Four short links: 14 Dec 2020', 'Four short links: 8 Dec 2020', 'O’Reilly’s top 20 live online training courses of 2020', 'What is functional programming?', 'Four short links: 4 Dec 2020', 'Four short links: 1 Dec 2020', 'Radar trends to watch: December 2020', 'Four short links: 27 Nov 2020', 'Four short links: 24

In [8]:
subtitles = [radar.entries[i].subtitle for i in range(len(radar.entries))]
print(subtitles)

AttributeError: object has no attribute 'subtitle'

In [9]:
authors = [radar.entries[i]. author for i in range(len(radar.entries))]
print( authors)

['Mike Loukides', 'Mike Loukides', 'Tim O’Reilly', 'Mike Loukides', 'Mike Loukides', 'Mike Loukides', 'Hugo Bowne-Anderson', 'Tim O’Reilly', 'Mike Loukides', 'Mike Loukides', 'Mike Loukides', '', '', 'Mike Loukides', 'Mike Loukides', 'Mike Loukides', 'Patrick Hall and Ayoub Ouederni', 'Mike Loukides', 'Mike Loukides', 'Nat Torkington', 'Nat Torkington', '', 'Mike Loukides', 'Nat Torkington', 'Nat Torkington', 'Mike Loukides', 'Nat Torkington', 'Nat Torkington', 'Nat Torkington', 'Kevlin Henney', 'Nat Torkington', 'Nat Torkington', 'Mike Loukides', 'Nat Torkington', 'Nat Torkington', 'Nat Torkington', 'Mike Loukides', 'Nat Torkington', 'Nat Torkington', 'Q Ethan McCallum, Chris Butler and Shane Glynn', 'Nat Torkington', '', 'Nat Torkington', 'Justin Norman and Mike Loukides', 'Nat Torkington', 'Mike Loukides', 'Nat Torkington', 'Nat Torkington', 'Mike Loukides', 'Nat Torkington', 'Nat Torkington', 'Nat Torkington', 'Nat Torkington', 'Alex Castrounis', 'Nat Torkington', 'Nat Torkington',

In [10]:
links = [radar.entries[i].link for i in range(len(radar.entries))]
print(links)

['http://feedproxy.google.com/~r/oreilly/radar/atom/~3/1pXDQUvNgxI/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/VnJwo4IfcPs/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/XFlyN0Aa4no/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/Zpd80eXo270/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/Rjydi0W5YiY/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/5iI6zBWTNe8/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/Ej4QyBpO27A/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/dHbA6ByfR5w/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/kzdPxx3DXng/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/vZgi_on1Eu8/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/ruAGAmisS_E/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/PNgA-4GmKv4/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/7xc4nTkzPNs/', 'http://feedproxy.google.com/~r/oreilly/radar/atom/~3/NsfBPf08fpI/', 'http://feedproxy.google.com/~r/o

### 5. Count the number of entries that are contained in this RSS feed.

In [11]:
len(radar['entries'])

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [12]:
radar['entries'][0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [13]:
titles = [radar.entries[i].title for i in range(len(radar.entries))]
print(titles)

['DeepCheapFakes', 'Radar trends to watch: May 2021', 'Checking Jeff Bezos’s Math', 'AI Adoption in the Enterprise 2021', 'NFTs: Owning Digital Art', 'Radar trends to watch: April 2021', 'InfoTribes, Reality Brokers', 'The End of Silicon Valley as We Know It?', 'The Next Generation of AI', 'Radar trends to watch: March 2021', 'Product Management for AI', '5 things on our data and AI radar for 2021', '5 infrastructure and operations trends to watch in 2021', 'The Wrong Question', 'Radar trends to watch: February 2021', 'Where Programming, Ops, AI, and the Cloud are Headed in 2021', 'Seven Legal Questions for Data Scientists', 'Patterns', 'Radar trends to watch: January 2021', 'Four short links: 14 Dec 2020', 'Four short links: 8 Dec 2020', 'O’Reilly’s top 20 live online training courses of 2020', 'What is functional programming?', 'Four short links: 4 Dec 2020', 'Four short links: 1 Dec 2020', 'Radar trends to watch: December 2020', 'Four short links: 27 Nov 2020', 'Four short links: 24

### 8. Calculate the percentage of "Four short links" entry titles.

In [14]:
radar['entries'][-4]['title']

'Pair Programming with AI'

### 9. Create a Pandas data frame from the feed's entries.

In [15]:
import pandas as pd

In [16]:
import pandas as pd

df = pd.DataFrame(radar.entries)
df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,DeepCheapFakes,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/deepcheapfakes/#...,"Tue, 11 May 2021 11:58:37 +0000","(2021, 5, 11, 11, 58, 37, 1, 131, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Artificial Intelligence', 'scheme':...",https://www.oreilly.com/radar/?p=13768,False,"Back in 2019, Ben Lorica and I wrote about de...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/deepcheapfakes/f...,0,https://www.oreilly.com/radar/deepcheapfakes/
1,Radar trends to watch: May 2021,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Mon, 03 May 2021 14:05:40 +0000","(2021, 5, 3, 14, 5, 40, 0, 123, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=13755,False,We’ll start with a moment of silence. RIP Dan ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0,https://www.oreilly.com/radar/radar-trends-to-...
2,Checking Jeff Bezos’s Math,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/checking-jeff-be...,"Fri, 23 Apr 2021 20:43:28 +0000","(2021, 4, 23, 20, 43, 28, 4, 113, 0)",[{'name': 'Tim O’Reilly'}],Tim O’Reilly,{'name': 'Tim O’Reilly'},"[{'term': 'Business', 'scheme': None, 'label':...",https://www.oreilly.com/radar/?p=13748,False,“If you want to be successful in business (in ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/checking-jeff-be...,0,https://www.oreilly.com/radar/checking-jeff-be...
3,AI Adoption in the Enterprise 2021,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/ai-adoption-in-t...,"Mon, 19 Apr 2021 12:20:38 +0000","(2021, 4, 19, 12, 20, 38, 0, 109, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=13720,False,"During the first weeks of February, we asked r...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/ai-adoption-in-t...,0,https://www.oreilly.com/radar/ai-adoption-in-t...
4,NFTs: Owning Digital Art,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/nfts-owning-digi...,"Tue, 06 Apr 2021 18:43:26 +0000","(2021, 4, 6, 18, 43, 26, 1, 96, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Building a data culture', 'scheme':...",https://www.oreilly.com/radar/?p=13713,False,It would be hard to miss the commotion around ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/nfts-owning-digi...,0,https://www.oreilly.com/radar/nfts-owning-digi...


### 10. Count the number of entries per author and sort them in descending order.

In [17]:
df['authors'].value_counts()

TypeError: unhashable type: 'list'

Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas/_libs/hashtable_class_helper.pxi", line 4588, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'list'


[{'name': 'Nat Torkington'}]                                    27
[{'name': 'Mike Loukides'}]                                     21
[{}]                                                             4
[{'name': 'Tim O’Reilly'}]                                       2
[{'name': 'Alex Castrounis'}]                                    1
[{'name': 'Justin Norman and Mike Loukides'}]                    1
[{'name': 'Patrick Hall and Ayoub Ouederni'}]                    1
[{'name': 'Q Ethan McCallum, Chris Butler and Shane Glynn'}]     1
[{'name': 'Hugo Bowne-Anderson'}]                                1
[{'name': 'Kevlin Henney'}]                                      1
Name: authors, dtype: int64

In [18]:
df.sort_values(by=['authors'])

TypeError: '<' not supported between instances of 'FeedParserDict' and 'FeedParserDict'

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [19]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False).head()

Unnamed: 0,title,author,title_length
15,"Where Programming, Ops, AI, and the Cloud are ...",Mike Loukides,60
12,5 infrastructure and operations trends to watc...,,55
21,O’Reilly’s top 20 live online training courses...,,54
11,5 things on our data and AI radar for 2021,,42
16,Seven Legal Questions for Data Scientists,Patrick Hall and Ayoub Ouederni,41


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [20]:
df.title= df.title.astype(str)

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   title                60 non-null     object
 1   title_detail         60 non-null     object
 2   links                60 non-null     object
 3   link                 60 non-null     object
 4   comments             60 non-null     object
 5   published            60 non-null     object
 6   published_parsed     60 non-null     object
 7   authors              60 non-null     object
 8   author               60 non-null     object
 9   author_detail        56 non-null     object
 10  tags                 59 non-null     object
 11  id                   60 non-null     object
 12  guidislink           60 non-null     bool  
 13  summary              60 non-null     object
 14  summary_detail       60 non-null     object
 15  content              60 non-null     object
 16  wfw_commen

In [22]:
df.title.contains('machine learning') # no se como hacer el ejercicio....

AttributeError: 'Series' object has no attribute 'contains'