# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser as fp
import pandas as pd

In [2]:
%pip install feedparser

Collecting feedparser
  Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
     -------------------------------------- 81.1/81.1 kB 323.6 kB/s eta 0:00:00
Collecting sgmllib3k
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Installing collected packages: sgmllib3k, feedparser
  Running setup.py install for sgmllib3k: started
  Running setup.py install for sgmllib3k: finished with status 'done'
Successfully installed feedparser-6.0.10 sgmllib3k-1.0.0
Note: you may need to restart the kernel to use updated packages.


  DEPRECATION: sgmllib3k is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559

[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


### 1. Use feedparser to parse the following RSS feed URL.

In [3]:
nassa=fp.parse('https://www.nasa.gov/rss/dyn/breaking_news.rss')

In [4]:
type(nassa)

feedparser.util.FeedParserDict

### 2. Obtain a list of components (keys) that are available for this feed.

In [5]:
nassa.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [6]:
nassa['feed'].keys()

dict_keys(['language', 'title', 'title_detail', 'subtitle', 'subtitle_detail', 'links', 'link', 'authors', 'author', 'author_detail', 'publisher', 'publisher_detail', 'docs'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [7]:
nassa['feed']['title'],nassa['feed']['subtitle'],nassa['feed']['link']

('NASA Breaking News',
 'A RSS news feed containing the latest NASA news articles and press releases.',
 'http://www.nasa.gov/')

### 5. Count the number of entries that are contained in this RSS feed.

In [8]:
len(nassa['entries'])

10

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [9]:
nassa['entries'][0]

{'title': 'NASA Welcomes Czech Foreign Minister for Artemis Accords Signing',
 'title_detail': {'type': 'text/plain',
  'language': 'en',
  'base': 'http://www.nasa.gov/',
  'value': 'NASA Welcomes Czech Foreign Minister for Artemis Accords Signing'},
 'links': [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'http://www.nasa.gov/press-release/nasa-welcomes-czech-foreign-minister-for-artemis-accords-signing'},
  {'length': '2620728',
   'type': 'image/jpeg',
   'href': 'http://www.nasa.gov/sites/default/files/styles/1x1_cardfeed/public/thumbnails/image/nhq202305030001.jpeg?itok=UafnYVHD',
   'rel': 'enclosure'}],
 'link': 'http://www.nasa.gov/press-release/nasa-welcomes-czech-foreign-minister-for-artemis-accords-signing',
 'summary': 'During a ceremony at NASA Headquarters in Washington Wednesday, the Czech Republic became the 24th country to sign the Artemis Accords. NASA Administrator Bill Nelson participated in the signing ceremony for the agency and Foreign Minister Jan Lip

### 7. Extract a list of entry titles.

In [10]:
lista=[e['title'] for e in nassa['entries']]
lista

['NASA Welcomes Czech Foreign Minister for Artemis Accords Signing',
 'NASA Sets Coverage for Dragon Port Relocation on Space Station',
 'Entrepreneurs to Pitch Ideas for Future in NASA ‘Space Tank’',
 'NASA Updates Coverage of Roscosmos Spacewalks at Space Station',
 'NASA Sets Coverage for Czech Republic Artemis Accords Signing Ceremony',
 'NASA Experts Available for Interviews About Sea and Sky Campaign',
 'NASA Announces Winners of 2023 Human Exploration Rover Challenge',
 'NASA Selects 12 Companies to Collaborate on Key Technology Development',
 'NASA Announces Student Winners of Power to Explore Challenge',
 'NASA, Rocket Lab Set Coverage for Tropical Cyclones Mission']

### 8. Calculate the percentage of "Four short links" entry titles.

### 9. Create a Pandas data frame from the feed's entries.

In [21]:
df=pd.DataFrame(nassa['entries']).drop(columns=['title_detail','links','summary_detail','guidislink'])
df

Unnamed: 0,title,link,summary,id,published,published_parsed,source,dc_identifier
0,NASA Welcomes Czech Foreign Minister for Artem...,http://www.nasa.gov/press-release/nasa-welcome...,During a ceremony at NASA Headquarters in Wash...,http://www.nasa.gov/press-release/nasa-welcome...,"Wed, 03 May 2023 10:36 EDT","(2023, 5, 3, 14, 36, 0, 2, 123, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,487001
1,NASA Sets Coverage for Dragon Port Relocation ...,http://www.nasa.gov/press-release/nasa-sets-co...,Four crew members aboard the International Spa...,http://www.nasa.gov/press-release/nasa-sets-co...,"Mon, 01 May 2023 15:06 EDT","(2023, 5, 1, 19, 6, 0, 0, 121, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486960
2,Entrepreneurs to Pitch Ideas for Future in NAS...,http://www.nasa.gov/press-release/entrepreneur...,College students in NASA’s Minority University...,http://www.nasa.gov/press-release/entrepreneur...,"Thu, 27 Apr 2023 14:02 EDT","(2023, 4, 27, 18, 2, 0, 3, 117, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486922
3,NASA Updates Coverage of Roscosmos Spacewalks ...,http://www.nasa.gov/press-release/nasa-updates...,NASA will provide live coverage as two Roscosm...,http://www.nasa.gov/press-release/nasa-updates...,"Wed, 26 Apr 2023 14:59 EDT","(2023, 4, 26, 18, 59, 0, 2, 116, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486897
4,NASA Sets Coverage for Czech Republic Artemis ...,http://www.nasa.gov/press-release/nasa-sets-co...,The Czech Republic is expected to sign the Art...,http://www.nasa.gov/press-release/nasa-sets-co...,"Wed, 26 Apr 2023 14:49 EDT","(2023, 4, 26, 18, 49, 0, 2, 116, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486898
5,NASA Experts Available for Interviews About Se...,http://www.nasa.gov/press-release/nasa-experts...,"This spring, NASA’s S-MODE (Sub-Mesoscale Ocea...",http://www.nasa.gov/press-release/nasa-experts...,"Wed, 26 Apr 2023 14:09 EDT","(2023, 4, 26, 18, 9, 0, 2, 116, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486894
6,NASA Announces Winners of 2023 Human Explorati...,http://www.nasa.gov/press-release/nasa-announc...,NASA has announced the winners of the 2023 Hum...,http://www.nasa.gov/press-release/nasa-announc...,"Wed, 26 Apr 2023 09:04 EDT","(2023, 4, 26, 13, 4, 0, 2, 116, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486884
7,NASA Selects 12 Companies to Collaborate on Ke...,http://www.nasa.gov/press-release/nasa-selects...,NASA has selected 16 proposals from 12 compani...,http://www.nasa.gov/press-release/nasa-selects...,"Tue, 25 Apr 2023 14:31 EDT","(2023, 4, 25, 18, 31, 0, 1, 115, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486867
8,NASA Announces Student Winners of Power to Exp...,http://www.nasa.gov/press-release/nasa-announc...,NASA selected three winners out of nine finali...,http://www.nasa.gov/press-release/nasa-announc...,"Tue, 25 Apr 2023 10:30 EDT","(2023, 4, 25, 14, 30, 0, 1, 115, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486862
9,"NASA, Rocket Lab Set Coverage for Tropical Cyc...",http://www.nasa.gov/press-release/nasa-rocket-...,"NASA and Rocket Lab are targeting 9 p.m. EDT, ...",http://www.nasa.gov/press-release/nasa-rocket-...,"Mon, 24 Apr 2023 15:18 EDT","(2023, 4, 24, 19, 18, 0, 0, 114, 0)",{'href': 'http://www.nasa.gov/rss/dyn/breaking...,486848


### 10. Count the number of entries per author and sort them in descending order.

In [20]:
sorted([e['dc_identifier']for e in nassa['entries']])

['486848',
 '486862',
 '486867',
 '486884',
 '486894',
 '486897',
 '486898',
 '486922',
 '486960',
 '487001']

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [29]:
df['length']=df['title'].apply(len)
sol=df[['title','length']]
sol


Unnamed: 0,title,length
0,NASA Welcomes Czech Foreign Minister for Artem...,64
1,NASA Sets Coverage for Dragon Port Relocation ...,62
2,Entrepreneurs to Pitch Ideas for Future in NAS...,60
3,NASA Updates Coverage of Roscosmos Spacewalks ...,62
4,NASA Sets Coverage for Czech Republic Artemis ...,70
5,NASA Experts Available for Interviews About Se...,64
6,NASA Announces Winners of 2023 Human Explorati...,64
7,NASA Selects 12 Companies to Collaborate on Ke...,70
8,NASA Announces Student Winners of Power to Exp...,60
9,"NASA, Rocket Lab Set Coverage for Tropical Cyc...",59


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [46]:
pd.DataFrame(nassa['entries'])['summary_detail'].apply(lambda x : x['value'] )

0    During a ceremony at NASA Headquarters in Wash...
1    Four crew members aboard the International Spa...
2    College students in NASA’s Minority University...
3    NASA will provide live coverage as two Roscosm...
4    The Czech Republic is expected to sign the Art...
5    This spring, NASA’s S-MODE (Sub-Mesoscale Ocea...
6    NASA has announced the winners of the 2023 Hum...
7    NASA has selected 16 proposals from 12 compani...
8    NASA selected three winners out of nine finali...
9    NASA and Rocket Lab are targeting 9 p.m. EDT, ...
Name: summary_detail, dtype: object

In [41]:
pd.DataFrame(nassa['entries'])['summary_detail'].apply(lambda x :x if 'machine learning' in x['value'] else 'none')

0    none
1    none
2    none
3    none
4    none
5    none
6    none
7    none
8    none
9    none
Name: summary_detail, dtype: object