
# Really Simple Syndication (RSS)

AKA: Rich Site Summary; [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) Site Summary

**From Wikipedia**  
RSS is a type of web feed which allows users to access updates to online content in a **standardized, computer-readable format**. 
These feeds can, for example, allow a user to keep track of many different websites in a single news aggregator. 
The news aggregator will automatically check the RSS feed for new content, allowing the content to be automatically passed from website to website or from website to user. 
This passing of content is called web syndication. 
Websites usually use RSS feeds to publish frequently updated information, such as blog entries, news headlines, audio, video. An RSS document (called "feed", "web feed", or "channel") includes full or summarized text, and metadata, like publishing date and author's name.

### Reddit supports RSS access to their community of posts.

Click the link below to see the new Reddit posts feed in raw form (XML)
 * https://www.reddit.com/new/.rss?sort=new
 
This is an example of a sub-reddit RSS feed:
 * https://www.reddit.com/r/datascience/.rss?sort=new
 
**In both cases, we see the pattern that after the URL's last slash, "/", we add**  
  
`.rss?sort=new`


**The wall of character data you see should scream: _Parse Me with Python!_**

#### RSS Feed Content

The RSS Feed is structured as a set of items which can roughly be expected to follow the below structure. 
Note: the XML has been parsed into a DOM, then renderd as a JSON here.


---

```JSON
{
	'guidislink': True, 
	'author_detail': {
		'href': 'https://www.reddit.com/user/cryoskyd', 
		'name': '/u/cryoskyd'
		}, 
	'links': [
		{
			'rel': 'alternate', 
			'type': 'text/html', 
			'href': 'https://www.reddit.com/r/SkydTech/comments/8r47kj/dealmaster_get_a_15inch_dell_laptop_with_an/'
		}], 
	'href': 'https://www.reddit.com/user/cryoskyd', 
	'updated_parsed': time.struct_time(tm_year=2018, tm_mon=6, tm_mday=14, tm_hour=18, tm_min=38, tm_sec=25, tm_wday=3, tm_yday=165, tm_isdst=0), 
	'authors': [
		{'href': 'https://www.reddit.com/user/cryoskyd', 
		'name': '/u/cryoskyd'
		}
		], 
	'tags': [
		{'label': 'r/SkydTech', 
		'scheme': None, 
		'term': 'SkydTech'
		}], 
	'title_detail': {
		'type': 'text/plain', 
		'base': 'https://www.reddit.com/new/.rss?sort=new', 
		'value': 'Dealmaster: Get a 15-inch Dell laptop with an 8th-gen Core i7 for $580', 				'language': None
		}, 
	'summary': '<table> <tr><td> <a href="https://www.reddit.com/r/SkydTech/comments/8r47kj/dealmaster_get_a_15inch_dell_laptop_with_an/"> <img src="https://b.thumbs.redditmedia.com/ioyXj08RjCyhRbNWiPQfcjrQMlHTG4Ec-LrYJ6MB0kI.jpg" alt="Dealmaster: Get a 15-inch Dell laptop with an 8th-gen Core i7 for $580" title="Dealmaster: Get a 15-inch Dell laptop with an 8th-gen Core i7 for $580" /> </a> </td><td> &#32; submitted by &#32; <a href="https://www.reddit.com/user/cryoskyd"> /u/cryoskyd </a> &#32; to &#32; <a href="https://www.reddit.com/r/SkydTech/"> r/SkydTech </a> <br/> <span><a href="https://arstechnica.com/staff/2018/06/dealmaster-get-a-15-inch-dell-laptop-with-an-8th-gen-core-i7-for-580/">[link]</a></span> &#32; <span><a href="https://www.reddit.com/r/SkydTech/comments/8r47kj/dealmaster_get_a_15inch_dell_laptop_with_an/">[comments]</a></span> </td></tr></table>', 
	'id': 'https://www.reddit.com/new/t3_8r47kj', 
	'updated': '2018-06-14T18:38:25+00:00', 
	'content': [
		{
			'type': 'text/html', 
			'base': 'https://www.reddit.com/new/.rss?sort=new', 
			'value': '<table> <tr><td> <a href="https://www.reddit.com/r/SkydTech/comments/8r47kj/dealmaster_get_a_15inch_dell_laptop_with_an/"> <img src="https://b.thumbs.redditmedia.com/ioyXj08RjCyhRbNWiPQfcjrQMlHTG4Ec-LrYJ6MB0kI.jpg" alt="Dealmaster: Get a 15-inch Dell laptop with an 8th-gen Core i7 for $580" title="Dealmaster: Get a 15-inch Dell laptop with an 8th-gen Core i7 for $580" /> </a> </td><td> &#32; submitted by &#32; <a href="https://www.reddit.com/user/cryoskyd"> /u/cryoskyd </a> &#32; to &#32; <a href="https://www.reddit.com/r/SkydTech/"> r/SkydTech </a> <br/> <span><a href="https://arstechnica.com/staff/2018/06/dealmaster-get-a-15-inch-dell-laptop-with-an-8th-gen-core-i7-for-580/">[link]</a></span> &#32; <span><a href="https://www.reddit.com/r/SkydTech/comments/8r47kj/dealmaster_get_a_15inch_dell_laptop_with_an/">[comments]</a></span> </td></tr></table>', 
			'language': None
		}], 
	'link': 'https://www.reddit.com/r/SkydTech/comments/8r47kj/dealmaster_get_a_15inch_dell_laptop_with_an/', 
	'author': '/u/cryoskyd', 
	'title': 'Dealmaster: Get a 15-inch Dell laptop with an 8th-gen Core i7 for $580'
}
```

---

### Example Code:

The example code below grabs the new reddit posts from the RSS feed, then prints the first one.

In [None]:
import feedparser
import pprint as pp

# Define URL of the RSS Feed I want
a_reddit_rss_url = 'http://www.reddit.com/new/.rss?sort=new'

feed = feedparser.parse( a_reddit_rss_url )

if (feed['bozo'] == 1):
    print("Error Reading/Parsing Feed XML Data")    
else:
    for item in feed[ "items" ]:
        pp.pprint(item)
        break