Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Added basic reddit scraper #8

Closed
wants to merge 1 commit into from

2 participants

@vrokolos

I use this as a separate reddit only plugin after copying everything over to another folder.

This scrapper fetches every jpg and png from reddit's homepage (this includes my personal feed code if you see in the source code).

You should make it so that when description is empty you don't show the transparent description panel at the bottom which darkens the photograph/picture for no reason at all.

Anyways you can add more configuration options for this like subreddits/personal feed code found in reddit preferences etc.

Also I changed the theme a little bit in my personal version so that the picture begins right after the title panel and not behind it

Just writing all this info because I'm not seeing myself extending this or generating a valid distributable plugin and will only use it for me personally. If anyone wants to fix all that stuff you're welcome!

Also you might wanna look into developing a system that fetches next pages on the fly when you're at the end of the current picture buffer (reddit has almost infinite pages and I just fetch 3 of them here because I can't find any other solution to this)

@dersphere
Owner

Hi,

There are very good picture sources out there, and surely reddit is one of them. But I see The Big Pictures as Photojournalism Add-on. And reddit is not Photojournalism ;)

Sorry, but I won't add this.

Maybe I will develop a reddit picture plugin in the future, or maybe @jbeluch wants to do that?

@dersphere dersphere closed this
@vrokolos

I understand and that's the reason I use this as a standalone reddit-only plugin. Maybe notify jbeluch about this or something :) I just wanted to post source somewhere before it fades away in my HD for my personal use only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 28, 2013
  1. @vrokolos

    Added Reddit picture scrapper

    vrokolos authored
This page is out of date. Refresh to see the latest.
Showing with 80 additions and 0 deletions.
  1. +80 −0 resources/lib/scrapers/7_reddit.py
View
80 resources/lib/scrapers/7_reddit.py
@@ -0,0 +1,80 @@
+import time
+import urllib2
+import simplejson as json
+import xbmc
+import re
+from scraper import ScraperPlugin
+
+
+class Scraper(ScraperPlugin):
+
+ _title = 'Reddit'
+
+ def _get_albums(self):
+ self.albums = []
+ self.albums.append({'title': "hot",
+ 'album_id': 1,
+ 'pic': "http://blogs-images.forbes.com/gregvoakes/files/2012/06/reddit-logo.jpeg",
+ 'description': "HOT",
+ 'album_url': "http://www.reddit.com/.json?feed=b37a5c83510ebd741dd3b290939af9a2a7aa45cc&user=Vrokolos"})
+ self.albums.append({'title': "week",
+ 'album_id': 1,
+ 'pic': "http://blogs-images.forbes.com/gregvoakes/files/2012/06/reddit-logo.jpeg",
+ 'description': "HOT",
+ 'album_url': "http://www.reddit.com/top/.json?sort=top&t=week&feed=b37a5c83510ebd741dd3b290939af9a2a7aa45cc&user=Vrokolos"})
+ self.albums.append({'title': "month",
+ 'album_id': 1,
+ 'pic': "http://blogs-images.forbes.com/gregvoakes/files/2012/06/reddit-logo.jpeg",
+ 'description': "HOT",
+ 'album_url': "http://www.reddit.com/top/.json?sort=top&t=month&feed=b37a5c83510ebd741dd3b290939af9a2a7aa45cc&user=Vrokolos"})
+ self.albums.append({'title': "year",
+ 'album_id': 1,
+ 'pic': "http://blogs-images.forbes.com/gregvoakes/files/2012/06/reddit-logo.jpeg",
+ 'description': "HOT",
+ 'album_url': "http://www.reddit.com/top/.json?sort=top&t=year&feed=b37a5c83510ebd741dd3b290939af9a2a7aa45cc&user=Vrokolos"})
+ self.albums.append({'title': "all",
+ 'album_id': 1,
+ 'pic': "http://blogs-images.forbes.com/gregvoakes/files/2012/06/reddit-logo.jpeg",
+ 'description': "HOT",
+ 'album_url': "http://www.reddit.com/top/.json?sort=top&t=all&feed=b37a5c83510ebd741dd3b290939af9a2a7aa45cc&user=Vrokolos"})
+ return self.albums
+
+ def _get_photos(self, album_url):
+ self.photos = []
+ realalbumurl = album_url
+ for x in range(0, 3):
+ try:
+ photos2, album_url = self.__get_photo_page(album_url, realalbumurl)
+ self.photos.extend(photos2)
+ time.sleep(1)
+ except:
+ photos2, album_url = self.__get_photo_page(album_url, realalbumurl)
+ self.photos.extend(photos2)
+ time.sleep(1)
+ return self.photos
+
+ def __get_photo_page(self, album_url, realalbumurl):
+ page_photos = []
+ next_page_url = None
+ response = urllib2.urlopen(album_url).read()
+ album_title = album_url
+ photos1 = json.loads(response)
+ for photo in photos1['data']['children']:
+ link = photo['data']['url']
+ if link.endswith('.jpg') or link.endswith('.png'):
+ description = ''
+ d = photo['data']['title']
+ title = d.encode('ascii', 'ignore')
+ page_photos.append({'title': title,
+ 'album_title': album_title,
+ 'photo_id': link,
+ 'pic': link,
+ 'description': description,
+ 'album_url': album_url})
+ if 'after' in photos1['data']:
+ s = photos1['data']['after']
+ next_page_url = realalbumurl + '&after=' + s
+ return page_photos, next_page_url
+
+def register(id):
+ return Scraper(id)
Something went wrong with that request. Please try again.