github
Advanced Search
  • Home
  • Pricing and Signup
  • Explore GitHub
  • Blog
  • Login

jamiew / tumblr-dashboard-rss

  • Admin
  • Watch Unwatch
  • Fork
  • Your Fork
  • Pull Request
  • Download Source
    • 16
    • 3
  • Source
  • Commits
  • Network (3)
  • Issues (0)
  • Downloads (0)
  • Wiki (1)
  • Graphs
  • Tree: f0d25b4

click here to add a description

click here to add a homepage

  • Switch Branches (1)
    • master
  • Switch Tags (0)
  • Branch List
Sending Request…

Mechanize scraper to generate an RSS feed of your Tumblr dashboard — Read more

  Cancel

http://jamiedubs.com/rss-feed-of-your-tumblr-dashboard

  Cancel
  • Private
  • Read-Only
  • HTTP Read-Only

This URL has Read+Write access

updated to work with tumblr v5 
jamiew (author)
Sat Mar 07 15:44:23 -0800 2009
commit  f0d25b4791be53e5332c15592001b54a8b604079
tree    c0e0e63a06d36e96b9592db3776c87cbe002793c
parent  98069dbeeaf0ad5040918067e17b3f5022fec7b1
tumblr-dashboard-rss / tumblr-dashboard-rss.rb tumblr-dashboard-rss.rb
100755 109 lines (88 sloc) 3.152 kb
edit raw blame history
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
#!/usr/bin/env ruby
#
# tumblr dashboard => RSS generator
# fingers crossed they'll add their own soon <3
#
# code: http://github.com/jamiew/tumblr-dashboard-rss
#
# @author Jamie Wilkinson
# @email jamie@internetfamo.us
# @website http://jamiedubs.com
#
 
require 'rubygems'
gem 'mechanize', '>=0.9.0' # new version w/ nokogiri pls!
require 'mechanize'
require 'rss/maker'
 
## config
email = "you@example.com"
password = "secret"
pages = 3
 
 
 
## mixins
class String
  def strip_html(allowed = ['a','img','p','br','i','b','u','ul','li'])
    str = self.strip || ''
    str.gsub(/<(\/|\s)*[^(#{allowed.join('|') << '|\/'})][^>]*>/,'')
  end
end
 
## start
 
# freak out if you haven't set this up yet
raise "You need to set your email & password! Edit this file" if email == 'you@example.com' and password == 'secret'
 
# login to tumblr
# TODO: load & save cookies
agent = WWW::Mechanize.new
agent.user_agent = "#{email} :: Tumblr Dashboard RSS <http://github.com/jamiew/tumblr-dashboard-rss>"
 
page = agent.get("http://www.tumblr.com/login")
form = page.form_with(:action => '/login')
form.email = email
form.password = password
agent.submit(form)
 
## go back to dashboard
posts = []
(1..pages).each { |i|
  i = '' if i == 1 # what we'll grab in the URL (/dashboard/2)
  STDERR.puts "getting page #{i}..."
  page = agent.get("http://www.tumblr.com/dashboard/#{i}")
  start = (i == '' ? 1 : 0) # 1st post on 1st page isn't a real post
 
  # hmm. Nokogiri doesn't seem to be having a good time with li.post (just returns the first)
  # fortunately we have li.not_mine; FIXME TODO
  posts += (page/'#posts li.not_mine')
  sleep 2
}
 
## generate RSS
content = RSS::Maker.make("2.0") { |m|
  m.channel.title = "tumblr dashboard for #{email}"
  m.channel.link = "http://www.tumblr.com/dashboard"
  m.channel.description = "Latest from yr Tumblr Dashboard"
  # m.items.do_sort = true # sort items by date
  
  author = "WHO DAT NINJA" #temp
  posts.each { |post|
 
    # basic post info
    kind = post['class'].gsub('post','').gsub('is_mine','').split(' ').first
    title = (post/'.post_title a').first.content.strip_html([]) rescue kind
    author = (post/'.post_info a').first.content unless (post/'.post_info a').first.nil? # carry over previous author
    link = (post/'a').last.attributes['href']
    
    # delete things we don't want and extract the remaining stuff as 'content'
    (post/'.post_title').remove
    (post/'.post_info').remove
    (post/'.post_controls').remove
    (post/'table').remove
    (post/'.so_ie_doesnt_treat_this_as_inline').remove
    content = post.to_s.strip
    
    # STDERR.puts "#{kind} post by #{author}, #{title} => #{link}"
    # STDERR.puts "#{content.inspect}"
    # STDERR.puts "---"
 
    item = m.items.new_item
    item.title = title
    item.link = link # just use whatever link is first
    item.description = content # ghetto, should strip some stuff
    # i.date = Time.now # they don't give us a time
  }
}
 
## write to disk
# destination = "tumblr-dashboard-rss.xml"
# File.open(destination,"w") { |f|
# f.write(content)
# }
 
## output now
puts "Content-Type: application/rss+xml\n"
puts content
 
 
Blog | Support | Training | Contact | API | Status | Twitter | Help | Security
© 2010 GitHub Inc. All rights reserved. | Terms of Service | Privacy Policy
Powered by the Dedicated Servers and
Cloud Computing of Rackspace Hosting®
Dedicated Server