This project crawls and rips entire tumblr image blogs into a local cache, and provides a flexible interface to navigate through the store images.
It is something of a re-implementation of the Tumblr Collage chrome extension, which I felt became too unstable on larger image sites. The browser-based nature of the original extension means it wasn't really possible to implement things I really wanted, like saving your current location in a large collection of images.
This is also just a little demo project I built to experiment with a number of technologies I had my eye on for a while, so don't expect much in the way of support.
It consists of :
- CouchDB database, with all images as attachments.
- Elasticsearch used to index and query the data. (Indexes the couch documents via the couchdb river)
- Node.js based proxy, that mostly is a straight pass-through to ES/Couch.
- CouchApp to host the UI. (my first)
- node.io based scraping back-end, allowing multi-threaded mirroring of sites. (my first)
- Angular based front end (my first)
It also uses :
- Yeoman used for the couchapp generator (my first time using)
- Bower used for client-side dependencies, as I'm usually a browserify guy (my first time)
- Grunt used for complex build processes (my first time using for non-trivial things)
- Masonry based cascading grid layout (my first time using)
- ngInfiniteScroll for infinite scrolling (my first time using)
- couchdb running on localhost:5984
- elasticsearch running on localhost:9200
- node.js + npm
# how to get said requirements running : # get the osx command line tools first # either from developer.apple.com/download, or from xcode. # install homebrew ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)" # install elastic search brew install elasticsearch plugin -install elasticsearch/elasticsearch-river-couchdb/1.2.0 # install couch db from the package on couchdb.apache.org # install nvm for node touch ~/.profile curl https://raw.github.com/creationix/nvm/master/install.sh | sh source ~/.nvm/nvm.sh # install node nvm install 0.10 nvm use 0.10
npm install -g grunt-cli bower node.io # check out this repo git checkout 'whatever' cd browsr # main app npm install # couch app cd couch-app npm install bower install sh install.sh # build an pushes app to couchdb grunt # run the main node app cd .. node index.js
How to use:
application should now be listening on localhost:5000
to download blogs, either use the command line :
cd $browsrDir node.io tumblr sitename [start] [amount] [perbatch] //ex: node.io tumblr jl8comic 1 200 10
this is also mounted on localhost, so you can access the following to do the same :
I use a bookmarklet to do that.
- Better initiation of download tasks, rather than cli or url hacking
- Show completion status of download tasks
- Store download task history for future re-init.
- Schedule regular fetching of specific tasks.
- Configure right-click-action to have 'favorite' and 'tag' modes.
- Figure out how to set a proper 'lastSeen' tag on multiple images at once.
- Integrate with tumblr api through the passport-tumblr auth strategy. Should sync likes and follows.