caracol - a recommendation engine for longform articles
Table of Contents
Caracol is a tool for discovering articles from around the web that you'll like to read. Users use the caracol bookmarklet to clip notable articles they read and indicate whether they liked or disliked an article. Caracol processes the user's clippings and votes, and uses latent semantic analysis to recommend articles other users have clipped which the user might like.
To get started working on caracol:
Clone the repository on your development machine.
If you are running Ubuntu 12.04 LTS, run
setup.shfrom the root caracol directory:
If you are on Mac OS X, consult
setup.shfor the system and Python dependencies:
- Python 2.7
- Beautiful Soup
Install node.js dependencies using
npm(which will install front-end dependencies using
bower). From the root caracol directory:
Test launching locally:
From the root caracol directory:
In another Terminal window:
If the previous two commands don't return any errors, you have successfully launched caracol! Hack away!
- Twitter Bootstrap
In using Python for the machine learning side of caracol, we face the question of how to communicate between our node server and Python. If Python were running on the same machine as our node server, we could invoke the Python functionality as a child process of node. But that probably wouldn't be a scalable solution, because latent semantic analysis in Python can use up a lot of CPU and memory resources, which could limit the ability of the node server to respond to requests quickly. So we've implemented a solution that allows the machine learning to be done on a remote machine, which communicates with our node server. That solution is essentially TCP sockets, but rather than work with sockets directly and have to deal with messy problems like what to do if a connection is dropped, we're using the ZeroMQ messaging framework. More specifically, we're using ZeroRPC, a library for Python and node.js built on top of ZeroMQ which enables remote procedure calls. Setup was simple and it has worked beautifully. More on our experience using ZeroRPC here.
- Adam Witzel
- Rick Cerf email@example.com
- Michael Munson
- Ian Hinsdale
Copyright (c) 2013 Adam Witzel, Rick Cerf, Michael Munson, Ian Hinsdale
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.