caracol - a recommendation engine for longform articles

Caracol is a tool for discovering articles from around the web that you'll like to read. Users use the caracol bookmarklet to clip notable articles they read and indicate whether they liked or disliked an article. Caracol processes the user's clippings and votes, and uses latent semantic analysis to recommend articles other users have clipped which the user might like.

##Deployment / Development

To get started working on caracol:

Clone the repository on your development machine.
Install dependencies:
1. If you are running Ubuntu 12.04 LTS, run setup.sh from the root caracol directory:
```
 sh setup.sh
```
2. If you are on Mac OS X, consult setup.sh for the system and Python dependencies:
  - node.js
  - PostgreSQL
  - ZeroMQ
  - libevent
  - Python 2.7
    - psycopg2
    - Beautiful Soup
    - lxml
    - numpy
    - pyyaml
    - nltk
    - pyzmq
    - ZeroRPC
3. Install node.js dependencies using npm (which will install front-end dependencies using bower). From the root caracol directory:
```
 npm install
```
Test launching locally:
1. From the root caracol directory:
```
 python python/server.py
```
2. In another Terminal window:
```
 node server.js
```
3. If the previous two commands don't return any errors, you have successfully launched caracol! Hack away!

##The Stack

Core architecture:

Front-end snazz:

Twitter Bootstrap
Topcoat
Stylus
Jade

Back-end utilities:

ZeroRPC
Bookshelf.js
Grunt
Bower

Testing:

Mocha

##Challenges

In using Python for the machine learning side of caracol, we face the question of how to communicate between our node server and Python. If Python were running on the same machine as our node server, we could invoke the Python functionality as a child process of node. But that probably wouldn't be a scalable solution, because latent semantic analysis in Python can use up a lot of CPU and memory resources, which could limit the ability of the node server to respond to requests quickly. So we've implemented a solution that allows the machine learning to be done on a remote machine, which communicates with our node server. That solution is essentially TCP sockets, but rather than work with sockets directly and have to deal with messy problems like what to do if a connection is dropped, we're using the ZeroMQ messaging framework. More specifically, we're using ZeroRPC, a library for Python and node.js built on top of ZeroMQ which enables remote procedure calls. Setup was simple and it has worked beautifully. More on our experience using ZeroRPC here.

##Team

Adam Witzel
- Personal site: http://adamwitzel.com
- Email: adam.witzel@gmail.com
- Github: https://github.com/adwitz
Rick Cerf email@email.com
- Personal site: http://www.rcerf.com
- Email: rickcerf@gmail.com
- Github: https://github.com/rcerf
Michael Munson
- Email: michaelmunson1@gmail.com
- Github: https://github.com/michaelmunson1
Ian Hinsdale
- Personal site: http://ianhinsdale.com
- Email: ihinsdale@gmail.com
- Github: https://github.com/ihinsdale

##License The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 499 Commits
badges		badges
client		client
config		config
controllers		controllers
database		database
dist		dist
public		public
python		python
test		test
views		views
.bowerrc		.bowerrc
.gitignore		.gitignore
.travis.yml		.travis.yml
Gruntfile.js		Gruntfile.js
LICENSE.md		LICENSE.md
README.md		README.md
bower.json		bower.json
package.json		package.json
server.js		server.js
setup-ml.sh		setup-ml.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

caracol - a recommendation engine for longform articles

Table of Contents

About

Releases

Packages

Contributors 5

Languages

License

caracolrec/caracol

Folders and files

Latest commit

History

Repository files navigation

caracol - a recommendation engine for longform articles

Table of Contents

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages