GitHub - albarrentine/bebop: Solr object mapping and schema generation without all the XML

albarrentine / bebop Public

Notifications You must be signed in to change notification settings
Fork 0
Star 5

Solr object mapping and schema generation without all the XML

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
bebop		bebop
test_bebop		test_bebop
.gitignore		.gitignore
LICENSE		LICENSE
README		README
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

                                          .-'''-.                           
                                         '   _    \                         
/|              __.....__     /|       /   /` '.   \_________   _...._      
||          .-''         '.   ||      .   |     \  '\        |.'      '-.   
||         /     .-''"'-.  `. ||      |   '      |  '\        .'```'.    '. 
||  __    /     /________\   \||  __  \    \     / /  \      |       \     \
||/'__ '. |                  |||/'__ '.`.   ` ..' /    |     |        |    |
|:/`  '. '\    .-------------'|:/`  '. '  '-...-'`     |      \      /    . 
||     | | \    '-.____...---.||     | |               |     |\`'-.-'   .'  
||\    / '  `.             .' ||\    / '               |     | '-....-'`    
|/\'..' /     `''-...... -'   |/\'..' /               .'     '.             
'  `'-'`                      '  `'-'`              '-----------'           

----------------------------------------------------------------------------

Bebop is a Solr abstraction library written for the busy developer.

pysolr is great, and is in fact a dependency for this library, so not trying to duplicate effort on that front. I'd say pysolr is to MySQLdb as bebop is to SQLAlchemy. Bebop is designed to be a full-fledged ORM (OIM? Object-Index-Mapper?) for translating between your domain models and your Solr Index.

Similar projects include Django's haystack, although that is not specifically designed with Solr in mind and cannot generate Solr's various XML schemas nor produce its queries in as tailored a manner as can bebop. I'm not particularly concerned with Whoosh and Xapian and Sphinx.

Despite the dynamic duo reference, bebop is in no way dependent on rocksteady and is agnostic as to which domain models you're trying to create.

I've had bad experiences with trying to do a Solr import all in one SQL query in the past. I'll try to provide some examples with server-side cursors to show what a chunked import looks like.

TODO: USAGE EXAMPLES!!!!

Some dependency stuff:

1) The schema generation stuff uses lxml. It's a dependable and super-fast library, but requires C compilation which is always a bitch, so make sure you have the following packages installed (using Debian/Ubuntu names because that's what I know): libxml2, libxslt1, libxml2-dev libxslt1-dev

2) pysolr is currently the underlying driver for this project. That's the kind of thing that might benefit from a C implementation, although I suspect that if you need that at the "driver" level you're probably doing something wrong.

Running the tests:

cd bebop
nosetests ./test

On generated schemas:

I've debated how much deploy stuff to put in this lib, and have decided to stay out of it wherever possible. I'm not going to touch setting up Jetty/Tomcat/Resin and where you put your data vs. schema etc. In the simplest case, just copy the example stuff from your Solr distro to /opt/solr and replace the "solr" directory with what bebop generates for you. Then you should be able to start your servlet container (e.g. "java -jar start.jar" in the case of Jetty) pretty seamlessly.

Word.