Cascading scheme for Solr
Java XSLT JavaScript CSS HTML
Failed to load latest commit information.
doc Initial commit May 18, 2011
.gitignore Initial commit May 18, 2011 Add example code snippet for using SolrScheme Nov 12, 2015
build.xml Lots of changes to sync with Hadoop 2.2/Cascading 2.5 Oct 25, 2014
pom.xml Bumping master version after creating branch for previous version Oct 26, 2014


This is a Cascading scheme for Solr.

It lets you easily add a Tap to a worfklow that generates a Lucene index using Solr. The resulting index will have N shards for N reducers, thus you can call the scheme's setNumSinkParts to control this value.

Indexes are built locally on the slave's hard disk drives, by leveraging embedded Solr. Once the index has been built (and optionally optimized), it is copied to the target location (HDFS, S3, etc) as specified by the Tap. This improves the performance of building indexes, especially if you can deploy multiple shards and thus build using many reducers.

You can find the latest version of the jar at Conjars

To add this jar to your project (via Maven), include the following in your pom.xml:

            <name>Cascading repository</name>

An example of using this SolrScheme (taken from and looks like:

    private static final String TEST_DIR = "build/test/SolrSchemeHadoopTest/";
    private static final String SOLR_HOME_DIR = "src/test/resources/solr-home-4.1/"; 
    protected static final String SOLR_CORE_DIR = SOLR_HOME_DIR + "collection1"; 

    public void testSimpleIndexing() throws Exception {
        final Fields testFields = new Fields("id", "name", "price", "cat", "inStock", "image");

        final String in = TEST_DIR + "testSimpleIndexing/in";
        final String out = TEST_DIR + "testSimpleIndexing/out";

        byte[] imageData = new byte[] {0, 1, 2, 3, 5};

        // Create some data
        Tap source = new Hfs(new SequenceFile(testFields), in, SinkMode.REPLACE);
        TupleEntryCollector write = source.openForWrite(makeFlowProcess());
        Tuple t = new Tuple();
        t.add("TurboWriter 2.3");
        t.add(new Tuple("wordprocessor", "Japanese"));

        t = new Tuple();
        t.add("Shasta 1.0");

        BytesWritable bw = new BytesWritable(imageData);
        bw.setCapacity(imageData.length + 10);

        // Now read from the results, and write to a Solr index.
        Pipe writePipe = new Pipe("tuples to Solr");

        Scheme scheme = new SolrScheme(testFields, SOLR_CORE_DIR);
        Tap solrSink = new Hfs(scheme, out, SinkMode.REPLACE);
        Flow flow = makeFlowConnector().connect(source, solrSink, writePipe);

        // Open up the Solr index, and do some searches.
        System.setProperty("", out + "/part-00000");

        CoreContainer coreContainer = new CoreContainer(SOLR_HOME_DIR);
        SolrServer solrServer = new EmbeddedSolrServer(coreContainer, "");

        ModifiableSolrParams params = new ModifiableSolrParams();
        params.set(CommonParams.Q, "turbowriter");

        QueryResponse res = solrServer.query(params);
        assertEquals(1, res.getResults().size());
        byte[] storedImageData = (byte[])res.getResults().get(0).getFieldValue("image");
        assertEquals(imageData, storedImageData);

        params.set(CommonParams.Q, "cat:Japanese");
        res = solrServer.query(params);
        assertEquals(1, res.getResults().size());

        params.set(CommonParams.Q, "cat:Chinese");
        res = solrServer.query(params);
        assertEquals(1, res.getResults().size());
        storedImageData = (byte[])res.getResults().get(0).getFieldValue("image");
        assertEquals(imageData, storedImageData);

        params.set(CommonParams.Q, "bogus");
        res = solrServer.query(params);
        assertEquals(0, res.getResults().size());