How To Install Apache Solr on Red Hat

digisus edited this page Jun 28, 2013 · 1 revision
Clone this wiki locally

The following instructions show how to install and configure the Apache Solr search platform for CKAN on Red Hat Linux. The standard documentation for setting up Solr when installing CKAN on Ubuntu 10.4 uses the Java JDK and jetty to power Solr. The instructions below use the Java JDK with Tomcat instead.

CKAN uses customized schema files that take into account its specific search needs. Different versions of the schema file for Solr are found in the ckan/ckan/config/solr directory of the ckan codebase. Solr can also be set up with multiple Solr cores to support multiple configurations and indexes on the same instance. This is specially useful when you want other applications than CKAN or different CKAN versions to use the same Solr instance. The instructions here set up two Solr cores to support both schema-1.3.xml for CKAN and schema-1.4.xml (the current latest version). As new CKAN schemas are introduced in the future (e.g., schema-1.5), you will want to modify these instructions accordingly.

Table of Contents

Install latest Java JDK

yum remove java
yum install java-1.6.0-openjdk
yum install java-1.6.0-openjdk-devel

Install Tomcat6

sudo yum install -y tomcat6 

Install Solr

Install Solr 1.4.1 and create a directory at /data/solr which will contain the configuration and data for your Solr cores. Tomcat needs to be the owner of /data/solr and its contents.

cd /usr/src/
curl http://mirror.lividpenguin.com/pub/apache/lucene/solr/1.4.1/apache-solr-1.4.1.tgz | tar xfz -
mkdir -p /data/solr
cp -R apache-solr-1.4.1/example/solr/* /data/solr
cp apache-solr-1.4.1/dist/apache-solr-1.4.1.war /data/solr/solr.war
chown -R tomcat /data/solr/

Setup Tomcat for Solr

Create a solr.xml file for Catalina localhost.

vi /etc/tomcat6/Catalina/localhost/solr.xml

The contents of the solr.xml file should be:

<Context docBase="/data/solr/solr.war" debug="0" privileged="true" allowLinking="true" crossContext="true">
  <Environment name="solr/home" type="java.lang.String" value="/data/solr" override="true" />
</context> 

Fix Redhat compatibility issues

mkdir -p /usr/share/tomcat6/common/endorsed
cd /usr/share/tomcat6/common/endorsed/
ln -s /usr/share/java/xalan-j2.jar xalan-j2.jar 

Start Tomcat

/etc/init.d/tomcat6 start

Add Multiple Core Support and Setup directories

mkdir -p /data/solr/cores
cp /usr/src/apache-solr-1.4.1/example/multicore/solr.xml /data/solr/
mkdir -p /data/solr/cores/core0/conf
mkdir -p /data/solr/cores/core1/conf

This default setup will use the following locations in your file system:

  • /data/solr: Solr home, with a symlink pointing to the configuration dir in /etc.
  • /data/solr/conf: Solr configuration files. The more important ones are schema.xml and solrconfig.xml.
  • /data/solr/cores: Solr Multiple core for schema-1.3.xml and schema-1.4.xml
  • /data/solr/cores/core0 : for Latest Ckan instance using schema-1.4.xml
  • /data/solr/cores/core1 : for Ckan instance using schema-1.3.xml

Set up /var/lib directories for each Solr core

Each core needs to be configured with its own data directory. This is really important to prevent conflicts between cores. The following commands create the data directories for core0 and core1:

mkdir -p /var/lib/solr/data/core0
chown tomcat /var/lib/solr/data/core0
chgrp tomcat /var/lib/solr/data/core0

mkdir -p /var/lib/solr/data/core1
chown tomcat /var/lib/solr/data/core1
chgrp tomcat /var/lib/solr/data/core1

Copy configuration data for each Solr core

Populate the core directories by copying /data/solr/conf into each subdirectory of /data/solr/cores, and then add a data directory.

for i in `ls /data/solr/cores`; do cp /data/solr/conf /data/solr/cores/$i/ -r; done
for i in `ls /data/solr/cores`; do mkdir /data/solr/cores/$i/data; done

Overwrite schema.xml files for each Solr core with CKAN schemas

Copy CKAN schemas to replace the schema.xml files for each core. The lines below set up versions 1.3 and 1.4. To set up other schema versions, you would modify the instructions below accordingly:

cp ~/pyenv/src/ckan/ckan/config/solr/schema-1.3.xml /data/solr/cores/core1/conf/schema.xml
cp ~/pyenv/src/ckan/ckan/config/solr/schema-1.4.xml /data/solr/cores/core0/conf/schema.xml

Change ownership and group of the schema.xml files:

chown tomcat /data/solr/cores/core1/conf/schema.xml
chgrp tomcat /data/solr/cores/core1/conf/schema.xml
chown tomcat /data/solr/cores/core0/conf/schema.xml
chgrp tomcat /data/solr/cores/core0/conf/schema.xml

Restart Tomcat:

/etc/init.d/tomcat6 restart

Edit solr.xml to define the available cores

vi /data/solr/solr.xml
<cores adminPath="/admin/cores">
  <core name="ckan-schema-1.4" instanceDir="cores/core0">
    <property name="dataDir" value="/data/solr/cores/core0" />
  </core>
  <core name="ckan-schema-1.3" instanceDir="cores/core1">
    <property name="dataDir" value="/data/solr/cores/core1" />
  </core>
</cores>

Take note of the value, and make sure that the directory is the right path. In the XML above, the "dataDir" property specifies that the data directory for CKAN schema 1.4 is in the directory at path /data/solr/cores/core0, and the directory for schema 1.3 is in the directory at path /data/solr/cores/core1.

Configure the core data directory

Now configure the core to use the data directory you have created. Edit files /data/solr/cores/core0/conf/solrconfig.xml and /data/solr/cores/core1/conf/solrconfig.xml to change the <datadir></datadir> element to this variable:

&lt;dataDir&gt;$&#123;dataDir&#125;&lt;/datadir&gt;
Set Perms
chown &#45;R tomcat /data/solr/
Restart TomCat
/etc/init.d/tomcat6 restart

Edit development.ini

vi ~/pyenv/src/ckan/development.ini
Uncomment solr_url and set it to the correct URL to enable Solr support, e.g.,
solr_url &#61; http&#58;//ec2&#45;23&#45;20&#45;243&#45;151.compute&#45;1.amazonaws.com&#58;8080/solr/ckan&#45;schema&#45;1.4  

If you have disabled Solr by setting ckan.simple_search = 1, you should comment out that line:

&#35; ckan.simple_search &#61; 1

Start the engine

If you are adding Solr to an already-existing CKAN installation, you should now activate the python environment and restart the web server. Otherwise, if you are installing Solr as part of the setup of a new CKAN installation, proceed by creating a CKAN config file and complete the rest of the installation process.

cd ~
. pyenv/bin/activate
cd ../../src/ckan
paster serve development.ini

Additional information