Skip to content

Experiments using Eclipse Luna

Jack Park edited this page Feb 2, 2015 · 6 revisions

I will accumulate notes on building YodaQA in Eclipse (on a Windows box).

  • Start by adding Gradle to Eclipse using a update site linked here: https://github.com/spring-projects/eclipse-integration-gradle/ : I chose latest release.

  • Create a New java project, and set its root directory to the location where YodaQA is pulled by git.

  • Right click on the project, navigate to Configure, and change to a Gradle project. Building will take a while; the system creates a /build directory and downloads all the dependencies.

That is where the work is to date. I plan to edit this as I proceed.

Known to do:

  • The file cz.brmlab.yodaqa.pipeline.YodaQA has a line of code (49) with a hard-coded reference to a specific Solr server: that line really needs, for example, to reference a Solr at localhost or some local server. In the long run, it will be useful to create some sort of properties file for YodaQA where those hard-coded values can be configured.

  • It remains to be discovered how to use gradlew.bat inside Eclipse for the various runtime chores.

20150202 Update -- It's Running!
I got it running now by doing the following:

  • Update the gist https://gist.github.com/KnowledgeGarden/90cecd04d0de14809253 to properly wrap imported Wikipedia documents, followed by reimporting enwiki-text.xml
  • Start over with a fresh download of the recently-updated YodaQA repo
  • Modify just one file YodaQA to reflect "localhost" for Solr
  • Import enwiki-text.xml into Solr: Start with Solr 4.6.0. Install the repo's "/enwiki" directory into <solrinstalldirectory/examples/enwiki . Add enwiki-text.xml to that directory. I created a batch file which uses the command line suggested by the YodaQA readme, but giving Solr a 4gb heap, then boot with the browser URL given in the readme. That took just over an hour.
  • gradlew.bat build to build the system
  • gradlew.bat run -q to run it and start asking questions

20150130 Update
Very interesting issues.

  • At the moment, bringing up Solr 4.5.1 has been problematic, so I finally downloaded 3.6.0 and am using the repo's /enwiki/ directory in its fullness.
  • There's a kind of ambiguity going on. The repo's /enwiki/ directory includes solr.xml as does /example/solr/; couple that with the belief that you the directive -D.solr.solr.home=enwiki should indicate that /enwiki goes in /example/solr/enwiki, and not elsewhere, there are some problems. First, removing solr.xml from /solr doesn't change anything, but the system will not boot. Cannot make a core.
  • Remove solr.xml from /solr/enwiki/ and you now have problems finding solrconfig.xml: SEVERE: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'enwiki.\conf/' It's looking in enwiki/conf rather than enwiki/collection1/conf
  • Suppose we take the idea that solr.xml in /enwiki means that's a root itself. Just as /solr/solr.xml exists, we now have /enwiki/solr.xml; so, why not move /enwiki outside /solr and try again. Doesn't work. So, for experiment's sake, we try -Dsolr.enwiki.home=enwiki, and that boots, but it will not perform the dataimport, even with the giant enwiki-text.xml file where it belongs. Get a "not found" error.

There remains much work to do...

20150128 Update
I was able to use Cygwin to create a directory enwiki-text and fill it with bz2 files from a Wikipedia dump. But, I was unable to coax the script extracted2xml.sh to work, so I wrote a tiny Java platform that did the trick. Its listing is here: https://gist.github.com/KnowledgeGarden/90cecd04d0de14809253

Next up: see if Solr will read the monster file it wrote.

20150127 Update
gradlew.bat --gui boots a nice user interface.

What I am learning is that it's possible to use Eclipse, configured as a gradle project, to edit files and so forth, but not to build and exercise YodaQA. Instead, leave that to gradlew.bat. I am learning that, to rebuild the project, the command "clean" may not be enough; I ended up searching for places in the entire Eclipse build where source code created by gradle might exist and remove it. Following that, the system seems to build and perform as suggested in the main ReadMe.

At this time, I am building the English Wikipedia files per instructions. Since this is a Windows project, thus far, Cygwin is serving well.

Also, I made the shift from Solr 3.6 to Solr 4.5.1, which required changes to build.gradle, and which required two additional changes:

  • Recent Solr platforms now use the restlett platform which is not available in maven repos except their own. To work around that, I added mavenLocal() to the build.gradle repositories.
  • The file cz.brmlab.yodaqa.provider.solr.Solr.java, the method createEmbeddedSolrServer makes use of CoreContainer which has been changed; there is no Initializer class. The solution has been (though only partially tested -- it compiles and runs, but without data, there is no way to know it works) to use the constructor: new CoreContainer(). It is not at all clear this is a solution, since points to the Solr installation, which cannot be correct for remote installations. I expect to continue exploring this issue.

One additional change:

  • The class cz.brmlab.yodaqa.pipeline.YodaQA has a hard-wired URL for Solr. That must be changed to suit particular installations. For now, I changed it to

It will be useful to create some kind of runtime properties file which includes the URL for Solr. This might turn out to be non-trivial if SolrCloud is used.

20150126 Update
gradlew.bat build calls for the JDK javac code, whereas Eclipse uses its own internal compiler, which gradle knows nothing about. I am finding that one cannot tell Eclipse to use the JDK since it will then rely on the JRE found in the JDK install directory. It appears that gradlew.bat must be run from a console outside Eclipse.