Skip to content
/ main Public

The main - so far, only - repository for the SmileWide project.

License

Notifications You must be signed in to change notification settings

SmileWide/main

Repository files navigation

Welcome to SMILE-WIDE

SMILE-WIDE is a Bayesian network library. Initially, SMILE-WIDE is a version of the well known SMILE library, augmented With Integrated Distributed Execution. This allows execution on very large datasets. As SMILE-WIDE is developed, BigData-specific capabilities will surpass the standard Bayesian network interfaces.

Programmer-facing, SMILE-WIDE is a .jar library which you can include in your software. User-facing, it is also integrated into Hive as a UDF to provide posterior probabilities of missing values, given the observed values for each instance.

SMILE-WIDE is written in Java, using the underlying SMILE library, which is written in C++. It uses Hadoop for inference on large data.

Contents

Please contact the authors with any questions or problems:

How to build the software

SMILE-WIDE is an Eclipse project configured to use Maven. All external dependencies are pulled from the appropriate Maven repositories. The code can be built from the IDE or directly from the command line. The basic build can be started with the following command:

mvn clean package</code>

This creates two jars in the target directory and copies the appropriate native library to the target/lib directory.

The binary files are:

  • smile-wide-0.0.1-SNAPSHOT.jar
    • Contains the SMILE-WIDE code
  • smile-wide-0.0.1-SNAPSHOT-job.jar
    • Contains the SMILE-WIDE code and the core SMILE jar in its lib subdirectory. This makes running SMILE-WIDE-based Hadoop jobs easier, because Hadoop will automatically add SMILE jar to the classpath on the machines running in the cluster.
  • libjsmile.so, libjsmile.jnilib or jsmile.dll
    • JNI library containing the C++ SMILE code

It's possible to build for a platform different from the one running the Maven by overriding the smile.native.platform variable. For example, when building for Hadoop on 64-bit Linux cluster with Maven or Eclipse running on OSX, the command should be extended to:

mvn clean package -Dsmile.native.platform=linux64 -Dmaven.test.skip=true

How to run an example SMILE-WIDE Hadoop job

The example below executes a Hadoop job loop which learns the parameters of probability distributions for the kiva.xdsl network. Note that the jar file contains the SMILE jar in its lib directory. However, the native library must be explicitly added to the job's distributed cache with the Hadoop's -files option. Additionally, since the specifics of EM require the access to SMILE functionality locally, the .so file should be copied to the $HADOOP_BIN/native directory.

hadoop jar smile-wide-0.0.1-SNAPSHOT-job.jar smile.wide.algorithms.em.RunDistributedEM \
  -files em-tmp.xdsl,libjsmile.so \
  -D mapred.max.split.size=250000 -D mapred.reduce.tasks=12 \
  em.initial.netfile=kiva.xdsl em.work.netfile=em-tmp.xdsl \
  em.data.file=pitt/kiva500k.txt em.stat.file=pitt/em-out \
  em.separator=9 em.local.stat.file=em-local.txt

The file kiva.xdsl is located in the project's input directory; pitt/kiva500k.txt is in the compute cluster's HDFS.

The output of the job is the local file named em-tmp.xdsl, containing the modified kiva.xdsl network with learned parameters.

How to test Hive integration

To test the Hive UDFs, execute the normal maven package build followed by runscripts/hivePosteriors.sh. This creates the target/hive-test directory, containing all the files required for UDF test. The command to run the test is:

hive -f hivePosteriors.q

Hive will import small data file and perform four queries, each calling into SMILE-WIDE UDFs.

Problems and Solutions

java.lang.UnsatisfiedLinkError: no jsmile in java.library.path

This exception is caused by missing native library. The platform-specific library is placed in target/lib during the Maven build, but Hadoop and Hive must be made aware of its existence. This is done with the Hadoop's -files option or Hive's 'ADD FILE'. Some of SMILE-WIDE algorithms contain nontrivial local component running within the Hadoop's client JVM. In such case the shared library should be added to $HADOOP_BIN/native directory

How to generate Javadoc API documentation

The SMILE-WIDE API Javadoc documentation can be generated from the command line. With 'javadoc' on the path, issue the following command:

javadoc @options.javadoc.text

This will generate HTML documentation in the 'javadocs' directory.

About

The main - so far, only - repository for the SmileWide project.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published