Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting To Snappy Cluster Through Java Spark API Fails #264

Closed
shahamit opened this issue May 24, 2016 · 6 comments
Closed

Connecting To Snappy Cluster Through Java Spark API Fails #264

shahamit opened this issue May 24, 2016 · 6 comments

Comments

@shahamit
Copy link

I am trying out a simple hello world program to connect to the snappy cluster using the java spark api. It fails with an error Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class com.gemstone.gemfire.internal.shared.NativeCallsJNAImpl$WinNativeCalls$Kernel32. The application works when I run it by starting a standalone spark cluster i.e. setting master as local.

Below is the code and the exception trace

`
public class SnappyDataLoader implements Serializable{

private final String INPUT_FILE_PATH = "";
private Logger logger = LoggerFactory.getLogger(SnappyDataLoader.class);

public static void main(String[] args) {
    new SnappyDataLoader().executeDataLoad();
}

private  void executeDataLoad() {
    System.setProperty("hadoop.home.dir", "E:\\HadoopWinUtils\\");

    JavaSparkContext sc = getSparkContext();

    JavaRDD<String> inputFile = sc.textFile(INPUT_FILE_PATH);
    String header = inputFile.first();
    logger.info("{}", header);
    JavaRDD<TransactionRecord> recordRDD = inputFile.filter(line -> !(line.equals(header))).map(
            line -> {
                String[] fields = line.split(",");
                logger.info("{}", fields);
                return new TransactionRecord(Long.parseLong(fields[0]), fields[1]);
            });

    logger.info("{}", recordRDD.first());
}

private JavaSparkContext getSparkContext() {
    SparkConf sparkConf = new SparkConf();
    String snappyJarLocation = "E:\\snappydata-0.3.0-PREVIEW-bin\\lib\\";
    sparkConf.setAppName("mySnappyApp")
            .setMaster("snappydata://<leader-server>:10334")
            .set("jobserver.enabled", "true")
            .set("snappydata.store.locators", "<leader-server>:10334")
            .set("spark.ui.port", "4040")
            .set("spark.driver.extraClassPath", snappyJarLocation)
            .set("spark.executor.extraClassPath", snappyJarLocation);
    return new JavaSparkContext(sparkConf);
}

}

`

The exception trace is

Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class com.gemstone.gemfire.internal.shared.NativeCallsJNAImpl$WinNativeCalls$Kernel32 at com.gemstone.gemfire.internal.shared.NativeCallsJNAImpl$WinNativeCalls.getEnvironment(NativeCallsJNAImpl.java:1233) at com.gemstone.gemfire.internal.shared.ClientSharedUtils.initLog4J(ClientSharedUtils.java:1111) at com.gemstone.gemfire.internal.GFToSlf4jBridge.getLogger(GFToSlf4jBridge.java:104) at com.gemstone.gemfire.internal.GFToSlf4jBridge.put(GFToSlf4jBridge.java:61) at com.gemstone.gemfire.internal.LogWriterImpl.info(LogWriterImpl.java:717) at com.gemstone.gemfire.internal.LogWriterImpl.info(LogWriterImpl.java:725) at com.pivotal.gemfirexd.internal.impl.services.stream.GfxdHeaderPrintWriterImpl.write(GfxdHeaderPrintWriterImpl.java:122) at java.io.PrintWriter.write(PrintWriter.java:473) at java.io.PrintWriter.print(PrintWriter.java:603) at com.pivotal.gemfirexd.internal.iapi.services.context.ContextManager.flushErrorString(ContextManager.java:719) at com.pivotal.gemfirexd.internal.iapi.services.context.ContextManager.cleanupOnError(ContextManager.java:544) at com.pivotal.gemfirexd.internal.impl.jdbc.TransactionResourceImpl.cleanupOnError(TransactionResourceImpl.java:916) at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection.<init>(EmbedConnection.java:700) at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection30.<init>(EmbedConnection30.java:94) at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection40.<init>(EmbedConnection40.java:75) at com.pivotal.gemfirexd.internal.jdbc.Driver40.getNewEmbedConnection(Driver40.java:95) at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:351) at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:219) at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:195) at com.pivotal.gemfirexd.internal.jdbc.AutoloadedDriver.connect(AutoloadedDriver.java:141) at com.pivotal.gemfirexd.internal.engine.fabricservice.FabricServiceImpl.startImpl(FabricServiceImpl.java:294) at com.pivotal.gemfirexd.internal.engine.fabricservice.FabricServerImpl.start(FabricServerImpl.java:60) at io.snappydata.impl.LeadImpl.internalStart(LeadImpl.scala:159) at io.snappydata.impl.LeadImpl$.invokeLeadStart(LeadImpl.scala:356) at org.apache.spark.scheduler.cluster.SnappyEmbeddedModeClusterManager.initialize(SnappyEmbeddedModeClusterManager.scala:87) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2761) at org.apache.spark.SparkContext.<init>(SparkContext.scala:540) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)

The complete logs are shared here.

These are similar to the steps detailed on the documentation. What could I be missing?

@hbhanawat
Copy link
Contributor

You are setting lead as the master, which is not correct. Lead is a driver node and already has a SparkContext. You can use that SparkContext by submitting jobs to the lead node. I would recommend that you go through the deployment topologies of Snappy http://snappydatainc.github.io/snappydata/deployment/. If you trying to create a "Application managed Spark driver and context", you will have to set master as the locator url and your lead node should not be running. If you are trying to create a "Split cluster mode" , your master has to be Spark standalone cluster master.

@shahamit
Copy link
Author

Ok, so in a unified snappy configuration, the spark context is created and managed by the leader node. If I want to execute my spark applications on this configuration, I should executed them as jobs by implementing the SnappySQLJob interface as explained here. Let me know if my understanding is incorrect

As per your above suggestion, I stopped the leader node and I now have just 1 locator node and 2 datastore nodes. My sample application still fails with the same error. You can have a look at the detailed console logs here

@hbhanawat
Copy link
Contributor

I should executed them as jobs by implementing the SnappySQLJob interface as explained here. Let me know if my understanding is incorrect

No, your understanding is correct.

My sample application still fails with the same error.

Are you running it on Windows? We have not tested SnappyData yet on Windows: http://snappydatainc.github.io/snappydata/#download-binary-distribution. Looks like, an Exception is thrown while logging the actual error. Would it be possible for you to try it on linux box?

@shahamit
Copy link
Author

My snappy cluster is running on linux servers. I am executing my spark application from an IDE which is running windows box. Would that cause an issue?

@hbhanawat
Copy link
Contributor

Would that cause an issue?

I don't know as we have not tested it yet on Windows. I would recommend running it on Linux.

@sumwale
Copy link
Contributor

sumwale commented Sep 19, 2017

@shahamit Closing this issue since we are not supporting Windows yet.

@sumwale sumwale closed this as completed Sep 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants