Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-4526 added SharedRDD example #1403

Open
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@manishatGit
Copy link
Contributor

commented Jan 8, 2017

Added SharedRDD basic example.

@@ -44,6 +44,24 @@
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

All spark 2.11 dependencies have to be placed under "scala" profile below in the xml while spark 2.10 dependencies under scala-2.10 profile.

</dependency>

<dependency>
<groupId>org.apache.spark</groupId>

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

Same as above.


<dependency>
<groupId>org.apache.ignite</groupId>
<artifactId>ignite-spark</artifactId>

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

This module belongs to "scala" profile while for "scala-2.10" we need to add ignite-spark-2.10 instead.

val igniteContext = new IgniteContext(sparkContext, () => new IgniteConfiguration(),false)

/** Creates an Ignite RDD of Type (Int,Int) Integer Pair */
val sharedRDD: IgniteRDD[Int, Int] = igniteContext.fromCache("partitioned")

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

Let's rename "partitioned" to "sharedRdd".

* shares it with multiple spark workers. The goal of this particular
* example is to provide the simplest code example of this logic.
* <p/>
* Remote nodes should always be started with special configuration file which

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

I don't think that we need to mention remote nodes here since the example starts embedded Ignite nodes for every spark worker. So, I would remove the remote nodes related paragraphs from the doc and elaborate more on Ignite "standalone" flag that is set to "false" for the purpose of this example.

* start node with `examples/config/example-ignite.xml` configuration.
*/
object ScalarSharedRDDExample extends App {

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

No space here.

/** Spark context */
implicit val sparkContext = new SparkContext(conf)


This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

Remove on extra line.



/** Creates Ignite context with default configuration */
val igniteContext = new IgniteContext(sparkContext, () => new IgniteConfiguration(),false)

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

Let's define a configuration for the cache (shared RDD explicitly) showing that this is feasible. We can modify some cache parameter like cache name or atomicity mode and setup indexes for SQL queries.

As for the indexes, take a look here
https://apacheignite.readme.io/docs/indexes#queryentity-based-configuration

In your scenario both "keyType" and "valueType" will refer to Integer.


transformedValues.take(5).foreach(println)

igniteContext.close(true)

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 10, 2017

Contributor

I would also demonstrate:

  • some Spark native transformations
  • Ignite SQL queries
This file demonstrates how to configure cache using Spring. Provided cache
will be created on node startup.
Use this configuration file when running HTTP REST examples (see 'examples/rest' folder).

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 18, 2017

Contributor

This line looks wrong since you're mentioning HTTP REST example.

Use this configuration file when running HTTP REST examples (see 'examples/rest' folder).
When starting a standalone node, you need to execute the following command:
{IGNITE_HOME}/bin/ignite.{bat|sh} examples/config/example-cache.xml

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 18, 2017

Contributor

You're passing a wrong configuration name here. This needs to be example-shared-rdd.xml

{IGNITE_HOME}/bin/ignite.{bat|sh} examples/config/example-cache.xml
When starting Ignite from Java IDE, pass path to this file to Ignition:
Ignition.start("examples/config/example-cache.xml");

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 18, 2017

Contributor

The same as above.

</dependency>

<dependency>
<groupId>org.jboss.netty</groupId>

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 18, 2017

Contributor

Why do we need this?

This comment has been minimized.

Copy link
@manishatGit

manishatGit Jan 19, 2017

Author Contributor

Apache spark uses netty for remote communication. Without this Spark was having the following problem (see attached StacTrace):
A latest version of jboss netty resolved the problem.
netty_stack_trace.txt

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 19, 2017

Contributor

Understood, thanks for the explanation.

@@ -138,6 +138,18 @@
</exclusion>
</exclusions>
</dependency>

<dependency>

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 18, 2017

Contributor

Please add the following dependency

<dependency> <groupId>org.apache.ignite</groupId> <artifactId>ignite-spark-2.10</artifactId> <version>${project.version}</version> </dependency>

to scala-2.10 profile below.

* example is to provide the simplest code example of this logic.
* <p/>
* This example will start Ignite in embedded mode and will start an
* IgniteContext on each Spark worker node. It can also be also

This comment has been minimized.

Copy link
@dmagda

dmagda Jan 18, 2017

Contributor

Redundant "also" word in this line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.