Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-1604] Add Neo4j interpreter and Network visualization #1582

Closed
wants to merge 115 commits into from

Conversation

@conker84
Copy link
Contributor

conker84 commented Nov 2, 2016

What is this PR for?

This contribution would to introduce Neo4j Cypher intepreter and the new network visualization;
at the same time would provide base APIs that enable other graph databases (or graph framworks such as GraphX or Giraph).

What type of PR is it?

[Feature]

Todos

  • - Create the network visualization
  • - Provide base APIs to manage graph results (under: org.apache.zeppelin.interpreter.graph.*)
  • - Create the Neo4j intepreter

What is the Jira issue?

[ZEPPELIN-1604]

How should this be tested?

Donwload and execute Neo4j v3.x, you can also pull a Docker image.

In order to execute test cases, if you are running Java 7, you need to also provide an environment variable telling the tests where to find Java 8, because Neo4j-the-database needs it to run.

export NEO4J_JAVA=<path/to/java/home>

Use this statement to create a dummy dataset

%neo4j
UNWIND range(1,100) as id
CREATE (p:Person {id:id, name: "Name " + id, age: id % 3}) WITH collect(p) as people
UNWIND people as p1
UNWIND range(1,10) as friend
WITH p1, people[(p1.id + friend) % size(people)] as p2
CREATE (p1)-[:KNOWS {years: abs(p2.id - p2.id)}]->(p2)

Then you can write some simple queries like:

%neo4j
MATCH (p:Person)-[r:KNOWS]-(p1:Person)
RETURN p, r, p1
LIMIT 10;
%neo4j
MATCH (p:Person)-[r:KNOWS]-(p1:Person)
RETURN p.id AS ID_A, p.name AS NAME_A, r.years AS YEARS, p1.id AS ID_B, p1.name AS NAME_B
LIMIT 20;

Screenshots

Simple cypher query network visualization

Simple cypher query tabular visualization

Video

localhost_9000___-google-chrome-01_11_2016-15_44_24

Questions:

  • Does the licenses files need update? Yes
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Yes. I have create the neo4j.md file in /docs/intepreter folder
conker84 added 10 commits Oct 26, 2016
Move the GraphResult to zeppelin-intepreter package in order to use as
base api for other graph databases.
Added neo4.md to the documentation.
Prevented the nodes overlap.

<dependency>
<groupId>org.neo4j.driver</groupId>
<artifactId>neo4j-java-driver</artifactId>

This comment has been minimized.

Copy link
@felixcheung

felixcheung Nov 3, 2016

Member

what are the license for these? could you add to the LICENSE file?

This comment has been minimized.

Copy link
@conker84

conker84 Nov 3, 2016

Author Contributor

@felixcheung this dependency has Apache 2 license.
There is another one

<dependency>
  <groupId>org.neo4j.test</groupId>
  <artifactId>neo4j-harness</artifactId>
  <version>${neo4j.version}</version>
  <scope>test</scope>
</dependency>

which is listed both AGPL 3.0 and GPL 3.0, which of two have i to include in LICENSE file?

Another question, also must i include the license for sigma.js right?
Thanks for the support.

This comment has been minimized.

Copy link
@Leemoonsoo

Leemoonsoo Nov 3, 2016

Member

While org.neo4j.test:neo4j-harness is 'test' scope and will not bundled into the binary package, we don't need include it in zeppelin-distribution/src/bin_license/LICENSE file.

Right, sigma.js is going to be bundled in binary package and we need to include it in zeppelin-distribution/src/bin_license/LICENSE

This comment has been minimized.

Copy link
@conker84

conker84 Nov 3, 2016

Author Contributor

Done!

@felixcheung
Copy link
Member

felixcheung commented Nov 3, 2016

Thank you for the great contribution. Could you check https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/development/writingzeppelininterpreter.html#contributing-a-new-interpreter-to-zeppelin-releases

Specifically,

  • add a link to your documentation in the navigation menu (docs/_includes/themes/zeppelin/_navigation.html).
  • licenses of the transitive closure of all dependencies are list in license
@@ -33,7 +33,8 @@
"handsontable": "~0.24.2",
"moment-duration-format": "^1.3.0",
"select2": "^4.0.3",
"github-markdown-css": "^2.4.0"
"github-markdown-css": "^2.4.0",
"sigma.js": "https://github.com/jacomyal/sigma.js.git#master"

This comment has been minimized.

Copy link
@Leemoonsoo

Leemoonsoo Nov 3, 2016

Member

Can we use a specific tag instead of master?

This comment has been minimized.

Copy link
@conker84

conker84 Nov 3, 2016

Author Contributor

Done!

@@ -46,7 +46,8 @@
TABLE,
IMG,
SVG,
NULL
NULL,
NETWORK

This comment has been minimized.

Copy link
@Leemoonsoo

Leemoonsoo Nov 4, 2016

Member

Adding a new interpreter result type means adding a new DisplaySystem

And every display system should accessible from interpreter's output. for example, Table DisplaySystem,

%spark
print(s"""%table
key\tvalue
sun\t100
moon\t20""")
%sh echo -e "%table key\tvalue\nsun\t100\nmoon\t20"

Any output of interpreter is serialized as a string into InterpreterResult.msg and if %table is detected, it is deserialized in a front-end side and rendered

Can we do the same with NETWORK?

Or another approach could be remove 'NETWORK' result type, describe node/edge information some how in TABLE format. And let loadNetworkData() parse and construct network.

What do you think?

This comment has been minimized.

Copy link
@conker84

conker84 Nov 4, 2016

Author Contributor

Hi,
i updated the code base in order to manage simple graphs like this:

%spark
print(s"""
%network {
    "nodes" : [
        {"id" : 1},
        {"id" : 2}
    ],
    "edges" : [{"source" : 2, "target" : 1, "id" : 1}]
}
""")

neo4-simple-graph

I prefer to manage the graphs via json format because the goal of this contribution is to manage Property Graphs, not just simple graphs. Is this a viable way?
Thanks for the feedback!

This comment has been minimized.

Copy link
@jexp
class="btn-group"
style="margin-bottom: 10px;">

<button type="button" class="btn btn-default btn-sm"
ng-if="paragraph.result.type == 'TABLE'"
ng-if="paragraph.result.type == 'TABLE' || paragraph.result.type == 'NETWORK'"

This comment has been minimized.

Copy link
@Leemoonsoo

Leemoonsoo Nov 4, 2016

Member

I think all other types of visualizations - table, bar, line, area, scatter shouldn't be selectable when result type is 'NETWORK' while they can not display it.

This comment has been minimized.

Copy link
@conker84

conker84 Nov 4, 2016

Author Contributor

The goal of this PR is to full advantage the property graph by the zeppelin platfom.

%spark
print(s"""
%network {
    "nodes" : [
        {"id" : 1, "label" : "User", "data" : { "fullname" : "Andrea Santurbano" }},
        {"id" : 2, "label" : "User", "data" : { "fullname" : "Moon soo Lee" }}
    ],
    "edges" : [{"source" : 2, "target" : 1, "id" : 1, "label" : "HELPS", "data" : { "project" : "Zeppelin", "githubUrl": "https://github.com//pull/1582" } }]
}
""")

neo4-property-graph
This type of graph can be displayed in other ways like tabular, donut, etc...
neo4j-donut-chart
So i think the other visualizations must stay selectable.
Thanks for the feedback!

This comment has been minimized.

Copy link
@felixcheung

felixcheung Nov 4, 2016

Member

Cool! could you add examples/info on this to DisplaySystem?

This comment has been minimized.

Copy link
@conker84

conker84 Nov 8, 2016

Author Contributor

Done!

conker84 added 2 commits Nov 4, 2016
Updated the documentation.
Updated documentation
Fixed perfomance issues
@Leemoonsoo
Copy link
Member

Leemoonsoo commented Nov 9, 2016

Front-end visualization code has been refactored from #1529. I think that's the biggest conflicts that this branch have right now.

Could you merge master branch to this branch and resolve conflict?
Basically you'll need to move network visualization into zeppelin-web/src/app/visualization/builtins like other built-in visualizations.
Let me know if you need any help on it. I'd like to help.

@conker84
Copy link
Contributor Author

conker84 commented Nov 11, 2016

@Leemoonsoo i'll study the code this we. I'll let you know if i need help! Thanks!!!

@felixcheung
Copy link
Member

felixcheung commented Feb 27, 2017

Let's go with (c) & (b) and then (a)?

@Leemoonsoo
Copy link
Member

Leemoonsoo commented Feb 28, 2017

Sorry for late response.
Sounds like a plan!

@conker84
Copy link
Contributor Author

conker84 commented Feb 28, 2017

Perfect just to clarify what i have to do:

  1. Close this PR
  2. Open a new PR (and the related jira issue) for (c) & (b) (i have to made a new fork?)

I'm right?

@Leemoonsoo
Copy link
Member

Leemoonsoo commented Feb 28, 2017

Right!

@jexp
Copy link

jexp commented Mar 4, 2017

@conker84 wow what a long story. Do you need any help from us (Neo4j) or @felixcheung @Leemoonsoo is there anything we can do?

@conker84
Copy link
Contributor Author

conker84 commented Mar 5, 2017

@jexp any help is appreciated!

@conker84 conker84 closed this Mar 5, 2017
@felixcheung
Copy link
Member

felixcheung commented Mar 5, 2017

@jexp it would be great if you could review the neo4j parts in this PR.
@conker84 would you be opening a new PR? please link to this one (type #1582) so I could help review and get this in soon

@conker84
Copy link
Contributor Author

conker84 commented Mar 7, 2017

@felixcheung @Leemoonsoo i'm ready to make a new PR but my travis fails in one job
Can you help me to understand why?

@conker84
Copy link
Contributor Author

conker84 commented Mar 9, 2017

Hi guys any news?

@1ambda
Copy link
Member

1ambda commented Mar 9, 2017

Zeppelin start �[60G[�[1;32m  OK  �[0;39m]
< HTTP/1.1 200 OK
�[31mException encountered when invoking run on a nested suite - The code passed to eventually never returned normally. Attempted 832 times over 3.002551618033333 minutes. Last failure message: AbstractFunctionalSuite.this.find("welcome")(AbstractFunctionalSuite.this.webDriver).isDefined was false. *** ABORTED ***�[0m
�[31m  Exception encountered when invoking run on a nested suite - The code passed to eventually never returned normally. Attempted 832 times over 3.002551618033333 minutes. Last failure message: AbstractFunctionalSuite.this.find("welcome")(AbstractFunctionalSuite.this.webDriver).isDefined was false. (AbstractFunctionalSuite.scala:44)�[0m
�[36mRun completed in 3 minutes, 18 seconds.�[0m
�[36mTotal number of tests run: 0�[0m
�[36mSuites: completed 1, aborted 1�[0m
�[36mTests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0�[0m
�[31m*** 1 SUITE ABORTED ***�[0m
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project zeppelin-server: There are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :zeppelin-server

I think this profile failed due to one of flaky tests. So, just restarting the profile would work.

@conker84
Copy link
Contributor Author

conker84 commented Mar 10, 2017

I can't understand if it is related to something made by me becouse i haven't touched the zeppelin-server

@1ambda
Copy link
Member

1ambda commented Mar 11, 2017

Then, clicking restart button in the failed job would work. It'a flaky test.

<name>Zeppelin: Neo4j interpreter</name>

<properties>
<neo4j.driver.version>1.0.4</neo4j.driver.version>

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

perhaps update to 1.2.0


<properties>
<neo4j.driver.version>1.0.4</neo4j.driver.version>
<test.neo4j.kernel.version>3.0.4</test.neo4j.kernel.version>

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

update to 3.1.2

}
List<String> line = new ArrayList<>();
for (String col : cols) {
Object value = record.get(col);

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

Use Value instead.


private void setResultValue(Object value, Set<Node> nodes, Set<Relationship> relationships,
List<String> line) {
if (value instanceof NodeValue

This comment has been minimized.

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

Don't rely on internal types or instanceof checks.

(PathValue) value : (PathValue) ((InternalPath) value).asValue();
nodes.addAll(Iterables.asList(pathVal.asPath().nodes()));
relationships.addAll(Iterables.asList(pathVal.asPath().relationships()));
} else if (value instanceof ListValue) {

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

Also handle Map and map values.

setResultValue(val, nodes, relationships, line);
}
} else {
line.add(String.valueOf(value));

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

value.asString()

}
}

private StatementResult execute(Session session, String cypherQuery,

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

From Neo4j 3.1 also $param is valid parameter syntax.


@Override
public void cancel(InterpreterContext context) {
}

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

you could abort a running transaction, e.g. with session.reset()

Map<String, String> graphLabels) {
Set<String> labels = new LinkedHashSet<>();
String firstLabel = null;
for (String label : n.labels()) {

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

Should they probably be sorted to assure consistency ?

public static final String NEO4J_SERVER_URL = "neo4j.url";
public static final String NEO4J_SERVER_USER = "neo4j.user";
public static final String NEO4J_SERVER_PASSWORD = "neo4j.password";
public static final String NEO4J_MAX_CONCURRENCY = "neo4j.max.concurrency";

This comment has been minimized.

Copy link
@jexp

jexp Mar 12, 2017

should this also be used for neo4j-driver session pool size?

@jexp
Copy link

jexp commented Mar 12, 2017

@conker84 @felixcheung I looked at the java-driver Neo4j bits and made some comments.

In general it looks great and I really like the ideas.

If you need anything else from me to make this PR happen, please let me know.

@conker84
Copy link
Contributor Author

conker84 commented Mar 12, 2017

@jexp thanks for the hints i will integrate them and when i'll be ready i'll ask you another review!

@conker84 conker84 mentioned this pull request Mar 12, 2017
2 of 2 tasks complete
@conker84
Copy link
Contributor Author

conker84 commented Mar 12, 2017

asfgit pushed a commit that referenced this pull request Jun 9, 2017
### What is this PR for?
This issue is about a new network visualization that can leverage the Property Graph Model (https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph), but also simple graphs in order to provide a set of base apis that can be used by graph dbs (like Neo4j) or graph processing frameworks (like GraphX or Giraph).

### What type of PR is it?
[Feature]
Is related to the #1582

### Todos
* [x] - Added the intepreter apis to manage graphs (under the pakage **org.apache.zeppelin.interpreter.graph**)
* [x] - Added the frontend apis to manage graphs (via d3js)

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-2222

### How should this be tested?
You can download [this notebook](https://gist.github.com/conker84/9574127c2389d08164423894aa93b67f) to test the PR

### Screenshots (if appropriate)
![zeppelin-pr-screen](https://cloud.githubusercontent.com/assets/1833335/23830683/b883e916-0710-11e7-980d-c8ab6bf6d26b.PNG)

### Video
![zeppelin](https://cloud.githubusercontent.com/assets/1833335/23902630/1e121a0c-08c2-11e7-9f28-134866dba077.gif)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: conker84 <santand@gmail.com>

Closes #2125 from conker84/master and squashes the following commits:

b6062a0 [conker84] Removed package org.apache.zeppelin.interpreter.graph
e98ca7a [conker84] Comments of review 14/03/2017
b31b7b7 [conker84] Rebase of 07/04/2017
3257bea [conker84] Rebase 30/4
6e74eb9 [conker84] Rebase 30/04
@conker84
Copy link
Contributor Author

conker84 commented Jul 10, 2017

Hi guys,
I added the #2478 for the last part of the original PR.

@conker84 conker84 mentioned this pull request Aug 9, 2017
2 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.