Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Java tutorial #2320

Merged
merged 3 commits into from
Dec 27, 2022
Merged

Add Java tutorial #2320

merged 3 commits into from
Dec 27, 2022

Conversation

zhanglei1949
Copy link
Collaborator

@zhanglei1949 zhanglei1949 commented Dec 8, 2022

What do these changes do?

Related issue number

Add java tutorials for GRAPE-jdk.

Fix #2292

@zhanglei1949 zhanglei1949 marked this pull request as draft December 8, 2022 08:30
@zhanglei1949 zhanglei1949 marked this pull request as ready for review December 12, 2022 07:53
@zhanglei1949 zhanglei1949 changed the title [WIP] Add Java tutorial Add Java tutorial Dec 12, 2022
Follow the tutorials below, give it a shot on trying exploring GraphScope Java.

- [Querying graph via GraphScope JavaSDK](https://graphscope.io/docs/latest/java-tutorial-0-pie.html)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please use relative URLs, otherwise I cannot find these files in the preview or other deployments. (e.g., intranet in the future. )
  • Please use the filename as unified underline _ rather than -.

@yecol
Copy link
Collaborator

yecol commented Dec 12, 2022

You may want to check the preview on https://alibaba-graphscope-build-pr-2320.surge.sh/


## Try some example giraph apps

We provide some example giraph algorithms, i.e. SSSP, PageRank in [grape-demo.jar](https://graphscope.oss-cn-beijing.aliyuncs.com/jar/grape-demo-0.18.0-shaded.jar).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we provide this in the central repo?

Copy link
Collaborator Author

@zhanglei1949 zhanglei1949 Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, grape-demo.jar is not provided in maven central repo, since users shall not use it in their project.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

</dependency>
```

To address the dependency issue in jar packaging, you shall package all your
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to give details about your reminder about packaging, may be step by step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


After computation, you can obtain the results stored in context with the help of [`Context`](https://graphscope.io/docs/reference/context.html#context).

## GraphScope JavaSDK with Github Template
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition to this file, is it possible to provide a .pynb for this doc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


You can implement your algorithm towards Giraph' original API. Although almost all APIs are supported, there are indeed some limitation of Giraph-on-Graphscope.

- Currently graph modification API is not supported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide an example,
the java file comes from the examples from Giraph Repo.
build it and run on GS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean starting from examples in giraph repo, show users how to build and run them on GS?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

"""Or lauch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts')

graphscope_session.add_lib("path/to/grape-demo.jar")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder if this is a path on local? or on the cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path on local. Coordinator will distribute the jar to cluster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe speak it out


## Run example GraphX apps

Several GraphX algorithms are also contained in [grape-demo.jar](https://graphscope.oss-cn-beijing.aliyuncs.com/jar/grape-demo-0.18.0-shaded.jar).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder is this the Java way? maybe a central managed jar, import to their pom, would be better?

Copy link
Collaborator Author

@zhanglei1949 zhanglei1949 Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, the jar we provided contains some example algorithms, not like a SDK or some dependent jar.
If user include grape-demo.jar as dependency, then he still need to create a new project and then compile, to get a packed jar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am convinced.


First, you also need to prepare the dataset.
```bash
git clone -b master --single-branch --depth=1 https://github.com/7br/gstest.git ${GS_TEST_DIR}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the we already has a "demo" jar, i suggest we include the small test data in the demo to simplify the whole process.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But user still need to be aware of downloading data. What about let user just download one specific file we need? like

wget https://raw.githubusercontent.com/GraphScope/gstest/master/p2p-31.e

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better.


### Prepare Environment

The GraphX-GraphScope integration depends on serveral environment variables. We provide a shell script, user shall `source` this script before submit a graphx job.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please organize this code, do not depend on copy-paste for users.
e.g., using graphscope/cli
generate a .graphscope_4spark.env
and then ask user to export a single file.

Copy link
Collaborator

@yecol yecol Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prob.

### Submit to Spark

```bash
export EXTRA_CP=${GRAPHSCOPE_HOME}/lib/:${GRAPE_RUNTIME_JAR}:${GRAPHX_SDK_JAR}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is extra_cp??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will export this env var from cli script.

--class com.alibaba.graphscope.example.graphx.BFSTest ${GRAPE_DEMO_JAR} ${GS_TEST_DIR}/p2p-31.e 2 1
```

Remember to replace the placeholders like `${master_url}` ,`${num_executors}` with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too complicated for users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it simple


### Prepare Environment

The GraphX-GraphScope integration depends on serveral environment variables. We provide a shell script, user shall `source` this script before submit a graphx job.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check and unify all the occurrence of
graphx, Graphx, to GraphX

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked.

```bash
export EXTRA_CP=${GRAPHSCOPE_HOME}/lib/:${GRAPE_RUNTIME_JAR}:${GRAPHX_SDK_JAR}

./bin/spark-submit --verbose \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible we generate this command for copy/paste

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I will export this as env variable in cli.

@yecol
Copy link
Collaborator

yecol commented Dec 14, 2022

I also noticed that a file named grape_jvm_opt, are the users supposed to be aware of this file?

@zhanglei1949
Copy link
Collaborator Author

I also noticed that a file named grape_jvm_opt, are the users supposed to be aware of this file?

Through this PR, the jvm opts specified by grape_jvm_opt will be automatically loaded, user shouldn't be aware of this.

@zhanglei1949 zhanglei1949 force-pushed the java-tutorial branch 3 times, most recently from 273c9ee to 1a28ae0 Compare December 19, 2022 04:14
@zhanglei1949 zhanglei1949 force-pushed the java-tutorial branch 2 times, most recently from 2ad3255 to 8b34a40 Compare December 20, 2022 03:16
@zhanglei1949 zhanglei1949 force-pushed the java-tutorial branch 3 times, most recently from dafc148 to 1acb803 Compare December 27, 2022 03:22
update doc

s

project to simple

f

update doc

graphx naming

update tutorials

mod graphx doc
@github-actions
Copy link
Contributor

github-actions bot commented Dec 27, 2022

😭 Deploy PR Preview ee5185e failed. Build logs

🤖 By surge-preview

@zhanglei1949 zhanglei1949 deleted the java-tutorial branch September 27, 2023 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants