New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM Package New RoadMap #935

Closed
tqchen opened this Issue Mar 8, 2016 · 32 comments

Comments

Projects
None yet
@tqchen
Member

tqchen commented Mar 8, 2016

Now that #884 is finished, we will be proposing a new roadmap, where more contributors can get involved. Please reply to this issue to add things if you have more thoughts.

  • More robust maven packaging, use maven packaging scripts to copy resources, and create native lib, remove create_jni script
    • Maybe @javelinjs have some experience on this in mxnet-scala project
  • Integration with spark dataframe and pipeline api
  • Integration with FlinkML API
  • Support of google's dataflow as a runtime
  • Tutorial on using the spark/flink API from scratch.
  • Instruction on best performance settings of executors, and cores
  • Documentations on API
  • Developer documentations and a blog on how it works
  • across application integration (via something like Tachyon)
  • external memory support
  • upload to maven

@tqchen tqchen referenced this issue Mar 8, 2016

Closed

RoadMap #873

1 of 4 tasks complete
@yzhliu

This comment has been minimized.

Show comment
Hide comment
@yzhliu

yzhliu Mar 8, 2016

Member

Awesome. Seems that mxnet can learn from the design and implementation here.

Member

yzhliu commented Mar 8, 2016

Awesome. Seems that mxnet can learn from the design and implementation here.

@tqchen

This comment has been minimized.

Show comment
Hide comment
@tqchen

tqchen Mar 8, 2016

Member

@javelinjs Currently we solve the problem of embedding rabit allreduce jobs. We need to think a bit about how to handle server jobs, which involves start container that do not take data. Maybe @CodingCat will also have some thoughts on this

Member

tqchen commented Mar 8, 2016

@javelinjs Currently we solve the problem of embedding rabit allreduce jobs. We need to think a bit about how to handle server jobs, which involves start container that do not take data. Maybe @CodingCat will also have some thoughts on this

@yzhliu

This comment has been minimized.

Show comment
Hide comment
@yzhliu

yzhliu Mar 8, 2016

Member

About the maven package thing, you could refer to https://github.com/dmlc/mxnet/tree/master/scala-package. I suggest to make native lib an independent module under jvm project. And here's a great reference for how to setup the jni compiling procedure in maven: http://www.tricoder.net/blog/?p=197

I put native lib into assembly jar and load it using https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/util/NativeLibraryLoader.scala
It was mainly inspired by https://github.com/mikiobraun/jblas/blob/master/src/main/java/org/jblas/util/LibraryLoader.java , though I'm not sure whether there may exist any license problem.

Member

yzhliu commented Mar 8, 2016

About the maven package thing, you could refer to https://github.com/dmlc/mxnet/tree/master/scala-package. I suggest to make native lib an independent module under jvm project. And here's a great reference for how to setup the jni compiling procedure in maven: http://www.tricoder.net/blog/?p=197

I put native lib into assembly jar and load it using https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/util/NativeLibraryLoader.scala
It was mainly inspired by https://github.com/mikiobraun/jblas/blob/master/src/main/java/org/jblas/util/LibraryLoader.java , though I'm not sure whether there may exist any license problem.

@rotationsymmetry

This comment has been minimized.

Show comment
Hide comment
@rotationsymmetry

rotationsymmetry Mar 8, 2016

I can work on the Integration with spark dataframe and pipeline api

rotationsymmetry commented Mar 8, 2016

I can work on the Integration with spark dataframe and pipeline api

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat
Member

CodingCat commented Mar 8, 2016

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Mar 15, 2016

Member

OK, with the post here http://dmlc.ml/2016/03/14/xgboost4j-portable-distributed-xgboost-in-spark-flink-and-dataflow.html we have finished the first stage of the roadmap,

I will start working on dataframe and pipeline integration tmr

Member

CodingCat commented Mar 15, 2016

OK, with the post here http://dmlc.ml/2016/03/14/xgboost4j-portable-distributed-xgboost-in-spark-flink-and-dataflow.html we have finished the first stage of the roadmap,

I will start working on dataframe and pipeline integration tmr

@yzhliu

This comment has been minimized.

Show comment
Hide comment
@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Mar 15, 2016

Member

hmmm....I glanced it this afternoon...I'm sending the email about our xgboost4j to spark user list tmr morning

Member

CodingCat commented Mar 15, 2016

hmmm....I glanced it this afternoon...I'm sending the email about our xgboost4j to spark user list tmr morning

@futurely

This comment has been minimized.

Show comment
Hide comment
@futurely

futurely Mar 16, 2016

"Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends like Apache Spark, Apache Flink, and Google Cloud Dataflow."

futurely commented Mar 16, 2016

"Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends like Apache Spark, Apache Flink, and Google Cloud Dataflow."

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Mar 16, 2016

Member

@futurely thanks for information, yeah Beam is of courses on our radar when we considering the future version of XGBoost4J

Member

CodingCat commented Mar 16, 2016

@futurely thanks for information, yeah Beam is of courses on our radar when we considering the future version of XGBoost4J

@maropu

This comment has been minimized.

Show comment
Hide comment
@maropu

maropu Apr 27, 2016

Any plan to upload xgboost4j into the maven repository? We are planning to integrate xgboost in our product, hivemall, though, this becomes a barrier for the integration. Thanks in advance!

maropu commented Apr 27, 2016

Any plan to upload xgboost4j into the maven repository? We are planning to integrate xgboost in our product, hivemall, though, this becomes a barrier for the integration. Thanks in advance!

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Apr 27, 2016

Member

@maropu uploading to maven brings convenience to the user, but it will not be on a recent plan considering the other stuffs on the table and the limited number of people contributing here...

Member

CodingCat commented Apr 27, 2016

@maropu uploading to maven brings convenience to the user, but it will not be on a recent plan considering the other stuffs on the table and the limited number of people contributing here...

@maropu

This comment has been minimized.

Show comment
Hide comment
@maropu

maropu Apr 28, 2016

@CodingCat Ah, I see. Any opportunity to collaborate on the task? We do need this to support xgboost in our product.

maropu commented Apr 28, 2016

@CodingCat Ah, I see. Any opportunity to collaborate on the task? We do need this to support xgboost in our product.

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Apr 28, 2016

Member

sure, you are welcomed to contribute, you can just create PR in this repo

Member

CodingCat commented Apr 28, 2016

sure, you are welcomed to contribute, you can just create PR in this repo

@maropu

This comment has been minimized.

Show comment
Hide comment
@maropu

maropu Apr 28, 2016

okay, I'll look over related codes. many thanks!

maropu commented Apr 28, 2016

okay, I'll look over related codes. many thanks!

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Apr 28, 2016

Member

I think the major issue is how to include the native so files targeting to different platforms in the same jar and make Java code pinpoint the right one exactly.....one of the good references is https://github.com/facebook/rocksdb, which I once wanted to look into but quickly distracted by other stuffs

Member

CodingCat commented Apr 28, 2016

I think the major issue is how to include the native so files targeting to different platforms in the same jar and make Java code pinpoint the right one exactly.....one of the good references is https://github.com/facebook/rocksdb, which I once wanted to look into but quickly distracted by other stuffs

@maropu

This comment has been minimized.

Show comment
Hide comment
@maropu

maropu Apr 28, 2016

One of awesome examples is snappy-java which is widely used in many distributed system products like spark and hadoop. It checks a platform type when loading, and then loads the corresponding shared binary included in the package. See here.

maropu commented Apr 28, 2016

One of awesome examples is snappy-java which is widely used in many distributed system products like spark and hadoop. It checks a platform type when loading, and then loads the corresponding shared binary included in the package. See here.

@nicornk

This comment has been minimized.

Show comment
Hide comment
@nicornk

nicornk Jul 11, 2016

@CodingCat Any progress in task 'dataframe and pipeline integration'?

nicornk commented Jul 11, 2016

@CodingCat Any progress in task 'dataframe and pipeline integration'?

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Jul 11, 2016

Member

@nicornk , sorry, no, I am busy in other stuffs, and others are welcome to contribute on this

Member

CodingCat commented Jul 11, 2016

@nicornk , sorry, no, I am busy in other stuffs, and others are welcome to contribute on this

@tanwanirahul

This comment has been minimized.

Show comment
Hide comment
@tanwanirahul

tanwanirahul Jul 16, 2016

Contributor

@CodingCat @tqchen @rotationsymmetry Is work on integration with spark dataframes and pipeline API in progress? If not, I would like to start working on this. If it has already started, can you please share the initial design draft?

Contributor

tanwanirahul commented Jul 16, 2016

@CodingCat @tqchen @rotationsymmetry Is work on integration with spark dataframes and pipeline API in progress? If not, I would like to start working on this. If it has already started, can you please share the initial design draft?

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Jul 16, 2016

Member

No, feel free to start

Member

CodingCat commented Jul 16, 2016

No, feel free to start

@dirceusemighini

This comment has been minimized.

Show comment
Hide comment
@dirceusemighini

dirceusemighini Sep 8, 2016

Hi @tanwanirahul, are you doing the integration with SparkPipeline?
What kind of integration with Spark Pipeline is planned to be done?

dirceusemighini commented Sep 8, 2016

Hi @tanwanirahul, are you doing the integration with SparkPipeline?
What kind of integration with Spark Pipeline is planned to be done?

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Sep 8, 2016

Member

I will post the initial version within the week

Member

CodingCat commented Sep 8, 2016

I will post the initial version within the week

@tanwanirahul

This comment has been minimized.

Show comment
Hide comment
@tanwanirahul

tanwanirahul Sep 9, 2016

Contributor

@dirceusemighini @CodingCat I had started working on this before.. But couldn't push to the finish line due to other priorities. Give me like Tuesday's time and I should be able to push an initial working version.

Contributor

tanwanirahul commented Sep 9, 2016

@dirceusemighini @CodingCat I had started working on this before.. But couldn't push to the finish line due to other priorities. Give me like Tuesday's time and I should be able to push an initial working version.

@qqilihq

This comment has been minimized.

Show comment
Hide comment
@qqilihq

qqilihq Oct 9, 2016

@maropu Did you make any progress concerning the Maven repository? I'm having exactly the same issue: Not being able to access XGBoost from some public repository blocks integration into of my projects.

qqilihq commented Oct 9, 2016

@maropu Did you make any progress concerning the Maven repository? I'm having exactly the same issue: Not being able to access XGBoost from some public repository blocks integration into of my projects.

@maropu

This comment has been minimized.

Show comment
Hide comment
@maropu

maropu Oct 10, 2016

@qqilihq No progress tough, we need to do something until a next release of our product. If you interested in this issue, plz check myui/hivemall#370 and leave some comments there

maropu commented Oct 10, 2016

@qqilihq No progress tough, we need to do something until a next release of our product. If you interested in this issue, plz check myui/hivemall#370 and leave some comments there

@vectorijk

This comment has been minimized.

Show comment
Hide comment
@vectorijk

vectorijk Oct 11, 2016

@futurely regarding to google's dataflow, I also vote for Apache beam.

vectorijk commented Oct 11, 2016

@futurely regarding to google's dataflow, I also vote for Apache beam.

@alexeygrigorev

This comment has been minimized.

Show comment
Hide comment
@alexeygrigorev

alexeygrigorev Nov 23, 2016

Contributor

Is there an issue for publishing xgboost to maven central? A quick search didn't find anything. I have some ideas and would like to share them - and eventually maybe help with publishing it.

Contributor

alexeygrigorev commented Nov 23, 2016

Is there an issue for publishing xgboost to maven central? A quick search didn't find anything. I have some ideas and would like to share them - and eventually maybe help with publishing it.

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Nov 23, 2016

Member

the only issue is to include native libs for various platforms in the jar and make the program locate them accurately....I haven't get a chance to look at the solutions

Member

CodingCat commented Nov 23, 2016

the only issue is to include native libs for various platforms in the jar and make the program locate them accurately....I haven't get a chance to look at the solutions

@alexeygrigorev

This comment has been minimized.

Show comment
Hide comment
@alexeygrigorev

alexeygrigorev Nov 23, 2016

Contributor

@CodingCat yes I understand it. I have some ideas how to do it - e.g. it should be possible to follow the same approach as MTJ (https://github.com/fommil/matrix-toolkits-java). Should I create an issue where we can discuss it?

Contributor

alexeygrigorev commented Nov 23, 2016

@CodingCat yes I understand it. I have some ideas how to do it - e.g. it should be possible to follow the same approach as MTJ (https://github.com/fommil/matrix-toolkits-java). Should I create an issue where we can discuss it?

@CodingCat

This comment has been minimized.

Show comment
Hide comment
@CodingCat

CodingCat Nov 23, 2016

Member

sure, an issue or PR is welcome

Member

CodingCat commented Nov 23, 2016

sure, an issue or PR is welcome

@kapild

This comment has been minimized.

Show comment
Hide comment
@kapild

kapild Jan 5, 2017

also, we don't have cross validation support(neither in java nor in python) for ranking related tasks("objective:rank:pairwise") . can we add that to the road-map.

kapild commented Jan 5, 2017

also, we don't have cross validation support(neither in java nor in python) for ranking related tasks("objective:rank:pairwise") . can we add that to the road-map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment