New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] [WIP] Pyspark api wrapper #5658
Conversation
Out of curiosity, what's the difficulty of basing pyspark support on python package instead of jvm packages? |
There is no difficulty, except that you need to translate all existing xgboost4j-spark code to python |
Unfortunately I'm unable to import from sparkxgb after following those steps on 1.0.0
I'm running this on Jupyter, not sure if it matters or not. Until this is packages officially, is there a workaround? I looked at some other threads but none seemed to work. Is there a zip file for 1.0.0 I can download (i saw one for 0.0.9/0.0.8) |
from pyspark.ml.util import JavaMLWritable | ||
from pyspark.ml.wrapper import JavaModel, JavaEstimator | ||
|
||
from sparkxgb.util import XGBoostReadable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears that this import was left in unintentionally? I'm assuming that's causing Ben's import issue (as you define the class later on as well).
Is the new import simply from xgboost.spark import XGBoostRegressor
rather than from sparkxgb import XGBoostRegressor
with this change?
edit: I just saw ben was importing from sparkxgb
and not from xgboost.spark
so if I understand this, both issues were contributing
we use a modified version of sparkxgb, it works on spark3.0 and xgboost master now. let me know if it can help. Steps:
|
any updates on this issues ? |
@WeichenXu123 Sorry if this is the wrong place to inquire, but do you know what the plan is for the Pyspark API wrapper? Is this PR still the latest effort? |
@WeichenXu123 @CodingCat Still interested in this PR (or pyspark xgboost4j support broadly speaking). Any progress on this front? |
This is the one we used
|
@WeichenXu123 |
No. We have |
Hey @WeichenXu123 I don't think you finished your sentence? "We have"? |
hello , I have seen you can run xgboost on spark3, and I use pyspark3.1.2 that I failed many times to develop xgb, I download the file sparkxgb-1.24.zip, but where can I find the two jars named xgboost4j_2.12-1.3.0-SNAPSHOT.jar and xgboost4-saprk-j_2.12-1.3.0-SNAPSHOT.jar ? |
This is the latest one we used. We only tested on Spark3.0. Not sure if it works on spark3.1 |
|
One jar used Gazelle's Arrow native parquet reader to read the data, another Jar transform Arrow data format into xgboost Dmatrix directly. They are all outdated. We didn't update them anymore since no one use them. |
This take over #4656 , this is still WIP.