New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty paths written when persisting SparkXGBClassifier models #9446
Comments
Hi, if you are loading the model you should use Other than this, I can't reproduce the error:
|
Thank you @trivialfis |
I dug around a bit in the sources and it appears that for the trained model there is no param 'xgb_model' so when we go to retrieve the init_booster with .getOrDefault it returns None: |
That's for training continuation, I don't see the the issue with this. Could you please share a reproducible script? |
Here's a notebook that reproduces the result:
|
I converted the notebook to a script and replaced the for root, subdirs, files in os.walk(xgb_classifier_model_path):
for f in files:
print(os.path.join(root, f)) Here's the output print:
|
My environment: |
Oooh interesting, I'm using only pySpark 3.2 and 3.3 so far. Let me see about upgrading to 3.4. |
In hindsight, it might be worth noting this in the docs about persisting the model that if the spark catalog is configured for S3 then you should use an S3 path instead of a local path to persist and load models. |
Thank you for sharing, would you like to open a PR for the doc? |
Following the documentation for model persistence when using SparkXGBClassifier, saving the model results in each metadata/model subdirectory under the given path having only an empty file named _SUCCESS. Attempts to load the model from the given path result in:
ValueError: RDD is empty
.Running xgboost-2.0.0.dev on pySpark3.3/python3.8.
See also: https://stackoverflow.com/questions/75370396/saving-sparkxgboost-model-yields-empty-directory/76860032#76860032
The text was updated successfully, but these errors were encountered: