[pyspark] Document model format. #8186

trivialfis · 2022-08-21T08:01:54Z

https://discuss.xgboost.ai/t/save-pyspark-model-load-it-on-other-language/2894

DavidMaister · 2022-09-01T15:39:29Z

Hey, I managed to solve this so I'll share how I've solved.

The SparkXGBClassifier saving the model, saves a directory with parquet files.
Read these folder with parquet files on pyspark, write these object as json.
In python, load this json and we want to change the format because it doesnt work the original one.

import json
f = open('pyspark_raw_json.json')
pyspark_raw_json = json.load(f)
with open('pyspark_json_converted.json', 'w', encoding='utf-8') as f:
    json.dump(json.loads(list(pyspark_raw_json.values())[0]), f)

Then I am able to load the model in xgboost

loaded_model = xgb.XGBClassifier()
loaded_model.load_model('pyspark_model_converted.json')

wbo4958 · 2022-09-02T06:12:15Z

I think we need to change the "save" to the model that xgboost can use directly. Let me resolve this.

trivialfis added the doc label Aug 21, 2022

wbo4958 mentioned this issue Sep 2, 2022

[pyspark] make the model saved by pyspark compatible #8219

Merged

trivialfis closed this as completed Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pyspark] Document model format. #8186

[pyspark] Document model format. #8186

trivialfis commented Aug 21, 2022

DavidMaister commented Sep 1, 2022

wbo4958 commented Sep 2, 2022

[pyspark] Document model format. #8186

[pyspark] Document model format. #8186

Comments

trivialfis commented Aug 21, 2022

DavidMaister commented Sep 1, 2022

wbo4958 commented Sep 2, 2022