Saving models? #14

xloffree · 2022-09-28T21:57:41Z

Hi,

Is there a way to save models generated in PIML so that I do not need to run the program and train the model each time?
Also, what is the best way to export results? Is there any way to export results such that the widgets are still interactive? I have just been saving the notebook as an html file in order to share results.

Thank you

ZebinYang · 2022-10-07T08:11:02Z

Hi @xloffree,

First of all, you may save a fitted model in PiML using the following approach.

import dill

clf = exp.get_model("GAM").estimator 

with open('name_model.pkl', 'wb') as file:
    dill.dump(clf, file)

with open('name_model.pkl', 'rb') as file:
    clf_load = dill.load(file)

train_x = exp.get_model("GAM").get_data(train=True)[0]
clf_load.predict(train_x)

You may also register the loaded model into PiML using the demo at https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ExternalModels.ipynb#scrollTo=7WGJ8PzutkLh, "Scenario 2: Register external fitted models with dataset".

Second, all the interactive panels used in PiML are based on python runtime. Currently, we don't have such functionality to export interactive results; the best way is to save the notebook as static Html: a) Click on "Widgets -> Save Notebook Widget State"; b) Export it as Html by "File -> Download as -> HTML (.html)".

xloffree · 2022-12-18T20:25:37Z

Hi, thank you for your help with this. Has this solution worked for you with PiML? When I try this solution, this line:
with open('name_model.pkl', 'rb') as file: clf_load = dill.load(file)

results in a recursion error every time. I tried changing the recursion limit but even with a recursion limit pf 10000000 I still run into this error. Increasing the recursion limit indefinitely just causes the kernel to crash.

The error is as follows:

RecursionError Traceback (most recent call last)
Cell In [39], line 2
1 with open('name_model.pkl', 'rb') as file:
----> 2 clf_load = dill.load(file)

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:272, in load(file, ignore, **kwds)
266 def load(file, ignore=None, **kwds):
267 """
268 Unpickle an object from a file.
269
270 See :func:loads for keyword arguments.
271 """
--> 272 return Unpickler(file, ignore=ignore, **kwds).load()

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:419, in Unpickler.load(self)
418 def load(self): #NOTE: if settings change, need to update attributes
--> 419 obj = StockUnpickler.load(self)
420 if type(obj).module == getattr(_main_module, 'name', 'main'):
421 if not self._ignore:
422 # point obj class to main

File piml/models/glm.py:32, in piml.models.glm.GLMRegressor.getattr()

[... skipping similar frames: piml.models.glm.GLMRegressor.__getattr__ at line 32 (9999967 times)]

File piml/models/glm.py:32, in piml.models.glm.GLMRegressor.getattr()

RecursionError: maximum recursion depth exceeded while calling a Python object

Any help with this would be very appreciated. When using PiML for research purposes, being able to save a trained model is essential for reproducibility. Thank you!

ZebinYang · 2022-12-19T01:16:23Z

Hi @xloffree,

For GLM, you may use the following code to do model saving,

import dill

clf = exp.get_model("GLM").estimator.__model__ 

with open('name_model.pkl', 'wb') as file:
    dill.dump(clf, file)

with open('name_model.pkl', 'rb') as file:
    clf_load = dill.load(file)

train_x = exp.get_model("GLM").get_data(train=True)[0]
clf_load.predict(train_x)

xloffree · 2022-12-19T02:30:10Z

Thank you very much. This works. How can I use this trained model to predict on other datasets? Is this functionality exclusively part of PiML or is there third party documentation I can view for more background on how this code works?

Thank you

ZebinYang · 2022-12-19T10:06:52Z

Thank you very much. This works. How can I use this trained model to predict on other datasets? Is this functionality exclusively part of PiML or is there third party documentation I can view for more background on how this code works?

Thank you

If you have another dataset with the same input features set, then you can use this model to get predictions. Assume the new data has covariates "X" (the raw scale without preprocessing), then you can get the prediction using the fitted model in PiML via the following way.

clf = exp.get_model("GLM").estimator 
xx = exp.get_data(x=X)
clf.predict(xx)

xloffree · 2022-12-27T20:31:38Z

What datatype should X be here? Is it a dataframe that includes all of the data for all of the predictors?

Thanks

xloffree · 2022-12-27T21:00:28Z

Hi,

Is different code required to save each different type of built-in model in PiML? It seems whenever I try to save a different model, I run into a new error. Is there somewhere where I can see the code for how to save each different type of model?
Thank you

ZebinYang · 2022-12-28T06:12:07Z

What datatype should X be here? Is it a dataframe that includes all of the data for all of the predictors?

The X is the numpy array of the selected features. It should have the same data format as the uploaded raw data, without any preprocessing.

ZebinYang · 2022-12-28T06:14:29Z

Hi,

Is different code required to save each different type of built-in model in PiML? It seems whenever I try to save a different model, I run into a new error. Is there somewhere where I can see the code for how to save each different type of model? Thank you

For the GLMRegressor model, use

import dill

clf = exp.get_model("GLM").estimator.__model__ 

with open('name_model.pkl', 'wb') as file:
    dill.dump(clf, file)

with open('name_model.pkl', 'rb') as file:
    clf_load = dill.load(file)

train_x = exp.get_model("GLM").get_data(train=True)[0]
clf_load.predict(train_x)

For all the rest models, you can use

import dill

clf = exp.get_model("GAM").estimator 

with open('name_model.pkl', 'wb') as file:
    dill.dump(clf, file)

with open('name_model.pkl', 'rb') as file:
    clf_load = dill.load(file)

train_x = exp.get_model("GAM").get_data(train=True)[0]
clf_load.predict(train_x)

BTW, we will provide a unified API for model saving in the next release.

xloffree · 2023-01-06T04:34:23Z

clf = exp.get_model("GLM").estimator
xx = exp.get_data(x=X)
clf.predict(xx)

I still do not understand what this means. I have tried to pass df and df.columns as X but it does not work. Do you have an example of what X should be?

Thank you

xloffree · 2023-01-06T04:48:44Z

Would it be possible for us to discuss PiML over a zoom meeting? That might be more efficient than messages on this page.

ZebinYang · 2023-01-06T05:37:41Z

Hi, here X is just an n*p numpy array, where n is the sample size and p is the number of predictors (excluding unselected features and the response feature).

For instance, assume the raw data is a pd.DataFrame as follows,

season	yr	mnth	hr	holiday	weekday	workingday	weathersit	temp	atemp	hum	windspeed	cnt
1.0	0.0	1.0	0.0	0.0	6.0	0.0	1.0	0.24	0.2879	0.81	0.0000	16.0
1.0	0.0	1.0	1.0	0.0	6.0	0.0	1.0	0.22	0.2727	0.80	0.0000	40.0
1.0	0.0	1.0	2.0	0.0	6.0	0.0	1.0	0.22	0.2727	0.80	0.0000	32.0
1.0	0.0	1.0	3.0	0.0	6.0	0.0	1.0	0.24	0.2879	0.75	0.0000	13.0
1.0	0.0	1.0	4.0	0.0	6.0	0.0	1.0	0.24	0.2879	0.75	0.0000	1.0
...	...	...	...	...	...	...	...	...	...	...	...	...
1.0	1.0	12.0	19.0	0.0	1.0	1.0	2.0	0.26	0.2576	0.60	0.1642	119.0
1.0	1.0	12.0	20.0	0.0	1.0	1.0	2.0	0.26	0.2576	0.60	0.1642	89.0
1.0	1.0	12.0	21.0	0.0	1.0	1.0	1.0	0.26	0.2576	0.60	0.1642	90.0
1.0	1.0	12.0	22.0	0.0	1.0	1.0	1.0	0.26	0.2727	0.56	0.1343	61.0
1.0	1.0	12.0	23.0	0.0	1.0	1.0	1.0	0.26	0.2727	0.65	0.1343	49.0

Then you selected season, yr, mnth, hr as the covariates in exp.data_summary and exp.feature_select, and cnt as the response (in exp.data_prepare).

The X is supposed to be a np.array that looks like:

season	yr	mnth	hr
1.0	0.0	1.0	0.0
1.0	0.0	1.0	1.0
1.0	0.0	1.0	2.0
1.0	0.0	1.0	3.0
1.0	0.0	1.0	4.0
...	...	...	...
1.0	1.0	12.0	19.0
1.0	1.0	12.0	20.0
1.0	1.0	12.0	21.0
1.0	1.0	12.0	22.0
1.0	1.0	12.0	23.0

An example of X would be, which is the selected covariates of the loaded data.

X = exp.dataset.x
clf = exp.get_model("GLM").estimator
xx = exp.get_data(x=X)
clf.predict(xx)

Hope that helps.

xloffree · 2023-01-11T02:19:09Z

Saving a model works for glm and gaml. After that, none of the other models will save and they result in an error:

PicklingError Traceback (most recent call last)
Cell In [18], line 4
1 clf = exp.get_model("GAMI-Net").estimator
3 with open('LVS_GAMI-Net.pkl', 'wb') as file:
----> 4 dill.dump(clf, file)

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:235, in dump(obj, file, protocol, byref, fmode, recurse, **kwds)
233 _kwds = kwds.copy()
234 _kwds.update(dict(byref=byref, fmode=fmode, recurse=recurse))
--> 235 Pickler(file, protocol, **_kwds).dump(obj)
236 return

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:394, in Pickler.dump(self, obj)
392 def dump(self, obj): #NOTE: if settings change, need to update attributes
393 logger.trace_setup(self)
--> 394 StockPickler.dump(self, obj)

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:487, in _Pickler.dump(self, obj)
485 if self.proto >= 4:
486 self.framer.start_framing()
--> 487 self.save(obj)
488 self.write(STOP)
489 self.framer.end_framing()

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:388, in Pickler.save(self, obj, save_persistent_id)
386 msg = "Can't pickle %s: attribute lookup builtins.generator failed" % GeneratorType
387 raise PicklingError(msg)
--> 388 StockPickler.save(self, obj, save_persistent_id)

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:603, in _Pickler.save(self, obj, save_persistent_id)
599 raise PicklingError("Tuple returned by %s must have "
600 "two to six elements" % reduce)
602 # Save the reduce() output and finally memoize the object
--> 603 self.save_reduce(obj=obj, *rv)

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:717, in _Pickler.save_reduce(self, func, args, state, listitems, dictitems, state_setter, obj)
715 if state is not None:
716 if state_setter is None:
--> 717 save(state)
718 write(BUILD)
719 else:
720 # If a state_setter is specified, call it instead of load_build
721 # to update obj's with its previous state.
722 # First, push state_setter and its tuple of expected arguments
723 # (obj, state) onto the stack.

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:388, in Pickler.save(self, obj, save_persistent_id)
386 msg = "Can't pickle %s: attribute lookup builtins.generator failed" % GeneratorType
387 raise PicklingError(msg)
--> 388 StockPickler.save(self, obj, save_persistent_id)

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:560, in _Pickler.save(self, obj, save_persistent_id)
558 f = self.dispatch.get(t)
559 if f is not None:
--> 560 f(self, obj) # Call unbound method with explicit self
561 return
563 # Check private dispatch table if any, or else
564 # copyreg.dispatch_table

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:1186, in save_module_dict(pickler, obj)
1183 if is_dill(pickler, child=False) and pickler._session:
1184 # we only care about session the first pass thru
1185 pickler._first_pass = False
-> 1186 StockPickler.save_dict(pickler, obj)
1187 logger.trace(pickler, "# D2")
1188 return

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:971, in _Pickler.save_dict(self, obj)
968 self.write(MARK + DICT)
970 self.memoize(obj)
--> 971 self._batch_setitems(obj.items())

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:997, in _Pickler._batch_setitems(self, items)
995 for k, v in tmp:
996 save(k)
--> 997 save(v)
998 write(SETITEMS)
999 elif n:

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/site-packages/dill/_dill.py:388, in Pickler.save(self, obj, save_persistent_id)
386 msg = "Can't pickle %s: attribute lookup builtins.generator failed" % GeneratorType
387 raise PicklingError(msg)
--> 388 StockPickler.save(self, obj, save_persistent_id)

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:589, in _Pickler.save(self, obj, save_persistent_id)
587 # Check for string returned by reduce(), meaning "save as global"
588 if isinstance(rv, str):
--> 589 self.save_global(obj, rv)
590 return
592 # Assert that reduce() returned a tuple

File /n/holylfs05/LABS/liang_lab_l3/Lab/piml_py39_shared/lib/python3.9/pickle.py:1070, in _Pickler.save_global(self, obj, name)
1068 obj2, parent = _getattribute(module, name)
1069 except (ImportError, KeyError, AttributeError):
-> 1070 raise PicklingError(
1071 "Can't pickle %r: it's not found as %s.%s" %
1072 (obj, module_name, name)) from None
1073 else:
1074 if obj2 is not obj:

PicklingError: Can't pickle <cyfunction Model.register_model..sklearn_is_fitted at 0x2ae1c932aee0>: it's not found as piml.workflow.base.Model.register_model..sklearn_is_fitted

ZebinYang · 2023-01-11T05:44:39Z

Thanks for reporting this issue.

You may use the following scripts to save and load a fitted model except for GLMRegressor.

import dill

clf = exp.get_model("GAM").estimator 
clf.__sklearn_is_fitted__ = lambda : True

with open('name_model.pkl', 'wb') as file:
    dill.dump(clf, file)

with open('name_model.pkl', 'rb') as file:
    clf_load = dill.load(file)

train_x = exp.get_model("GAM").get_data(train=True)[0]
clf_load.predict(train_x)

xloffree · 2023-01-19T00:39:43Z

Thanks. I am able to save every model as a .pkl file now. How can I easily load the model and view its interpretability metrics within PiML? For example, if I have an EBM model saved as a .pkl, and I want to view the results of exp.model_interpret(), how can I do this without retraining?

Thank you

ZebinYang · 2023-01-19T01:08:33Z

@xloffree,

You can do the following to register it into the PiML workflow:

pipeline = exp.make_pipeline(model=clf_load)
exp.register(pipeline, "loaded_model")
exp.model_interpret()

Note that in this case, you need to do data loading, summary, and preparation first, so that all the data are available.
An alternative way is to specify the required data information in exp.register. You can find the details in the docs of exp.register function, and the example usage in https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ExternalModels.ipynb.

ZebinYang closed this as completed Nov 11, 2022

xloffree mentioned this issue Dec 18, 2022

Saving trained models #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving models? #14

Saving models? #14

xloffree commented Sep 28, 2022

ZebinYang commented Oct 7, 2022

xloffree commented Dec 18, 2022

ZebinYang commented Dec 19, 2022

xloffree commented Dec 19, 2022

ZebinYang commented Dec 19, 2022

xloffree commented Dec 27, 2022

xloffree commented Dec 27, 2022

ZebinYang commented Dec 28, 2022

ZebinYang commented Dec 28, 2022

xloffree commented Jan 6, 2023

xloffree commented Jan 6, 2023

ZebinYang commented Jan 6, 2023 •

edited

xloffree commented Jan 11, 2023

ZebinYang commented Jan 11, 2023

xloffree commented Jan 19, 2023

ZebinYang commented Jan 19, 2023

Saving models? #14

Saving models? #14

Comments

xloffree commented Sep 28, 2022

ZebinYang commented Oct 7, 2022

xloffree commented Dec 18, 2022

ZebinYang commented Dec 19, 2022

xloffree commented Dec 19, 2022

ZebinYang commented Dec 19, 2022

xloffree commented Dec 27, 2022

xloffree commented Dec 27, 2022

ZebinYang commented Dec 28, 2022

ZebinYang commented Dec 28, 2022

xloffree commented Jan 6, 2023

xloffree commented Jan 6, 2023

ZebinYang commented Jan 6, 2023 • edited

xloffree commented Jan 11, 2023

ZebinYang commented Jan 11, 2023

xloffree commented Jan 19, 2023

ZebinYang commented Jan 19, 2023

ZebinYang commented Jan 6, 2023 •

edited