Models don't accept model_name, saving_path #136

rmitsch · 2020-06-08T15:52:29Z

Describe the bug

Models don't accept model_name, saving_path as initialization arguments.

What is the current behavior?

See above.

If the current behavior is a bug, please provide the steps to reproduce.

clf: TabNetClassifier = TabNetClassifier(saving_path="/home/user123/dev/", device_name="cpu")

Expected behavior

Models should accept model_name, saving_path as initialization arguments as specified in the documentation.

Screenshots

Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:

Additional context

On a related note: How can models be persisted? The mentioned init parameters strongly suggest that it is possible, but I couldn't find any information on this - either in the documentation nor in the code.

The text was updated successfully, but these errors were encountered:

Optimox · 2020-06-08T16:37:55Z

hey @rmitsch,

Thanks for creating this issue, model_name and saving_pat are actually deprecated, we should remove them and update the README.

Saving a tabnet model follows the same rule as saving a pytorch model or XGBoost model.
Either you save it as pickle and it will be the same as an XGBoost model or you can use pytorch specific save methods https://pytorch.org/tutorials/beginner/saving_loading_models.html

The best way I would recommend :

torch.save(clf_tabnet.network.state_dict(), PATH) to save your model clf_tabnet
when you want to use it later: you'll need to redifine your tabnet_model with the same params clf_tabnet = TabnetClassfier(**your_params) and then clf_tabnet.network.load_state_dict(torch.load(PATH))

rmitsch · 2020-06-09T07:57:55Z

Thanks for the quick response!
Your recommended approach to save a model approach unfortunately yields AttributeError: 'TabNetClassifier' object has no attribute 'network'.

Optimox · 2020-06-09T08:48:18Z

hello @rmitsch,

Actually you are right, what I said does not work because the network is instantiated only after a fit, which is not very useful in that case (we might change that behaviour in the future).

Try this instead, it should work :

import pickle

# Save the model wherever you want
with open("./AMODEL.pkl", 'wb') as model_file:
    pickle.dump(clf, model_file)

# Load the model later to make prediction
with open("./AMODEL.pkl", 'rb') as model_file:
    new_clf = pickle.load(model_file)

eduardocarvp · 2020-06-09T10:27:59Z

Indeed, @Optimox , I have noticed that and I even probably have the change locally where I instantiate the network on class __init__(). I think it is better that way.
I'm willing to work on this and I can also fix the model_name/saving_path on the way, It should be simple.

Optimox · 2020-06-09T12:17:24Z

@eduardocarvp I think the problem with this is that before the fit we do not know either input_dim or output_dim, it's nice to have this computed automatically so I'm not sure how to bypass this.

I think the best way would probably to have a method set_network (or something better) which would be called automatically during the fit but that could be called manually in order to instantiate everything.

About the save I don't know if we wan't to package something or just give a few methods on how to save and reuse a tabnet model.

rmitsch · 2020-06-09T15:38:17Z

@Optimox Plain old pickling worked, thanks!
My two cents on whether to offer functionality to save and load load models: IMO that would be reasonable, even if it's just a very simple wrapper - that way I as a user don't have to worry about whether using Pytorch's save, pickle etc.

xywust2014 · 2020-06-10T18:41:23Z

hello @Optimox

Actually you are right, what I said does not work because the network is instantiated only after a fit, which is not very useful in that case (we might change that behaviour in the future).

Try this instead, it should work :
import pickle

# Save the model wherever you want
with open("./AMODEL.pkl", 'wb') as model_file:
    pickle.dump(clf, model_file)

# Load the model later to make prediction
with open("./AMODEL.pkl", 'rb') as model_file:
    new_clf = pickle.load(model_file)

I tried this method. But it gives me an error: PicklingError: Can't pickle <class 'pytorch_tabnet.tab_model.TabNetClassifier'>: it's not the same object as pytorch_tabnet.tab_model.TabNetClassifier . Could you take a look at what might be reason? Thanks!

It looks like that I can use this to save the model. However, after I used .fit method using the training dataset on clf, then abovementioned error will occur.

Optimox · 2020-06-11T06:14:37Z

helo @xywust2014 could you please clarify a bit when the error occurs?
Sharing some code can help us too, but from your error message it looks like you might be missing parenthesis, what you want to save is your clf defined like this clf = pytorch_tabnet.tab_model.TabNetClassifier().

xywust2014 · 2020-06-11T13:44:03Z

helo @xywust2014 could you please clarify a bit when the error occurs?
Sharing some code can help us too, but from your error message it looks like you might be missing parenthesis, what you want to save is your clf defined like this clf = pytorch_tabnet.tab_model.TabNetClassifier().

Thanks a lot for the help. Here are the code.

clf = TabNetClassifier(
    **best_hyperparams, 
    optimizer_fn=torch.optim.Adam,
    scheduler_params = {"gamma": 0.95,
                     "step_size": 20},
    scheduler_fn=torch.optim.lr_scheduler.StepLR, epsilon=1e-15,
    device_name = 'auto'
)

max_epochs = 100
clf.fit(X_train = train_x, y_train = train_y , 
        X_valid = test_x, y_valid = test_y, 
        max_epochs = max_epochs, patience = 1, 
        batch_size = 512, virtual_batch_size = 256
        )

import pickle 
with open("./AMODEL.pkl","wb") as model_file:
    pickle.dump(clf, model_file)

with open("./AMODEL.pkl","wb") as model_file:
    new_clf = pickle.load(model_file)

Optimox · 2020-06-11T14:45:47Z

hmm could you try reading the file instead of writting in the second with open statement :

instead of this

with open("./AMODEL.pkl","wb") as model_file:
    new_clf = pickle.load(model_file)

try this

with open("./AMODEL.pkl","rb") as model_file:
    new_clf = pickle.load(model_file)

xywust2014 · 2020-06-11T14:47:48Z

hmm could you try reading the file instead of writting in the second with open statement :

instead of this
with open("./AMODEL.pkl","wb") as model_file:
    new_clf = pickle.load(model_file)
try this
with open("./AMODEL.pkl","rb") as model_file:
    new_clf = pickle.load(model_file)

Thanks a lot!:) I will try that.
Well. Unfortunately, even I changed for that, it still gives me the same error.

athewsey · 2020-06-14T06:53:45Z

Just chipping in to say - pickle dump/load is working for me... But I've noticed that it means it's not possible to e.g. train the model on a CUDA-enabled machine but then deploy for inference on a CPU-only environment. I suspect there might also be some constraints about porting the model between Python versions or other environment changes?

I'd advocate for either of these, if possible:

Adding dedicated load/save methods to this library's API and working on their flexibility, or
Adding a (documented) way to use PyTorch's load/save methods

...Not sure if it belongs in a separate enhancement Issue or is OK to tackle here though!

Optimox · 2020-06-14T10:04:48Z

@athewsey
I guess improving things so that the pickle option is not the only one would allow easier porting on different environments, pickle has this inherent flaw.

However have you tried to explicitely switch to cpu for inference after loading your model by doing clf.device_name = "cpu" ?

athewsey · 2020-06-15T08:25:05Z

Thanks for the quick response @Optimox! ...But afraid I don't think it'll work 😔

The error below is thrown on pickle.load("classifier.pkl"), so not possible to unpickle first then check and override:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I also tried unpickling the trained file on a GPU-enabled instance (which showed device_name = "auto" by the way); setting the prop to cpu; then re-pickling to classifier.cpu.pkl and seeing if that would load in the CPU-only environment - It raised same error even with trying a range of different updates e.g:

model.device_name = "cpu"
model.device = torch.device(model.device_name)
model.network.device = model.device
model.network.to(model.network.device)
model.network.cpu()
# Still won't unpickle on a non-CUDA env

Optimox · 2020-06-15T09:58:00Z

yep it was worth a try, I think @eduardocarvp will come with a better solution pretty soon.

xywust2014 · 2020-06-15T16:30:53Z

Just chipping in to say - pickle dump/load is working for me... But I've noticed that it means it's not possible to e.g. train the model on a CUDA-enabled machine but then deploy for inference on a CPU-only environment. I suspect there might also be some constraints about porting the model between Python versions or other environment changes?

I'd advocate for either of these, if possible:

Adding dedicated load/save methods to this library's API and working on their flexibility, or

Adding a (documented) way to use PyTorch's load/save methods

...Not sure if it belongs in a separate enhancement Issue or is OK to tackle here though!

Thanks for the response. It looks like that I can use the pickle method to save&load the model clf without fitting on the training datasets. However, after I used .fit method using the training datasets on clf, then the following error will occur.

PicklingError: Can't pickle <class 'pytorch_tabnet.tab_network.TabNet'>: it's not the same object as pytorch_tabnet.tab_network.TabNet

Optimox · 2020-06-15T17:09:54Z

@xywust2014 could you share a minimal code sample to reproduce your error?

Because pickling option seems to be working as long as you stay in the same environment. Without a reproducible bug we can’t help you.

xywust2014 · 2020-06-15T17:51:13Z

@xywust2014 could you share a minimal code sample to reproduce your error?

Because pickling option seems to be working as long as you stay in the same environment. Without a reproducible bug we can’t help you.

Thanks. Optimox. I am using python 3.7 on Spyder. This machine has Cuda environment. But for the code below, I chose device_name = 'cpu'. (I set this as 'auto' before, which generates same error.)

If I escaped the clf.fit() code, then there won't be any errors.
However, if I run that code, then the error will occur.

best_hyperparams = {"clip_value": 4.0, "gamma": 0.6666666666666666, "lr": 0.23, 
                    "momentum": 0.45, "n_a": 8, "n_d": 48, "n_independent": 6, "n_shared": 2}

clf = TabNetClassifier(
    **best_hyperparams, 
    optimizer_fn=torch.optim.Adam,
    scheduler_params = {"gamma": 0.95,
                     "step_size": 20},
    scheduler_fn=torch.optim.lr_scheduler.StepLR, epsilon=1e-15,
    device_name = 'cpu')

max_epochs = 100
clf.fit(X_train = train_x, y_train = train_y , 
        X_valid = test_x, y_valid = test_y, 
        max_epochs = max_epochs, patience = 2, 
        batch_size = 1024, virtual_batch_size = 256
        )

###################### Save the trained model ####################
#joblib.dump(clf, filename = "Model"+"\\" + model_n +'.plk')
#torch.save(clf.network.state_dict(),"C:\DataScientist\HC\"  + "Model"+"\\" + model_n)
import pickle 
with open("./AMODEL.pkl","wb") as model_file:
    pickle.dump(clf, model_file)

with open("./AMODEL.pkl","rb") as model_file:
    new_clf = pickle.load(model_file)

Thanks guys for the help.

Optimox · 2020-06-15T19:12:01Z

well I just ran this code in my local machine on census income and it runs without problem...

a few notes though, if you are performing hyper parameter tuning you could refer to the README available in the front page of the repo to see "typical" values from the research paper.

momentum seems a bit high
clip value is for gradient clipping not sure it's worth searching on this but why not
typically n_d = n_a works well so this could lower your search space
gamma is mathematically supposed to be greater than 1, values bellow one would make the mask behave a bit strangely I guess maybe searching between 1 and 2 would yield better results

I know this does not solve your problem but everything is running just fine on my computer... not sure where this comes from.

xywust2014 · 2020-06-15T19:30:44Z

well I just ran this code in my local machine on census income and it runs without problem...

a few notes though, if you are performing hyper parameter tuning you could refer to the README available in the front page of the repo to see "typical" values from the research paper.

momentum seems a bit high

clip value is for gradient clipping not sure it's worth searching on this but why not

typically n_d = n_a works well so this could lower your search space

gamma is mathematically supposed to be greater than 1, values bellow one would make the mask behave a bit strangely I guess maybe searching between 1 and 2 would yield better results

I know this does not solve your problem but everything is running just fine on my computer... not sure where this comes from.

Thanks a lot, Optimox, for the advice.

DoDzilla-ai · 2020-06-22T12:17:44Z

hey @rmitsch,

Thanks for creating this issue, model_name and saving_pat are actually deprecated, we should remove them and update the README.

Saving a tabnet model follows the same rule as saving a pytorch model or XGBoost model.
Either you save it as pickle and it will be the same as an XGBoost model or you can use pytorch specific save methods https://pytorch.org/tutorials/beginner/saving_loading_models.html

The best way I would recommend :

torch.save(clf_tabnet.network.state_dict(), PATH) to save your model clf_tabnet

when you want to use it later: you'll need to redifine your tabnet_model with the same params clf_tabnet = TabnetClassfier(**your_params) and then clf_tabnet.network.load_state_dict(torch.load(PATH))

Models also don't accept mask_type. Is this deprecated as well?

PS: Btw. lr (learning rate) is not documented in the readme. Just saying...

Optimox · 2020-06-22T13:21:22Z

@DoDzilla-ai

Hello, well you are actually looking at the develop branch README (maybe we should find a way of defaulting the master branch) so mask_type is actually a new feature and not a deprecated one, but you if you installed the code from pip the you are using the master branch code which does not accept mask_type.

The same thing is happening with lr, we changed this recently in order to give more flexibility to final users.

The development branch always have some advanced features that the master branch does not get, they will match at the next release in the coming weeks. In the meantime please refer to the the master branch readme in order to get the current documentation.

Optimox · 2020-07-03T07:29:40Z

The release 1.2 should solve all these problems, feel free to open an new issue if you encounter an other problem.

rmitsch added the bug Something isn't working label Jun 8, 2020

rmitsch assigned eduardocarvp, Hartorn, j-abi and Optimox Jun 8, 2020

Optimox added documentation Improvements or additions to documentation and removed bug Something isn't working labels Jun 8, 2020

Optimox mentioned this issue Jun 9, 2020

Create network on model instantiation #138

Closed

eduardocarvp mentioned this issue Jun 23, 2020

feat: save and load tabnet models #148

Merged

Optimox closed this as completed Jul 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models don't accept model_name, saving_path #136

Models don't accept model_name, saving_path #136

rmitsch commented Jun 8, 2020

Optimox commented Jun 8, 2020 •

edited

rmitsch commented Jun 9, 2020 •

edited

Optimox commented Jun 9, 2020 •

edited

eduardocarvp commented Jun 9, 2020

Optimox commented Jun 9, 2020

rmitsch commented Jun 9, 2020

xywust2014 commented Jun 10, 2020 •

edited

Optimox commented Jun 11, 2020

xywust2014 commented Jun 11, 2020

Optimox commented Jun 11, 2020

xywust2014 commented Jun 11, 2020 •

edited

athewsey commented Jun 14, 2020

Optimox commented Jun 14, 2020

athewsey commented Jun 15, 2020

Optimox commented Jun 15, 2020

xywust2014 commented Jun 15, 2020

Optimox commented Jun 15, 2020

xywust2014 commented Jun 15, 2020 •

edited

Optimox commented Jun 15, 2020

xywust2014 commented Jun 15, 2020

DoDzilla-ai commented Jun 22, 2020 •

edited

Optimox commented Jun 22, 2020

Optimox commented Jul 3, 2020

Models don't accept model_name, saving_path #136

Models don't accept model_name, saving_path #136

Comments

rmitsch commented Jun 8, 2020

Optimox commented Jun 8, 2020 • edited

rmitsch commented Jun 9, 2020 • edited

Optimox commented Jun 9, 2020 • edited

eduardocarvp commented Jun 9, 2020

Optimox commented Jun 9, 2020

rmitsch commented Jun 9, 2020

xywust2014 commented Jun 10, 2020 • edited

Optimox commented Jun 11, 2020

xywust2014 commented Jun 11, 2020

Optimox commented Jun 11, 2020

xywust2014 commented Jun 11, 2020 • edited

athewsey commented Jun 14, 2020

Optimox commented Jun 14, 2020

athewsey commented Jun 15, 2020

Optimox commented Jun 15, 2020

xywust2014 commented Jun 15, 2020

Optimox commented Jun 15, 2020

xywust2014 commented Jun 15, 2020 • edited

Optimox commented Jun 15, 2020

xywust2014 commented Jun 15, 2020

DoDzilla-ai commented Jun 22, 2020 • edited

Optimox commented Jun 22, 2020

Optimox commented Jul 3, 2020

Optimox commented Jun 8, 2020 •

edited

rmitsch commented Jun 9, 2020 •

edited

Optimox commented Jun 9, 2020 •

edited

xywust2014 commented Jun 10, 2020 •

edited

xywust2014 commented Jun 11, 2020 •

edited

xywust2014 commented Jun 15, 2020 •

edited

DoDzilla-ai commented Jun 22, 2020 •

edited