`serializable` and `restore!` should be "safe" to use any time #965

ericphanson · 2022-09-23T17:15:32Z

I think it would be helpful if one could call serializable and restore! on un-fit models, as well as multiple times in a row. Why? This makes it much easier to e.g. save a model, then load it, then save it again, and have models "safe" to be saved any time.

MWE of the issues:

julia> using MLJ, XGBoost, MLJXGBoostInterface

julia> using MLJBase: serializable, restore!

julia> X = (; feat1 = rand(5), feat2 = rand(5))
(feat1 = [0.531947922668447, 0.5222942738232238, 0.656198624148056, 0.636834478615594, 0.9053874923035123],
 feat2 = [0.4851505236564363, 0.7617592421013468, 0.049114536456129954, 0.713100741015396, 0.8642362626706538],)

julia> y = categorical(rand(Bool, 5))
5-element CategoricalArrays.CategoricalArray{Bool,1,UInt32}:
 false
 false
 false
 true
 true

julia> mach = machine(XGBoostClassifier(), X, y)
untrained Machine; caches model-specific representations of data
  model: XGBoostClassifier(num_round = 100, …)
  args:
    1:	Source @746 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @662 ⏎ AbstractVector{Multiclass{2}}


julia> serializable(mach)
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] serializable(mach::Machine{XGBoostClassifier, true})
   @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:923
 [2] top-level scope
   @ REPL[38]:1

julia> restore!(mach)
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getproperty
   @ ./Base.jl:38 [inlined]
 [2] restore!(mach::Machine{XGBoostClassifier, true})
   @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:944
 [3] top-level scope
   @ REPL[39]:1

julia> fit!(mach)
[ Info: Training machine(XGBoostClassifier(num_round = 100, …), …).
[1]	train-logloss:0.68703596591949467
...
trained Machine; caches model-specific representations of data
  model: XGBoostClassifier(num_round = 100, …)
  args:
    1:	Source @746 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @662 ⏎ AbstractVector{Multiclass{2}}


julia> s_mach = serializable(mach)
serializable Machine
  model: XGBoostClassifier(num_round = 100, …)
  args:


julia> serializable(s_mach)
ERROR: MethodError: no method matching machine(::XGBoostClassifier; cache=true)
Closest candidates are:
  machine(::MLJModelInterface.Model, ::AbstractNode, ::AbstractNode...; kwargs...) at ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:388
  machine(::MLJModelInterface.Model, ::Any, ::AbstractNode, ::AbstractNode...; kwargs...) at ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:370
  machine(::MLJModelInterface.Model, ::AbstractNode, ::Any, ::Any...; kwargs...) at ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:374
  ...
Stacktrace:
 [1] serializable(mach::Machine{XGBoostClassifier, true})
   @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:909
 [2] top-level scope
   @ REPL[42]:1

julia> restored = restore!(s_mach)
trained Machine; caches model-specific representations of data
  model: XGBoostClassifier(num_round = 100, …)
  args:


julia> restored = restore!(s_mach)
ERROR: MethodError: no method matching write(::IOStream, ::Booster)
Closest candidates are:
  write(::IO, ::Any) at io.jl:672
  write(::IO, ::Any, ::Any...) at io.jl:673
  write(::IO, ::Union{SubString{String}, String}) at strings/io.jl:244
  ...
Stacktrace:
 [1] write(io::IOStream, x::Booster)
   @ Base ./io.jl:672
 [2] booster(persistent::Booster)
   @ MLJXGBoostInterface ~/.julia/packages/MLJXGBoostInterface/6E4I4/src/MLJXGBoostInterface.jl:729
 [3] restore(#unused#::XGBoostClassifier, serializable_fitresult::Tuple{Booster, CategoricalArrays.CategoricalValue{Bool, UInt32}})
   @ MLJXGBoostInterface ~/.julia/packages/MLJXGBoostInterface/6E4I4/src/MLJXGBoostInterface.jl:761
 [4] restore!(mach::Machine{XGBoostClassifier, true})
   @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:944
 [5] top-level scope
   @ REPL[43]:1

As a workaround, I am checking-before-calling with

    if !isempty(model.machine.args)
        machine = serializable(model.machine)
    else
        machine = model.machine
    end

and

    if model.machine.state == -1
        machine = restore!(model.machine)
    else
        machine = model.machine
    end

The text was updated successfully, but these errors were encountered:

ablaom · 2022-09-26T19:48:09Z

@ericphanson Thanks for reporting.

I get the idea that restore! and serializable should be no-ops if they have been performed already. I don't quite understand the need for serialising a machine with no learned parameters. That will remove data, so the restored object will be useless, right? You can't train it and you can't predict/transform on it. And presently, there is no way to add data to a machine that doesn't have any. Am I missing something here?

(However, serializable on untrained machine should give an informative error, so looks like there is a bug there).

Perhaps you mean that the data should be preserved in the untrained case?

ericphanson · 2022-09-27T05:17:58Z

You can't train it and you can't predict/transform on it. And presently, there is no way to add data to a machine that doesn't have any. Am I missing something here?

Ah ok, I’m probably too used to neural nets where you can predict with untrained models (using the randomly initialized weights), it will just be a very bad prediction. In that case, the use is two fold: (1) to write tests of code that saves and loads models (don’t need to have a trained one around to test it), and (2) if you want to save checkpoints out starting from the very beginning, so you can track validation loss since the beginning.

In this case maybe it makes less sense, in which case an informative error sounds good.

ablaom · 2022-09-27T20:50:46Z

Ah, I see now where you were coming from.

In this case maybe it makes less sense, in which case an informative error sounds good.

That's my feeling also.

The following are now addressed at JuliaAI/MLJBase.jl@ec977aa:

Throw informative error if serializable is called on untrained machine
Allow serializable and restore! to be called again with no effect.

ericphanson · 2022-09-28T23:34:04Z

thank you!

ericphanson mentioned this issue Sep 26, 2022

Issue in serializable JuliaAI/MLJBase.jl#843

Closed

ablaom closed this as completed Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`serializable` and `restore!` should be "safe" to use any time #965

`serializable` and `restore!` should be "safe" to use any time #965

ericphanson commented Sep 23, 2022

ablaom commented Sep 26, 2022

ericphanson commented Sep 27, 2022

ablaom commented Sep 27, 2022

ericphanson commented Sep 28, 2022

serializable and restore! should be "safe" to use any time #965

serializable and restore! should be "safe" to use any time #965

Comments

ericphanson commented Sep 23, 2022

ablaom commented Sep 26, 2022

ericphanson commented Sep 27, 2022

ablaom commented Sep 27, 2022

ericphanson commented Sep 28, 2022

`serializable` and `restore!` should be "safe" to use any time #965

`serializable` and `restore!` should be "safe" to use any time #965