Skip to content

FIX: sklearn clone() returns empty terms list (#340)#474

Open
VanshKharb wants to merge 4 commits into
dswah:mainfrom
VanshKharb:fix/sklearn-clone-no-terms
Open

FIX: sklearn clone() returns empty terms list (#340)#474
VanshKharb wants to merge 4 commits into
dswah:mainfrom
VanshKharb:fix/sklearn-clone-no-terms

Conversation

@VanshKharb
Copy link
Copy Markdown

Fix sklearn.clone() returning empty terms list on fitted GAMs (#340)

Description

This PR resolves an issue where calling sklearn.base.clone(gam) on a fitted GAM estimator results in an identical estimator but with a missing terms list (gam.terms == ""), breaking Scikit-Learn workflows (like grid search and cross-validation) that rely on cloning properly passing initialization arguments.

Root cause:
GAM.fit() mutates string configurations like terms='auto' into TermList objects in-place. Because sklearn.clone() queries get_params(deep=False), it received the mutated state rather than original initialization parameters, and then additionally tried to deep clone the TermList dynamically because it inherits Core.get_params(), resulting in parameter destruction.

Changes:

  • Store Initial Values: GAM.fit() now stores _init_distribution, _init_link, _init_callbacks, and _init_terms before applying data-dependent validation.
  • Selective get_params() Override: GAM.get_params(deep=False) now explicitly restores the original instantiation parameters so clone() receives correct input, cleanly segregating scikit-learn requirements from pyGAM's internal keep_best deep=True routines.
  • Bypass duck-typing: Custom __sklearn_clone__ implemented on Term and TermList to explicitly force copy.deepcopy and prevent sklearn.clone from attempting to reconstruct them as empty nested estimators.
  • Internal Optimization: Removed redundant, harmful set_params() calls in _bootstrap_samples_of_smoothing and gridsearch that broke identically.
  • Testing: Comprehensive regression test test_sklearn_clone_preserves_terms guarantees functionality safely across LinearGAM/LogisticGAM and customized parameters.

- Save original initialization params before `GAM.fit()` mutates them
- Override `get_params(deep=False)` to restore original params for `sklearn.clone()`
- Add `__sklearn_clone__` to `Term` and `TermList` to prevent empty list duck-typing
- Remove redundant `gam.set_params` calls in internal bootstrap/gridsearch
- Add regression tests for cloning fitted GAMs with custom terms
@VanshKharb VanshKharb force-pushed the fix/sklearn-clone-no-terms branch from d7015cb to 92a4f16 Compare March 3, 2026 02:15
pankajbaid567 added a commit to pankajbaid567/pyGAM that referenced this pull request Mar 12, 2026
pankajbaid567 added a commit to pankajbaid567/pyGAM that referenced this pull request Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant