Transformers, new features, transfer learning by stewarthe6 · Pull Request #424 · ATOMScience-org/AMPL

stewarthe6 · 2026-02-25T21:46:16Z

This is a large pull request with 3 new features.

Additional functionality and features for transfer learning or using a previously trained AMPL model as a feature encoder.
2 New feature sets that scale rdkit and mordred features.
Additional feature that allows you to fit and use transforms on larger/unlabeled datasets.
Removed deprecated UMAP feature transformer.

…ntransform

…and added tests

…scaler_unit_variance

…ault true and default false lists

Ipc should not be changed to AvgIpc like this because it would break all rdkit_raw models.

…n nans

…th RobustScaler and PowerTransformer. Updated documentation in related sections. Added functions to ModelFileReader to read out transformer specific parameters. Changed models that test RobustScaler and PowerTransformer to use RF to speed up the training

… unused imports

…eatures

…default function for all sklearn parameters

…dred

… it more generalizeable. Fixed tests. Fixed bug where the imputer_strategy parameter was not used

…dred

…e log scale features

…ndicator' flag because that changed the number of features and crashed.

…rming using SklearnPipelineWrapper

…dred

…g/AMPL into feat_scaled_rdkit_mordred

…e to transformers_x

…model, if transformers are saved and loaded correctly, and if transform_dataset_key_config is saved correctly

…r want to set that manually. Instead added a check when saving metadata to see if the parameters object has that attribute

…well as infill nan or extremely large values

codecov · 2026-02-26T01:11:30Z

Codecov Report

❌ Patch coverage is 98.30508% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
atomsci/ddm/pipeline/perf_plots.py	93.61%	6 Missing ⚠️

@@            Coverage Diff             @@
##            1.8.0     #424      +/-   ##
==========================================
+ Coverage   49.69%   51.61%   +1.91%     
==========================================
  Files          37       38       +1     
  Lines       11717    11982     +265     
==========================================
+ Hits         5823     6184     +361     
+ Misses       5894     5798      -96

Flag	Coverage Δ
unittests	`51.61% <98.30%> (+1.91%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
atomsci/ddm/pipeline/compare_models.py	`41.08% <ø> (-0.06%)`	⬇️
atomsci/ddm/pipeline/featurization.py	`67.59% <100.00%> (+3.15%)`	⬆️
atomsci/ddm/pipeline/model_datasets.py	`68.35% <100.00%> (+0.26%)`	⬆️
atomsci/ddm/pipeline/model_tracker.py	`17.45% <ø> (ø)`
atomsci/ddm/pipeline/model_wrapper.py	`68.85% <100.00%> (+0.39%)`	⬆️
atomsci/ddm/pipeline/parameter_parser.py	`92.60% <100.00%> (+0.07%)`	⬆️
atomsci/ddm/pipeline/transformations.py	`70.69% <100.00%> (+12.82%)`	⬆️
atomsci/ddm/utils/generate_transformers.py	`100.00% <100.00%> (ø)`
atomsci/ddm/utils/hyperparam_search_wrapper.py	`29.30% <ø> (ø)`
atomsci/ddm/utils/model_file_reader.py	`70.27% <100.00%> (+4.11%)`	⬆️
... and 1 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…r molecular weight

… colliding split_uuids

…re_transformers when transformers is None. This does not test the pipeline with no transformers, just that the function returns correctly

- Tests that the heavyatom_col paramter is used correctly and cases when there is no heavyatom_col. - Tests that the NotImplementedError is raised correctly when there is no feature count or if there is no way to featurize data. - Tests that the Identity features transforms are returned correctly. And that an error is raised if an unrecognized feature transform is used.

…ct error

…k and multitask models are trained using the same dataset, the scaled_descriptors copy of the featurized file will only contain response_columns for the single task models, and not all columns for the multitask models. This does not cause an issue during training, but when making predictions, the '_actual' columns won't exist. This causes the function to crash. This patch looks in the original dataset_key csv and finds the response columns and merges them into the scaled_descriptors file.

…t. Without this step, PowerTransformer failes

…sformer tests

stewarthe6 and others added 30 commits January 21, 2025 10:10

New sklearn transformer wrapper

78b3c37

This should be an else if since it is an else if in the init and in u…

f35e3fb

…ntransform

Remove UMAP transformer since it is deprecated

458e188

deleted large chunk of commented code

2479de0

Inserted Sklearn RobustScaler and PowerTransformer into the pipeline …

9e8dfce

…and added tests

Fixed argument name. Changed from robustscaler_with_scaling to robust…

53d3f74

…scaler_unit_variance

Updated test to use mordred_filtered features. Added arguments to def…

8feea36

…ault true and default false lists

feature_transformer_type should be feature_transform_type

1eda346

unit_variance parameter must be a tuple

fd22692

Convert quartiles

28cd0e0

Updated rdkit to use AvgIpc for a numerically better behaved feature

77c8b26

set default for PowerTransformer method

2551e38

Merge branch '1.7.0' into feature_sklearn_transformer

225c00d

Update descriptor_sets_sources_by_descr_type.csv

fa3ea05

Ipc should not be changed to AvgIpc like this because it would break all rdkit_raw models.

rdkit_scaled by heavy atom count is working

01b4c28

rdkit_raw has the Ipc feature which causes RobustTransformer to retur…

94326bc

…n nans

Set add_indicator to True so inverse transform works. Also cleaned up…

46fe7a1

… unused imports

Merged in sklearn transformer branch. Added mordred_filtered_scaled f…

2592636

…eatures

Added json needed for the delaney_panel test.

cdb6f98

default_true and default_false were being overwritten. Now calls set_…

da690e6

…default function for all sklearn parameters

Merge branch 'feature_sklearn_transformer' into feat_scaled_rdkit_mor…

216e02b

…dred

Refactor so we wrap sklearn Pipelines instead of transformers to make…

8634988

… it more generalizeable. Fixed tests. Fixed bug where the imputer_strategy parameter was not used

Merge branch 'feature_sklearn_transformer' into feat_scaled_rdkit_mor…

c748d0a

…dred

Updated rdkit_scaled and mordred_filtered_scaled. Added option to hav…

2969378

…e log scale features

Updated input sanity check for SklearnPipeLineWrapper. Removed 'add_i…

e0af6a2

…ndicator' flag because that changed the number of features and crashed.

Updated transformer test to expect NotImpelmentedError when untransfo…

8d98165

…rming using SklearnPipelineWrapper

Merge branch '1.7.0' into feature_sklearn_transformer

35eb9e9

Merge branch 'feature_sklearn_transformer' into feat_scaled_rdkit_mor…

c3ae805

…dred

Functions for saving and loading feature transformers

28f6f28

stewarthe6 added 8 commits June 17, 2025 09:38

Merge branch 'feat_scaled_rdkit_mordred' of github.com:ATOMScience-or…

b2bd80f

…g/AMPL into feat_scaled_rdkit_mordred

Merge branch '1.8.0' into feat_scaled_rdkit_mordred

c99d46c

updated clean function in test_transformer_generation. Fixed referenc…

6f50d9f

…e to transformers_x

Added tests for, testing if loaded transformers will work on the new …

9c564e4

…model, if transformers are saved and loaded correctly, and if transform_dataset_key_config is saved correctly

Removed transformer_dataset_key_configs from parameters since we neve…

0cf3529

…r want to set that manually. Instead added a check when saving metadata to see if the parameters object has that attribute

removed unused import

a99b0f6

added utility functions to filter compounds out based on features as …

1086a93

…well as infill nan or extremely large values

Merge branch '1.8.0' into feat_scaled_rdkit_mordred

6cd5746

stewarthe6 added 10 commits March 4, 2026 21:14

Tests for methods that filter out compounds by large feature values o…

cea1297

…r molecular weight

Added a test case that has nan or inf features. Added a test case for…

8d47b93

… colliding split_uuids

Thuroughly tested zero_out_inf_nan using ChatGPT. Tested create_featu…

3be943a

…re_transformers when transformers is None. This does not test the pipeline with no transformers, just that the function returns correctly

Add new json to test Identity transformer

db457b6

Test that taking the log of a negative value feature throws the corre…

1715354

…ct error

Merging in 1.8.0

51bd5cc

Added a variance threshold that removes any features that are constan…

9bd003a

…t. Without this step, PowerTransformer failes

Adding rdkit_raw features for MRP3 dataset to speed up balancing tran…

6c6325b

…sformer tests

mauvais2 merged commit 5eef520 into 1.8.0 Apr 22, 2026
8 checks passed

mauvais2 mentioned this pull request Apr 23, 2026

added new params for #434 #435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformers, new features, transfer learning#424

Transformers, new features, transfer learning#424
mauvais2 merged 48 commits into1.8.0from
feat_scaled_rdkit_mordred

stewarthe6 commented Feb 25, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stewarthe6 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stewarthe6 commented Feb 25, 2026 •

edited

Loading

codecov Bot commented Feb 26, 2026 •

edited

Loading