Update feature imputation strategy #201

bfhealy · 2022-12-20T16:00:48Z

Missing features are current zero-imputed for training and mean-imputed for inference. Aside from their inconsistency, there are better (albeit more costly) ways to perform imputation. The current plan is to:

Exclude the AllWISE W3 and W4 magnitude errors, which are missing from >75% of the training sample
Impute a value of zero for missing mean_ztf_alert_braai
Impute the median for missing magnitude errors (mainly PS1)
Use regression (e.g. KNN imputation) to impute missing magnitudes
Use regression (potentially on a class-by-class basis) to impute missing Gaia EDR3 parallaxes

The text was updated successfully, but these errors were encountered:

bfhealy · 2023-01-03T18:34:18Z

Using sklearn's KNNImputer (n=5) takes ~2-3 minutes to run on 100,000 sources for the following feature subset:

['Gaia_EDR3__phot_bp_mean_mag',
 'Gaia_EDR3__phot_rp_mean_mag',
 'Gaia_EDR3__parallax',
 'PS1_DR1__gMeanPSFMag',
 'PS1_DR1__rMeanPSFMag',
 'PS1_DR1__iMeanPSFMag',
 'PS1_DR1__zMeanPSFMag',
 'PS1_DR1__yMeanPSFMag',
 'AllWISE__w1mpro',
 'AllWISE__w2mpro',
 'AllWISE__w3mpro',
 'AllWISE__w4mpro']

This takes significantly longer than inference (~10s) for the same number of sources, but it remains important to change our imputation strategy from its current state.

bfhealy added the enhancement New feature or request label Dec 20, 2022

bfhealy mentioned this issue Dec 20, 2022

EPIC Scope catalogue #54

Open

48 tasks

bfhealy linked a pull request Jan 4, 2023 that will close this issue

Consistent/custom imputation strategy #207

Merged

bfhealy closed this as completed in #207 Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update feature imputation strategy #201

Update feature imputation strategy #201

bfhealy commented Dec 20, 2022

bfhealy commented Jan 3, 2023

Update feature imputation strategy #201

Update feature imputation strategy #201

Comments

bfhealy commented Dec 20, 2022

bfhealy commented Jan 3, 2023