Data scaling: row vs. column and naming of methods #482

elcorto · 2023-09-12T11:49:57Z

The DataScaler docs say (from mala/datahandling/data_scaler.py)


    Parameters
    ----------
    typestring :  string
        Specifies how scaling should be performed.
        Options:

        - "None": No normalization is applied.
        - "standard": Standardization (Scale to mean 0,
          standard deviation 1)
        - "normal": Min-Max scaling (Scale to be in range 0...1)
        - "feature-wise-standard": Row Standardization (Scale to mean 0,
          standard deviation 1)
        - "feature-wise-normal": Row Min-Max scaling (Scale to be in range
          0...1)

The "Row" parts of feature-wise-* suggest that inputs $X$ (e.g. bispectrum descriptors) and outputs $Y$ (LDOS) have shape (n_features, n_samples), while the data in https://github.com/mala-project/test-data has (18, 18, 27, 94) ($X$) and (18, 18, 27, 11) ($Y$) and is, in test/scaling_test.py, reshaped to (18*18*27, n_features). Further, the DataScaler code does torch.mean(unscaled, 0, ...), so operates on columns (features, axis 0).
It should be clearly stated that the "normal" and "standard" modes operate on the whole array, rather than along the opposite axis of the one that the feature-wise- modes use, which users may assume.
Maybe "normal" should be renamed to something like "minmax", since "Normalization" is usually something different.

The text was updated successfully, but these errors were encountered:

elcorto added the documentation Improvements or additions to documentation label Sep 12, 2023

elcorto mentioned this issue Sep 12, 2023

Data scaling: API and documentation #483

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data scaling: row vs. column and naming of methods #482

Data scaling: row vs. column and naming of methods #482

elcorto commented Sep 12, 2023 •

edited

Data scaling: row vs. column and naming of methods #482

Data scaling: row vs. column and naming of methods #482

Comments

elcorto commented Sep 12, 2023 • edited

elcorto commented Sep 12, 2023 •

edited