Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data scaling: row vs. column and naming of methods #482

Open
elcorto opened this issue Sep 12, 2023 · 0 comments
Open

Data scaling: row vs. column and naming of methods #482

elcorto opened this issue Sep 12, 2023 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@elcorto
Copy link
Member

elcorto commented Sep 12, 2023

The DataScaler docs say (from mala/datahandling/data_scaler.py)


    Parameters
    ----------
    typestring :  string
        Specifies how scaling should be performed.
        Options:

        - "None": No normalization is applied.
        - "standard": Standardization (Scale to mean 0,
          standard deviation 1)
        - "normal": Min-Max scaling (Scale to be in range 0...1)
        - "feature-wise-standard": Row Standardization (Scale to mean 0,
          standard deviation 1)
        - "feature-wise-normal": Row Min-Max scaling (Scale to be in range
          0...1)
  1. The "Row" parts of feature-wise-* suggest that inputs $X$ (e.g. bispectrum descriptors) and outputs $Y$ (LDOS) have shape (n_features, n_samples), while the data in https://github.com/mala-project/test-data has (18, 18, 27, 94) ($X$) and (18, 18, 27, 11) ($Y$) and is, in test/scaling_test.py, reshaped to (18*18*27, n_features). Further, the DataScaler code does torch.mean(unscaled, 0, ...), so operates on columns (features, axis 0).

  2. It should be clearly stated that the "normal" and "standard" modes operate on the whole array, rather than along the opposite axis of the one that the feature-wise- modes use, which users may assume.

  3. Maybe "normal" should be renamed to something like "minmax", since "Normalization" is usually something different.

@elcorto elcorto added the documentation Improvements or additions to documentation label Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant