Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make standardization of input optional in mlab.PCA #2752

Merged
merged 1 commit into from Jan 30, 2014
Merged

Make standardization of input optional in mlab.PCA #2752

merged 1 commit into from Jan 30, 2014

Conversation

aszilagyi
Copy link
Contributor

In principal component analysis, standardization of input data should be a user choice, depending on the problem at hand. The current version always standardizes, without offering a choice to the user. Therefore, I added an option to the PCA class where the user can specify whether the input data are to be standardized before performing the PCA. When standardize is set to False, only centering is performed, without dividing by sigma. The default is to standardize, to remain compatible with the current version.

In principal component analysis, standardization of input data should be a user choice, depending on the problem at hand. The current version always standardizes, without offering a choice to the user. Therefore, I added an option to the PCA class where the user can specify whether the input data are to be standardized before performing the PCA. When *standardize* is set to False, only centering is performed, without dividing by sigma. The default is to standardize, to remain compatible with the current version.
@tacaswell
Copy link
Member

This PR looks good 👍 to merge, but why does matplotlib have PCA code in it? Shouldn't this be handled in numpy or scipy?

@efiring
Copy link
Member

efiring commented Jan 23, 2014

@tacaswell Yes, PCA is the sort of thing that really doesn't belong in mpl. Same for spectral analysis; scipy would be the natural home. The contrary argument would be similar to the argument made for adding a statistics module to the Python standard library, which is that having such basic functionality readily available is a major convenience.

@WeatherGod
Copy link
Member

To provide a bit of historical context. The mlab module became a sort of
'stable" API that users could use during the whole churn of the
Numeric/NumPy transition. Also, I think mlab predates SciPy (I am not sure
about that, though). It also helped users who were transitioning over from
MatLab as these are tools that come standard with MatLab.

As for the question about adding a standardization option to the PCA
function, I would defer to whatever MatLab has available for its pca
function, as that is the primary reason for its existence.

@aszilagyi
Copy link
Contributor Author

Well, the matlab pca function does not have a standardize option, but it
does NOT standardize the data. It also has lots of other options. So it is
quite different from the mpl solution overall. BTW I think standardization
should not be the default; I only made it default so that existing code
does not produce different results with a new mpl version.

@efiring
Copy link
Member

efiring commented Jan 23, 2014

I agree that standardization is a poor default; if matlab does not even have a standardize option, it is an especially bad default.
@WeatherGod, actually the stable API for numpy/numarray/Numeric was numerix. I think mlab has always been more for matlab-like functions that require substantial computation apart from plotting.

tacaswell added a commit that referenced this pull request Jan 30, 2014
Make standardization of input optional in mlab.PCA
@tacaswell tacaswell merged commit 0511450 into matplotlib:master Jan 30, 2014
@tacaswell
Copy link
Member

Merged this as there was no protest and the failing tests are unrelated.

@aszilagyi aszilagyi deleted the patch-1 branch May 13, 2019 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants