Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed topographic_error() and quantization_error() #55

Merged
merged 7 commits into from Dec 12, 2019
Merged

Fixed topographic_error() and quantization_error() #55

merged 7 commits into from Dec 12, 2019

Conversation

wei-zhang-thz
Copy link
Contributor

Problems:

  • The previous topographic_error() method is incorrect. bmu_1 and bmu_2 are not the coordinates of the best two matching units.
  • The previous topographic_error() and quantization_error() uses explicit for-loops, which is very slow.

Fixes:

  • Fixed incorrect implementation of topographic_error() method.
  • Changed the topographic_error() and quantization_error() methods with vectorized implementation.

… the topographic_error() and quantization_error() methods with vectorized implementation.
@JustGlowing
Copy link
Owner

Hi there, thanks for your submission. I'll have a look next week.

@JustGlowing
Copy link
Owner

In the meanwhile, please make sure that the methods you are introducing are unit tested and make sure that pycodestyle doesn't flag any line.

@wei-zhang-thz
Copy link
Contributor Author

Thanks. The original commit only works for numpy.ndarray type input. I fixed this problem and now it passes the unit test.

@JustGlowing
Copy link
Owner

That's amazing!

Can you report there speedup achieved on the quantization error?

Also, please don't disable any of the pylint checks.

@wei-zhang-thz
Copy link
Contributor Author

The speed-up is very significant. On my PC, it's about 50 times faster. When the data is 300-by-50000, original methods take several minutes, while the new methods take seconds to complete.

The reason that I disabled the pylint check is because I encountered some issue similar to this: pylint-dev/pylint#2061. Not sure how to resolve it. Do you have suggestions on that?

@JustGlowing
Copy link
Owner

JustGlowing commented Dec 7, 2019 via email

@wei-zhang-thz
Copy link
Contributor Author

Please checkout the new version. Thanks!

minisom.py Outdated Show resolved Hide resolved
minisom.py Outdated Show resolved Hide resolved
minisom.py Outdated Show resolved Hide resolved
minisom.py Outdated Show resolved Hide resolved
minisom.py Outdated Show resolved Hide resolved
minisom.py Outdated Show resolved Hide resolved
@JustGlowing
Copy link
Owner

hi @wei-zhang-thz , I left my code review as promised. Let me know once you manage to have a look.

@JustGlowing
Copy link
Owner

JustGlowing commented Dec 9, 2019

Also, if you merge the new version of master in your code you'll find the test mentioned in a comment above.

@wei-zhang-thz
Copy link
Contributor Author

Thanks for the review! I will resolve the problems in one or two days.

@wei-zhang-thz
Copy link
Contributor Author

Please check the new version.

Your suggestion on using an alternative way to implement _distance_from_weights() is indeed very simple (but has some problem). A correct implementation would be something like: norm(input_data[:, :, None] - weights_flat.T[None, :, :], axis=1). However, this implementation does not work well for big data because of memory consumption.

Assume m is number of data entries, n is number of nodes, and k is data dimension, then the runtime analysis of the two implementations of _distance_from_weights() are:
Time: O[mnk] for both versions
Space: my version: O[mk + nk + mn] your version: O[mnk]
You can see the big difference in the space complexity.

minisom.py Outdated Show resolved Hide resolved
minisom.py Outdated Show resolved Hide resolved
@JustGlowing
Copy link
Owner

pycodestyle gives flags some lines now:

$ pycodestyle minisom.py                                                  
minisom.py:401:1: W293 blank line contains whitespace
minisom.py:404:80: E501 line too long (80 > 79 characters)
minisom.py:553:80: E501 line too long (130 > 79 characters)
minisom.py:624:19: E702 multiple statements on one line (semicolon)

minisom.py Outdated Show resolved Hide resolved
@wei-zhang-thz
Copy link
Contributor Author

Please check the new version. Thanks!

distances = self.som._distance_from_weights(data)
for i in range(len(data)):
for j in range(len(weights)):
assert(distances[i][j] == norm(data[i] - weights[j]))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great test 👍

@JustGlowing
Copy link
Owner

I'm going to merge this now. 🎉 Thanks for your amazing contribution. You'll be mentioned in the notes of the next release and on a twitter announcement. Let me know if you have a twitter account.

@JustGlowing JustGlowing merged commit 0f997a9 into JustGlowing:master Dec 12, 2019
@wei-zhang-thz
Copy link
Contributor Author

Thank you! It's a great experience and I learned a lot from you. @835Aloha is my twitter account.

Manciukic pushed a commit to Manciukic/xpysom that referenced this pull request Jun 28, 2020
Fixed topographic_error() and quantization_error()
@JustGlowing
Copy link
Owner

hi there, I'm thinking about changing the license of Minisom to MIT (or GPL). This will allow Minisom to have a paper on the Journal of Open Source software. This WILL NOT affect the ownership of your contribution and does not imply that I will profit from Minisom. Your contribution was very welcome and I'd like to have your approval.

@wei-zhang-thz
Copy link
Contributor Author

wei-zhang-thz commented Jul 23, 2020 via email

JustGlowing added a commit that referenced this pull request Apr 26, 2021
Fixed topographic_error() and quantization_error()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants