Fixed topographic_error() and quantization_error() #55

wei-zhang-thz · 2019-12-06T20:29:56Z

Problems:

The previous topographic_error() method is incorrect. bmu_1 and bmu_2 are not the coordinates of the best two matching units.
The previous topographic_error() and quantization_error() uses explicit for-loops, which is very slow.

Fixes:

Fixed incorrect implementation of topographic_error() method.
Changed the topographic_error() and quantization_error() methods with vectorized implementation.

… the topographic_error() and quantization_error() methods with vectorized implementation.

JustGlowing · 2019-12-06T20:34:24Z

Hi there, thanks for your submission. I'll have a look next week.

JustGlowing · 2019-12-06T20:43:41Z

In the meanwhile, please make sure that the methods you are introducing are unit tested and make sure that pycodestyle doesn't flag any line.

wei-zhang-thz · 2019-12-06T21:38:56Z

Thanks. The original commit only works for numpy.ndarray type input. I fixed this problem and now it passes the unit test.

JustGlowing · 2019-12-07T05:40:26Z

That's amazing!

Can you report there speedup achieved on the quantization error?

Also, please don't disable any of the pylint checks.

wei-zhang-thz · 2019-12-07T07:44:18Z

The speed-up is very significant. On my PC, it's about 50 times faster. When the data is 300-by-50000, original methods take several minutes, while the new methods take seconds to complete.

The reason that I disabled the pylint check is because I encountered some issue similar to this: pylint-dev/pylint#2061. Not sure how to resolve it. Do you have suggestions on that?

JustGlowing · 2019-12-07T07:48:05Z

I would just avoid unpacking the two elements in a single line.

…

On Sat, Dec 7, 2019, 7:44 AM Wei Zhang ***@***.***> wrote: The speed-up is very significant. On my PC, it's about 50 times faster. When the data is 300-by-50000, original methods take several minutes, while the new methods take seconds to complete. The reason that I disabled the pylint check is because I encountered some issue similar to this: pylint-dev/pylint#2061 <pylint-dev/pylint#2061>. Not sure how to resolve it. Do you have suggestions on that? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#55?email_source=notifications&email_token=ABFTNGNBNVNSPZJMJ6EWQODQXNH5FA5CNFSM4JXAXIGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGGAQIY#issuecomment-562825251>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFTNGIK7VE6P4BGLA7YGY3QXNH5FANCNFSM4JXAXIGA> .

wei-zhang-thz · 2019-12-07T18:05:22Z

Please checkout the new version. Thanks!

minisom.py

JustGlowing · 2019-12-09T09:47:34Z

hi @wei-zhang-thz , I left my code review as promised. Let me know once you manage to have a look.

JustGlowing · 2019-12-09T10:24:40Z

Also, if you merge the new version of master in your code you'll find the test mentioned in a comment above.

wei-zhang-thz · 2019-12-10T09:58:00Z

Thanks for the review! I will resolve the problems in one or two days.

…ts() method.

wei-zhang-thz · 2019-12-11T08:30:22Z

Please check the new version.

Your suggestion on using an alternative way to implement _distance_from_weights() is indeed very simple (but has some problem). A correct implementation would be something like: norm(input_data[:, :, None] - weights_flat.T[None, :, :], axis=1). However, this implementation does not work well for big data because of memory consumption.

Assume m is number of data entries, n is number of nodes, and k is data dimension, then the runtime analysis of the two implementations of _distance_from_weights() are:
Time: O[mnk] for both versions
Space: my version: O[mk + nk + mn] your version: O[mnk]
You can see the big difference in the space complexity.

minisom.py

JustGlowing · 2019-12-11T09:42:21Z

pycodestyle gives flags some lines now:

$ pycodestyle minisom.py                                                  
minisom.py:401:1: W293 blank line contains whitespace
minisom.py:404:80: E501 line too long (80 > 79 characters)
minisom.py:553:80: E501 line too long (130 > 79 characters)
minisom.py:624:19: E702 multiple statements on one line (semicolon)

minisom.py

wei-zhang-thz · 2019-12-12T01:44:41Z

Please check the new version. Thanks!

JustGlowing · 2019-12-12T08:18:27Z

minisom.py

+        distances = self.som._distance_from_weights(data)
+        for i in range(len(data)):
+            for j in range(len(weights)):
+                assert(distances[i][j] == norm(data[i] - weights[j]))


this is a great test 👍

JustGlowing · 2019-12-12T08:25:12Z

I'm going to merge this now. 🎉 Thanks for your amazing contribution. You'll be mentioned in the notes of the next release and on a twitter announcement. Let me know if you have a twitter account.

wei-zhang-thz · 2019-12-12T08:46:42Z

Thank you! It's a great experience and I learned a lot from you. @835Aloha is my twitter account.

Fixed topographic_error() and quantization_error()

JustGlowing · 2020-07-23T16:46:50Z

hi there, I'm thinking about changing the license of Minisom to MIT (or GPL). This will allow Minisom to have a paper on the Journal of Open Source software. This WILL NOT affect the ownership of your contribution and does not imply that I will profit from Minisom. Your contribution was very welcome and I'd like to have your approval.

wei-zhang-thz · 2020-07-23T16:58:17Z

Hello, Thanks for letting me know this. I have no problem with the license changing.

…

On Thu, Jul 23, 2020 at 9:47 AM Giuseppe Vettigli ***@***.***> wrote: hi there, I'm thinking about changing the license of Minisom to MIT (or GPL). This will allow Minisom to have a paper on the Journal of Open Source software. This WILL NOT affect the ownership of your contribution and does not imply that I will profit from Minisom. Your contribution was very welcome and I'd like to have your approval. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL6XS4DSF6HKOYMVFEP2LQDR5BSQRANCNFSM4JXAXIGA> .

Fixed topographic_error() and quantization_error()

Fixed incorrect implementation of topographic_error() method. Changed…

be60efb

… the topographic_error() and quantization_error() methods with vectorized implementation.

Modified to pass the unit test and style check.

b3d86df

Fixed pylint problem.

d304ed5

JustGlowing reviewed Dec 9, 2019

View reviewed changes