Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emitting log file causes Python kernel crashing #53

Closed
mailology opened this issue Jun 4, 2021 · 3 comments
Closed

Emitting log file causes Python kernel crashing #53

mailology opened this issue Jun 4, 2021 · 3 comments
Assignees

Comments

@mailology
Copy link

When testing the Python code on MNIST data with PAM algorithm, adding verbosity = 1 causes issue on the kernel. In particular, the following code causes kernel crashing.

X = pd.read_csv('data/MNIST-1k.csv', sep=' ', header=None).to_numpy()
X_tsne = TSNE(n_components = 2).fit_transform(X)

kmed = KMedoids(n_medoids = 10, algorithm = "naive", verbosity = 1)
kmed.fit(X, 'L2', 10, "naive_v1_mnist_log")

The above code runs properly if the verbosity = 1 is removed. If we change the algorithm to "BanditPAM", the verbosity = 1 does not cause any issue and the log file is generated properly.

@motiwari
Copy link
Owner

motiwari commented Jul 3, 2021

Nice find, @mailology ! Think you can take a look?

@mailology
Copy link
Author

There is a crash because we forgot to update the swap loss logHelper.loss_swap in the swap part of the naive algorithm. Also, we forgot to update the number of swaps indicated by the variable steps. I have fixed it and tried the same example as above:
Screen Shot 2021-07-08 at 12 22 48 PM

It works now and emits the following log file:

Built:891,392,354,714,23,805,527,777,251,972
Swapped:694,168,306,714,324,959,527,800,251,737
Num Swaps: 10
Final Loss: 7.44375
Build Logstring:
		:compute_exactly
		:loss
		:p
		:sigma
Swap Logstring:
		:compute_exactly
		:loss
				0: 7.52346
				1: 7.50876
				2: 7.49285
				3: 7.48046
				4: 7.47164
				5: 7.46393
				6: 7.45825
				7: 7.44646
				8: 7.44375
				9: 7.44375
		:p
		:sigma

Since the build step is greedy and we go through all possible iterations, do we still want to give any loss information there?

@motiwari
Copy link
Owner

motiwari commented Jul 8, 2021

Love it!

And yes, we should fill all of those fields out, including for the build step, and the compute_exactly, p, sigma, etc. as in the prior logfiles: https://github.com/motiwari/BanditPAM-python/blob/master/profiles/MNIST_L2_k10_paper/L-ucb-True-BS-v-0-k-10-N-1000-s-42-d-MNIST-m-L2-w-

@motiwari motiwari added this to the 7-16-21 Milestone milestone Jul 9, 2021
motiwari added a commit that referenced this issue Jul 16, 2021
complete the logfile for naive algorithm. Fixes #53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants