Skip to content

[SYSTEMDS-200] Gaussian Mixture Model builtin#976

Closed
Shafaq-Siddiqi wants to merge 4 commits into
apache:masterfrom
Shafaq-Siddiqi:GMM
Closed

[SYSTEMDS-200] Gaussian Mixture Model builtin#976
Shafaq-Siddiqi wants to merge 4 commits into
apache:masterfrom
Shafaq-Siddiqi:GMM

Conversation

@Shafaq-Siddiqi
Copy link
Copy Markdown
Contributor

New Builtin for Gaussian Mixture Model with four covariance types namely VVV, EEE, diag and spherical (please see the script header for more details).

Spark tests for using kmeans as initialization function are commented out due to a runtime exception in kmeans execution on the following instruction.

SPARK°rexpand°cast=true°max=_Var872°ignore=false°dir=cols°target=_mVar884°_mVar885·MATRIX·FP64

@Shafaq-Siddiqi Shafaq-Siddiqi changed the title [WIP]: [SYSTEMDS-200] Gaussian Mixture Model builtin [SYSTEMDS-200] Gaussian Mixture Model builtin Jun 18, 2020
private void runGMMTest(int G_mixtures, String model, String init_param, int iter, double reg, boolean rewrite,
LopProperties.ExecType instType) {
Types.ExecMode platformOld = setExecMode(instType);
OptimizerUtils.ALLOW_ALGEBRAIC_SIMPLIFICATION = rewrite;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark test passed after setting OptimizerUtils.ALLOW_ALGEBRAIC_SIMPLIFICATION to false.

Copy link
Copy Markdown
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Shafaq-Siddiqi,

I think this looks awesome, 🥇
Unfortunately I'm not that well versed in the internals of GMM so i hope the math is in order, but looking at the internal operations i really look forward to try it out on the compressed representation!

I have some minor comments and questions you that would be great if you could answer!

On the less important side, the script gmm.dml as it is now does not follow what i thought to be the coding guidelines.

  1. Variable inputs names should be CamelCase.
  2. Indentation should be tabs not double space.

I'm fine if you don't want to change it, but i really value if we could aim for some consistency between our scripts. I can also change my formatting to the settings you use.

I also can't help but notice that there are no doc entry added for this new operation, that would also be great!

Comment thread src/test/java/org/apache/sysds/test/functions/builtin/BuiltinGMMTest.java Outdated
Comment thread src/test/java/org/apache/sysds/test/functions/builtin/BuiltinGMMTest.java Outdated
Comment thread src/test/scripts/functions/transform/input/iris/iris.csv
Comment thread scripts/builtin/gmm.dml
Comment thread scripts/builtin/gmm.dml
{
resp = Rand(rows = nrow(X), cols=n_components)
resp = resp/rowSums(resp)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what other initialization methods could be applied and to what benefits?

Comment thread scripts/builtin/gmm.dml
Comment thread scripts/builtin/gmm.dml
else if (model == "VII")
cov_param = n_components
else
stop("invalid model expecting any of [VVV,EEE,VVI,VII], found "+model)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the previous validation step's stop, not catch this?

Comment thread scripts/builtin/gmm.dml
# NAME TYPE DEFAULT MEANING
# ---------------------------------------------------------------------------------------------
# X Double --- Matrix X
# n_components Integer 3 Number of n_components in the Gaussian mixture model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default n_components value is 1 in line 55.

@Shafaq-Siddiqi
Copy link
Copy Markdown
Contributor Author

Merged into Master.

@Shafaq-Siddiqi Shafaq-Siddiqi deleted the GMM branch August 5, 2020 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants