Skip to content

Commit

Permalink
version 1.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
mlampros authored and cran-robot committed Jan 17, 2018
1 parent c0e11aa commit d73c90b
Show file tree
Hide file tree
Showing 10 changed files with 81 additions and 40 deletions.
8 changes: 4 additions & 4 deletions DESCRIPTION
Expand Up @@ -2,8 +2,8 @@ Package: ClusterR
Type: Package
Title: Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans and
K-Medoids Clustering
Version: 1.0.9
Date: 2017-11-30
Version: 1.1.0
Date: 2018-01-16
Author: Lampros Mouselimis <mouselimislampros@gmail.com>
Maintainer: Lampros Mouselimis <mouselimislampros@gmail.com>
BugReports: https://github.com/mlampros/ClusterR/issues
Expand All @@ -22,6 +22,6 @@ Suggests: testthat, covr, knitr, rmarkdown
VignetteBuilder: knitr
RoxygenNote: 6.0.1
NeedsCompilation: yes
Packaged: 2017-11-30 12:02:51 UTC; lampros
Packaged: 2018-01-16 15:22:47 UTC; lampros
Repository: CRAN
Date/Publication: 2017-11-30 13:00:05 UTC
Date/Publication: 2018-01-17 18:01:58 UTC
18 changes: 9 additions & 9 deletions MD5
@@ -1,23 +1,23 @@
4ba5c0512cea5b202d6aa4865b8d3d48 *DESCRIPTION
ace8b8631582c6ecac17bdd2655da786 *DESCRIPTION
c9c69a326502c31a65128654c3c08cb5 *LICENSE
1cf3e5d5262851d34692258a58312008 *NAMESPACE
76a37150f8b9bd10d49f44cebc4a1570 *NEWS.md
a526b34b4bf940e7fec070072b6e2d0c *NEWS.md
19f3ce55803fd8c849bdfa8fd3cc831b *R/RcppExports.R
afcc201781a94cc54334bf81ad399e07 *R/clustering_functions.R
7baeb6643966f872dd5ee7f39fc03149 *R/clustering_functions.R
ede10384907019d8669da63646557e31 *README.md
c752579b23f60b5b51aa1e39e309fffc *build/vignette.rds
e037a0892dccbc8c11badfea2e787664 *build/vignette.rds
8b6687f3bb9c58cd74ae67670872215d *data/dietary_survey_IBS.rda
aa58963ebd13c4c91edf809ca4efc5d4 *data/mushroom.rda
a9e3dfd8650ed7d2d0a91d3880a67f7b *data/soybean.rda
128fe5f74b8c6787e9c378b61bd8d423 *inst/doc/the_clusterR_package.R
86f33097b23d7a5acc0c83647a19d865 *inst/doc/the_clusterR_package.Rmd
bee08e59baf7f8da5a698c0607cae625 *inst/doc/the_clusterR_package.html
2891d5475073f815afc44c8b859ed5a1 *inst/doc/the_clusterR_package.html
94cd0d57e5b44e47a3746a7979a4088f *man/Clara_Medoids.Rd
c4f35a38b1a4271caf0a29ed616c45ba *man/Cluster_Medoids.Rd
853d94cb9bf0db952fee7ce47788baa5 *man/GMM.Rd
add088ce3a97143c6a7920c2a5d04df4 *man/KMeans_arma.Rd
bb8de9fbba458b06aff94061d2d3765e *man/KMeans_rcpp.Rd
66152d79ea97f9fc524403e8a8f4acc6 *man/MiniBatchKmeans.Rd
2219970918b3d69ad736f506c7b356b1 *man/MiniBatchKmeans.Rd
e0427f0525c0407bbdcb48ca5dc8f7ca *man/Optimal_Clusters_GMM.Rd
a0641377964d9bb3e2045614b50ffbd5 *man/Optimal_Clusters_KMeans.Rd
c84b2645d14b6c057736ea9015fcc582 *man/Optimal_Clusters_Medoids.Rd
Expand All @@ -38,11 +38,11 @@ cd805bfc471c3894870346627bb3b047 *man/predict_Medoids.Rd
6c290a2ae91a58e535ef1adac77e8e57 *man/tryCatch_GMM.Rd
9e21127499fb7ce22636287667ea3b34 *man/tryCatch_KMEANS_arma.Rd
62231fa9f6be487bdbf9f95625d39c0b *man/tryCatch_optimal_clust_GMM.Rd
16d006bf40b078243e18207776416cf7 *src/Makevars
18579fb24623b1813f3972b33fe55dac *src/Makevars.win
c0b0887d0e21dba427791387a69357a4 *src/Makevars
3924f33984346427452b9a43de77a4c8 *src/Makevars.win
ba1dafe01a73b757ea17e6c04dee7b39 *src/RcppExports.cpp
8be2c73e7a0e14a4500c68d9444cc5cd *src/init.c
0ba1c6644b1357bcce0fcb3aeb604886 *src/kmeans_miniBatchKmeans_GMM_Medoids.cpp
02132d8c61a2e864cbe93d557dd70829 *src/kmeans_miniBatchKmeans_GMM_Medoids.cpp
580cce096456bdd88fc875489a33be99 *src/utils_rcpp.cpp
76f46da484cec9e5763f21f16c309c50 *src/utils_rcpp.h
3243b3e7b85ca7953f191679629429ec *tests/testthat.R
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
@@ -1,4 +1,10 @@

## ClusterR 1.1.0

* I added the *DARMA_64BIT_WORD* flag in the Makevars file to allow the package processing big datasets
* I modified the *kmeans_miniBatchKmeans_GMM_Medoids.cpp* file and especially all *Rcpp::List::create()* objects to addrress the clang-ASAN errors.


## ClusterR 1.0.9

* I modified the *Optimal_Clusters_KMeans* function to return a vector with the *distortion_fK* values if criterion is *distortion_fK* (instead of the *WCSSE* values).
Expand Down
2 changes: 1 addition & 1 deletion R/clustering_functions.R
Expand Up @@ -811,7 +811,7 @@ Optimal_Clusters_KMeans = function(data, max_clusters, criterion = "variance_exp
#' \strong{random} : random selection of data rows as initial centroids
#'
#' @references
#' http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf
#' http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf, https://github.com/siddharth-agrawal/Mini-Batch-K-Means
#' @export
#' @examples
#'
Expand Down
Binary file modified build/vignette.rds
Binary file not shown.
12 changes: 6 additions & 6 deletions inst/doc/the_clusterR_package.html
Expand Up @@ -12,7 +12,7 @@

<meta name="author" content="Lampros Mouselimis" />

<meta name="date" content="2017-11-30" />
<meta name="date" content="2018-01-16" />

<title>Functionality of the ClusterR package</title>

Expand Down Expand Up @@ -70,7 +70,7 @@

<h1 class="title toc-ignore">Functionality of the ClusterR package</h1>
<h4 class="author"><em>Lampros Mouselimis</em></h4>
<h4 class="date"><em>2017-11-30</em></h4>
<h4 class="date"><em>2018-01-16</em></h4>



Expand Down Expand Up @@ -258,7 +258,7 @@ <h5>Mini-batch-kmeans</h5>
t =<span class="st"> </span>end <span class="op">-</span><span class="st"> </span>start

<span class="kw">cat</span>(<span class="st">'time to complete :'</span>, t, <span class="kw">attributes</span>(t)<span class="op">$</span>units, <span class="st">'</span><span class="ch">\n</span><span class="st">'</span>)</code></pre></div>
<pre><code>## time to complete : 2.275446 secs</code></pre>
<pre><code>## time to complete : 2.275485 secs</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">getcent_init =<span class="st"> </span>km_init<span class="op">$</span>centroids

getclust_init =<span class="st"> </span>km_init<span class="op">$</span>clusters
Expand Down Expand Up @@ -286,7 +286,7 @@ <h5>Mini-batch-kmeans</h5>
t =<span class="st"> </span>end <span class="op">-</span><span class="st"> </span>start

<span class="kw">cat</span>(<span class="st">'time to complete :'</span>, t, <span class="kw">attributes</span>(t)<span class="op">$</span>units, <span class="st">'</span><span class="ch">\n</span><span class="st">'</span>)</code></pre></div>
<pre><code>## time to complete : 0.8078771 secs</code></pre>
<pre><code>## time to complete : 0.8378835 secs</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">getcent_mb =<span class="st"> </span>km_mb<span class="op">$</span>centroids

new_im_mb =<span class="st"> </span>getcent_mb[pr_mb, ] <span class="co"># each observation is associated with the nearby centroid</span>
Expand Down Expand Up @@ -490,7 +490,7 @@ <h5>K-Medoids</h5>
t =<span class="st"> </span>end <span class="op">-</span><span class="st"> </span>start

<span class="kw">cat</span>(<span class="st">'time to complete :'</span>, t, <span class="kw">attributes</span>(t)<span class="op">$</span>units, <span class="st">'</span><span class="ch">\n</span><span class="st">'</span>)</code></pre></div>
<pre><code>## time to complete : 2.711706 secs</code></pre>
<pre><code>## time to complete : 3.194762 secs</code></pre>
<p><br></p>
<table>
<caption>hamming-Clara-Medoids</caption>
Expand Down Expand Up @@ -522,7 +522,7 @@ <h5>K-Medoids</h5>
t =<span class="st"> </span>end <span class="op">-</span><span class="st"> </span>start

<span class="kw">cat</span>(<span class="st">'time to complete :'</span>, t, <span class="kw">attributes</span>(t)<span class="op">$</span>units, <span class="st">'</span><span class="ch">\n</span><span class="st">'</span>)</code></pre></div>
<pre><code>## time to complete : 12.99718 secs</code></pre>
<pre><code>## time to complete : 16.61397 secs</code></pre>
<p><br></p>
<table>
<caption>hamming-Cluster-Medoids</caption>
Expand Down
2 changes: 1 addition & 1 deletion man/MiniBatchKmeans.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src/Makevars
@@ -1,4 +1,4 @@
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS)
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) -DARMA_64BIT_WORD
PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CXXFLAGS)
CXX_STD = CXX11
PKG_CPPFLAGS = -I../inst/include/
Expand Down
2 changes: 1 addition & 1 deletion src/Makevars.win
@@ -1,4 +1,4 @@
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS)
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) -DARMA_64BIT_WORD
PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CXXFLAGS) -mthreads
CXX_STD = CXX11
PKG_CPPFLAGS = -I../inst/include/
69 changes: 52 additions & 17 deletions src/kmeans_miniBatchKmeans_GMM_Medoids.cpp
Expand Up @@ -546,8 +546,10 @@ Rcpp::List evaluation_rcpp(arma::mat& data, arma::vec CLUSTER, bool silhouette =
Rcpp::List in_cluster_dist = INTRA_CLUSTER_DISS(data, tmp_clust);

if (!silhouette) {

arma::rowvec befout_CLUSTER = arma::conv_to< arma::rowvec >::from(CLUSTER);

return(Rcpp::List::create(Rcpp::Named("clusters") = arma::conv_to< arma::rowvec >::from(CLUSTER),
return(Rcpp::List::create(Rcpp::Named("clusters") = befout_CLUSTER,

Rcpp::Named("cluster_indices") = tmp_clust, // use the data indices, otherwise difficult to match clusters with silhouette or dissimilarity coefficients

Expand All @@ -557,8 +559,10 @@ Rcpp::List evaluation_rcpp(arma::mat& data, arma::vec CLUSTER, bool silhouette =
else {

Rcpp::List silhouet_out = SILHOUETTE_metric(data, CLUSTER, tmp_clust, in_cluster_dist);

arma::rowvec befout_CLUSTER = arma::conv_to< arma::rowvec >::from(CLUSTER);

return(Rcpp::List::create(Rcpp::Named("clusters") = arma::conv_to< arma::rowvec >::from(CLUSTER),
return(Rcpp::List::create(Rcpp::Named("clusters") = befout_CLUSTER,

Rcpp::Named("cluster_indices") = tmp_clust, // use the data indices, otherwise difficult to match clusters with silhouette or dissimilarity coefficients

Expand Down Expand Up @@ -949,12 +953,20 @@ Rcpp::List GMM_arma(arma::mat& data, int gaussian_comps, std::string dist_mode,
double n = timer.toc();

if (verbose) { Rcpp::Rcout << "\ntime to complete : " << n << "\n" << std::endl; }

arma::mat model_means = model.means.t();

arma::mat model_dcovs = model.dcovs.t();

arma::rowvec model_hefts = arma::conv_to< arma::rowvec >::from(model.hefts.t());

double model_avg_log_p = model.avg_log_p(data.t(), gaussian_comps - 1);

return Rcpp::List::create( Rcpp::Named("centroids") = model.means.t(), Rcpp::Named("covariance_matrices") = model.dcovs.t(), // each row of the 'covariance_matrices' is a different covariance matrix, use diag() to build each square diagonal matrix
return Rcpp::List::create( Rcpp::Named("centroids") = model_means, Rcpp::Named("covariance_matrices") = model_dcovs, // each row of the 'covariance_matrices' is a different covariance matrix, use diag() to build each square diagonal matrix

Rcpp::Named("weights") = model.hefts.t(), Rcpp::Named("Log_likelihood_raw") = loglik,
Rcpp::Named("weights") = model_hefts, Rcpp::Named("Log_likelihood_raw") = loglik,

Rcpp::Named("avg_Log_likelihood_DATA") = model.avg_log_p(data.t(), gaussian_comps - 1) );
Rcpp::Named("avg_Log_likelihood_DATA") = model_avg_log_p );
}


Expand Down Expand Up @@ -2261,7 +2273,18 @@ Rcpp::List ClusterMedoids(arma::mat& data, int clusters, std::string method, dou
}
}

if (clusters > 1) { silh_lst = silhouette_matrix(data, end_indices_vec + 1, end_cost_vec, threads); }
arma::mat befout_silhouette_matrix;

arma::mat befout_clustering_stats;

if (clusters > 1) {

silh_lst = silhouette_matrix(data, end_indices_vec + 1, end_cost_vec, threads);

befout_silhouette_matrix = Rcpp::as<arma::mat> (silh_lst[0]);

befout_clustering_stats = Rcpp::as<arma::mat> (silh_lst[1]);
}

arma::mat fuz_out;

Expand All @@ -2287,11 +2310,13 @@ Rcpp::List ClusterMedoids(arma::mat& data, int clusters, std::string method, dou
}
}

return Rcpp::List::create(Rcpp::Named("medoids") = end_idxs, Rcpp::Named("cost") = arma::accu(end_cost_vec), Rcpp::Named("dissimilarity_matrix") = data,
double end_cost_vec_scalar = arma::accu(end_cost_vec);

return Rcpp::List::create(Rcpp::Named("medoids") = end_idxs, Rcpp::Named("cost") = end_cost_vec_scalar, Rcpp::Named("dissimilarity_matrix") = data,

Rcpp::Named("clusters") = end_indices_vec, Rcpp::Named("end_cost_vec") = end_cost_vec, Rcpp::Named("silhouette_matrix") = silh_lst[0],
Rcpp::Named("clusters") = end_indices_vec, Rcpp::Named("end_cost_vec") = end_cost_vec, Rcpp::Named("silhouette_matrix") = befout_silhouette_matrix,

Rcpp::Named("fuzzy_probs") = fuz_out, Rcpp::Named("clustering_stats") = silh_lst[1], Rcpp::Named("flag_dissim_mat") = flag_dissim_mat);
Rcpp::Named("fuzzy_probs") = fuz_out, Rcpp::Named("clustering_stats") = befout_clustering_stats, Rcpp::Named("flag_dissim_mat") = flag_dissim_mat);
}


Expand Down Expand Up @@ -2550,14 +2575,20 @@ Rcpp::List ClaraMedoids(arma::mat& data, int clusters, std::string method, int s

bst_sample_silh_mat = Rcpp::as<arma::mat> (bst_lst[5]);
}

arma::rowvec clr_split_out_rowvec = arma::conv_to< arma::rowvec >::from(clr_split_out);

arma::mat fuz_and_stats_mt = Rcpp::as<arma::mat> (fuz_and_stats[1]);

fuz_st_mat = fuz_st_mat.t();

return Rcpp::List::create(Rcpp::Named("medoids") = subs_meds, Rcpp::Named("bst_dissimilarity") = dism, Rcpp::Named("medoid_indices") = out_medoid,

Rcpp::Named("sample_indices") = arma::conv_to< arma::rowvec >::from(clr_split_out), Rcpp::Named("clusters") = hard_clust,

Rcpp::Named("bst_sample_silhouette_matrix") = bst_sample_silh_mat, Rcpp::Named("fuzzy_probs") = fuz_and_stats[1],
Rcpp::Named("sample_indices") = clr_split_out_rowvec, Rcpp::Named("clusters") = hard_clust,

Rcpp::Named("clustering_stats") = fuz_st_mat.t(), Rcpp::Named("bst_sample_dissimilarity_matrix") = bst_sample_dissm_mat);
Rcpp::Named("bst_sample_silhouette_matrix") = bst_sample_silh_mat, Rcpp::Named("fuzzy_probs") = fuz_and_stats_mt,

Rcpp::Named("clustering_stats") = fuz_st_mat, Rcpp::Named("bst_sample_dissimilarity_matrix") = bst_sample_dissm_mat);
}


Expand Down Expand Up @@ -2645,18 +2676,22 @@ Rcpp::List split_rcpp_lst(Rcpp::List lst) {
}

double avg_intr_dis = arma::accu(intra_clust_disml);

double intra_clust_disml_scalar = avg_intr_dis; // return this value

double avg_width_silh = arma::accu(silhouet);
double avg_width_silh = arma::accu(silhouet);

avg_intr_dis /= silh_mat.n_rows;
avg_intr_dis /= silh_mat.n_rows; // modify this value

avg_width_silh /= silh_mat.n_rows;

bool silh_plot_boolean = true;

return Rcpp::List::create(Rcpp::Named("avg_intra_clust_dissimilarity") = avg_intr_dis,

Rcpp::Named("sum_intra_dissim") = arma::accu(intra_clust_disml), Rcpp::Named("avg_width_silhouette") = avg_width_silh,
Rcpp::Named("sum_intra_dissim") = intra_clust_disml_scalar, Rcpp::Named("avg_width_silhouette") = avg_width_silh,

Rcpp::Named("list_intra_dissm") = intra_dissm, Rcpp::Named("list_silhouette") = silhouet_lst, Rcpp::Named("silhouette_plot") = true);
Rcpp::Named("list_intra_dissm") = intra_dissm, Rcpp::Named("list_silhouette") = silhouet_lst, Rcpp::Named("silhouette_plot") = silh_plot_boolean);
}


Expand Down

0 comments on commit d73c90b

Please sign in to comment.