Skip to content
Merged

Dev #21

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
5cae88f
Added unit test and created a easy opticluster function
GregJohnsonJr Apr 24, 2024
37ec06d
Release v0.0.1 (#1)
GregJohnsonJr Apr 29, 2024
931b70b
Add cpp test (#3)
GregJohnsonJr May 17, 2024
9efe436
RMD Check is able to run successfully!
GregJohnsonJr May 31, 2024
678d5a3
Correcting the paths of my cpp files, should fix the action errors.
GregJohnsonJr May 31, 2024
2c99c18
Update to the cluster command test fixture
GregJohnsonJr May 31, 2024
8f3cbc1
Modifying the test for opticluster
GregJohnsonJr Jun 3, 2024
626e70a
Ensuring everything works with c++11
GregJohnsonJr Jun 3, 2024
5b1bdb0
Removing code issues from cluster command
GregJohnsonJr Jun 3, 2024
e35e710
Adding the build ignore
GregJohnsonJr Jun 3, 2024
11a41dd
Founds some issue where I am using c++ 17 syntax and not 11.
GregJohnsonJr Jun 5, 2024
8fcff5d
Github action fixes, needed to update syntax towards cpp 11
GregJohnsonJr Jun 6, 2024
6eb79ec
Modified the testing structure by removing the "Opticluster returns p…
GregJohnsonJr Jun 6, 2024
50c3a7c
Fix cluster unit test (#5)
GregJohnsonJr Jun 10, 2024
b717404
Printing out the metrics after you perform a cluster and added a true…
GregJohnsonJr Jun 10, 2024
77ebc1c
Release polish (#6)
GregJohnsonJr Jun 14, 2024
58a4056
Added a depends for lazy-loading and other R related issues.
GregJohnsonJr Jun 14, 2024
d958121
More cluster features (#7)
GregJohnsonJr Jul 12, 2024
12aaa2e
Merge branch 'master' into dev
GregJohnsonJr Jul 12, 2024
2682ec5
The fix for github actions.
GregJohnsonJr Jul 12, 2024
b7a77be
Change to the include file.
GregJohnsonJr Jul 12, 2024
869656e
Removing srand from Utils, going to attempt to set seeds inside of R.
GregJohnsonJr Jul 12, 2024
fc8b722
Fix for race condition issue.
GregJohnsonJr Jul 15, 2024
8603183
Fix for RCMD check warnings
GregJohnsonJr Jul 15, 2024
79fb369
The fix for the windows version of RMD Check!
GregJohnsonJr Jul 16, 2024
1a4256b
Adding dependency for time.
GregJohnsonJr Jul 16, 2024
8e0ae22
Make shared (#9)
GregJohnsonJr Aug 30, 2024
7564d48
Forgot a unit test. (#10)
GregJohnsonJr Sep 3, 2024
3bd7dea
Fix results (#11)
GregJohnsonJr Sep 10, 2024
2be797e
Removing and fixing check issues.
GregJohnsonJr Sep 10, 2024
a445421
Fix compilation warnings (#12)
GregJohnsonJr Sep 11, 2024
e7d8625
Fix for negative index value
GregJohnsonJr Sep 11, 2024
ad47beb
Cleaning up build notes.
GregJohnsonJr Sep 11, 2024
ba93c19
Merge branch 'master' into dev
GregJohnsonJr Sep 11, 2024
e6a4c9f
lintr fixes
GregJohnsonJr Sep 11, 2024
386a0c7
Fix for lintr
GregJohnsonJr Sep 11, 2024
7ea9c0b
Read phylip files (#14)
GregJohnsonJr Sep 12, 2024
fa25af7
Initial push
GregJohnsonJr Sep 12, 2024
25a357d
Adding r documentation about mothur and clustur
GregJohnsonJr Sep 12, 2024
e6f00a8
Added functionality for column distance file reading!
GregJohnsonJr Sep 13, 2024
6f830c6
Column distance files work!
GregJohnsonJr Sep 13, 2024
3d8015f
Adding read column feature (#15)
GregJohnsonJr Sep 16, 2024
42162e1
Documentation (#16)
GregJohnsonJr Sep 16, 2024
d7dc294
Fix for opticluster clustering.
GregJohnsonJr Sep 16, 2024
37cdb7e
Fixing up the documentation
GregJohnsonJr Sep 16, 2024
48a0f38
I am getting the same number of bins!
GregJohnsonJr Sep 16, 2024
4c63f8c
example data
GregJohnsonJr Sep 17, 2024
58a7e8e
Fix for test error
GregJohnsonJr Sep 17, 2024
0d3e798
Testing values to RMD file
GregJohnsonJr Sep 17, 2024
3bddc53
Small changes
GregJohnsonJr Sep 18, 2024
d486604
Added sorting by bin size to cluster output and fixed the clustering …
GregJohnsonJr Sep 18, 2024
d89afb5
Modification to the test!
GregJohnsonJr Sep 18, 2024
60d16f5
Updates to test file
GregJohnsonJr Sep 18, 2024
acfbc9a
Cleaning up test
GregJohnsonJr Sep 18, 2024
585736e
Small change
GregJohnsonJr Sep 18, 2024
1623601
Method to check if each cluster exist in the dataframe
GregJohnsonJr Sep 21, 2024
d08a209
Using content paths instead of absolutes
GregJohnsonJr Sep 21, 2024
a745510
Create 96_sq_column_results_mac.list
GregJohnsonJr Sep 23, 2024
5af269d
Pushing results for different operating systems
GregJohnsonJr Sep 23, 2024
3cb0ba4
Updating documentation
GregJohnsonJr Sep 24, 2024
7c694e3
Added inst folders
GregJohnsonJr Sep 24, 2024
0b5a145
Update Cluster.R
GregJohnsonJr Sep 24, 2024
7eec4fb
Pushing the temporary fix!
GregJohnsonJr Sep 24, 2024
63d515e
Pushing spare_matrix data file
GregJohnsonJr Sep 24, 2024
6a5df20
Squashed commit of the following:
GregJohnsonJr Sep 24, 2024
a714b7d
Creating vignettes
GregJohnsonJr Sep 25, 2024
4a4fcfa
Created base pkgdown structure
GregJohnsonJr Sep 25, 2024
05c8a07
Base structure of documentation and website
GregJohnsonJr Sep 25, 2024
8c648bd
Small optimzation to clustur
GregJohnsonJr Sep 25, 2024
e87319e
Fixing unit test
GregJohnsonJr Sep 25, 2024
1251c3c
Removing comments
GregJohnsonJr Sep 25, 2024
2c642f3
Changed the name of the package to clustur
GregJohnsonJr Sep 25, 2024
3f04170
Removing unneeded data and fixing issue to validate count_table
GregJohnsonJr Sep 25, 2024
3f2d399
Fixing check errors.
GregJohnsonJr Sep 25, 2024
10c2ad9
Consistent randomization (#17)
GregJohnsonJr Sep 26, 2024
45ba179
Consistent randomization (#18)
GregJohnsonJr Sep 26, 2024
6d30db8
Squashed commit of the following:
GregJohnsonJr Sep 26, 2024
00b5013
Merge branch 'Documentation' into dev
GregJohnsonJr Sep 26, 2024
439668a
Documentation (#16) (#19) (#20)
GregJohnsonJr Sep 26, 2024
e2e52c4
Removing old vignette
GregJohnsonJr Sep 26, 2024
ea4d749
Adding additional documentation
GregJohnsonJr Sep 29, 2024
d62037b
Adding links (#22)
GregJohnsonJr Sep 30, 2024
012fc12
Moving RDS file
GregJohnsonJr Sep 30, 2024
3471ba2
Small changes to test
GregJohnsonJr Sep 30, 2024
7501509
Merge branch 'master' into dev
GregJohnsonJr Sep 30, 2024
c1cab5b
Adding a vignette, fixed the test that were failing, and removed old …
GregJohnsonJr Sep 30, 2024
a9e8095
Small change to test
GregJohnsonJr Sep 30, 2024
69b13c4
Pushing lintr fixes
GregJohnsonJr Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@
^pkgdown$
^vignettes/articles$
^\.vscode$
^\.lintr$
5 changes: 5 additions & 0 deletions .lintr
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
linters: linters_with_defaults() # see vignette("lintr")
encoding: "UTF-8"
exclusions: list(
"vignettes/articles/Using-clustur.Rmd"
)
176 changes: 90 additions & 86 deletions R/Cluster.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,42 +7,47 @@
#' @param shuffle a boolean to determine whether or
#' not you want to shuffle the data before you cluster
#' @param simularity_matrix are you using a simularity matrix or distance matrix
#' @param random_seed you can set your own random seed for consistent results, if not it will be set to 123
#' @param ... Either your phylip file or column file path, or a sparse distance matrix
#' @param random_seed you can set your own random
#' seed for consistent results, if not it will be set to 123
#' @param ... Either your phylip file or column file path,
#' or a sparse distance matrix
#' @description
#' You must specfiy the type of matrix you are inputting to cluster your object and we support three types:
#' You must specfiy the type of matrix you are inputting
#' to cluster your object and we support three types:
#' the path to your phylip and column distance file, or a sparse matrix.
#'
#'
#' @examples
#' # Using a sparse matrix
#' i_values <- as.integer(1:100)
#' j_values <- as.integer(sample(1:100, 100, TRUE))
#' x_values <- as.numeric(runif(100, 0, 1))
#' s_matrix <- Matrix::spMatrix(nrow=max(i_values),
#' ncol=max(i_values),
#' i=i_values,
#' j=j_values,
#' s_matrix <- Matrix::spMatrix(nrow=max(i_values),
#' ncol=max(i_values),
#' i=i_values,
#' j=j_values,
#' x=x_values)
#'
#'
#' # Creating a count table using the sparse matrix
#' count_table_sparse <- data.frame(sequence=as.character(i_values),
#' count_table_sparse <- data.frame(sequence=as.character(i_values),
#' total=rep(1,times=100))
#'
#' cluster_results <- opti_cluster(cutoff=0.2,
#'
#' cluster_results <- opti_cluster(cutoff=0.2,
#' count_table = count_table_sparse,
#' sparse_matrix=s_matrix)
#'
#'
#' # With a column file
#' count_table <- read.delim(example_path("amazon1.count_table"))
#' amazon_data_column <- opti_cluster(column_path=example_path("96_sq_column_amazon.dist"),
#' amazon_data_column <- opti_cluster(column_path=
#' example_path("96_sq_column_amazon.dist"),
#' count_table = count_table, cutoff = 0.2)
#' # With a phylip file
#' count_table <- read.delim(example_path("amazon1.count_table"))
#' amazon_data_phylip <- opti_cluster(phylip_path=example_path("98_sq_phylip_amazon.dist"),
#' amazon_data_phylip <- opti_cluster(phylip_path=
#' example_path("98_sq_phylip_amazon.dist"),
#' count_table = count_table, cutoff = 0.2)
#'
#'
#'
#'
#'
#'
#' @return A data.frame of the cluster and cluster metrics.
opti_cluster <- function(cutoff, count_table,
iterations = 100, shuffle = TRUE,
Expand All @@ -51,16 +56,15 @@ opti_cluster <- function(cutoff, count_table,
list_params <- list(...)
params <- names(list_params)
cluster_dfs <- list()
if("phylip_path" %in% params &&
"column_path" %in% params &&
"sparse_matrix" %in% params){

if ("phylip_path" %in% params &&
"column_path" %in% params &&
"sparse_matrix" %in% params) {
stop("You cannot use all three input paramters at once.
Use either phylip_path, column_path, or sparse_matrix.")
}
set.seed(random_seed)
if("sparse_matrix" %in% params)
{
if ("sparse_matrix" %in% params) {
sparse_matrix <- list_params$sparse_matrix
cluster_dfs <- MatrixToOpiMatrixCluster(
sparse_matrix@i,
Expand All @@ -72,8 +76,7 @@ opti_cluster <- function(cutoff, count_table,
shuffle,
simularity_matrix
)
}
else if("phylip_path" %in% params) {
} else if ("phylip_path" %in% params) {
phylip_path <- list_params$phylip_path
cluster_dfs <- OptiClusterPhylip(
phylip_path,
Expand All @@ -83,8 +86,7 @@ opti_cluster <- function(cutoff, count_table,
shuffle,
simularity_matrix
)
}
else if("column_path" %in% params) {
} else if ("column_path" %in% params) {
column_path <- list_params$column_path
cluster_dfs <- OptiClusterColumnDist(
column_path,
Expand All @@ -94,30 +96,29 @@ opti_cluster <- function(cutoff, count_table,
shuffle,
simularity_matrix
)
}
else {
} else {
stop("The parameters should include either a sparse_matrix,
phylip_path, column_path")
}
cluster_dfs[[4]]$comma_count <- sapply(cluster_dfs[[4]]$bins, function(x){
ls <- gregexpr(",", x, fixed=TRUE)[[1]]
if(ls[[1]] == -1){
cluster_dfs[[4]]$comma_count <- sapply(cluster_dfs[[4]]$bins, function(x) {
ls <- gregexpr(",", x, fixed = TRUE)[[1]]
if (ls[[1]] == -1) {
return(0)
}
else{
} else {
return(length(ls))
}
})
cluster_dfs[[4]] <- cluster_dfs[[4]][order(cluster_dfs[[4]]$comma_count, decreasing = T), ]
cluster_dfs[[4]] <- cluster_dfs[[4]][,1:3]
cluster_dfs[[4]] <- cluster_dfs[[4]][order(cluster_dfs[[4]]$comma_count,
decreasing = TRUE), ]
cluster_dfs[[4]] <- cluster_dfs[[4]][, 1:3]
opticluster_data <- list(
abundance = cluster_dfs[[1]],
cluster = cluster_dfs[[4]],
cluster_metrics = cluster_dfs[[3]],
other_cluster_metrics = cluster_dfs[[2]]
)
return(opticluster_data)
}
}

#' Cluster Description
#'
Expand All @@ -129,99 +130,100 @@ opti_cluster <- function(cutoff, count_table,
#' furthest, nearest, average, weighted.
#' @param count_table A table of names and the given abundance per group.
#' @param simularity_matrix are you using a simularity matrix or distance matrix
#' @param random_seed you can set your own random seed for consistent results, if not it will be set to 123
#' @param ... Either your phylip file or column file path, or a sparse distance matrix
#' @param random_seed you can set your own random seed
#' for consistent results, if not it will be set to 123
#' @param ... Either your phylip file or column file path,
#' or a sparse distance matrix
#' @description
#' You must specfiy the type of matrix you are inputting to cluster your object and we support three types:
#' You must specfiy the type of matrix you are inputting
#' to cluster your object and we support three types:
#' the path to your phylip and column distance file, or a sparse matrix.
#' @return A string of the given cluster.
#'
#'
#' @examples
#' # Using a sparse matrix
#' i_values <- as.integer(1:100)
#' j_values <- as.integer(sample(1:100, 100, TRUE))
#' x_values <- as.numeric(runif(100, 0, 1))
#' s_matrix <- Matrix::spMatrix(nrow=max(i_values),
#' ncol=max(i_values),
#' i=i_values,
#' j=j_values,
#' s_matrix <- Matrix::spMatrix(nrow=max(i_values),
#' ncol=max(i_values),
#' i=i_values,
#' j=j_values,
#' x=x_values)
#'
#'
#' # Creating a count table using the sparse matrix
#' count_table_sparse <- data.frame(sequence=as.character(i_values),
#' count_table_sparse <- data.frame(sequence=as.character(i_values),
#' total=rep(1,times=100))
#' # furthest method
#' cluster_results <- cluster(cutoff=0.2, count_table = count_table_sparse,
#' cluster_results <- cluster(cutoff=0.2, count_table = count_table_sparse,
#' sparse_matrix=s_matrix, method="furthest")
#'
#'
#' # With a phylip file and nearest methods
#' count_table <- read.delim(example_path("amazon1.count_table"))
#' amazon_data_phylip <- cluster(phylip_path=example_path("98_sq_phylip_amazon.dist"),
#' count_table = count_table, method="nearest", cutoff = 0.2)
#'
#' # With a column file and average methods
#' amazon_data_column <- cluster(column_path=example_path("96_sq_column_amazon.dist"),
#' count_table = count_table, method="average", cutoff = 0.2)
#'
#' amazon_data_phylip <- cluster(phylip_path=
#' example_path("98_sq_phylip_amazon.dist"),
#' count_table = count_table, method="nearest", cutoff = 0.2)
#'
#' # With a column file and average methods
#' amazon_data_column <- cluster(column_path=
#' example_path("96_sq_column_amazon.dist"),
#' count_table = count_table, method="average", cutoff = 0.2)
#'
#' # Weighted method
#' amazon_data_column <- cluster(column_path=example_path("96_sq_column_amazon.dist"),
#' count_table = count_table, method="weighted", cutoff = 0.2)
#'
#'
cluster <- function(cutoff, method,
count_table, simularity_matrix = FALSE, random_seed = 123, ...) {
#' amazon_data_column <- cluster(column_path=
#' example_path("96_sq_column_amazon.dist"),
#' count_table = count_table, method="weighted", cutoff = 0.2)
#'
#'
cluster <- function(cutoff, method, count_table,
simularity_matrix = FALSE, random_seed = 123, ...) {
list_params <- list(...)
params <- names(list_params)
cluster_dfs <- list()
if("phylip_path" %in% params &&
"column_path" %in% params &&
"sparse_matrix" %in% params){
if ("phylip_path" %in% params &&
"column_path" %in% params &&
"sparse_matrix" %in% params) {
stop("You cannot use all three input paramters at once.
Use either phylip_path, column_path, or sparse_matrix.")
}
set.seed(random_seed)
if("sparse_matrix" %in% params)
{
if ("sparse_matrix" %in% params) {
sparse_matrix <- list_params$sparse_matrix
cluster_dfs <- ClassicCluster(
sparse_matrix@i, sparse_matrix@j,
sparse_matrix@x, cutoff, method,
validate_count_table(count_table),
simularity_matrix
)
}
else if("phylip_path" %in% params) {
} else if ("phylip_path" %in% params) {
phylip_path <- list_params$phylip_path
cluster_dfs <- ClusterWithPhylip(
phylip_path, cutoff, method,
validate_count_table(count_table),
simularity_matrix
)
}
else if("column_path" %in% params) {
} else if ("column_path" %in% params) {
column_path <- list_params$column_path
cluster_dfs <- ClusterWithColumn(
column_path, cutoff, method,
validate_count_table(count_table),
simularity_matrix
)
}
else {
} else {
stop("The parameters should include either a sparse_matrix,
phylip_path, column_path")
}

cluster_dfs[[2]]$comma_count <- sapply(cluster_dfs[[2]]$bins, function(x){
ls <- gregexpr(",", x, fixed=TRUE)[[1]]
if(ls[[1]] == -1){
cluster_dfs[[2]]$comma_count <- sapply(cluster_dfs[[2]]$bins, function(x) {
ls <- gregexpr(",", x, fixed = TRUE)[[1]]
if (ls[[1]] == -1) {
return(0)
}
else{
} else {
return(length(ls))
}
})
cluster_dfs[[2]] <- cluster_dfs[[2]][order(cluster_dfs[[2]]$comma_count, decreasing = T), ]
cluster_dfs[[2]] <- cluster_dfs[[2]][,1:3]
cluster_dfs[[2]] <- cluster_dfs[[2]][order(cluster_dfs[[2]]$comma_count,
decreasing = TRUE), ]
cluster_dfs[[2]] <- cluster_dfs[[2]][, 1:3]

return(list(
abundance = cluster_dfs[[1]],
Expand All @@ -243,14 +245,16 @@ validate_count_table <- function(count_table_df) {


#' Example Path
#'
#'
#' @export
#' This function was created as a helper function to generate file paths to our internal data. You are able to access this function if you want to follow along with the example.
#' This function was created as a helper function to generate file paths to our
#' internal data. You are able to access this function if you
#' want to follow along with the example.
#' @param file The data of the path you are looking to find.
#' @examples
#' # This will return the path to our example file
#' example_path("98_sq_phylip_amazon.dist")
#'
#'
#' @return the path inside of the package of the file.
example_path <- function(file = NULL) {
path <- ""
Expand All @@ -260,4 +264,4 @@ example_path <- function(file = NULL) {
path <- system.file("extdata", file, package = "clustur", mustWork = TRUE)
}
return(path)
}
}
17 changes: 0 additions & 17 deletions man/validate_count_table.Rd

This file was deleted.

4 changes: 0 additions & 4 deletions src/Utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,6 @@
#include <sstream>
#include <unordered_set>

Utils::Utils() {
constexpr long long seed = 19760620;
mersenne_twister_engine.seed(seed);
}

void Utils::mothurRandomShuffle(std::vector<int>& randomize){
Rcpp::IntegerVector randomValues = Rcpp::wrap(randomize);
Expand Down
2 changes: 1 addition & 1 deletion src/test-matrix_adapter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ context("MatrixAdapter Test") {
}
test_that("Matrix Adapter can create proper square matrices from distance matrices") {
MatrixAdapterTestFixture fixture;
bool result = fixture.TestDistanceMatrixToSquareMatrix(5);
bool result = fixture.TestDistanceMatrixToSquareMatrix(6);
expect_true(result);
result = fixture.TestDistanceMatrixToSquareMatrix(0);
expect_false(result);
Expand Down
Loading