-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving SINGE compatibility #33
Comments
Hi Atul,
|
Thanks.
|
Thanks. We will update in our paper. We did a parameter search to optimise the AUPRC or early precision, rank SINGE with these parameter values, and then used the results in the ensemble. So there was only one level of ensemble. @adyprat Please correct me if I am wrong. |
So if I understand correctly, you choose the best performing hyperparameter for SINGE and then do the ensemble approach? |
Hi @atuldeshpande, Thank you for getting back to us. I'm currently trying to integrate the latest version of SINGE into BEELINE. However, I ran into some issues getting the Docker image working. Could you please help me with that? Thanks, |
@adyprat thanks for looking at the work-in-progress Docker image. We can help with troubleshooting in your new issue gitter-lab/SINGE#41. Thanks for your willingness to help propagate the SINGE name change as well. The name change generates confusion but was in response to a specific reviewer request. |
Hi @atuldeshpande , To give you some more detail, we used two different values for lambda: {0, 0.01}; five different dT-numlags combinations: {(3,5), (5,9), (9, 5), (5 ,15), (15, 5)}; and four kernel-width values: {0.5, 1, 2, 4}. We also varied both the prob-zero-removal and the prob-remove-sample parameters in {0 , 0.2} and the number of replicates in: {2, 6}. We choose these parameter values as our initial exploratory analysis showed that increasing lambda over 0.01 to other values used in SINGE paper (0.02, 0.05, 1) often resulted in decreased performance or very few or no output edges. We did not see any improvement in increasing the number of replicates to 10 either. These choices resulted in a total of 320 different hyperparameter sets. We then ran SINGE with each parameter set to find one that achieved the highest median AUPRC, with the median computed for the Linear synthetic network across the 10 simulated datasets we created for each number of cells (100, 200, 500, 2000 and 5000). Thus, for each number of cells, we obtained one best parameter set. We used these parameter sets for the datasets with the corresponding number of cells for every other synthetic network. We repeated this process for the 10 datasets without dropouts obtained from each of four curated models, obtaining one best parameter set per curated model. Thus, for every synthetic network-number of cells pair and for every curated model, we had one SINGE output network, which we combined in ensembles with the output networks from every other algorithm. We do understand that your approach using the hyperparameter ensemble (Modified Borda Aggregation) is different. We used our approach for the following reasons:
That said, we did investigate if an ensemble built on the outputs from each of the 12 GRN inference methods would result in a higher AUPRC. We did not find that these ensembles improve the best performing algorithm in a systematic way here either. Please note that the AUPRC results (Figure 3 of BEELINE paper) have changed since our bioRxiv preprint as we have modified our BoolODE simulations since then. I hope this reply clarifies the issue of why we chose the best performing hyperparameter set as opposed to computing ensembles over hyperparameters. |
Hi @adyprat, thanks for the detailed reply. |
The blue box plots are from the same datasets that we used for the optimal hyperparameter search. |
@adyprat @tmmurali we recently released version 0.4.0 of SINGE, which is the first version with a complete Docker image. I wanted to resume our discussion of whether we should try to more fully integrate SINGE with BEELINE. The primary question is whether you are interested in using the Docker image we build. I envision that your If that makes sense to you, we could discuss what other parts, like |
Please see bace690 for the integration of the SINGE Dockerfile and SINGE.sh executable into BEELINE. Because the input to SINGE.sh for gene expression and psuedotime data are MATLAB matfiles, BEELINE still needs the equivalent of the previous executable in runSINGE to convert a CSV count matrix to matfiles before calling SINGE.sh. It might be more convenient if SINGE.sh were to alternatively support input for X, ptime, and gene_list in a different format such as CSV. |
@ktakers thank you for taking the time to update the SINGE version and BEELINE wrapper code. I can see it was not easy to rewrite the wrappers. I finally reviewed bace690 and ba00892 and have a few questions and comments:
We agree. We have an open issue for that (gitter-lab/SINGE#11) but no timeline for resolving it. |
@agitter, thank you for following up and reviewing the commits.
Thank you, I updated the Dockerfile to point to v0.4.1: 487d25e .
At this line https://github.com/gitter-lab/SINGE/blob/master/code/iLasso_for_SINGE.m#L56 , values of m.fullKp are written into a copy of the input mat file before a variable declaration for fullKp, and MATLAB should infer that fullKp is a struct array. This works as expected when using version 7.3 input mat files written by MATLAB, which support partial writes of each value without loading fullKp into memory https://www.mathworks.com/help/matlab/import_export/load-parts-of-variables-from-mat-files.html . As I understand it, the issue I ran into when using a version 7 input mat file written by Octave, which does not support writing version 7.3 mat files, is that MATLAB first loads the fullKp variable into memory, and because that variable hasn't been declared yet MATLAB initializes it, but at that point MATLAB fails to correctly infer that the type should be a struct array. The actual error I get is that the Kp2 struct cannot be assigned to a value in m.fullKp, that full error message copied below: " Error using iLasso_for_SINGE (line 56) The workaround I implemented in that commit was to initialize the fullKp variable as a struct array in the input mat file, which appears to work but obviously it is not ideal to initialize a specific internal variable that could change in another SINGE version. I think this might be addressed in SINGE to support a version 7 mat file written by Octave by either initializing fullKp as a struct array before that write, loading and saving the input file as a version 7.3 mat file instead of directly copying to a temporary file when parsing the input, or otherwise writing that variable somewhere other than the mat file. |
@ktakers thanks for the additional details about the mat file v7 issues. We opened gitter-lab/SINGE#59 and have an idea on how to fix that so Octave can be used to generate the SINGE input files. |
@ktakers thanks for reporting the issues with the mat file v7 formatted inputs and file separators in the output directory. The SINGE v0.5.0 release fixes those bugs. We're also starting to implement support for running SINGE with plain text inputs instead of mat files, but that didn't make it into this release. |
@agitter Thank you for the updated version. I will plan to add the new 0.5.0 release to BEELINE, remove the workaround that I had implemented for the mat file v7 issue when running SINGE in BEELINE, and update this issue. |
Hi All,
Thanks for using SINGE (rebranded from SCINGE) in this paper. We read your paper recently, and some of the benchmarking figures are visuall very appealing. We have been updating SINGE quite a lot in the past couple of months, and wanted to update you on the same.
In the recent versions, we have also added guidance about parameter combinations and ensembling. This includes an updated default_hyperparameters.txt file which ensures that the first instinct of a new user to run would be to replicate the hyperparameters used in our paper. For users who would like to generate their own hyperparameter files, we have added s few scripts to do so in the scripts folder.
Based on your workflow, as well as dynverse, we were inspired to support our own Docker image, which is nearing its completion. We'd like to coordinate so that BEELINE users can have access to the latest version of SINGE as we continue our development.
By the way, one key attribute of SINGE is the aggregation stage on the ensemble generated. Do you think providing the aggregated scores in a matrix form (in addition to the ranked lists) would make it easier for the users to evaluate SINGE using BEELINE?
(I realize that the last couple of points may also be pertinent to #15 )
The text was updated successfully, but these errors were encountered: