Skip to content

2. Software Usage

Haoliang Xue edited this page Mar 17, 2024 · 5 revisions

General usage

Usage within container

We recommande using KaMRaT within apptainer (previous singularity) container:

apptainer exec -B /bind_src:/bind_des kamrat <CMD> [options] /path/from/{bind_des}/to/input/kmer/table 
    # <CMD> can be one of index, filter, mask, merge, score, query
    # replace "apptainer" to "singularity" when KaMRaT is built by singularity

The -B option is to bind disk partitions to apptainer image, please check apptainer helper for details:

apptainer exec -h

Usage after building from source

If built from source, KaMRaT can be run by:

/path/to/KaMRaT/kamrat/bin/in/app/directory <CMD> [options] /path/to/input/kmer/table 
    # <CMD> can be one of index, filter, mask, merge, score, query

Usage by operations

In the following sections, we present under the situation of using KaMRaT in apptainer.

For two alternative situations:

  • to run KaMRaT within singularity container, please simply replace the keyword apptainer by singularity;
  • to run KaMRaT after building from source, please replace the leading apptainer exec -B /bind_src:/bind_des by the path to KaMRaT binary file (in the app/ folder).

KaMRaT helper

KaMRaT's top-level helper is accessible by typing one of these commands:

apptainer exec kamrat
apptainer exec kamrat -h
apptainer exec kamrat -help

Helpers of each KaMRaT modules are accessible via one of these commands:

apptainer exec kamrat <CMD>
apptainer exec kamrat <CMD> -h
apptainer exec kamrat <CMD> -help
    # <CMD> can be one of index, filter, mask, merge, score, query

KaMRaT index: index feature count table on disk

[USAGE]    kamrat index -intab STR -outdir STR [-klen INT -unstrand -nfbase INT]

[OPTION]   -h, -help      Print the helper
           -intab STR     Input table for index, mandatory
           -outdir STR    Output index directory, mandatory
           -klen          k-mer length, mandatory if features are k-mer
                              if present, indexation will be switched to k-mer mode
           -unstrand      Unstranded mode, indexation with canonical k-mers
                              if present, indexation will be switched to k-mer mode
           -nfbase INT    Base for calculating normalization factor, not compatible with -nffile STR
                              normCount_ij <- INT * rawCount_ij / sum_i{rawCount_ij}
                              if not provided, input counts will not be normalized
           -nffile STR    File for loading normalization factor, not compatible with -nfbase INT
                              a tab-separated row of normalization factors, same order as table header

KaMRaT filter: filter feature by expression level

[USAGE]    kamrat filter -idxdir STR -design STR [-upmin INT1:INT2 -downmax INT1:INT2 -reverse -outpath STR -withcounts]

[OPTION]    -h,-help              Print the helper
            -idxdir STR           Indexing folder by KaMRaT index, mandatory
            -design STR           Path to filter design file, a table of two columns, mandatory
                                      the first column indicate sample names
                                      the second column should be either UP or DOWN (capital letters)
                                          samples with UP will be considered as up-regulated samples
                                          samples with DOWN will be considered as down-regulated samples
                                          samples not given will be neutral (not considered for filter)
                                          samples can also be all UP or all DOWN
            -upmin INT1:INT2      Up feature lower bound, [1:1, meaning no filter]
                                      output features counting >= INT1 in >= INT2 UP-samples
            -downmax INT1:INT2    Down feature upper bound [inf:1, meaning no filter]
                                      output features counting <= INT1 in >= INT2 DOWN-samples
            -reverse              Reverse filter, to remove eligible features [false]
            -outpath STR          Path to results after filter
                                      if not provided, output to screen
            -withcounts           Output sample count vectors [false]

KaMRaT mask: mask k-mers from matrix

[USAGE]    kamrat mask -idxdir STR -fasta STR [-reverse -outpath STR -withcounts]

[OPTION]    -h,-help         Print the helper
            -idxdir STR      Indexing folder by KaMRaT index, mandatory
            -fasta STR       Sequence fasta file as the mask, mandatory
            -reverse         Reverse mask, to select the k-mers in sequence fasta file [false]
            -outpath STR     Path to extension results
                                 if not provided, output to screen
            -withcounts      Output sample count vectors [false]

KaMRaT merge: extend k-mers into contigs

[USAGE]    kamrat merge -idxdir STR -overlap MAX-MIN [-with STR1[:STR2] -interv STR[:FLOAT] -min-nbkmer INT -outpath STR -withcounts STR]

[OPTION]    -h,-help               Print the helper
            -idxdir STR            Indexing folder by KaMRaT index, mandatory
            -overlap MAX-MIN       Overlap range for extension, by default: from (k-1) to ⌊k/2⌋
                                       MIN and MAX are integers, MIN <= MAX < k-mer length
            -with STR1[:STR2]      File indicating k-mers to be extended (STR1) and rep-mode (STR2)
                                       if not provided, all indexed k-mers are used for extension
                                       in the file STR1, a supplementary column of rep-value can be provided
                                       STR2 can be one of {min, minabs, max, maxabs} [min]
            -interv STR[:FLOAT]    Intervention method for extension [pearson:0.20]
                                       can be one of {none, pearson, spearman, mac}
                                       the threshold may follow a ':' symbol
            -min-nbkmer INT        Minimal length of extended contigs [0]
            -outpath STR           Path to extension results
                                       if not provided, output to screen
            -withcounts STR        Output sample count vectors, STR can be one of [mean, median]
                                       if not provided, output without count vector

Three intervention methods are available for choice:

  • pearson: Pearson distance, i.e., 0.5 * [1 - pearson.correlation(x, y)]
  • spearman: Spearman distance, i.e., 0.5 * [1 - spearman.correlation(x, y)]
  • mac: mean absolute contrast, as described in [Nguyen, H. T., et al., 2021]

The threshold controlling these distances can be given between [0, 1], where 0 indicates the most strict case and 1 indicates the most permissive case (equivalent to none).

KaMRaT score: score features by classification performance, statistical significance, correlation, or variability

[USAGE]    kamrat score -idxdir STR -count-mode STR -scoreby STR -design STR [-with STR1[:STR2] -seltop NUM -outpath STR -withcounts]

[OPTION]    -h,-help             Print the helper
            -idxdir STR          Indexing folder by KaMRaT index, mandatory
            -scoreby STR         Scoring method, mandatory, can be one of: 
                                     classification (binary sample labels given by design file)
                                         ttest.padj      adjusted p-value of t-test between conditions
                                         ttest.pi        π-value of t-test between conditions
                                         snr             signal-to-noise ratio between conditions
                                         lr:nfold        accuracy by logistic regression classifier
                                     classification (binary or multiple sample labels given by design file)
                                         dids            DIDS score
                                         bayes:nfold     accuracy by naive Bayes classifier
                                     correlation evaluation (continuous sample labels given by design file)
                                         pearson         Pearson correlation with the continunous sample condition
                                         spearman        Spearman correlation with the continuous sample condition
                                     unsupervised evaluation (no design file required)
                                         sd              standard deviation
                                         rsd1            standard deviation adjusted by mean
                                         rsd2            standard deviation adjusted by min
                                         rsd3            standard deviation adjusted by median
                                         entropy         entropy of sample counts + 1
            -design STR          Path to file indicating sample-condition design, mandatory unless using sd, rsd1, rsd2, rsd3, entropy
                                     without header line, each row can be either: 
                                         sample name, sample condition
                                         sample name, sample condition, sample batch (only for lrc, nbc, and svm)
            -with STR1[:STR2]    File indicating features to score (STR1) and counting mode (STR2)
                                     if not provided, all indexed features are used for scoring
                                     STR2 can be one of [rep, mean, median]
            -seltop NUM          Select top scored features
                                     if NUM > 1, number of top features to select (should be integer)
                                     if 0 < NUM <= 1, ratio of top features to select
                                     if absent or NUM <= 0, output all features
            -outpath STR         Path to scoring result
                                     if not provided, output to screen
            -withcounts          Output sample count vectors [false]

[NOTE]      For scoring methods lrc, nbc, and svm, a univariate CV fold number (nfold) can be provided
                if nfold = 0, leave-one-out cross-validation
                if nfold = 1, without cross-validation, training and testing on the whole datset
                if nfold > 1, n-fold cross-validation
            For t-test scoring methods, a transformation log2(x + 1) is applied to sample counts
            For SVM scoring, sample counts standardization is applied feature by feature

For detailed description of some scoring methods, please refer to the supplementary document of our article for information.

KaMRaT score has an alias as KaMRaT rank, which share the same usage as described above. Please prioritize the "score" name instead of "rank". The alias is only to ensure compatiblility to previous projects, and may be deprecated in future release.

KaMRaT query: query sequences

[USAGE]    kamrat query -idxdir STR -fasta STR -toquery STR [-withabsent -outpath STR]

[OPTION]    -h,-help         Print the helper
            -idxdir STR      Indexing folder by KaMRaT index, mandatory
            -fasta STR       Sequence fasta file, mandatory
            -toquery STR     Query method, mandatory, can be one of:
                                 mean        mean count among all composite k-mers for each sample
                                 median      median count among all composite k-mers for each sample
            -withabsent      Output also absent queries (count vector all 0) [default: false]
            -outpath STR     Path to extension results
                                 if not provided, output to screen