This script processes the output files generated by featureCounts to a simpler (and smaller) counts matrix.
usage: process-featurecounts [-h] [-V] [-v {error,warning,info,debug}] [-r <re>] [-s <re>] [-e] [-i <n>] [-k <n,[n]>] <file>
Reformat featureCounts output files
positional arguments:
<file> input featureCounts file
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-v {error,warning,info,debug}, --verbose {error,warning,info,debug}
Set logging level (default debug)
-r <re>, --id-regex <re>
row ID regular expression to use (default
"^([^.]+)\.\d+(.*)$")
-s <re>, --sample-regex <re>
sample name regular expression to use (default
"^(.*)$")
-e, --include-header include header comments
-i <n>, --id-col <n> gene ID column (default 1)
-k <n,[n]>, --skip <n,[n]>
comma-separated columns to skip (default 2,3,4,5,6)
process-featurecounts
trims both the header sample names and the gene IDs using the specified sample-regex
and id-regex
regular expressions. After matching, all captured groups are concatenated to yield the output.
Installation should be as simple as:
git clone https://github.com/alastair-droop/process-featurecounts.git
cd process-featurecounts
python setup.py install