In [1]:
%load_ext memory_profiler

## 1. Preparation of individual input files in `data` folder
After data processing with CellRanger, you need the following files:
* RNA read count matrix ("Filtered feature barcode matrix MEX (DIR)")
* ATAC reads ("ATAC Position-sorted alignments (BAM)")
* Optionally or from other softwares: cell clustering (inside "Secondary analysis outputs (DIR)")


In [2]:
#Add path
import os
os.environ['PATH'] = os.path.expanduser('~/.conda/envs/dictys/bin') + ':' + os.environ['PATH']
print(os.getenv('PATH'))

/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/.conda/envs/dictys/bin:/opt/gridware/apps/gcc/R/4.3.2/bin:/opt/gridware/compilers/gcc/11.2.0/bin:/opt/gridware2/apps/binapps/anaconda3/2019.07/bin:/opt/gridware2/apps/binapps/anaconda3/2019.07/condabin:/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/.local/bin:/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin


In [3]:
# Removes CPU usage limit by some jupyter versions
import os
os.environ['KMP_AFFINITY'] = ''
#Create input data folder
current_path = os.getcwd()
print(current_path)

/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys


### expression.tsv.gz
Read count matrix of RNA-profiled cells in compressed tsv format. Downloaded and converted from "Filtered feature barcode matrix MEX (DIR)".

In [11]:
%%bash
cd ./data
# Download expression data in mtx.gz format
wget -q -o /dev/null https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.tar.gz
tar xf pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.tar.gz
rm pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.tar.gz
# Convert from mtx.gz to tsv.gz format using helper script `expression_mtx.py`.
dictys_helper expression_mtx.py filtered_feature_bc_matrix expression.tsv.gz
rm -Rf filtered_feature_bc_matrix




### bams
This folder contains one bam file for each cell with chromatin accessibility measurement. File name should be cell name. Downloaded and converted from "ATAC Position-sorted alignments (BAM)".


In [12]:
%%bash
set -eo pipefail
cd ./data
# Download chromatin accessibility reads in bam format
wget --debug --progress=bar:force:noscroll -O bams.bam https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_possorted_bam.bam


Setting --progress (progress) to bar:force:noscroll
Setting --output-document (outputdocument) to bams.bam
DEBUG output created by Wget 1.21.4 on linux-gnu.

Reading HSTS entries from /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/.wget-hsts
URI encoding = ‘UTF-8’
--2024-07-18 23:47:43--  https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_possorted_bam.bam
Resolving cf.10xgenomics.com (cf.10xgenomics.com)... 104.18.1.173, 104.18.0.173, 2606:4700::6812:1ad, ...
Caching cf.10xgenomics.com => 104.18.1.173 104.18.0.173 2606:4700::6812:1ad 2606:4700::6812:ad
Connecting to cf.10xgenomics.com (cf.10xgenomics.com)|104.18.1.173|:443... connected.
Created socket 4.
Releasing 0x000055ae7779bcf0 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 4 to SSL handle 0x000055ae7779d830
certificate:
  subject: CN=10xgenomics.com
  issuer:  CN=E1,O=Let's Encrypt,C=US
X509 certificate successfully

In [16]:
%%bash
set -eo pipefail
cd ./data
dictys_helper split_bam.sh bams.bam bams --section "CB:Z:" --ref_expression expression.tsv.gz
rm bams.bam

### subsets & subsets.txt
* subsets.txt: Names of cell subsets. For each cell subset, a GRN is reconstructed.
* subsets: Folder containing one subfolder for each cell subset as in `subsets.txt`. Each subfolder contains two files:
    - names_rna.txt: Names of cells that belong to this subset and have transcriptome measurement
    - names_atac.txt: Names of cells that belong to this subset and have chromatin accessibility measurement
    - For joint measurements of RNA and ATAC, these two files should be identical in every folder.


In [17]:
%%bash
cd ./data
wget -q -o /dev/null https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_analysis.tar.gz
#Extract cell names for each cluster
tar xf pbmc_granulocyte_sorted_10k_analysis.tar.gz 
mv analysis/clustering/gex/graphclust/clusters.csv clusters.csv
rm -Rf pbmc_granulocyte_sorted_10k_analysis.tar.gz analysis


Finally, reformat clusters.csv for input:

In [18]:
%%bash
cd ./data
subsets="$(tail -n +2 clusters.csv | awk -F , '{print $2}' | sort -u)"
echo "$subsets" | awk '{print "Subset"$1}' > subsets.txt
for x in $subsets; do
	mkdir -p "subsets/Subset$x"
	grep ",$x"'$' clusters.csv | awk -F , '{print $1}' > "subsets/Subset$x/names_rna.txt"
	# RNA and ATAC barcodes are the same for joint quantifications
	cp "subsets/Subset$x/names_rna.txt" "subsets/Subset$x/names_atac.txt"
done
rm clusters.csv


### Other files
Motif, reference genome, gene transcriptional start site

In [21]:
%%bash
cd ./data

# Motifs (file motifs.motif)
# from HOCOMOCO (https://hocomoco11.autosome.org/)
wget -q -o /dev/null -O motifs.motif 'https://hocomoco11.autosome.org/final_bundle/hocomoco11/full/HUMAN/mono/HOCOMOCOv11_full_HUMAN_mono_homer_format_0.0001.motif'






In [24]:
!head -n 18 ./data/motifs.motif

>dKhGCGTGh	AHR_HUMAN.H11MO.0.B	3.3775000000000004
0.262728374765856	0.1227600511842322	0.362725638699551	0.25178593535036087
0.07633328991810645	0.08258130543118362	0.22593295481662123	0.6151524498340887
0.14450570038747923	0.28392173880411337	0.13815442099009081	0.4334181398183167
0.023935814057894068	0.016203821748029118	0.9253278681170539	0.03453249607702277
0.007919544273173793	0.953597675415874	0.017308392078009837	0.021174388232942286
0.02956192959210962	0.012890110758086997	0.9474192747166682	0.010128684933135217
0.007919544273173797	0.029561929592109615	0.012337825593096645	0.9501807005416201
0.007919544273173793	0.007919544273173793	0.9762413671804787	0.007919544273173793
0.27886589130660366	0.4285328543459993	0.10955683916661985	0.18304441518077724
>hnnGGWWnddWWGGdbWh	AIRE_HUMAN.H11MO.0.C	5.64711
0.38551919443239085	0.2604245534178759	0.1353299124033618	0.21872633974637148
0.18745267949274294	0.18745267949274294	0.14575446582123766	0.4793401751932764
0.1457544658

In [23]:
%%bash
cd ./data
# Reference genome (folder genome)
# Download genome from HOMER
dictys_helper genome_homer.sh hg38 genome

Process is interrupted.


In [30]:
%%bash
ls -h1s ./data/genome | head

total 5.3G
113K annotations
153K chrom.sizes
3.7G genome.fa
4.2M hg38.aug
 51M hg38.basic.annotation
822M hg38.full.annotation
641K hg38.miRNA
616M hg38.repeats
 32M hg38.rna


In [29]:
%%bash
cd ./data
# Bed file for TSS (file gene.bed)
# Download gtf file from ensembl
wget -q -o /dev/null -O gene.gtf.gz http://ftp.ensembl.org/pub/release-107/gtf/homo_sapiens/Homo_sapiens.GRCh38.107.gtf.gz
gunzip gene.gtf.gz
# Convert to bed
dictys_helper gene_gtf.sh gene.gtf gene.bed
rm gene.gtf

In [31]:
!head ./data/gene.bed

chr1	11869	14409	DDX11L1	.	+
chr1	14404	29570	WASH7P	.	-
chr1	17369	17436	MIR6859-1	.	-
chr1	29554	31109	MIR1302-2HG	.	+
chr1	30366	30503	MIR1302-2	.	+
chr1	34554	36081	FAM138A	.	-
chr1	52473	53312	OR4G4P	.	+
chr1	57598	64116	OR4G11P	.	+
chr1	65419	71585	OR4F5	.	+
chr1	131025	134836	CICP27	.	+


### Clean up

In [32]:
!rm -Rf ./data/filtered_feature_bc_matrix

## 2. Network inference configuration
### Generate configurations
Please adjust them for your own machine and dataset

In [4]:
pwd

'/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys'

In [5]:
%%bash
# Generate configuration template
rm -Rf ./makefiles
mkdir ./makefiles
cd ./makefiles
dictys_helper makefile_template.sh common.mk config.mk env_none.mk static.mk

# Update configurations, such as:
# DEVICE: pytorch device, e.g. cpu, cuda:0. If you do not have a GPU, use 'cpu' and expect LONG computing time.
# GENOME_MACS2: effective genome size for macs2. See https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html
# JOINT: whether dataset is joint profiling of RNA and ATAC.
# Other configurations include quality control thresholds, number of threads in each job, number of hidden confounders, etc.
# They can be obtained in the full-multiome tutorial.
dictys_helper makefile_update.py ../makefiles/config.mk '{"DEVICE": "cpu", "GENOME_MACS2": "hs", "JOINT": "1"}'

In [6]:
!cat ./makefiles/config.mk

# Lingfei Wang, 2022. All rights reserved.
#This file contains parameters for whole run and individual steps to be edited for your dataset
#This file should be edited to configure the run
#This file should NOT be directly used for any run with `makefile -f` 

############################################################
# Run environment settings
############################################################
#Which environment to use, corresponding to env_$(ENVMODE).mk file
ENVMODE=none
#Maximum number of CPU threads for each job
#This is only nominated and passed through to other softwares without any guarantee.
NTH=4
#Device name for pyro/pytorch
#Note: cuda devices other than cuda:0 could be incompatible with singularity environment
DEVICE=cpu

############################################################
# Dataset settings
############################################################

#Genome size for Macs2, accept shortcuts like mm & hs
GENOME_MACS2=hs
#Whether d

### Validate input data

In [7]:
%%bash
/usr/bin/time -v cd . &&  /usr/bin/time -v dictys_helper makefile_check.py

	Command being timed: "cd ."
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 33%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3328
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 189
	Voluntary context switches: 2
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0


Joint profile: True
Found 11909 cells with RNA profile
Found 24036 genes with RNA profile
Found 11909 cells with ATAC profile
Found 769 motifs
Found 678 TFs
Found 461 TFs in current dataset
Missing 217 TFs in current dataset: ANDR,AP2A,AP2B,AP2C,AP2D,ARI3A,ARI5B,ATF6A,BARH1,BARH2,BC11A,BHA15,BHE22,BHE23,BHE40,BHE41,BMAL1,BRAC,BSH,COE1,COT1,COT2,CR3L1,CR3L2,ERR1,ERR2,ERR3,EVI1,GCR,HEN1,HMBX1,HME1,HME2,HNF6,HTF4,HXA1,HXA10,HXA11,HXA13,HXA2,HXA5,HXA7,HXA9,HXB1,HXB13,HXB2,HXB3,HXB4,HXB6,HXB7,HXB8,HXC10,HXC11,HXC12,HXC13,HXC6,HXC8,HXC9,HXD10,HXD11,HXD12,HXD13,HXD3,HXD4,HXD8,HXD9,ITF2,KAISO,MCR,MGAP,MLXPL,MYBA,MYBB,NDF1,NDF2,NF2L1,NF2L2,NFAC1,NFAC2,NFAC3,NFAC4,NGN2,NKX21,NKX22,NKX23,NKX25,NKX28,NKX31,NKX32,NKX61,NKX62,ONEC2,ONEC3,OZF,P53,P5F1B,P63,P73,PEBB,PHX2A,PHX2B,PIT1,PKNX1,PLAL1,PO2F1,PO2F2,PO2F3,PO3F1,PO3F2,PO3F3,PO3F4,PO4F1,PO4F2,PO4F3,PO5F1,PO6F1,PO6F2,PRD14,PRGR,RHXF1,RORG,RX,SMCA1,SMCA5,SRBP1,SRBP2,STA5A,STA5B,STF1,SUH,TF2LX,TF65,TF7L1,TF7L2,TFE2,THA,THA11,THB,TWST1,TYY1,TYY2,UBIP

	Command being timed: "dictys_helper makefile_check.py"
	User time (seconds): 20.91
	System time (seconds): 3.98
	Percent of CPU this job got: 53%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:46.11
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 7925516
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 2119390
	Voluntary context switches: 69043
	Involuntary context switches: 561
	Swaps: 0
	File system inputs: 6440144
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0


## 3. Network inference

In [8]:
import os
print(os.cpu_count())

32


In [9]:
%memit

peak memory: 65.46 MiB, increment: 0.25 MiB


In [10]:
%%bash
/usr/bin/time -v cd .; 
/usr/bin/time -v dictys_helper network_inference.sh -j 32 -J 1 static 

	Command being timed: "cd ."
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 100%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3328
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 188
	Voluntary context switches: 1
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0


mkdir -p tmp_static/Subset1/
cp data/subsets/Subset1/names_rna.txt tmp_static/Subset1/names_rna.txt
mkdir -p tmp_static/Subset10/
mkdir -p tmp_static/Subset11/
cp data/subsets/Subset10/names_rna.txt tmp_static/Subset10/names_rna.txt
cp data/subsets/Subset11/names_rna.txt tmp_static/Subset11/names_rna.txt
mkdir -p tmp_static/Subset12/
cp data/subsets/Subset12/names_rna.txt tmp_static/Subset12/names_rna.txt
mkdir -p tmp_static/Subset13/
cp data/subsets/Subset13/names_rna.txt tmp_static/Subset13/names_rna.txt
mkdir -p tmp_static/Subset14/
cp data/subsets/Subset14/names_rna.txt tmp_static/Subset14/names_rna.txt
mkdir -p tmp_static/Subset15/
cp data/subsets/Subset15/names_rna.txt tmp_static/Subset15/names_rna.txt
mkdir -p tmp_static/Subset16/
mkdir -p tmp_static/Subset17/
cp data/subsets/Subset16/names_rna.txt tmp_static/Subset16/names_rna.txt
cp data/subsets/Subset17/names_rna.txt tmp_static/Subset17/names_rna.txt
mkdir -p tmp_static/Subset2/
mkdir -p tmp_static/Subset3/
cp data/subsets/Su

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  preproc qc_reads  tmp_static/Subset13/expression0.tsv.gz tmp_static/Subset13/expression.tsv.gz 50 10 0 200 100 0
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  preproc qc_reads  tmp_static/Subset10/expression0.tsv.gz tmp_static/Subset10/expression.tsv.gz 50 10 0 200 100 0
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  preproc qc_reads  tmp_static/Subset14/expression0.tsv.gz tmp_static/Subset14/expression.tsv.gz 50 10 0 200 100 0
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin macs2 --nth 4 tmp_static/Subset17/names_atac.txt data/bams tmp_static/S

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  preproc selects_atac  tmp_static/Subset4/expression.tsv.gz tmp_static/Subset4/names_atac0.txt tmp_static/Subset4/names_atac.txt
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin macs2 --nth 4 tmp_static/Subset9/names_atac.txt data/bams tmp_static/Subset9/reads.bam tmp_static/Subset9/reads.bai tmp_static/Subset9/peaks.bed hs
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin macs2 --nth 4 tmp_static/Subset7/names_atac.txt data/bams tmp_static/Subset7/reads.bam tmp_static/Subset7/reads.bai tmp_static/Subset7/peaks.bed hs
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREA

# ARGUMENTS LIST:
# name = 04
# format = AUTO
# ChIP-seq file = ['/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/tmp_static/Subset16/reads.bam']
# control file = None
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 10000 bps
# Broad region calling is off
# Paired-End mode is off
# Searching for subpeak summits is on
 
INFO  @ Fri, 02 Aug 2024 12:25:22: #1 read tag files... 
INFO  @ Fri, 02 Aug 2024 12:25:22: #1 read treatment tags... 
DEBUG @ Fri, 02 Aug 2024 12:25:22: Testing format BAM 
INFO  @ Fri, 02 Aug 2024 12:25:22: Detected format is: BAM 
INFO  @ Fri, 02 Aug 2024 12:25:22: * Input file is gzipped. 
INFO  @ Fri, 02 Aug 2024 12:25

INFO  @ Fri, 02 Aug 2024 12:28:44: #1 tag size = 49.0 
INFO  @ Fri, 02 Aug 2024 12:28:44: #1  total tags in treatment: 11955586 
INFO  @ Fri, 02 Aug 2024 12:28:44: #1 finished! 
INFO  @ Fri, 02 Aug 2024 12:28:44: #2 Build Peak Model... 
INFO  @ Fri, 02 Aug 2024 12:28:44: #2 Skipped... 
INFO  @ Fri, 02 Aug 2024 12:28:44: #2 Use 150 as fragment length 
INFO  @ Fri, 02 Aug 2024 12:28:44: #2 Sequencing ends will be shifted towards 5' by 75 bp(s) 
INFO  @ Fri, 02 Aug 2024 12:28:44: #3 Call peaks... 
INFO  @ Fri, 02 Aug 2024 12:28:44: #3 Going to call summits inside each peak ... 
INFO  @ Fri, 02 Aug 2024 12:28:44: #3 Pre-compute pvalue-qvalue table... 
DEBUG @ Fri, 02 Aug 2024 12:28:44: Start to calculate pvalue stat... 
DEBUG @ Fri, 02 Aug 2024 12:28:59: access pq hash for 21232508 times 
INFO  @ Fri, 02 Aug 2024 12:28:59: #3 Call peaks for each chromosome... 
INFO  @ Fri, 02 Aug 2024 12:29:25: #4 Write output xls file... 04_peaks.xls 
INFO  @ Fri, 02 Aug 2024 12:29:25: #4 Write peak in na

[bam_sort_core] merging from 4 files and 4 in-memory blocks...
INFO  @ Fri, 02 Aug 2024 12:34:04: 
# Command line: callpeak -t /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/tmp_static/Subset14/reads.bam -n 04 -g hs --nomodel --shift -75 --extsize 150 --keep-dup all --verbose 4 --call-summits -q 0.05
# ARGUMENTS LIST:
# name = 04
# format = AUTO
# ChIP-seq file = ['/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/tmp_static/Subset14/reads.bam']
# control file = None
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 10000 bps
# Broad region calling is off
# Paired-End mode is off
# Searching for subpeak summit

# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 10000 bps
# Broad region calling is off
# Paired-End mode is off
# Searching for subpeak summits is on
 
INFO  @ Fri, 02 Aug 2024 12:36:51: #1 read tag files... 
INFO  @ Fri, 02 Aug 2024 12:36:51: #1 read treatment tags... 
DEBUG @ Fri, 02 Aug 2024 12:36:51: Testing format BAM 
INFO  @ Fri, 02 Aug 2024 12:36:51: Detected format is: BAM 
INFO  @ Fri, 02 Aug 2024 12:36:51: * Input file is gzipped. 
INFO  @ Fri, 02 Aug 2024 12:36:55:  1000000 
INFO  @ Fri, 02 Aug 2024 12:36:58:  2000000 
INFO  @ Fri, 02 Aug 2024 12:37:01:  3000000 
INFO  @ Fri, 02 Aug 2024 12:37:04:  4000000 
INFO  @ Fri, 02 Aug 2024 12:37:07:  5000000 
INFO  @ Fri, 02 Aug 2024 12:37:10:  6000000 
INFO  @ Fri, 02 Aug 2024 12:37:14:  7000000 
INFO  @ Fri, 02 Aug 2024 12:37:17:  8000000 
INFO  @ Fri, 02 Aug 2024 12:37:20:  9000000 
INFO  @ Fri, 02 Aug 2024 12:37:23:  10000000 
INFO  @ Fri, 02 Aug 2024 12:37:26:  11000000 
IN

INFO  @ Fri, 02 Aug 2024 12:38:20:  2000000 
INFO  @ Fri, 02 Aug 2024 12:38:23:  3000000 
INFO  @ Fri, 02 Aug 2024 12:38:26:  4000000 
INFO  @ Fri, 02 Aug 2024 12:38:29:  5000000 
INFO  @ Fri, 02 Aug 2024 12:38:33:  6000000 
INFO  @ Fri, 02 Aug 2024 12:38:36:  7000000 
INFO  @ Fri, 02 Aug 2024 12:38:39:  8000000 
INFO  @ Fri, 02 Aug 2024 12:38:43:  9000000 
INFO  @ Fri, 02 Aug 2024 12:38:46:  10000000 
INFO  @ Fri, 02 Aug 2024 12:38:49:  11000000 
INFO  @ Fri, 02 Aug 2024 12:38:53:  12000000 
INFO  @ Fri, 02 Aug 2024 12:38:57:  13000000 
INFO  @ Fri, 02 Aug 2024 12:39:01:  14000000 
INFO  @ Fri, 02 Aug 2024 12:39:04:  15000000 
INFO  @ Fri, 02 Aug 2024 12:39:08:  16000000 
INFO  @ Fri, 02 Aug 2024 12:39:12:  17000000 
INFO  @ Fri, 02 Aug 2024 12:39:15:  18000000 
INFO  @ Fri, 02 Aug 2024 12:39:19:  19000000 
INFO  @ Fri, 02 Aug 2024 12:39:23:  20000000 
INFO  @ Fri, 02 Aug 2024 12:39:27:  21000000 
INFO  @ Fri, 02 Aug 2024 12:39:31:  22000000 
INFO  @ Fri, 02 Aug 2024 12:39:33: 2250800

INFO  @ Fri, 02 Aug 2024 12:39:14:  10000000 
INFO  @ Fri, 02 Aug 2024 12:39:18:  11000000 
INFO  @ Fri, 02 Aug 2024 12:39:22:  12000000 
INFO  @ Fri, 02 Aug 2024 12:39:25:  13000000 
INFO  @ Fri, 02 Aug 2024 12:39:29:  14000000 
INFO  @ Fri, 02 Aug 2024 12:39:33:  15000000 
INFO  @ Fri, 02 Aug 2024 12:39:37:  16000000 
INFO  @ Fri, 02 Aug 2024 12:39:40:  17000000 
INFO  @ Fri, 02 Aug 2024 12:39:45:  18000000 
INFO  @ Fri, 02 Aug 2024 12:39:49:  19000000 
INFO  @ Fri, 02 Aug 2024 12:39:54:  20000000 
INFO  @ Fri, 02 Aug 2024 12:39:58:  21000000 
INFO  @ Fri, 02 Aug 2024 12:40:02:  22000000 
INFO  @ Fri, 02 Aug 2024 12:40:08:  23000000 
INFO  @ Fri, 02 Aug 2024 12:40:12:  24000000 
INFO  @ Fri, 02 Aug 2024 12:40:16:  25000000 
INFO  @ Fri, 02 Aug 2024 12:40:21:  26000000 
INFO  @ Fri, 02 Aug 2024 12:40:22: 26390825 reads have been read. 
INFO  @ Fri, 02 Aug 2024 12:40:22: #1 tag size is determined as 49 bps 
INFO  @ Fri, 02 Aug 2024 12:40:22: #1 tag size = 49.0 
INFO  @ Fri, 02 Aug 2024

INFO  @ Fri, 02 Aug 2024 12:40:53:  17000000 
INFO  @ Fri, 02 Aug 2024 12:40:57:  18000000 
INFO  @ Fri, 02 Aug 2024 12:41:01:  19000000 
INFO  @ Fri, 02 Aug 2024 12:41:06:  20000000 
INFO  @ Fri, 02 Aug 2024 12:41:11:  21000000 
INFO  @ Fri, 02 Aug 2024 12:41:16:  22000000 
INFO  @ Fri, 02 Aug 2024 12:41:21:  23000000 
INFO  @ Fri, 02 Aug 2024 12:41:26:  24000000 
INFO  @ Fri, 02 Aug 2024 12:41:31:  25000000 
INFO  @ Fri, 02 Aug 2024 12:41:35:  26000000 
INFO  @ Fri, 02 Aug 2024 12:41:38:  27000000 
INFO  @ Fri, 02 Aug 2024 12:41:42:  28000000 
INFO  @ Fri, 02 Aug 2024 12:41:43: 28263367 reads have been read. 
INFO  @ Fri, 02 Aug 2024 12:41:43: #1 tag size is determined as 49 bps 
INFO  @ Fri, 02 Aug 2024 12:41:43: #1 tag size = 49.0 
INFO  @ Fri, 02 Aug 2024 12:41:43: #1  total tags in treatment: 28263367 
INFO  @ Fri, 02 Aug 2024 12:41:43: #1 finished! 
INFO  @ Fri, 02 Aug 2024 12:41:43: #2 Build Peak Model... 
INFO  @ Fri, 02 Aug 2024 12:41:43: #2 Skipped... 
INFO  @ Fri, 02 Aug 20

	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 6747
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 6747
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need at least 5 columns per peak)
		Peaks with misformatted coordinates: 0 (should be integer)
		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 21 (avg size of targets)
	Background files for 21 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa

	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 626 sequences from chr1
	Extracting 259 sequences from chr10
	Extracting 387 sequences from chr11
	Extracting 336 sequences from chr12
	Extracting 123 sequences from chr13
	Extracting 212 sequences from chr14
	Extracting 197 sequences from chr15
	Extracting 330 sequences from chr16
	Extracting 463 sequences from chr17
	Extracting 101 sequences from chr18
	Extracting 642 sequences from chr19
	Extracting 440 sequences from chr2
	Extracting 187 sequences from chr20
	Extracting 118 sequences from chr21
	Extracting 180 sequences from chr22
	Extracting 319 sequences from chr3

INFO  @ Fri, 02 Aug 2024 13:06:36: #3 Going to call summits inside each peak ... 
INFO  @ Fri, 02 Aug 2024 13:06:36: #3 Pre-compute pvalue-qvalue table... 
DEBUG @ Fri, 02 Aug 2024 13:06:36: Start to calculate pvalue stat... 
DEBUG @ Fri, 02 Aug 2024 13:10:27: access pq hash for 115417840 times 
INFO  @ Fri, 02 Aug 2024 13:10:27: #3 Call peaks for each chromosome... 
INFO  @ Fri, 02 Aug 2024 13:13:27: #4 Write output xls file... 04_peaks.xls 
INFO  @ Fri, 02 Aug 2024 13:13:28: #4 Write peak in narrowPeak format file... 04_peaks.narrowPeak 
INFO  @ Fri, 02 Aug 2024 13:13:29: #4 Write summits bed file... 04_summits.bed 
INFO  @ Fri, 02 Aug 2024 13:13:29: Done! 

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin wellington --nth 4 tmp_static/Subset1/reads.bam tmp_static/Subset1/reads.bai tmp_static/Subset1/peaks.bed tmp_static/Subset1/footprints.bed
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_T

	Extracting 2520 sequences from chr1
	Extracting 920 sequences from chr10
	Extracting 1314 sequences from chr11
	Extracting 1209 sequences from chr12
	Extracting 438 sequences from chr13
	Extracting 842 sequences from chr14
	Extracting 714 sequences from chr15
	Extracting 1152 sequences from chr16
	Extracting 1572 sequences from chr17
	Extracting 339 sequences from chr18
	Extracting 1983 sequences from chr19
	Extracting 1517 sequences from chr2
	Extracting 598 sequences from chr20
	Extracting 380 sequences from chr21
	Extracting 618 sequences from chr22
	Extracting 1217 sequences from chr3
	Extracting 786 sequences from chr4
	Extracting 1030 sequences from chr5
	Extracting 1330 sequences from chr6
	Extracting 1074 sequences from chr7
	Extracting 799 sequences from chr8
	Extracting 899 sequences from chr9
	Extracting 477 sequences from chrX
	Extracting 2 sequences from chrY


	Reading input files...
	23730 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%  

	Extracting 890 sequences from chr8
	Extracting 988 sequences from chr9
	Extracting 597 sequences from chrX


	Reading input files...
	24966 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%                                    50%                                  100%|
	Cleaning up tmp files...


	Position file = 14-reform-split/aaaab
	Genome = /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome
	Output Directory = 15-motifscan/aaaab
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 25001
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 25001
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need at least 5 columns per peak)
		Peaks 

	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 25001
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 25001
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need at least 5 columns per peak)
		Peaks with misformatted coordinates: 0 (should be integer)
		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 18 (avg size of targets)
	Background files for 18 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 2602 sequences from chr1
	Extracting 1005 sequences from chr10
	

	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 2610 sequences from chr1
	Extracting 1012 sequences from chr10
	Extracting 1301 sequences from chr11
	Extracting 1238 sequences from chr12
	Extracting 492 sequences from chr13
	Extracting 918 sequences from chr14
	Extracting 821 sequences from chr15
	Extracting 1148 sequences from chr16
	Extracting 1670 sequences from chr17
	Extracting 378 sequences from chr18
	Extracting 1773 sequences from chr19
	Extracting 1687 sequences from chr2
	Extracting 663 sequences from chr20
	Extracting 316 sequences from chr21
	Extracting 626 sequences from chr22
	Extracting 1310 sequences from chr3
	Extracting 786 sequences from chr4
	Extracting 1120 sequences from chr5
	Extracting 1366 sequences from chr6
	Extracting 1258 sequences from chr7
	Extracting 836 sequences from chr8
	Extracting 996 sequences from chr9
	Extracting 643 sequences from chr

	Background files for 18 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 2595 sequences from chr1
	Extracting 1039 sequences from chr10
	Extracting 1278 sequences from chr11
	Extracting 1223 sequences from chr12
	Extracting 507 sequences from chr13
	Extracting 857 sequences from chr14
	Extracting 819 sequences from chr15
	Extracting 1143 sequences from chr16
	Extracting 1658 sequences from chr17
	Extracting 322 sequences from chr18
	Extracting 1883 sequences from chr19
	Extracting 1621 sequences from chr2
	Extracting 631 sequences from chr20
	Extracting 381 sequences from chr21
	Extracting 622 seq

Reading BED File...
Calculating footprints...
Waiting for the last 30 jobs to finish...

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin homer --nth 4 tmp_static/Subset7/footprints.bed data/motifs.motif data/genome tmp_static/Subset7/expression.tsv.gz tmp_static/Subset7/motifs.bed tmp_static/Subset7/wellington.tsv.gz tmp_static/Subset7/homer.tsv.gz

	Position file = 14-reform-split/aaaab
	Genome = /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome
	Output Directory = 15-motifscan/aaaab
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 25001
		peakfile formatted lines: 0

	Peak File

	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 25001
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 25001
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need at least 5 columns per peak)
		Peaks with misformatted coordinates: 0 (should be integer)
		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 18 (avg size of targets)
	Background files for 18 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single fi


	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 2655 sequences from chr1
	Extracting 963 sequences from chr10
	Extracting 1307 sequences from chr11
	Extracting 1288 sequences from chr12
	Extracting 453 sequences from chr13
	Extracting 899 sequences from chr14
	Extracting 648 sequences from chr15
	Extracting 1213 sequences from chr16
	Extracting 1632 sequences from chr17
	Extracting 309 sequences from chr18
	Extracting 2428 sequences from chr19
	Extracting 1555 sequences from chr2
	Extracting 595 sequences from chr20
	Extracting 464 sequences from chr21
	Extracting 617 sequences from chr22
	Extracting 1254 sequences from chr3
	Extracting 797 sequences from chr4
	Extracting 1001 sequences from chr5
	Extracting 1264 sequences from chr6
	Extract

		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 20 (avg size of targets)
	Background files for 20 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 2479 sequences from chr1
	Extracting 1016 sequences from chr10
	Extracting 1300 sequences from chr11
	Extracting 1248 sequences from chr12
	Extracting 420 sequences from chr13
	Extracting 847 sequences from chr14
	Extracting 709 sequences from chr15
	Extracting 1210 sequences from chr16
	Extracting 1716 sequences from chr17
	Extracting 384 sequences from chr18
	Extracting 2319 se

	Extracting 1329 sequences from chr11
	Extracting 1284 sequences from chr12
	Extracting 481 sequences from chr13
	Extracting 832 sequences from chr14
	Extracting 752 sequences from chr15
	Extracting 1050 sequences from chr16
	Extracting 1510 sequences from chr17
	Extracting 392 sequences from chr18
	Extracting 1533 sequences from chr19
	Extracting 1776 sequences from chr2
	Extracting 690 sequences from chr20
	Extracting 378 sequences from chr21
	Extracting 584 sequences from chr22
	Extracting 1385 sequences from chr3
	Extracting 858 sequences from chr4
	Extracting 1163 sequences from chr5
	Extracting 1464 sequences from chr6
	Extracting 1219 sequences from chr7
	Extracting 934 sequences from chr8
	Extracting 1104 sequences from chr9
	Extracting 624 sequences from chrX


	Reading input files...
	24957 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%                                    50%                                  100%|
	Cleaning up tmp files...

OPE

	Extracting 1720 sequences from chr2
	Extracting 680 sequences from chr20
	Extracting 341 sequences from chr21
	Extracting 661 sequences from chr22
	Extracting 1296 sequences from chr3
	Extracting 858 sequences from chr4
	Extracting 1073 sequences from chr5
	Extracting 1407 sequences from chr6
	Extracting 1137 sequences from chr7
	Extracting 861 sequences from chr8
	Extracting 941 sequences from chr9
	Extracting 636 sequences from chrX


	Reading input files...
	24964 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%                                    50%                                  100%|
	Cleaning up tmp files...


	Position file = 14-reform-split/aaaab
	Genome = /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome
	Output Directory = 15-motifscan/aaaab
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/ms

	Extracting 1288 sequences from chr3
	Extracting 793 sequences from chr4
	Extracting 1084 sequences from chr5
	Extracting 1385 sequences from chr6
	Extracting 1160 sequences from chr7
	Extracting 873 sequences from chr8
	Extracting 951 sequences from chr9
	Extracting 620 sequences from chrX


	Reading input files...
	24922 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%                                    50%                                  100%|
	Cleaning up tmp files...


	Position file = 14-reform-split/aaaac
	Genome = /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome
	Output Directory = 15-motifscan/aaaac
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted li


OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin binding  tmp_static/Subset6/wellington.tsv.gz tmp_static/Subset6/homer.tsv.gz tmp_static/Subset6/binding.tsv.gz
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin tssdist  tmp_static/Subset6/expression.tsv.gz tmp_static/Subset6/wellington.tsv.gz data/gene.bed tmp_static/Subset6/tssdist.tsv.gz
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin binlinking  tmp_static/Subset4/linking.tsv.gz tmp_static/Subset4/binlinking.tsv.gz 20
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network reconstruct --device cpu --nth 4 tmp_static/Subset4/

	Output Directory = 15-motifscan/aaaab
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 25001
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 25001
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need at least 5 columns per peak)
		Peaks with misformatted coordinates: 0 (should be integer)
		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 18 (avg size of targets)
	Background files for 18 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89

	Background files for 18 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 2614 sequences from chr1
	Extracting 1008 sequences from chr10
	Extracting 1322 sequences from chr11
	Extracting 1214 sequences from chr12
	Extracting 448 sequences from chr13
	Extracting 910 sequences from chr14
	Extracting 777 sequences from chr15
	Extracting 1193 sequences from chr16
	Extracting 1684 sequences from chr17
	Extracting 364 sequences from chr18
	Extracting 2022 sequences from chr19
	Extracting 1699 sequences from chr2
	Extracting 620 sequences from chr20
	Extracting 411 sequences from chr21
	Extracting 655 seq

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network reconstruct --device cpu --nth 4 tmp_static/Subset3/expression.tsv.gz tmp_static/Subset3/binlinking.tsv.gz tmp_static/Subset3/net_weight.tsv.gz tmp_static/Subset3/net_meanvar.tsv.gz tmp_static/Subset3/net_covfactor.tsv.gz tmp_static/Subset3/net_loss.tsv.gz tmp_static/Subset3/net_stats.tsv.gz
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin linking  tmp_static/Subset9/binding.tsv.gz tmp_static/Subset9/tssdist.tsv.gz tmp_static/Subset9/linking.tsv.gz
Reading BED File...
Calculating footprints...
Waiting for the last 30 jobs to finish...

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin homer --nth 4 tmp_static/Subset14/footprints

	Extracting 2567 sequences from chr1
	Extracting 1052 sequences from chr10
	Extracting 1290 sequences from chr11
	Extracting 1301 sequences from chr12
	Extracting 482 sequences from chr13
	Extracting 855 sequences from chr14
	Extracting 724 sequences from chr15
	Extracting 1182 sequences from chr16
	Extracting 1676 sequences from chr17
	Extracting 392 sequences from chr18
	Extracting 1857 sequences from chr19
	Extracting 1705 sequences from chr2
	Extracting 706 sequences from chr20
	Extracting 380 sequences from chr21
	Extracting 616 sequences from chr22
	Extracting 1399 sequences from chr3
	Extracting 847 sequences from chr4
	Extracting 1033 sequences from chr5
	Extracting 1323 sequences from chr6
	Extracting 1157 sequences from chr7
	Extracting 815 sequences from chr8
	Extracting 1080 sequences from chr9
	Extracting 504 sequences from chrX


	Reading input files...
	24943 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%                                  

	Output Directory = 15-motifscan/aaaab
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 25001
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 25001
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need at least 5 columns per peak)
		Peaks with misformatted coordinates: 0 (should be integer)
		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 18 (avg size of targets)
	Background files for 18 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin binlinking  tmp_static/Subset12/linking.tsv.gz tmp_static/Subset12/binlinking.tsv.gz 20
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network reconstruct --device cpu --nth 4 tmp_static/Subset12/expression.tsv.gz tmp_static/Subset12/binlinking.tsv.gz tmp_static/Subset12/net_weight.tsv.gz tmp_static/Subset12/net_meanvar.tsv.gz tmp_static/Subset12/net_covfactor.tsv.gz tmp_static/Subset12/net_loss.tsv.gz tmp_static/Subset12/net_stats.tsv.gz
Reading BED File...
Calculating footprints...
Waiting for the last 30 jobs to finish...

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  chromatin homer --nth 4 tmp_static/Subset2/footprints.bed data/motifs.

	Extracting 1163 sequences from chr7
	Extracting 855 sequences from chr8
	Extracting 996 sequences from chr9
	Extracting 545 sequences from chrX


	Reading input files...
	24923 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%                                    50%                                  100%|
	Cleaning up tmp files...


	Position file = 14-reform-split/aaaad
	Genome = /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome
	Output Directory = 15-motifscan/aaaad
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 24997
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 24997
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need 

	Extracting 2478 sequences from chr1
	Extracting 1047 sequences from chr10
	Extracting 1315 sequences from chr11
	Extracting 1316 sequences from chr12
	Extracting 496 sequences from chr13
	Extracting 818 sequences from chr14
	Extracting 766 sequences from chr15
	Extracting 1073 sequences from chr16
	Extracting 1648 sequences from chr17
	Extracting 385 sequences from chr18
	Extracting 1803 sequences from chr19
	Extracting 1737 sequences from chr2
	Extracting 740 sequences from chr20
	Extracting 347 sequences from chr21
	Extracting 619 sequences from chr22
	Extracting 1391 sequences from chr3
	Extracting 808 sequences from chr4
	Extracting 1111 sequences from chr5
	Extracting 1418 sequences from chr6
	Extracting 1182 sequences from chr7
	Extracting 886 sequences from chr8
	Extracting 1000 sequences from chr9
	Extracting 567 sequences from chrX


	Reading input files...
	24951 total sequences read
	769 motifs loaded
	Finding instances of 769 motif(s)
	|0%                                  


	Position file = 14-reform-split/aaaac
	Genome = /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome
	Output Directory = 15-motifscan/aaaac
	Using actual sizes of regions (-size given)
	Fragment size set to given
	Will use repeat masked sequences
	Will find motif(s) in /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/motifs.motif
	Using Custom Genome
	Peak/BED file conversion summary:
		BED/Header formatted lines: 25001
		peakfile formatted lines: 0

	Peak File Statistics:
		Total Peaks: 25001
		Redundant Peak IDs: 0
		Peaks lacking information: 0 (need at least 5 columns per peak)
		Peaks with misformatted coordinates: 0 (should be integer)
		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 19 (avg size of targets)
	Background files for 19 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatas

		Peaks with misformatted coordinates: 0 (should be integer)
		Peaks with misformatted strand: 0 (should be either +/- or 0/1)

	Peak file looks good!

	Background fragment size set to 20 (avg size of targets)
	Background files for 20 bp fragments found.
	Custom genome sequence directory: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome

	Extracting sequences from file: /mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa
	Looking for peak sequences in a single file (/mnt/iusers01/fatpou01/bmh01/msc-healthdatasci-2023-2024/z89953zj/models/dictys/data/genome/genome.fa)
	Extracting 2653 sequences from chr1
	Extracting 989 sequences from chr10
	Extracting 1289 sequences from chr11
	Extracting 1237 sequences from chr12
	Extracting 392 sequences from chr13
	Extracting 796 sequences from chr14
	Extracting 659 sequences from chr15
	Extracting 1209 sequences from chr16
	Extracting 1838 sequences from ch

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network normalize --nth 4 tmp_static/Subset8/net_weight.tsv.gz tmp_static/Subset8/net_meanvar.tsv.gz tmp_static/Subset8/net_covfactor.tsv.gz tmp_static/Subset8/net_nweight.tsv.gz
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network indirect --nth 4 --fi_meanvar tmp_static/Subset8/net_meanvar.tsv.gz tmp_static/Subset8/net_weight.tsv.gz tmp_static/Subset8/net_covfactor.tsv.gz tmp_static/Subset8/net_iweight.tsv.gz
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network normalize --nth 4 tmp_static/Subset8/net_iweight.tsv.gz tmp_static/Subset8/net_meanvar.tsv.gz tmp_static/Subset8/net_covfactor.tsv.gz tmp_static/Subset8/net_inweight.tsv.gz
OPENBLAS_NUM_T

OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network indirect --nth 4 --fi_meanvar tmp_static/Subset14/net_meanvar.tsv.gz tmp_static/Subset14/net_weight.tsv.gz tmp_static/Subset14/net_covfactor.tsv.gz tmp_static/Subset14/net_iweight.tsv.gz
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network normalize --nth 4 tmp_static/Subset14/net_iweight.tsv.gz tmp_static/Subset14/net_meanvar.tsv.gz tmp_static/Subset14/net_covfactor.tsv.gz tmp_static/Subset14/net_inweight.tsv.gz
OPENBLAS_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 MKL_NUM_THREADS=1 OPENBLAS_MAX_THREADS=1 NUMEXPR_MAX_THREADS=1 MKL_MAX_THREADS=1 python3 -m dictys  network normalize --nth 4 tmp_static/Subset5/net_weight.tsv.gz tmp_static/Subset5/net_meanvar.tsv.gz tmp_static/Subset5/net_covfactor.tsv.gz tmp_static/Subset5/net_nweight.tsv.gz
OPENBL

	Command being timed: "dictys_helper network_inference.sh -j 32 -J 1 static"
	User time (seconds): 1888306.40
	System time (seconds): 60382.78
	Percent of CPU this job got: 2316%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 23:22:18
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 6561684
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2551
	Minor (reclaiming a frame) page faults: 3570822734
	Voluntary context switches: 370192451
	Involuntary context switches: 223650176
	Swaps: 0
	File system inputs: 268634144
	File system outputs: 322552624
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
