# Mtb Network Inference

In order to generate inferred networks from transcriptional data, you must have access to a command line with docker installed.

## Obtaining docker images

### Building a single image

All docker images used here can be built from the dockerfiles supplied. To do this, simply run a command like:

```
docker build -t clr 1_network-inference/docker/clr
```

See `docker build --help` for more details on using this command.

### Building all images

All images can be built with the following series of commands:

```
for dir in 1_network-inference/docker/*; do docker build -t $(basename $dir) $dir; done
```

### Pulling images from Dockerhub

Alternatively, most images can also be pulled from Dockerhub from the `ethanbustadscri` account. To do this, run a command like:

```
docker pull ethanbustadscri/clr:0.1 && docker tag ethanbustadscri/clr:0.1 clr
```

This will fetch the pre-built image and give it a name as expected below. See `docker pull --help` and `docker tag --help` for more details on these commands.

The ARACNe image is notably missing from Dockerhub, as its license does not allow redistribution.

## Running docker images

Once the inference method docker images are obtained, they can each be run to infer a regulatory network based on the transcriptional data aggregated previously.

Inference must be executed once for each dataset. Hyperparameters can be adjusted as desired -- here are supplied the hyperparameters used in our investigation.

### GSE59086 dataset

#### ARACNe

```
docker run -it --rm \
    --volume ..:/root/mount \
    aracne \
        /root/mount/0_transcriptome-aggregation/out/GSE59086_pysnail.tsv \
        /root/mount/1_network-inference/in/mtb_tfs_214_Mycobrowser.txt \
        /root/mount/1_network-inference/out/aracne_GSE59086.txt \
        y \
        1E-6 \
        6 \
        100
```

Currently, this docker image does not have help text available. The positional parameters it accepts are:
- `expression_file`: tab-delimited text file containing expression information
- `regulators_file`: newline-delimited text file containing the list of transcription factors to use for inference
- `out_file`: the location where results will be written, as a tab-delimited text file
- `transpose`: whether or not to transpose the `expression_file` before passing to ARACNe proper; ARACNe expects files in the form of genes × samples, but a file in the form of samples × genes can be passed if `transpose` is `y` or `yes` (all other arguments will be interpreted as `no`; optional, defaults to `n`)
- `p_value`: the p-value cutoff used by ARACNe, see https://github.com/califano-lab/ARACNe-AP#parameters (optional, defaults to 1 × 10<sup>-8</sup>)
- `threads`: the number of CPU cores to use for processing, see https://github.com/califano-lab/ARACNe-AP#parameters (optional, defaults to 4)
- `bootstraps`: the number of random bootstrap networks to generate before consolidation, see https://github.com/califano-lab/ARACNe-AP#parameters (optional, defaults to 10, which is likely insufficient)
- `memory_gbs`: the number of gigabytes of memory to supply to the ARACNe process (optional, defaults to 12)

#### CLR

```
docker run -it --rm \
    --volume ..:/root/mount \
    clr \
        /root/mount/0_transcriptome-aggregation/out/GSE59086_pysnail.tsv \
        /root/mount/1_network-inference/in/mtb_tfs_214_Mycobrowser.txt \
        /root/mount/1_network-inference/out/clr_GSE59086.txt \
        --rows=samples
```

More usage details can be found using

```
docker run -it --rm clr --help
```

#### GENIE3

```
docker run -it --rm \
    --volume ..:/root/mount \
    genie3 \
        /root/mount/0_transcriptome-aggregation/out/GSE59086_pysnail.tsv \
        /root/mount/1_network-inference/in/mtb_tfs_214_Mycobrowser.txt \
        /root/mount/1_network-inference/out/genie3_GSE59086.txt \
        --rows=samples
```

More usage details can be found using

```
docker run -it --rm genie3 --help
```

#### Elastic net

```
docker run -it --rm \
    --volume ..:/root/mount \
    elasticnet \
        /root/mount/0_transcriptome-aggregation/out/GSE59086_pysnail.tsv \
        /root/mount/1_network-inference/in/mtb_tfs_214_Mycobrowser.txt \
        /root/mount/1_network-inference/out/elasticnet_GSE59086.txt \
        --n_cores=6 \
        -vv
```

More usage details can be found using

```
docker run -it --rm elasticnet --help
```

#### cMonkey2

```
docker run -it --rm \
    --volume ..:/root/mount \
    cmonkey2 \
        --organism mtu \
        --out /root/mount/cmonkey2/mtu \
        --num_cores 6 \
        --rsat_base_url http://networks.systemsbiology.net/rsat \
        --rsat_organism Mycobacterium_tuberculosis_H37Rv \
        --debug \
        /root/mount/0_transcriptome-aggregation/out/GSE59086_pysnail.tsv

# now transform the cMonkey2 output into the desired output
docker run -it --rm \
    --volume ..:/root/mount \
    --entrypoint /root/cmonkey_process.sh \
    cmonkey2 \
        /root/mount/cmonkey2/mtu/cmonkey_run.db \
        /root/mount/1_network-inference/in/mtb_tfs_214_Mycobrowser.txt \
        /root/mount/1_network-inference/out/cmonkey2_GSE59086.txt
```

#### iModulon

iModulon is by far the most long-running inference method used in this investigation. Here, processing is split up into multiple jobs that can be run in parallel, in order to complete the operation in a more reasonable amount of time. With the `--tolerance` and `--iterations` parameters used here, inference still takes days to complete.

```
docker run -it --rm \
    --volume ..:/root/mount \
    imodulon \
        --expression_file /root/mount/mtb_expression_master-20240215.tsv \
        --tolerance 1e-7 \
        --out_dir /root/mount/imodulon3 \
        --iterations 100 \
        --dim_end 280 \
        --dim_step 20 \
        --finalize False

docker run -it --rm \
    --volume ..:/root/mount \
    imodulon \
        --expression_file /root/mount/mtb_expression_master-20240215.tsv \
        --tolerance 1e-7 \
        --out_dir /root/mount/imodulon3 \
        --iterations 100 \
        --dim_begin 280 \
        --dim_end 400 \
        --dim_step 20 \
        --finalize False

docker run -it --rm \
    --volume ..:/root/mount \
    imodulon \
        --expression_file /root/mount/mtb_expression_master-20240215.tsv \
        --tolerance 1e-7 \
        --out_dir /root/mount/imodulon3 \
        --iterations 100 \
        --dim_begin 400 \
        --dim_end 420 \
        --dim_step 20 \
        --finalize False

docker run -it --rm \
    --volume ..:/root/mount \
    imodulon \
        --expression_file /root/mount/mtb_expression_master-20240215.tsv \
        --tolerance 1e-7 \
        --out_dir /root/mount/imodulon3 \
        --iterations 100 \
        --dim_begin 420 \
        --dim_end 440 \
        --dim_step 20 \
        --finalize False

# now consolidate all results together
docker run -it --rm \
    --volume ..:/root/mount \
    imodulon \
        --expression_file /root/mount/mtb_expression_master-20240215.tsv \
        --tolerance 1e-7 \
        --out_dir /root/mount/imodulon3 \
        --finalize only

# now transform the iModulon output into the desired output
docker run -it --rm \
    --volume ..:/root/mount \
    --entrypoint /root/imodulon_process.sh \
    imodulon \
        /root/mount/imodulon/M.csv \
        /root/mount/1_network-inference/in/mtb_tfs_214_Mycobrowser.txt \
        /root/mount/1_network-inference/out/imodulon_GSE59086.txt
```

More usage details can be found using

```
docker run -it --rm imodulon --help
```

and

```
docker run -it --rm --entrypoint /root/imodulon_process.sh imodulon --help
```