# NetSeed: Get tool, run tool, get data

NetSeed [Carr and Borenstein, 2012] is a tool from [Borenstein Lab]https://borensteinlab.sites.tau.ac.il/, usually available [here](https://borensteinlab.sites.tau.ac.il/items-1/netseed). The Perl version download link is not available anymore. This is a seed searching tool based on graph analyses.

We downloaded the tool on 2022-01-19 and the tool is avaiblable in N2PComp archive: N2Pcomp/NetSeedPerl

## BEFORE STARTING

For the paper, NetSeed had been downloaded and integrated to a third part tool, called N2PComp, to make it easier to benchmark. N2PComp is used for launching NetSeed (and precursor) then generate output results file. 

N2Pcomp is available: [https://doi.org/10.57745/OS1JND](https://doi.org/10.57745/OS1JND)

After downloading and unzipping the package, go to analyses/tools/N2PComp

When N2PComp is downloaded, install it on a conda environment (advice for cluster scripts: call it s2lp)

**STEPS**
- On the console, go on the root of N2PComp directory: `cd [downloaded N2PComp direcotry]`
- Install N2PComp using `pip install .`



For the N2PComp to work, it is necessary to install precursor additionally.

Go to analyses/tools/precursor/

When precursor is downloaded, install it on the same conda environment 

**STEPS**
- On the console, go on the root of precursor directory: `cd [downloaded precursor directory]`
- Install precursor using `pip install .`



N2PComp will execute NetSeed, and from NetSeed results it will produce result files partly compatible with Seed2LP, by performing a scalar product of sets composing sets of seed results.

## **WARNING**
This notebook will run NetSeed using N2PComp with default parameters, limiting to 10 min or 30 solutions obtained.

In the paper, the time limit is set to 45 min and number of solutions limit to 1000.

To avoid a long running time in the notebook execution, the notebook will copy the normalised e_coli_core in a sbml directory on a path that you can change.

> Note:
>
> - All result files from NetSeed search are available: [https://doi.org/10.57745/OS1JND](https://doi.org/10.57745/OS1JND)
>    - go to analyses/results/netseed
> - All formated result files from NetSeed search are availabl: [https://doi.org/10.57745/OS1JND](https://doi.org/10.57745/OS1JND)
>    - go to analyses/results/netseed_formated_results

Because of temporary file management reasons, it is best **not** to parallelise the NetSeed search through N2PComp.

## Requirements
The Network must have been normalised before starting. See [04_normalise_network.ipynb](04_normalise_network.ipynb)

N2PComp and precursor have to be installed on env conv: 

- Copy the downloaded N2PComp (which include NetSeedPerl package on the root of the direcotry, "NetSeedPerl" directory must be present)
- Go on root of directory and install N2PComp using `pip install .`
- Copy the downloaded precursor
- Go on root of directory and install precursor using `pip install .`

Seed2LP has to be installed.

> Advice:
> 
> Use a conda env called s2lp

## **Slurm-based cluster**: Reproducing paper data
Slurm-based scripts for cluster are available:
- Launch if needed 
    - [04_job_run_sbml_normalisation.sh](../../scripts/plafrim_cluster/04_job_run_sbml_normalisation.sh): `sbatch 04_job_run_sbml_normalisation.sh`
    - or copy your local files into you cluster
- Change **_source_** variable by the path of your conda environment with tools-comparison installed in files: [05_1_job_run_n2pcomp_netseed.sh](../../scripts/plafrim_cluster/05_1_job_run_n2pcomp_netseed.sh) and on [05_2_job_format_netseed_results.sh](../../scripts/plafrim_cluster/05_2_job_format_netseed_results.sh)
- launch [05_1_job_run_n2pcomp_netseed.sh](../../scripts/plafrim_cluster/05_1_job_run_n2pcomp_netseed.sh): `sbatch 05_1_job_run_n2pcomp_netseed.sh`
- When the job is done, you can launch [05_2_job_format_netseed_results.sh](../../scripts/plafrim_cluster/05_2_job_format_netseed_results.sh): `sbatch 05_2_job_format_netseed_results.sh`

## **LAUNCH**

### Variable to change (if wanted)

In [1]:
analyse_dir = "../../analyses"
data_dir  = f"{analyse_dir}/data/"
result_dir=f"{analyse_dir}/results"
result_dir=f"{analyse_dir}/tools/N2PComp"
#n2pcomp_dir="../../../N2PComp"

time_limit = 10 # time limit
number_solution = 30 # number solutions

### Execute

In [2]:
from os import path, makedirs, listdir
from shutil import copyfile

In [None]:
sbml_dir = f"{data_dir}/bigg/sbml"
norm_sbml_dir=f"{data_dir}/sbml_corrected"
netseed_result_dir=f"{result_dir}/netseed"
netseed_form_result_dir=f"{result_dir}/netseed_formated_results"
objecive_dir = f"{data_dir}/objective"

e_coli_norm_dir =f"{data_dir}/sbml_corrected_e_coli_core"
e_coli_dir =f"{data_dir}/bigg/sbml_e_coli_core"

In [3]:
if not path.isdir(e_coli_norm_dir):
    makedirs(e_coli_norm_dir)

copyfile(path.join(norm_sbml_dir, "e_coli_core.xml"), path.join(e_coli_norm_dir, "e_coli_core.xml"))

if not path.isdir(netseed_result_dir):
    makedirs(netseed_result_dir)
    

In [3]:
def run_netseed(normalised_sbml_dir, source_sbml_dir:str, out_dir:str):
    conf_file=path.join(n2pcomp_dir,"config_netseed.yaml")
    
    for filename in listdir(normalised_sbml_dir):
        species = f'{path.splitext(path.basename(filename))[0]}'
        sbml_normalised_path = path.join(normalised_sbml_dir,filename)
        sbml_path = path.join(source_sbml_dir,filename)
        species_result_dir=path.join(out_dir,species)
        if not path.isdir(species_result_dir):
            makedirs(species_result_dir)

        # Netseed has to be run with the normalise sbml
        netseed_command=f"-m n2pcomp run {sbml_normalised_path} --output {species_result_dir} -c {conf_file} -nbs {number_solution} -tl {time_limit}"

        !python {netseed_command}

        # Finish formating results
        file = "../../scripts/05_format_results.py"
        objective_path = path.join(objecive_dir,f"{species}_target.txt")
        netseed_result_file=path.join(species_result_dir,"results.json")
        form_result_path=path.join(netseed_form_result_dir,species)
        if not path.isdir(form_result_path):
            makedirs(form_result_path)
        netseed_form_result_file=path.join(form_result_path,f"{species}_netseed_results.json")
        with open(objective_path) as f:
            objective = f.readline()
        
        format_command=f"{file} {netseed_result_file} {species} {objective} {netseed_form_result_file} NETSEED"
        !python {format_command}

        # Execute check_flux with Seed2LP with the orginal path because seed2lp needs the complete
        # network to fin the import reaction that have been shut down. Seed2lp will always normalise
        # the network befor calculating flux.
        flux_command=f"flux {sbml_path} {netseed_form_result_file} {form_result_path}"
        !seed2lp {flux_command}

In [10]:
run_netseed(e_coli_norm_dir, e_coli_dir, netseed_result_dir)

Directory ../../results/netseed/e_coli_core/ already exists. Overwritting directory...
 Netseed Directory given : /home/cghassem/miniconda3/envs/test/lib/python3.10/site-packages/n2pcomp/NetSeedPerl
/home/cghassem/miniconda3/envs/test/lib/python3.10/site-packages/n2pcomp/netseed.py
NETSEED OUTPUT:  M_fum_e
M_fru_e
M_gln__L_e
M_mal__L_e
M_o2_e,M_o2_c
M_glc__D_e

NETSEED ERR:  NO ERROR
NETSEED ENUMERATION 
[0;96m[1m           
                       _   ___    _   
  ___   ___   ___   __| | |_  \  | | _ __  
 / __| / _ \ / _ \ / _` |   ) |  | || '_ \ 
 \__ \|  __/|  __/| (_| |  / /_  | || |_) |
 |___/ \___| \___| \__,_| |____| |_|| .__/    
                                    |_|         
      [0m
Network name: e_coli_core
____________________________________________

[0;96m
############################################
############################################
                 [1mCHECK FLUX[0;96m
############################################
#####################################

### **List of output files**

The netseed_formated_result directory contains:
- Directory logs: one log by sbml file. It shows modification performed on the original metabolic network.
- a file *_results.json containing the formatted results compatible with Seed2LP. This file is used to launch the check-flux mode of Seed2LP.
- a file *_fluxes_from_result.tsv: This is the output of check-flux while using the check flux mode from a s2lp file or result file formatted to be compatible with Seed2LP results files.