# Precursor: Get tool, run tool, get data

Precursor is a seed searching tool using a different network expansion definition than seed2lp. This is not a released tool but its code is freely available on [Github](https://github.com/bioasp/precursor). We made some changes to make the tool compatible with our benchmark: the code is available in the archive associated to the paper. 

## BEFORE STARTING

Such as [Netseed notebook](05_run_netseed.ipynb), N2PComp is used for launching precursor then generate output results file. 

N2Pcomp is available: [https://doi.org/10.57745/OS1JND](https://doi.org/10.57745/OS1JND)

After downloadind and unzipping the package, go to analyses/tools/N2PComp


### IF N2PComp not already install, follow these steps
When N2PComp is downloaded, install it on a conda environment (advice for cluster scripts: called s2lp)

**STEPS**
- On the console, go on the root of N2PComp directory: `cd [downloaded N2PComp direcotry]`
- Install N2PComp using `pip install .`



For the N2PComp to work, it is necessary to install precursor also.

Go to analyses/tools/precursor/

When precursor is downloaded, install it on the same conda environment 

**STEPS**
- On the console, go on the root of precursor directory: `cd [downloaded precursor direcotry]`
- Install precursor using `pip install .`



N2PComp will perform netseed, and from netseed results it will produce result files partly compatible with Seed2LP, by performing a scalar product of sets composing sets of seed results.

## **WARNING**
This notebook will run precursor using N2PComp for Full Network and Target mode, limiting to 10 min/30 solutions.

In the paper, the time limit is set to 45 min and number of solutions limit to 1000.

To avoid a long running time running in the notebook, the notebook will copy the normalised e_coli_core in an sbml directory on a path that you can change.

> Note:
>
> - All  result files from precursor search are available: [https://doi.org/10.57745/OS1JND](https://doi.org/10.57745/OS1JND)
>    - go to analyses/results/precursor
> - All formated results file from precursor search are available: [https://doi.org/10.57745/OS1JND](https://doi.org/10.57745/OS1JND)
>    - go to analyses/results/precursor_formated_results

Because of temporary file management reasons, it is best **not** to parallelise the precursor search through N2PComp.

## Requirements
The Network must have been normalised before starting. See [04_normalise_network.ipynb](04_normalise_network.ipynb)

N2PComp and precursor have to be installed on env conv: 

- Copy the downloaded N2PComp (which include NetSeedPerl package on the root of the direcotry, "NetSeedPerl" directory must be present)
- Go on root of directory and install N2PComp using `pip install .`
- Copy the downloaded precursor
- Go on root of directory and install precursor using `pip install .`

Seed2LP has to be installed.

> Advice:
> 
> Use a conda env called s2lp

## **Slurm-based cluster**: Reproducing paper data
Slurm-based scripts for cluster are available:
- Launch if needed 
    - [04_job_run_sbml_normalisation.sh](../../scripts/plafrim_cluster/04_job_run_sbml_normalisation.sh): `sbatch 04_job_run_sbml_normalisation.sh`
    - or copy your local files into you cluster
- Change **_source_** variable by the path of your conda environement with tools-comparison installed in files: [06_1_job_run_n2pcomp_precursor.sh](../../scripts/plafrim_cluster/06_1_job_run_n2pcomp_precursor.sh) and on [06_2_job_format_precursor_results.sh](../../scripts/plafrim_cluster/06_2_job_format_precursor_results.sh)
- launch [06_1_job_run_n2pcomp_precursor.sh](../../scripts/plafrim_cluster/06_1_job_run_n2pcomp_precursor.sh): `sbatch 06_1_job_run_n2pcomp_precursor.sh`
- When the job is done, you can launch [06_2_job_format_precursor_results.sh](../../scripts/plafrim_cluster/06_2_job_format_precursor_results.sh): `sbatch 06_2_job_format_precursor_results.sh`

## **LAUNCH**

### Variable to change (if wanted)

In [1]:
analyse_dir = "../../analyses"
data_dir  = f"{analyse_dir}/data/"
result_dir=f"{analyse_dir}/results"
result_dir=f"{analyse_dir}/tools/N2PComp"

time_limit = 10 # time limit
number_solution = 30 # number solutions

### Execute

In [2]:
from os import path, makedirs, listdir
from shutil import copyfile

In [None]:
sbml_dir = f"{data_dir}/bigg/sbml"
norm_sbml_dir=f"{data_dir}/sbml_corrected"
precursor_result_dir=f"{result_dir}/precursor"
precursor_form_result_dir=f"{result_dir}/precursor_formated_results"
objective_dir = f"{data_dir}/objective"
target_dir=f"{data_dir}/target"

e_coli_norm_dir =f"{data_dir}/sbml_corrected_e_coli_core"
e_coli_dir =f"{data_dir}/bigg/sbml_e_coli_core"

In [3]:
if not path.isdir(e_coli_norm_dir):
    makedirs(e_coli_norm_dir)

copyfile(path.join(norm_sbml_dir, "e_coli_core.xml"), path.join(e_coli_norm_dir, "e_coli_core.xml"))

if not path.isdir(precursor_result_dir):
    makedirs(precursor_result_dir)

In [4]:
def run_precursor(normalised_sbml_dir, source_sbml_dir:str, out_dir:str):
    conf_file=path.join(n2pcomp_dir,"config_precursor.yaml")

    
    for filename in listdir(normalised_sbml_dir):
        species = f'{path.splitext(path.basename(filename))[0]}'
        sbml_normalised_path = path.join(normalised_sbml_dir,filename)
        sbml_path = path.join(source_sbml_dir,filename)

        species_result_dir=path.join(out_dir,species)
        if not path.isdir(species_result_dir):
            makedirs(species_result_dir)
        result_fn_dir=path.join(species_result_dir,"full_network")
        if not path.isdir(result_fn_dir):
            makedirs(result_fn_dir)
        result_tgt_dir=path.join(species_result_dir,"target")
        if not path.isdir(result_tgt_dir):
            makedirs(result_tgt_dir)

        objective_path = path.join(objective_dir,f"{species}_target.txt")
        with open(objective_path) as f:
            objective = f.readline()

        # Get targetted metabolites
        # Precursor has a target mode and needs a target fil compose of targetted metabolite  
        target_dir=path.join(data_dir,"target")
        if not path.isdir(target_dir):
            makedirs(target_dir)
            target_command=f"objective_targets {sbml_normalised_path} {target_dir} -o {objective}"
            !seed2lp {target_command}
        target_file=path.join(target_dir,f"{species}_targets.txt")

        # Precursor has to be run with the normalise sbml
        precursor_command_tgt=f"-m n2pcomp run {sbml_normalised_path} --output {result_tgt_dir} -c {conf_file} -nbs {number_solution} -tl {time_limit} -t {target_file}"
        !python {precursor_command_tgt}

        precursor_command_fn=f"-m n2pcomp run {sbml_normalised_path} --output {result_fn_dir} -c {conf_file} -nbs {number_solution} -tl {time_limit}"
        !python {precursor_command_fn}

        # Finish formating results
        file = "../../scripts/05_format_results.py"

        precursor_tgt_result_file=path.join(result_tgt_dir,"results.json")
        precursor_fn_result_file=path.join(result_fn_dir,"results.json")

        form_result_path=path.join(precursor_form_result_dir,species)
        if not path.isdir(form_result_path):
            makedirs(form_result_path)

        precursor_form_tgt_result_file=path.join(form_result_path,f"{species}_precursor_tgt_results.json")
        precursor_form_fn_result_file=path.join(form_result_path,f"{species}_precursor_fn_results.json")
        

        format_tgt_command=f"{file} {precursor_tgt_result_file} {species} {objective} {precursor_form_tgt_result_file} PRECURSOR"
        !python {format_tgt_command}

        format_fn_command=f"{file} {precursor_fn_result_file} {species} {objective} {precursor_form_fn_result_file} PRECURSOR"
        !python {format_fn_command}

        # Execute check_flux with Seed2LP with the orginal path because seed2lp needs the complete
        # network to fin the import reaction that have been shut down. Seed2lp will always normalise
        # the network befor calculating flux.
        flux_tgt_command=f"flux {sbml_path} {precursor_form_tgt_result_file} {form_result_path}"
        !seed2lp {flux_tgt_command}

        flux_fn_command=f"flux {sbml_path} {precursor_form_fn_result_file} {form_result_path}"
        !seed2lp {flux_fn_command}

In [6]:
run_precursor(e_coli_norm_dir, e_coli_dir, precursor_result_dir)

Directory ../../results/precursor/e_coli_core/target/ already exists. Overwritting directory...
 Netseed Directory given : /home/cghassem/miniconda3/envs/test/lib/python3.10/site-packages/n2pcomp/NetSeedPerl
Reading network from  ../../data/sbml_corrected_e_coli_core/e_coli_core.xml ...done.
Reading inputs from  ../../results/precursor/e_coli_core/target/inputs/precursor_input.xml ...
done.
    16 target metabolites.
    0 possible seed metabolites.

Autocompute possible seeds ...done.
    17 possible seeds found.

Testing targets for producibility ...done.
    16  targets can be produced:
    "M_gln__L_c"
    "M_h2o_c"
    "M_r5p_c"
    "M_atp_c"
    "M_e4p_c"
    "M_nadph_c"
    "M_3pg_c"
    "M_oaa_c"
    "M_g6p_c"
    "M_pep_c"
    "M_glu__L_c"
    "M_g3p_c"
    "M_nad_c"
    "M_f6p_c"
    "M_accoa_c"
    "M_pyr_c"

Compute subset minimal precursor sets ...done.
17  subset minimal precursor sets found.
Solution 1: "M_h2o_e"  
Solution 2: "M_nh4_e"  
Solution 3: "M_ac_e"  
Solution 

### **List of output files**

The precursor_formated_result directory contains:
- Directory logs: one log file by sbml metabolic network. It shows modification performed on the original file.
- 2 files *_results.json containing the formatted results compatible with Seed2LP. This file is used to launch the check-flux mode of Seed2LP.
- 2 files *_fluxes_from_result.tsv: This is the output of check-flux while using the check flux mode from a s2lp file or result file formatted to be like Seed2LP results file