# Seed2LP: Run

This notebook explains how to run seed2lp, and must be run **AFTER** retrieving BiGG SBML file (see notebook [01_get_sbml_BiGG.ipynb](./01_get_sbml_BiGG.ipynb)) ant getting objective (see notebook [02_get_objectives.ipynb](./02_get_objectives.ipynb))

> Note:
>
> The Seed2lp (seed searching and flux) result files are available: [https://doi.org/10.57745/OS1JND](https://doi.org/10.57745/OS1JND)
>
> After downloadind and unzipping the package, go to 
> - Reasoning results and flux: analyses/results/s2lp_reasoning 
> - Filter, Guess&Check and Guess&Check with diversity (hybrid cobra): analyses/results/s2lp_hyb_cobra 

## **WARNING**
This notebook will run Seed2LP for e_coli_core from BiGG with the target and full network seed searching mode, using all methods (*Reasoning*, *Hybrid-Filter*, *Hybrid-GC*, *Hybrid-GC<sub>Div<sub>*, *Hybrid-lpx*), no accumulation allowed, NE seed inference, using subset minimal and minimize otptimisations. Seed2LP is set to find 30 solutions and will stop if exceed 10 min.

In the paper, the time limit is set to 45 min and number of solutions limit to 1000, it was run for all 107 networks, with and without accumulation, using subset minimal and minimize otptimization, separating the methods (*Reasoning*, *Hybrid-Filter*, *Hybrid-GC*, *Hybrid-GC<sub>Div<sub>*, *Hybrid-lpx* and *FBA*) by not using "all" as a value of argument. Pratical reasons explain this choice: benchmarks were run on a computing cluster, launching tasks in parallel, and using computing nodes in "exclusive" mode for *Hybrid-lpx* and *FBA* to get enough memory to calculate fluxes.

To avoid a long time running within the notebook, the notebook will copy e_coli_core in a sbml directory on a path that you can change.

## Requirements
Module *seed2lp* needed

> Advice:
> 
> Use a conda env called s2lp with python 3.10 for plafrim cluster scripts

In [None]:
!pip install seed2lp

## **Slurm-based cluster**: Reproducing paper data
Slurm-based scripts for cluster are available with 45 min and 1000 for all networks:
- Launch if needed 
    - [01_job_retrieve_bigg_sbml.sh](../../scripts/plafrim_cluster/01_job_retrieve_bigg_sbml.sh): `sbatch 01_job_retrieve_bigg_sbml.sh`
    - [02_job_get_objective.sh](../../scripts/plafrim_cluster/02_job_get_objective.sh): `sbatch 02_job_get_objective.sh`
    - or copy your local files into you cluster
- Change **_source_** variable by the path of your conda environement with seed2lp installed in files: 
    - [03_01_execute_workflow_search.sh](../../scripts/plafrim_cluster/03_01_execute_workflow_search.sh)
    - [03_02_execute_workflow_search_exclusive.sh](../../scripts/plafrim_cluster/03_02_execute_workflow_search_exclusive.sh)
    - [03_01_job_run_s2lp.sh](../../scripts/plafrim_cluster/03_01_job_run_s2lp.sh) 
    - [03_02_job_run_s2lp_exclusive.sh](../../scripts/plafrim_cluster/03_02_job_run_s2lp_exclusive.sh) 
- launch:
    - [03_01_execute_workflow_search.sh](../../scripts/plafrim_cluster/03_01_execute_workflow_search.sh): `sbatch 03_01_execute_workflow_search.sh`
    - [03_02_execute_workflow_search_exclusive.sh](../../scripts/plafrim_cluster/03_02_execute_workflow_search_exclusive.sh): `sbatch 03_02_execute_workflow_search_exclusive.sh`

## **LAUNCH**

### Variable to change (if wanted)

In [10]:
analyse_dir = "../../analyses"
data_dir  = f"{analyse_dir}/data/"
result_dir=f"{analyse_dir}/results"
temp_dir = "../../tmp/"


time_limit = 10 # time limit
number_solution = 30 # number solutions

### Execute

In [11]:
from os import path, makedirs, listdir
from shutil import copyfile

In [None]:
sbml_dir = f"{data_dir}/bigg/sbml"
e_coli_dir = f"{data_dir}/bigg/sbml_e_coli_core"
result_dir = f"{result_dir}/s2lp"
objective_dir = f"{data_dir}/objective"

In [12]:
if not path.isdir(e_coli_dir):
    makedirs(e_coli_dir)
    copyfile(path.join(sbml_dir, "e_coli_core.xml"), path.join(e_coli_dir, "e_coli_core.xml"))

'../../data/bigg/sbml_e_coli_core/e_coli_core.xml'

This function will execute seed2lp for e_coli_core:
- Target and Full Network
- subset minimal and minimize
- *Reasoning*, *Hybrid-Filter*, *Hybrid-GC*, *Hybrid-GC<sub>Div<sub>* and *Hybrid-lpx*
- no accumulation
- maximisation (of flux in Objective reaction)
- Limitations: 30 solutions and 10 min

Also, it will check the flux for each solution and write it into files.

In [5]:
def run_s2lp(in_dir:str):
    for filename in listdir(in_dir):
        species = f'{path.splitext(path.basename(filename))[0]}'
        sbml_path = path.join(in_dir,filename)
        objective_path = path.join(objective_dir,f"{species}_target.txt")
        with open(objective_path) as f:
            objective = f.readline()
        result_path = path.join(result_dir,species)

        run={"target":f"-tf {objective_path}",
         "full":f"-o {objective}"}

        for key,value in run.items():
            command = f"{key} {sbml_path} {result_path} --temp {temp_dir} -tl {time_limit} -nbs {number_solution} -cf -max {value}"
            !seed2lp {command}

The execution might take more than 30min due to finding minimal solutions and due to *Hybrid-lpx* mode (requires lot of time to calculate fluxes)

In [6]:
# "e_coli_dir" can be cahnged by "sbml_dir" for all files run
# But not recommanded within notebook

run_s2lp(e_coli_dir)

[0;96m[1m           
                       _   ___    _   
  ___   ___   ___   __| | |_  \  | | _ __  
 / __| / _ \ / _ \ / _` |   ) |  | || '_ \ 
 \__ \|  __/|  __/| (_| |  / /_  | || |_) |
 |___/ \___| \___| \__,_| |____| |_|| .__/    
                                    |_|         
      [0m
Network name: e_coli_core

____________________________________________

                  TARGETS                   
          FOR TARGET MODE AND FBA           
____________________________________________

Targets set:
    Reactant of objective reaction
    from target file


____________________________________________

                  OBJECTVE                  
                 FOR HYBRID                 
____________________________________________

Objective set:
    Objective reaction from target file


Objective : R_BIOMASS_Ecoli_core_w_GAM



____________________________________________

                  NETWORK                   
____________________________________________

I

### **List of output files**

In the result directory (initially "../../results/s2lp") you will find 4 files.

Seed2lp results files:
- e_coli_core_rm_rxn_tgt_taf_all_max_no_accu_results.json -> Target
- e_coli_core_rm_rxn_fn_all_max_no_accu_results.json -> Full Network

Fluxes files:
- e_coli_core_rm_rxn_tgt_taf_all_max_no_accu_fluxes.tsv -> Target
- e_coli_core_rm_rxn_fn_all_max_no_accu_fluxes.tsv -> Full Network

> Note:
>
> You will find log files in [result_directory]/e_coli_core/logs
>
> Example: ../../results/s2lp/e_coli_core/logs