# Automatic atom mapping - example

This notebook demonstrates how the `atom_mapping` module works in practice. It's purpose is to reduce the workload when preparing input data for MFA analysis on INCA. 

The only input required is the COBRA model that contains all reaction data, and, most importantly, references for metabolite structures in KEGG Compound, HMDB, CHEBI databases, or an InChI key.

#### First, import required modules:

In [1]:
from BFAIR import atom_mapping
from BFAIR.mfa.INCA import INCA_input_parser

#### Prepare input dataframes

In [2]:
model, reaction_data, metabolite_data = INCA_input_parser.parse_cobra_model('data/atom_mapping_Data/e_coli_core.json', 'e_coli_core', '2021-07-15')

Workflow continues by fetching and storing all of the available metabolite structures in Molfile format.

#### Initialise MolfileDownloader and fetch the structures

In [5]:
downloader = atom_mapping.MolfileDownloader(metabolite_data)
downloader.generate_molfile_database()

Fetching metabolite structures...




Successfully fetched 72/72 metabolites


Note: MolfileDownloader takes a second optional argument that allows user to specify preference of databases to search first. By default, 'InChI key -> InChI string -> structure' approach is preferred, and then databases are used for search. Check documentation for more information.

#### Write reactions in RXN format

Here, all of the obtained compound structure files are used to write reactions in RXN format, using `reaction_data` dataframe as a reference.

In [7]:
atom_mapping.write_rxn_files(reaction_data)

Excluded BIOMASS_Ecoli_core_w_GAM reaction from mapping
Generated 94/95


#### Run RDT to obtain atom mappings

RDT algorithm is downloaded and stored in the working directory (deleted after function is done). 

**NOTE**: Java is required to run the algorithm, please make sure it's installed on your computer.

In [8]:
atom_mapping.obtain_atom_mappings(max_time=20) # specify time limit for single reaction

Mapping reactions...
Reactions mapped in total: 85/94


#### Parse data from generated mappings to suitable format for INCA

In [11]:
reaction_mapping_df = atom_mapping.parse_reaction_mappings()
met_mapping_df = atom_mapping.parse_metabolite_mappings()



#### Generate CSV output of mapping data

In [12]:
atom_mapping.generate_INCA_mapping_input(reaction_mapping_df, met_mapping_df)

The generated CSV files can be used in general MFA workflow, as atom mapping inputs.

#### Clear the working directory of generated output (optional)

In [2]:
atom_mapping.clean_output()