# LPD to RS to WP Mapping

### Methods
<ol>
    <li>Load LPD to RS mapping as it is found in Henson et al 2018 supplementary information </li>
    <li>Load text file that contains gene infomation about the Broad Institute's R. opacus genome assembly</li>
    <li>Create dictionary with RS gene ID's as keys and WP gene ID's as values</li>
    <li>Add WP gene ids to gene mapping data frame</li>
    <li>Save data frame as .csv</li>
</ol>

### Imports

In [None]:
import pandas as pd

### Load LPD to RS data from the supplement of Henson et al (2018)
[Multi-omic elucidation of aromatic catabolism in adaptively evolved Rhodococcus opacus](https://www.sciencedirect.com/science/article/pii/S1096717618300910?via%3Dihub)

In [None]:
LPD_to_RS_df = pd.read_csv('LPD_to_RS_gene_mappping.csv')
# Remove the unneed '\t' from the end of LPD genes
LPD_to_RS_df['Gene ID (LPD)'] = [val.split('\t')[0] for val in LPD_to_RS_df['Gene ID (LPD)']]
LPD_to_RS_df.head()


### Load the R. opacus PD630 gtf file as txt file
The gtf file can be downloaded from the [R. opacus ncbi assembly page](https://www.ncbi.nlm.nih.gov/assembly/GCF_000234335.1).

In [None]:
f = open("rhodococcus_opacus_pd630_gtf.txt", "r")
full_text = f.read()
full_text[:1000]

### Create dictionary to map RS gene IDs to WP gene IDs

In [None]:
gene_list = full_text.split('gene_id')
gene_dictionary = {}
for gene in gene_list:
    if 'OPAG_RS' in gene and 'RefSeq:WP_' in gene:
        RS_name = 'PD630_' + gene.split('OPAG_')[1].split('"')[0]
        WP_name = gene.split('RefSeq:')[1].split('"')[0]
        gene_dictionary[RS_name] = WP_name

### Add WP Annotations to the gene mapping dataframe

In [None]:
WP_annotations = [gene_dictionary[RS_name] if RS_name in gene_dictionary.keys() else 'none' for RS_name in LPD_to_RS_df['Gene ID (RS)']]
LPD_to_RS_df.insert(2, 'Gene ID (WP)', WP_annotations)

LPD_to_RS_df

### Save gene mapping dataframe as a csv

In [None]:
LPD_to_RS_df.to_csv('LPD_RS_WP_gene_mapping.csv')