# Staphylococcus aureus
## Data

In [1]:
import pandas as pd
import numpy as np

### Genomic Features

The table below contains a list of genomic features, including coding DNA.

Each feature is solely identified by BRC ID and associated to a protein family referred as PATRIC genus-specific families (PLfams).

In [2]:
saureus_features = pd.read_csv('saureus_genome_features.csv')

In [3]:
saureus_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10999 entries, 0 to 10998
Data columns (total 21 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   Genome                                   10999 non-null  object 
 1   Genome ID                                10999 non-null  float64
 2   Accession                                10999 non-null  object 
 3   BRC ID                                   10999 non-null  object 
 4   RefSeq Locus Tag                         10703 non-null  object 
 5   Alt Locus Tag                            5488 non-null   object 
 6   Feature ID                               10999 non-null  object 
 7   Annotation                               10999 non-null  object 
 8   Feature Type                             10999 non-null  object 
 9   Start                                    10999 non-null  int64  
 10  End                                      10999

Through this table, we extract useful data to map protein families referred by Nguyen et. al.:

In [4]:
plf = saureus_features[['BRC ID', 'PATRIC genus-specific families (PLfams)']].astype("string")
plf.columns = ['BRC_ID', 'PLFam']
plf.set_index('BRC_ID', inplace = True)
plf.head()

Unnamed: 0_level_0,PLFam
BRC_ID,Unnamed: 1_level_1
fig|1241616.6.peg.978,PLF_1279_00000947
fig|1241616.6.peg.979,PLF_1279_00001869
fig|1241616.6.peg.980,PLF_1279_00000303
fig|1241616.6.peg.981,PLF_1279_00000735
fig|1241616.6.peg.982,PLF_1279_00000362


### Protein Interaction Network

The table below contais pairs of proteins interacting with each other on Staphylococcus aureus protein network, identified by their BRC ID.

In [5]:
saureus_ppi = pd.read_csv('saureus_ppi_patric.csv')
saureus_ppi = saureus_ppi[['Interactor A ID', 'Interactor B ID']].astype("string")
saureus_ppi.columns = ['Interactor_A_ID', 'Interactor_B_ID']
saureus_ppi.head()

Unnamed: 0,Interactor_A_ID,Interactor_B_ID
0,fig|93061.5.peg.452,fig|93061.5.peg.713
1,fig|93061.5.peg.1920,fig|93061.5.peg.1921
2,fig|93061.5.peg.111,fig|93061.5.peg.119
3,fig|93061.5.peg.112,fig|93061.5.peg.121
4,fig|93061.5.peg.1069,fig|93061.5.peg.1071


### Specialty Genes

The table containing specialty genes relates several genomic features to a relevant property:
 - Essential gene
 - Antibiotic resistance
 - Virulence factor
 - Human homolog
 - Drug target
 - Transporter
 
We are particularly interested on properties associated to antibiotic resistance. Besides genes related to antibiotic resistance themselves, it is possible to have causal relation between virulence factor and bacterial resistance.

In [6]:
sa_specialty_genes = pd.read_csv('saureus_specialty_genes.csv')
sa_specialty_genes = sa_specialty_genes[['BRC ID', 'Property']]
sa_specialty_genes.columns = ['BRC_ID', 'Property']
sa_specialty_genes.set_index('BRC_ID', inplace = True)
sa_specialty_genes.Property.unique()

array(['Antibiotic Resistance', 'Essential Gene', 'Virulence Factor',
       'Human Homolog', 'Drug Target', 'Transporter'], dtype=object)

### Conserved Genes used for prediction in Nguyen et. al. 2020

Next table is listening protein families of 10 experiments (each one with 100 non overlapping protein families) selected from a set of conserved genes and used in the paper of Nguyen et. al.

Each protein family has a feature importance value derived from XGBoost, which means a contribution degree from a protein family given to classificate in resistant/susceptible phenotype.

In [7]:
sa_feature_importance = pd.read_excel('saureus_feature_importance.xlsx')

In [8]:
sa_feature_importance

Unnamed: 0,Protein Family ID,Model,Total Feature Importance,Annotation
0,PLF_1279_00001080,1,162.412577,hypothetical protein
1,PLF_1279_00001505,1,81.039855,ABC transporter-like sensor ATP-binding protei...
2,PLF_1279_00001583,1,67.782436,Polysaccharide intercellular adhesin (PIA) bio...
3,PLF_1279_00001118,1,60.701992,"Nickel ABC transporter, substrate-binding prot..."
4,PLF_1279_00001691,1,54.623888,Activator of the mannose operon (transcription...
...,...,...,...,...
995,PLF_1279_00007034,10,0.000000,Cold shock protein of CSP family
996,PLF_1279_00001353,10,0.000000,UPF0398 protein YpsA
997,PLF_1279_00000861,10,0.000000,LSU ribosomal protein L15p (L27Ae)
998,PLF_1279_00000601,10,0.000000,LSU ribosomal protein L30p (L7e)


Once the information used in the paper from Nguyen is given in therms of protein families, we need to associate every feature for a corresponding protein family.

Lets  check if every genome feature in the PPI has a associated Patric Local Family:

In [9]:
saureus_ppi['Interactor_A_ID'].isin(plf.index)[saureus_ppi['Interactor_A_ID'].isin(plf.index)==False]

Series([], Name: Interactor_A_ID, dtype: bool)

In [10]:
saureus_ppi['Interactor_B_ID'].isin(plf.index)[saureus_ppi['Interactor_B_ID'].isin(plf.index)==False]

2085    False
Name: Interactor_B_ID, dtype: bool

There is no PLFam associated to the feature fig|93061.5.peg.894 (line 2085, interactor B).

Before ignore this information, lets also check if there is some relevant characteristic related to this feature:

In [11]:
sa_specialty_genes.loc[sa_specialty_genes.index == saureus_ppi['Interactor_B_ID'].loc[2085]]

Unnamed: 0_level_0,Property
BRC_ID,Unnamed: 1_level_1


There is no property associated to this feature, hence, it can be excluded:

In [12]:
saureus_ppi.drop(2085, axis = 0, inplace = True)
saureus_ppi.reset_index(drop=True, inplace=True)

Now we can map a feature to a Patric Local Family with no problem.

### Writing PPI in terms of PLFams for conserved genes

Creating a new PPI substituing the feature for its Patric Local Familiy for conserved genes: 

In [13]:
saureus_ppi_plfams = saureus_ppi

for i in range(len(saureus_ppi['Interactor_A_ID'])):
    if plf.loc[saureus_ppi['Interactor_A_ID'][i]].isin(sa_feature_importance['Protein Family ID']).bool():
        saureus_ppi_plfams.at[i, 'Interactor_A_ID'] = plf.loc[saureus_ppi['Interactor_A_ID'][i]].PLFam
        
for i in range(len(saureus_ppi['Interactor_B_ID'])):
    if plf.loc[saureus_ppi['Interactor_B_ID'][i]].isin(sa_feature_importance['Protein Family ID']).bool():
        saureus_ppi_plfams.at[i, 'Interactor_B_ID'] = plf.loc[saureus_ppi['Interactor_B_ID'][i]].PLFam        
        
saureus_ppi_plfams.drop_duplicates(subset=None, keep='first', inplace=True)

### Resistance Genes in PPI

In [14]:
sa_specialty_genes[sa_specialty_genes.Property == 'Antibiotic Resistance']

Unnamed: 0_level_0,Property
BRC_ID,Unnamed: 1_level_1
fig|1413510.3.peg.2169,Antibiotic Resistance
fig|93061.5.peg.1154,Antibiotic Resistance
fig|93061.5.peg.2089,Antibiotic Resistance
fig|93061.5.peg.842,Antibiotic Resistance
fig|158879.11.peg.1813,Antibiotic Resistance
...,...
fig|158879.11.peg.2331,Antibiotic Resistance
fig|1241616.6.peg.1396,Antibiotic Resistance
fig|158879.11.peg.647,Antibiotic Resistance
fig|158879.11.peg.2107,Antibiotic Resistance


In [15]:
resistance_genes = sa_specialty_genes.loc[sa_specialty_genes.Property == 'Antibiotic Resistance'].reset_index()

In [16]:
resistance_genes

Unnamed: 0,BRC_ID,Property
0,fig|1413510.3.peg.2169,Antibiotic Resistance
1,fig|93061.5.peg.1154,Antibiotic Resistance
2,fig|93061.5.peg.2089,Antibiotic Resistance
3,fig|93061.5.peg.842,Antibiotic Resistance
4,fig|158879.11.peg.1813,Antibiotic Resistance
...,...,...
264,fig|158879.11.peg.2331,Antibiotic Resistance
265,fig|1241616.6.peg.1396,Antibiotic Resistance
266,fig|158879.11.peg.647,Antibiotic Resistance
267,fig|158879.11.peg.2107,Antibiotic Resistance


We need to find which genes related to antibiotic resistance are in the PPI:

In [17]:
resistance_genes_ppi_A = resistance_genes[resistance_genes['BRC_ID'].isin(saureus_ppi_plfams['Interactor_A_ID'])]['BRC_ID']
resistance_genes_ppi_B = resistance_genes[resistance_genes['BRC_ID'].isin(saureus_ppi_plfams['Interactor_B_ID'])]['BRC_ID']

resistance_genes_ppi = pd.DataFrame(pd.concat([resistance_genes_ppi_A, resistance_genes_ppi_B], axis = 0))
resistance_genes_ppi.reset_index(drop=True, inplace=True)

In [18]:
resistance_genes_ppi

Unnamed: 0,BRC_ID
0,fig|93061.5.peg.2089
1,fig|93061.5.peg.842
2,fig|93061.5.peg.2384
3,fig|93061.5.peg.2139
4,fig|93061.5.peg.1243
...,...
92,fig|93061.5.peg.88
93,fig|93061.5.peg.1310
94,fig|93061.5.peg.471
95,fig|93061.5.peg.2118


### Conserved Genes used for prediction in Nguyen et. al. 2020 in PPI

We also need to find which conserved genes of 10 experiments (each one with 100 non overlapping protein families) used to prediction in the paper are in the PPI

In [19]:
sa_conserved_ppi_A = sa_feature_importance[sa_feature_importance['Protein Family ID'].isin(saureus_ppi_plfams['Interactor_A_ID'])]['Protein Family ID']
sa_conserved_ppi_B = sa_feature_importance[sa_feature_importance['Protein Family ID'].isin(saureus_ppi_plfams['Interactor_B_ID'])]['Protein Family ID']

sa_conserved_ppi = pd.DataFrame(pd.concat([sa_conserved_ppi_A, sa_conserved_ppi_B], axis = 0).drop_duplicates())

In [20]:
sa_conserved_ppi

Unnamed: 0,Protein Family ID
2,PLF_1279_00001583
3,PLF_1279_00001118
5,PLF_1279_00001741
6,PLF_1279_00001743
8,PLF_1279_00000675
...,...
974,PLF_1279_00001003
982,PLF_1279_00002144
988,PLF_1279_00001063
990,PLF_1279_00001416


In [21]:
#### Virulence Factors in PPI

#virulence_genes = sa_specialty_genes.loc[sa_specialty_genes.Property == 'Virulence Factor'].reset_index()

# We need to find which genes related to virulence are in the PPI:


#virulence_genes_ppi_A = virulence_genes[virulence_genes['BRC_ID'].isin(saureus_ppi['Interactor_A_ID'])]
#virulence_genes_ppi_B = virulence_genes[virulence_genes['BRC_ID'].isin(saureus_ppi['Interactor_B_ID'])]

#virulence_genes_ppi = pd.concat([virulence_genes_ppi_A, virulence_genes_ppi_B], axis = 0)

#virulence_genes_ppi.reset_index(drop=True, inplace=True)

# NetworkX

In [22]:
import networkx as nx
import scipy

## Statistics

In [23]:
ppi_info = pd.DataFrame(columns = ['Conserved Gene', 'Shortest Path to an AMR gene (length)',])

ppi_info['Conserved Gene'] = sa_conserved_ppi.reset_index(drop = True)['Protein Family ID']

#### For each conserved gene having a path to an AMR, what is the length of this path? 

In [24]:
ppi_graph_plfams = nx.from_pandas_edgelist(saureus_ppi_plfams, 'Interactor_A_ID', 'Interactor_B_ID')

idx = 0
for i in sa_conserved_ppi['Protein Family ID']:
    lengths = []
    for j in resistance_genes_ppi['BRC_ID']:
        if nx.has_path(ppi_graph_plfams, i, j):
            lengths.append(nx.shortest_path_length(ppi_graph_plfams, i, j))
    if lengths:        
        ppi_info['Shortest Path to an AMR gene (length)'][idx] = min(lengths)
        
    idx += 1

In [25]:
ppi_info['Feature Score'] = sa_feature_importance[sa_feature_importance['Protein Family ID'].isin(sa_conserved_ppi['Protein Family ID'])]['Total Feature Importance'].reset_index(drop = True)

In [26]:
print(ppi_info.groupby(['Shortest Path to an AMR gene (length)']).size().reset_index(name='Count'))

   Shortest Path to an AMR gene (length)  Count
0                                      1    103
1                                      2    300
2                                      3    200
3                                      4     67
4                                      5     10
5                                      6      7


In [27]:
ppi_info

Unnamed: 0,Conserved Gene,Shortest Path to an AMR gene (length),Feature Score
0,PLF_1279_00001583,2,81.039855
1,PLF_1279_00001118,2,67.782436
2,PLF_1279_00001741,2,60.701992
3,PLF_1279_00001743,3,54.623888
4,PLF_1279_00000675,3,51.659804
...,...,...,...
753,PLF_1279_00001003,3,0.000000
754,PLF_1279_00002144,,0.000000
755,PLF_1279_00001063,3,0.000000
756,PLF_1279_00001416,3,0.000000


Removing genes with no path to an AMR gene:

In [28]:
ppi_info = ppi_info[~ppi_info[['Shortest Path to an AMR gene (length)', 'Feature Score']].isnull().any(axis = 1)]

#### What is the correlaton between the feature score and the length of the path?

In [30]:
ppi_info['Shortest Path to an AMR gene (length)'].astype('int').corr(ppi_info['Feature Score'].astype('float64'))

0.04841244141598265

# Running the model with different groups of genes

Once we did not have a satisfactory result calculating correlation, we will try to look ate the model performance using groups of genes constructed according to the path length to an AMR gene. 

However, we can not use different replicates because each strain in a different replicate has different protein families associated. So, the separation considering different replicates will gives different samples and we need to have the same features on each strain.

## Constructing sets of genes according to the path length

From now one, we will consider the set of strains using 500 conserved genes.

Taking protein families from this experiment set:

In [31]:
import os
from os import listdir

plf_500 = []

datadir = 'E:/User/bruna.fistarol/Documents/GitHub/Nguyen_et_al_2020/Staphylococcus/fasta.500.0'
for strain in listdir(datadir):
    with open(os.path.join(datadir, strain), 'r') as sequences:
        for line in sequences:
            if line[0] == '>':
                plf_500.append(line[1:len(line)-1])
                
plf_500 = pd.DataFrame(np.unique(plf_500))
plf_500.columns = ['Protein Family ID']

In [32]:
plf_500

Unnamed: 0,Protein Family ID
0,PLF_1279_00000015
1,PLF_1279_00000025
2,PLF_1279_00000070
3,PLF_1279_00000084
4,PLF_1279_00000090
...,...
495,PLF_1279_00125198
496,PLF_1279_00125895
497,PLF_1279_00126161
498,PLF_1279_00126203


Againg, once the information used in the paper from Nguyen is given in therms of protein families, we need to associate every feature for a corresponding protein family.

Constructing the PPI in terms of protein families for conserved genes:

In [33]:
saureus_ppi = pd.read_csv('saureus_ppi_patric.csv')
saureus_ppi = saureus_ppi[['Interactor A ID', 'Interactor B ID']].astype("string")
saureus_ppi.columns = ['Interactor_A_ID', 'Interactor_B_ID']
saureus_ppi.drop(2085, axis = 0, inplace = True)
saureus_ppi.reset_index(drop=True, inplace=True)

saureus_ppi_plfams2 = saureus_ppi

for i in range(len(saureus_ppi['Interactor_A_ID'])):
    if plf.loc[saureus_ppi['Interactor_A_ID'][i]].isin(plf_500['Protein Family ID']).bool():
        saureus_ppi_plfams2.at[i, 'Interactor_A_ID'] = plf.loc[saureus_ppi['Interactor_A_ID'][i]].PLFam
        
for i in range(len(saureus_ppi['Interactor_B_ID'])):
    if plf.loc[saureus_ppi['Interactor_B_ID'][i]].isin(plf_500['Protein Family ID']).bool():
        saureus_ppi_plfams2.at[i, 'Interactor_B_ID'] = plf.loc[saureus_ppi['Interactor_B_ID'][i]].PLFam        
        
saureus_ppi_plfams2.drop_duplicates(subset=None, keep='first', inplace=True)

Checking which protein families are in PPI:

In [34]:
sa_conserved_ppi_A = plf_500[plf_500['Protein Family ID'].isin(saureus_ppi_plfams2['Interactor_A_ID'])]['Protein Family ID']
sa_conserved_ppi_B = plf_500[plf_500['Protein Family ID'].isin(saureus_ppi_plfams2['Interactor_B_ID'])]['Protein Family ID']

sa_conserved_ppi2 = pd.DataFrame(pd.concat([sa_conserved_ppi_A, sa_conserved_ppi_B], axis = 0).drop_duplicates())

In [35]:
sa_conserved_ppi2

Unnamed: 0,Protein Family ID
0,PLF_1279_00000015
3,PLF_1279_00000084
5,PLF_1279_00000113
6,PLF_1279_00000121
8,PLF_1279_00000138
...,...
479,PLF_1279_00022461
485,PLF_1279_00030959
490,PLF_1279_00088943
494,PLF_1279_00122740


Checking which resistant genes are in PPI:

In [36]:
resistance_genes_ppi_A = resistance_genes[resistance_genes['BRC_ID'].isin(saureus_ppi_plfams2['Interactor_A_ID'])]['BRC_ID']
resistance_genes_ppi_B = resistance_genes[resistance_genes['BRC_ID'].isin(saureus_ppi_plfams2['Interactor_B_ID'])]['BRC_ID']

resistance_genes_ppi2 = pd.DataFrame(pd.concat([resistance_genes_ppi_A, resistance_genes_ppi_B], axis = 0))
resistance_genes_ppi2.reset_index(drop=True, inplace=True)

In [37]:
resistance_genes_ppi2

Unnamed: 0,BRC_ID
0,fig|93061.5.peg.2089
1,fig|93061.5.peg.842
2,fig|93061.5.peg.2384
3,fig|93061.5.peg.2139
4,fig|93061.5.peg.1243
...,...
96,fig|93061.5.peg.1310
97,fig|93061.5.peg.471
98,fig|93061.5.peg.2118
99,fig|93061.5.peg.287


Calculating the distance to an AMR gene:

In [38]:
ppi_info2 = pd.DataFrame(columns = ['Conserved Gene', 'Shortest Path to an AMR gene (length)',])

ppi_info2['Conserved Gene'] = sa_conserved_ppi2.reset_index(drop = True)['Protein Family ID']

In [39]:
ppi_graph_plfams2 = nx.from_pandas_edgelist(saureus_ppi_plfams2, 'Interactor_A_ID', 'Interactor_B_ID')

idx = 0
for i in sa_conserved_ppi2['Protein Family ID']:
    lengths = []
    for j in resistance_genes_ppi2['BRC_ID']:
        if nx.has_path(ppi_graph_plfams2, i, j):
            lengths.append(nx.shortest_path_length(ppi_graph_plfams2, i, j))
    if lengths:        
        ppi_info2['Shortest Path to an AMR gene (length)'][idx] = min(lengths)
        
    idx += 1

In [40]:
print(ppi_info2.groupby(['Shortest Path to an AMR gene (length)']).size().reset_index(name='Count'))

   Shortest Path to an AMR gene (length)  Count
0                                      1     55
1                                      2    152
2                                      3     87
3                                      4     39
4                                      5      6
5                                      6      1
6                                      7      1


To evaluate the model from Nguyen et. al. with genes according to the path until an AMR gene, we need to separate these families. Lets construct them:

In [41]:
for i in ppi_info2['Shortest Path to an AMR gene (length)'].unique():
    globals()[f'plf_length_{i}'] = ppi_info[ppi_info['Shortest Path to an AMR gene (length)'] == i]['Conserved Gene']

The data used is available through the PATRIC FTP (ftp://ftp.patricbrc.org/datasets/) downloading the Nguyen_et_al_2020.tar.gz archive.



In [None]:
mydir = 'E:/User/bruna.fistarol/Documents/GitHub/Fistarol_2022/Staphylococcus'
datadir = 'E:/User/bruna.fistarol/Documents/GitHub/Nguyen_et_al_2020/Staphylococcus'

import os
for i in ppi_info2['Shortest Path to an AMR gene (length)'].unique():
    newdir = f'length.{i}'
    path = os.path.join(mydir, newdir)
    os.mkdir(path)
    
    sample = f'E:/User/bruna.fistarol/Documents/GitHub/Nguyen_et_al_2020/Staphylococcus/fasta.500.0'
    for strain in listdir(sample):
        with open(os.path.join(path, strain), 'a') as mystrain:
            with open(os.path.join(sample, strain), 'r') as sequences:
                first_loop = True
                for line in sequences:
                    if line[0] == '>':
                        if first_loop:
                            plf = line[1:len(line)-1]
                            seq = ''
                            first_loop = False
                            continue
                        if plf in list((globals()[f'plf_length_{i}'])):
                            mystrain.write('>' + plf + '\n')
                            mystrain.write(seq)
                        plf = line[1:len(line)-1]
                        seq = ''
                    else:
                        seq += line
                if plf in list((globals()[f'plf_length_{i}'])):
                            mystrain.write('>' + plf + '\n')
                            mystrain.write(seq)

In [42]:
for i in ppi_info2['Shortest Path to an AMR gene (length)'].unique():
    globals()[f'len_{i}'] = []
    for strain in listdir(f'E:/User/bruna.fistarol/Documents/GitHub/Fistarol_2022/Staphylococcus/length.{i}'):
        with open(os.path.join(f'E:/User/bruna.fistarol/Documents/GitHub/Fistarol_2022/Staphylococcus/length.{i}', strain), 'r') as sequence:
            genes = 0
            for line in sequence:
                if line[0] == '>':
                    genes += 1
            globals()[f'len_{i}'].append(genes)

For each strain considering different path lengths to an AMR gene, we have the following number of genes:

In [43]:
for i in ppi_info2['Shortest Path to an AMR gene (length)'].unique():
    print(i, np.mean(globals()[f'len_{i}']))

1 36.018927444794954
3 69.05520504731861
nan 0.0
2 105.08832807570978
4 28.074132492113566
5 2.001577287066246
7 0.0
6 1.0


At this point, it is possible to use this new configuration of data to run the model. However, it makes more sense taking all the strains with the same number of genes, because using more genes means to be closer to the whole genome sequence.

We can take 25 genes to each strain for lengths equals to 1, 2, 3 and 4, because the results from the paper also are derived from groups of 25 genes, hence, we can compare these results.

In [None]:
import random
mydir = 'E:/User/bruna.fistarol/Documents/GitHub/Fistarol_2022/Staphylococcus'
os.mkdir(os.path.join(mydir, '25genes'))
dir_25genes = 'E:/User/bruna.fistarol/Documents/GitHub/Fistarol_2022/Staphylococcus/25genes'

for j in [0, 1, 2, 3, 4]:
    
    rand_idx = [sorted(random.sample(range(1,36), 25)), 
                sorted(random.sample(range(1,105), 25)), 
                sorted(random.sample(range(1,69), 25)), 
                sorted(random.sample(range(1,28), 25))]
    
    for i in [1, 2, 3, 4]:   

        path = f'E:/User/bruna.fistarol/Documents/GitHub/Fistarol_2022/Staphylococcus/length.{i}'
        mydir = os.path.join(dir_25genes, f'length.{i}.{j}')
        os.mkdir(mydir)

        for strain in listdir(path):
            with open(os.path.join(mydir, strain), 'a') as mystrain:
                with open(os.path.join(path, strain), 'r') as sequences:
                    c = 0
                    first_loop = True
                    for line in sequences:
                        if line[0] == '>':
                            if first_loop:
                                plf = line
                                seq = ''
                                c += 1
                                first_loop = False
                                continue
                            if c in rand_idx[i-1]:
                                mystrain.write(plf)
                                mystrain.write(seq)
                            plf = line
                            seq = ''
                            c += 1
                        else:
                            seq += line
                    if c in rand_idx[i-1]:
                            mystrain.write(plf)
                            mystrain.write(seq)