## Comparison with correlation analysis


We know that there are different scenarios that could occur and could be either positive or negative for the production of the compounds.


Positive:
- Member produces precursor 
- Member produces compound

Negative:
- Member consumes precursor
- Member consumes compound


We want to see if we can match the results from our project partners. 

We need to define:
1. What is a precursor?
2. How will the magitude of the positive/negative interaction be determined.

In [91]:
import sys
sys.path.append('../functions')
sys.path.append('../functions_steadiercom/')

import general_functions as general_func
import steadiercom_samples_plotting

In [52]:
import numpy as np
import pandas as pd

In [10]:
all_mags_paper = general_func.read_allmags_data()

### SteadierCom samples

In [92]:
SC1_C = pd.read_csv("../output/steadiercom_sample_0.1.3/results/results_99_SC1_C.tsv",sep="\t")
SC2_C = pd.read_csv("../output/steadiercom_sample_0.1.3/results/results_99_SC2_C.tsv",sep="\t")
SC1_X = pd.read_csv("../output/steadiercom_sample_0.1.3/results/results_99_SC1_X.tsv",sep="\t")

In [93]:
steadier_sample = pd.concat([SC1_C,SC2_C,SC1_X])

In [95]:
steadier_sample = steadiercom_samples_plotting.preprocessing_func(steadier_sample)

## Correlation

In [96]:
compounds = ["co2","h2","ac","etoh","ppa","but","lac__L","for","ppoh","hxa"]


In [97]:
def mag2genus(steadiercom_sample,all_mags_paper):  
   
    # Find members of this dataset 
    MAGs_steady_com = set(list(steadiercom_sample[steadiercom_sample.donor!="environment"].donor.values)+list(steadiercom_sample[steadiercom_sample.receiver!="environment"].receiver.values)) 
    # Find the genus for the members
    all_mags_paper_99 = all_mags_paper[all_mags_paper.new_coverage>1].copy()
    all_mags_paper_99["Genus"] = all_mags_paper_99.apply(lambda x:x.Genus if isinstance(x.Genus,str) else "f_"+x.Family ,axis=1)
    
    genus_groups = all_mags_paper_99[all_mags_paper_99.index.isin(MAGs_steady_com)].groupby("Genus").groups
    mag2genus_dict = {mag:genus for genus,mags in genus_groups.items() for mag in mags}
    
    return genus_groups,mag2genus_dict,MAGs_steady_com

In [98]:
genus_groups,mag2genus_dict,MAGs_steady_com = mag2genus(steadier_sample,all_mags_paper)

In [99]:
mag2genus_dict

{'CH15-bin.10': 'Aminivibrio',
 'CH15-bin.1': 'Bacteroides',
 'CH7-bin.13': 'Bacteroides',
 'CH13-bin.4': 'Bacteroides',
 'CH8-bin.16': 'Bacteroides',
 'CH15-bin.2': 'Cloacimonas',
 'CH3-bin.2': 'Clostridium',
 'CH3-bin.1': 'Clostridium_S',
 'CH15-bin.17': 'Cupidesulfovibrio',
 'CH13-bin.1': 'Cupidesulfovibrio',
 'CH8-bin.21': 'DMER64',
 'CH15-bin.23': 'DTFZ01',
 'CH15-bin.0': 'DUOS01',
 'CH13-bin.17': 'DUOS01',
 'CH8-bin.5': 'DUOS01',
 'CH15-bin.15': 'Desulfobulbus',
 'CH7-bin.18': 'Desulfobulbus',
 'CH15-bin.8': 'Desulfocurvibacter',
 'CH8-bin.8': 'Desulfocurvibacter',
 'CH7-bin.12': 'Desulfomicrobium',
 'CH14-bin.4': 'Desulfovibrio',
 'CH15-bin.6': 'Desulfovibrio',
 'CH13-bin.11': 'Desulfovibrio',
 'CH13-bin.12': 'Fibro-01',
 'CH8-bin.22': 'Fibro-01',
 'CH7-bin.23': 'Halodesulfovibrio',
 'CH7-bin.16': 'Humidesulfovibrio',
 'CH14-bin.1': 'Lacrimispora',
 'CH15-bin.16': 'Lacrimispora',
 'CH1-bin.6': 'Lacrimispora',
 'CH1-bin.8': 'Lacrimispora',
 'CH13-bin.14': 'Lacrimispora',
 'CH15-b

## Caproate as an example

In [156]:

steadier_sample["genus_donor"] = steadier_sample.apply(lambda x:mag2genus_dict[x.donor] if x.donor!="environment" else np.nan,axis=1)
steadier_sample["genus_receiver"] = steadier_sample.apply(lambda x:mag2genus_dict[x.receiver] if x.receiver!="environment" else np.nan,axis=1)


In [157]:
steadier_sample

Unnamed: 0,donor,receiver,compound,mass_rate,rate,frequency,community,medium,mass_rate*frequency,super_class,genus_donor,genus_receiver
0,environment,CH13-bin.12,M_cell4_e,1.505983e-02,2.259281e-02,1.00,CD_A,SC1_C,1.505983e-02,oligosaccharides,,Fibro-01
1,environment,CH13-bin.25,M_cellb_e,1.088664e-02,3.180480e-02,0.76,CD_A,SC1_C,8.273849e-03,oligosaccharides,,Sphaerochaeta
2,CH13-bin.14,CH13-bin.25,M_glc__D_e,1.085714e-02,6.026541e-02,0.25,CD_A,SC1_C,2.714285e-03,simple sugars,Lacrimispora,Sphaerochaeta
3,environment,CH15-bin.0,M_cell5_e,1.077655e-02,1.300391e-02,0.95,CD_P,SC1_C,1.023772e-02,oligosaccharides,,DUOS01
4,CH13-bin.17,CH13-bin.25,M_glc__D_e,1.052828e-02,5.843997e-02,0.08,CD_A,SC1_C,8.422621e-04,simple sugars,DUOS01,Sphaerochaeta
...,...,...,...,...,...,...,...,...,...,...,...,...
80,environment,CH14-bin.2,M_nac_e,5.909062e-07,4.839475e-06,1.00,CD_X,SC1_X,5.909062e-07,B-vitamins,,Robinsoniella
81,environment,CH3-bin.2,M_thm_e,2.841219e-07,1.070728e-06,1.00,M_X,SC1_X,2.841219e-07,nucleotides and derivatives,,Clostridium
82,environment,CH14-bin.2,M_pnto__R_e,2.670388e-07,1.223678e-06,1.00,CD_X,SC1_X,2.670388e-07,B-vitamins,,Robinsoniella
83,CH14-bin.2,CH14-bin.4,M_h2s_e,3.392386e-08,9.953950e-07,0.04,CD_X,SC1_X,1.356954e-09,gases,Robinsoniella,Desulfovibrio


**Find who is consuming Caproate**

In [158]:
steadier_sample_consuming_compound =  steadier_sample[ (steadier_sample.compound == "M_hxa_e") & (steadier_sample.receiver!="environment")]

In [165]:
receiver_compound = steadier_sample_consuming_compound.receiver.unique()
receiver_compound

array(['CH13-bin.0', 'CH15-bin.7', 'CH7-bin.13'], dtype=object)

**Find who is producing Caproate**

In [166]:
steadier_sample_producing_compound =  steadier_sample[ (steadier_sample.compound == "M_hxa_e")]

In [167]:
steadier_sample_producing_compound.groupby("genus_donor").count()["donor"]

genus_donor
DTFZ01            2
Desulfobulbus     1
Lacrimispora      2
Lentimicrobium    1
UBA2174           1
Name: donor, dtype: int64

In [168]:
donors_compound_all = steadier_sample_producing_compound.donor.unique()
donors_compound_all

array(['CH13-bin.14', 'CH15-bin.23', 'CH7-bin.18', 'CH7-bin.17',
       'CH7-bin.9'], dtype=object)

**Find what precursor they use**

In [169]:
steadier_sample_precursors = steadier_sample[steadier_sample.receiver.isin(donors_compound_all)]

steadier_sample_precursors = steadier_sample_precursors[steadier_sample_precursors.super_class.isin(["oligosaccharides","simple sugars","carbohydrate derivatives","carboxylic acids and anions","alcohols and aldehydes"])]
steadier_sample_precursors

Unnamed: 0,donor,receiver,compound,mass_rate,rate,frequency,community,medium,mass_rate*frequency,super_class,genus_donor,genus_receiver
8,environment,CH13-bin.14,M_cellb_e,7.520236e-03,2.197001e-02,0.48,CD_A,SC1_C,3.609713e-03,oligosaccharides,,Lacrimispora
12,environment,CH15-bin.23,M_cellb_e,5.861117e-03,1.712297e-02,0.31,CD_P,SC1_C,1.816946e-03,oligosaccharides,,DTFZ01
17,environment,CH15-bin.23,M_cell4_e,5.373626e-03,8.061535e-03,0.21,CD_P,SC1_C,1.128461e-03,oligosaccharides,,DTFZ01
20,environment,CH15-bin.23,M_cell5_e,4.803183e-03,5.795933e-03,0.77,CD_P,SC1_C,3.698451e-03,oligosaccharides,,DTFZ01
23,environment,CH13-bin.14,M_cell5_e,4.281572e-03,5.166512e-03,0.98,CD_A,SC1_C,4.195941e-03,oligosaccharides,,Lacrimispora
...,...,...,...,...,...,...,...,...,...,...,...,...
1194,CH7-bin.9,CH7-bin.18,M_glc__D_e,7.235152e-07,4.016062e-06,0.01,CM_P,SC2_C,7.235152e-09,simple sugars,Lentimicrobium,Desulfobulbus
1421,CH7-bin.12,CH7-bin.18,M_meoh_e,4.018524e-08,1.254154e-06,0.03,CM_P,SC2_C,1.205557e-09,alcohols and aldehydes,Desulfomicrobium,Desulfobulbus
1445,environment,CH7-bin.17,M_glyclt_e,2.398899e-08,3.196686e-07,0.03,CM_P,SC2_C,7.196697e-10,alcohols and aldehydes,,UBA2174
1479,CH7-bin.4,CH7-bin.18,M_meoh_e,8.803259e-09,2.747438e-07,0.02,CM_P,SC2_C,1.760652e-10,alcohols and aldehydes,Solidesulfovibrio,Desulfobulbus


In [170]:
precursors = steadier_sample_precursors.compound.unique()
precursors

array(['M_cellb_e', 'M_cell4_e', 'M_cell5_e', 'M_cell3_e', 'M_acald_e',
       'M_lac__L_e', 'M_mal__L_e', 'M_succ_e', 'M_rib__D_e', 'M_fum_e',
       'M_glc__D_e', 'M_actn__R_e', 'M_ocdcea_e', 'M_gal_e', 'M_quin_e',
       'M_glyclt_e', 'M_cit_e', 'M_meoh_e'], dtype=object)

**Who is producing these precursors**

In [171]:
steadier_sample_precursors[steadier_sample_precursors.donor!="environment"]

Unnamed: 0,donor,receiver,compound,mass_rate,rate,frequency,community,medium,mass_rate*frequency,super_class,genus_donor,genus_receiver
56,CH15-bin.0,CH15-bin.23,M_acald_e,1.816542e-03,4.123595e-02,0.53,CD_P,SC1_C,9.627675e-04,alcohols and aldehydes,DUOS01,DTFZ01
65,CH13-bin.12,CH13-bin.14,M_acald_e,1.561566e-03,3.544793e-02,0.08,CD_A,SC1_C,1.249253e-04,alcohols and aldehydes,Fibro-01,Lacrimispora
98,CH13-bin.4,CH13-bin.14,M_acald_e,1.166642e-03,2.648305e-02,0.08,CD_A,SC1_C,9.333134e-05,alcohols and aldehydes,Bacteroides,Lacrimispora
169,CH13-bin.11,CH13-bin.14,M_acald_e,5.454641e-04,1.238217e-02,0.01,CD_A,SC1_C,5.454641e-06,alcohols and aldehydes,Desulfovibrio,Lacrimispora
183,CH15-bin.7,CH15-bin.23,M_acald_e,4.833223e-04,1.097153e-02,0.04,CD_P,SC1_C,1.933289e-05,alcohols and aldehydes,Sphaerochaeta,DTFZ01
...,...,...,...,...,...,...,...,...,...,...,...,...
1087,CH7-bin.17,CH7-bin.18,M_acald_e,1.590037e-06,3.609422e-05,0.77,CM_P,SC2_C,1.224329e-06,alcohols and aldehydes,UBA2174,Desulfobulbus
1176,CH7-bin.4,CH7-bin.17,M_cit_e,8.464844e-07,4.476397e-06,0.03,CM_P,SC2_C,2.539453e-08,carboxylic acids and anions,Solidesulfovibrio,UBA2174
1194,CH7-bin.9,CH7-bin.18,M_glc__D_e,7.235152e-07,4.016062e-06,0.01,CM_P,SC2_C,7.235152e-09,simple sugars,Lentimicrobium,Desulfobulbus
1421,CH7-bin.12,CH7-bin.18,M_meoh_e,4.018524e-08,1.254154e-06,0.03,CM_P,SC2_C,1.205557e-09,alcohols and aldehydes,Desulfomicrobium,Desulfobulbus


**Who is consuming the precursors**

In [172]:
steadier_sample[ (~steadier_sample.receiver.isin(donors_compound_all)) & (steadier_sample.receiver!="environment") & (steadier_sample.compound.isin(precursors))]

Unnamed: 0,donor,receiver,compound,mass_rate,rate,frequency,community,medium,mass_rate*frequency,super_class,genus_donor,genus_receiver
0,environment,CH13-bin.12,M_cell4_e,0.015060,0.022593,1.00,CD_A,SC1_C,0.015060,oligosaccharides,,Fibro-01
1,environment,CH13-bin.25,M_cellb_e,0.010887,0.031805,0.76,CD_A,SC1_C,0.008274,oligosaccharides,,Sphaerochaeta
2,CH13-bin.14,CH13-bin.25,M_glc__D_e,0.010857,0.060265,0.25,CD_A,SC1_C,0.002714,simple sugars,Lacrimispora,Sphaerochaeta
3,environment,CH15-bin.0,M_cell5_e,0.010777,0.013004,0.95,CD_P,SC1_C,0.010238,oligosaccharides,,DUOS01
4,CH13-bin.17,CH13-bin.25,M_glc__D_e,0.010528,0.058440,0.08,CD_A,SC1_C,0.000842,simple sugars,DUOS01,Sphaerochaeta
...,...,...,...,...,...,...,...,...,...,...,...,...
31,CH3-bin.2,CH3-bin.0,M_acald_e,0.000941,0.021350,1.00,M_X,SC1_X,0.000941,alcohols and aldehydes,Clostridium,Sporolactobacillus
33,CH14-bin.1,CH14-bin.4,M_acald_e,0.000846,0.019205,0.80,CD_X,SC1_X,0.000677,alcohols and aldehydes,Lacrimispora,Desulfovibrio
35,CH14-bin.2,CH14-bin.4,M_acald_e,0.000837,0.018991,0.28,CD_X,SC1_X,0.000234,alcohols and aldehydes,Robinsoniella,Desulfovibrio
38,CH14-bin.2,CH14-bin.1,M_mal__L_e,0.000495,0.003748,0.14,CD_X,SC1_X,0.000069,carboxylic acids and anions,Robinsoniella,Lacrimispora


In [173]:
steadier_sample[ (~steadier_sample.receiver.isin(donors_compound_all)) & (steadier_sample.receiver!="environment") & (steadier_sample.compound.isin(precursors))].groupby("genus_receiver").count()["receiver"].sort_values()

genus_receiver
Robinsoniella            1
Clostridium              1
Sporolactobacillus       1
Clostridium_S            2
Microbacter              8
f_Ethanoligenenaceae     8
UBA7706                  8
Mobilitalea             10
Spiro-10                11
Limiplasma              13
Thiomonas               13
UBA6814                 14
Thermodesulfobium       14
Trichococcus            15
Desulfomicrobium        15
Lentimicrobium          16
Bacteroides             16
f_CAG-74                17
Fibro-01                17
Halodesulfovibrio       18
DMER64                  19
Oscillibacter           19
Cloacimonas             21
Proteiniphilum          23
Humidesulfovibrio       24
Syntrophotalea          24
Aminivibrio             25
f_Sphaerochaetaceae     25
Desulfocurvibacter      26
Solidesulfovibrio       28
Desulfobulbus           30
DUOS01                  32
Desulfovibrio           47
Verruco-01              47
Cupidesulfovibrio       49
Lacrimispora            57
Sphaerochaeta