# Correlations between transcriptome and microbiome



## Datasets

 * RNAseq matrix with TPM values
 * Feature table with ASV counts

### Associations between RNAseq and 16S data

For the correlations between Kremling and Wallace papers, information about the day and time of sampling and genotype must match in order to do any correlations.

 * Run information for [16S data]()
 * Run information for [RNAseq data]()

Such information can be easily extracted for the RNAseq data based on field `LibraryName`

For Dr. Wallace manuscript, a file is available on [FigShare](https://doi.org/10.6084/m9.figshare.5886769.v2) that can connect plots (therefore genotypes) and sequencing runs

 * 0_plate_key.txt has associations between genotypes and plots (NCBI run metadata have plots as part of the identifier)

In [1]:
plate_key_file = '/home/rsantos/Repositories/maize_microbiome_transcriptomics/16S_wallace2018/0_plate_key.txt'
sra_run_table = '/home/rsantos/Repositories/maize_microbiome_transcriptomics/16S_wallace2018/SraRunInfo_Wallace_etal_2018.csv'

dict_wallace_2018 = {}

In [22]:
with open(plate_key_file, 'r') as file:

    _ = file.readline()

    for line in file:
        fields = line.strip().split('\t')
        fields2 = fields[3].split('_')
        print(f'{fields[2]}\t{fields2[0]}\t{fields2[1]}')

14A0005	LMAD	8
14A0007	LMAD	8
14A0009	LMAD	8
14A0011	LMAD	8
14A0013	LMAD	8
14A0015	LMAD	8
14A0017	LMAD	8
14A0019	LMAD	8
14A0021	LMAD	8
14A0023	LMAD	8
14A0025	LMAD	8
14A0027	LMAD	8
14A0030	LMAD	8
14A0031	LMAD	8
14A0033	LMAD	8
14A0035	LMAD	8
14A0037	LMAD	8
14A0039	LMAD	8
14A0043	LMAD	8
14A0045	LMAD	8
14A0047	LMAD	8
14A0049	LMAD	8
14A0051	LMAD	8
14A0053	LMAD	8
14A0055	LMAD	8
14A0057	LMAD	8
14A0059	LMAD	8
14A0061	LMAD	8
14A0063	LMAD	8
14A0065	LMAD	8
14A0067	LMAD	8
14A0069	LMAD	8
14A0071	LMAD	8
14A0073	LMAD	8
14A0075	LMAD	8
14A0077	LMAD	8
14A0081	LMAD	8
14A0083	LMAD	8
14ABLANK	LMAD	8
14A0085	LMAD	8
14A0089	LMAD	8
14A0091	LMAD	8
14A0095	LMAD	8
14A0097	LMAD	8
14A0099	LMAD	8
14A0101	LMAD	8
14A0103	LMAD	8
14A0105	LMAD	8
14A0111	LMAD	8
14A0113	LMAD	8
14ABLANK	LMAD	8
14A0114	LMAD	8
14A0115	LMAD	8
14A0117	LMAD	8
14A0119	LMAD	8
14A0121	LMAD	8
14A0123	LMAD	8
14A0125	LMAD	8
14A0127	LMAD	8
14A0129	LMAD	8
14A0131	LMAD	8
14A0133	LMAD	8
14A0135	LMAD	8
14A0139	LMAD	8
14A0141	LMAD	8
14A0143	LMAD	8
14A0145	

In [None]:
with open(sra_run_table, 'r') as file:

    _ = file.readline()

    for line in file:
        #TODO continue editing

## Actual computation of correlations

At the time of writing, there are at least two interesting approaches:

 * Deep Graph
 * CorALS

In [4]:
from corals.threads import set_threads_for_external_libraries
set_threads_for_external_libraries(n_threads=1)

In [5]:
import numpy as np

In [6]:
n_features = 20000
n_samples = 50
X = np.random.random((n_samples, n_features))

In [10]:
X.shape

(50, 20000)

In [11]:
# runtime: ~2 sec
from corals.correlation.full.default import cor_full
cor_values = cor_full(X)

In [13]:
cor_values.shape

(20000, 20000)