# Engraftment in FMT patients

Fecal transplant results in both temporary and durable engraftment of donor bacteria. Tracking bacterial sources using 16S (a) and metagenomic species (b) identifies bacteria transferred across hosts as well as the invasion of environmental bacteria. Tracking strains by analyzing (c) the flexible genome or (d) SNVs with StrainFinder identifies similar dynamics at strain resolution. Plots in panels a–d display the frequency of ASVs, species, or strains attibutable to each source over time for each patient. Ecological dynamics could be deconvoluted using StrainFinder (e) demonstrating examples of donor strains that engrafted durably, temporarily, or unsuccessfully. (f) Distribution of donor strains in each patient that did not engraft, temporarily engrafted, or durably engrafted in each FMT patient. (g) Distribution of patient strains that uniformly persisted, temporarily decreased in abundance but persisted, or were lost in each FMT patient.

In [1]:
import pandas as pd, seaborn as sns, numpy as np, matplotlib.pyplot as plt
import scipy.stats as stats
import itertools, os, random

sns.set_style('white')
%matplotlib inline

In [2]:
#read in metadata
md = pd.read_csv('/Users/ndchu/Documents/uc_fmt/metadata/metadata_stool_dna_NV022018.tsv',
                 sep='\t', dtype={'patient': 'object',
                                  'fmt': 'object'})

## Source tracking in 16S and MGX are in the "abx_recovery" file.

## Flexible genome

In [None]:
#set up to read in data tables
ncbi_dir = '/Users/ndchu/Documents/uc_fmt/metagenomics/strains_flexible/ncbi_ref_flex/'

fp_dir = '/Users/ndchu/Documents/uc_fmt/metagenomics/strains_flexible/ncbi_ref_flex/faecalibacterium_prausnitzii_GCF_000162015/k_len'
fp_files = os.listdir(fp_dir)
os.chdir(fp_dir)

In [None]:
ncbi_dir = '/Users/ndchu/Documents/uc_fmt/metagenomics/strains_flexible/ncbi_ref_flex/'

#get lists of genomes with their corresponding data directories
ncbi_genomes = [x for x in os.listdir(ncbi_dir) if not x.startswith('.')]
ncbi_genomes_data = [[x, ncbi_dir + x + '/k_len/'] for x in ncbi_genomes]

In [None]:
#now get dataframes for every genome for plot #1
#structure is [[genome, df], [genome, df]]
ncbi_data_dfs = [[x, read_flex_tables(y, md)] for x, y in ncbi_genomes_data]

In [None]:
#identify strain matches with the induction sample

#patients with their donor induction samples
induction_pairs = [['001', '0044-0076'],
                   ['007', '0044-0115'],
                   ['008', '0073-0028'],
                   ['010', '0044-0157'],
                   ['011', '0073-0028'],
                   ['014', '0044-0169']]

#We'll also have a set of placebo controls for comparison
controls = [['004', '0044-0076'],
            ['005', '0044-0115'],
            ['006', '0073-0028'],
            ['009', '0044-0157'],
            ['012', '0073-0028'],
            ['013', '0044-0169']]


ncbi_induc_match_dfs = [[x, strain_match_wrap(y, induction_pairs)] for x, y in ncbi_data_dfs]

### Iden