# Processing the transporter data

## Overview

This notebook fetches transporter cluster information from the [transporter](https://github.com/johnne/transporters) GitHub repository and metaomic gene abundances and annotations from [figshare](https://figshare.com/s/6e05aa0ea8353098a503).

In [1]:
import pandas as pd
import os
import numpy as np
import urllib
import hashlib

In [2]:
def sha256sum(filename):
    h  = hashlib.sha256()
    b  = bytearray(128*1024)
    mv = memoryview(b)
    with open(filename, 'rb', buffering=0) as f:
        for n in iter(lambda : f.readinto(mv), 0):
            h.update(mv[:n])
    return h.hexdigest()

In [3]:
def file2df(f, drop=None, axis=1, rename=None):
    df = pd.read_csv(f, index_col=0, sep="\t", header=0)
    if drop:
        df.drop(drop, axis=axis, inplace=True)
    if rename:
        if axis == 1:
            df.rename(columns=rename, inplace=True)
        elif axis == 0:
            df.rename(index=rename, inplace=True)
    return df

## Set up the metaomic data

### Download data from figshare

Download the abundance data of ORFs in the co-assembly, as well as tables containing taxonomic information.

In [4]:
# Define data files
data_files = {'data/mg/all_genes.raw_counts.taxonomy.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168053', 'sha256': '700b83f864791ba801a5912f2673d2e3c09f0e70cf8a0ee685489f705fa75dbc'},
              'data/mg/all_genes.raw_counts.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168047', 'sha256': '4d532c1f2126028cef6be531fb39802d1d31a27a5e5abba480782607eb419f4f'},
              'data/mg/all_genes.tpm.taxonomy.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168017', 'sha256': '9f4b29218009c75969d312b0d62243459699fde548efc03545d30e2f262f19e1'},
              'data/mg/all_genes.tpm.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168011', 'sha256': 'f38aedf2151277d88f6f9fe92af64503baa9ed146ad607b050fb48a488a9a8d8'},
              'data/mt/all_genes.tpm.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168020', 'sha256': '881a73bcc670f74b567b3875c65c64fd0097ba8b18eabc1ddf159ae39699ac86'},
              'data/mt/all_genes.tpm.taxonomy.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168023', 'sha256': '89bb9ab8fd34e29df486961147875d3f2aa613b993c88e13694863322475af71'},
              'data/mt/all_genes.raw_counts.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168026', 'sha256': 'c32ca173e359369f5558b695e1d3108105e59dcc6b3478c7669e32ea3d93825a'},
              'data/mt/all_genes.raw_counts.taxonomy.tsv.gz': {'url': 'https://ndownloader.figshare.com/files/15168035', 'sha256': '77efa5d4a1cd22cbd367b41981553ec489a85598ee7328f778c5497976278a04'}}

In [5]:
for f, d in data_files.items():
    os.makedirs(os.path.dirname(f), exist_ok=True)
    download = False
    if os.path.exists(f):
        if sha256sum(f) == d['sha256']:
            print("File {} exists".format(f))
            continue
        else:
            print("File {} has wrong hash. Re-downloading")
            download = True
    else:
        download = True
    if download:
        url = d['url']
        print("Downloading file {} from {}".format(f, url))
        urllib.request.urlretrieve(url, f)
        if sha256sum(f) == d['sha256']:
            print("{} OK".format(f))
        else:
            print("{} FAILED. Please try re-downloading.".format(f))

Downloading file data/mg/all_genes.raw_counts.taxonomy.tsv.gz from https://ndownloader.figshare.com/files/15168053
data/mg/all_genes.raw_counts.taxonomy.tsv.gz OK
Downloading file data/mg/all_genes.raw_counts.tsv.gz from https://ndownloader.figshare.com/files/15168047
data/mg/all_genes.raw_counts.tsv.gz OK
Downloading file data/mg/all_genes.tpm.taxonomy.tsv.gz from https://ndownloader.figshare.com/files/15168017
data/mg/all_genes.tpm.taxonomy.tsv.gz OK
Downloading file data/mg/all_genes.tpm.tsv.gz from https://ndownloader.figshare.com/files/15168011
data/mg/all_genes.tpm.tsv.gz OK
Downloading file data/mt/all_genes.tpm.tsv.gz from https://ndownloader.figshare.com/files/15168020
data/mt/all_genes.tpm.tsv.gz OK
Downloading file data/mt/all_genes.tpm.taxonomy.tsv.gz from https://ndownloader.figshare.com/files/15168023
data/mt/all_genes.tpm.taxonomy.tsv.gz OK
Downloading file data/mt/all_genes.raw_counts.tsv.gz from https://ndownloader.figshare.com/files/15168026
data/mt/all_genes.raw_coun

Download the environmental data.

In [6]:
urllib.request.urlretrieve("https://ndownloader.figshare.com/files/15175808", "data/LMO.time.series.metadata.csv")

('data/LMO.time.series.metadata.csv', <http.client.HTTPMessage at 0x10d063f98>)

Download TIGRFAM annotations for ORFs. This is done directly from the [Alneberg et al 2018](https://doi.org/10.6084/m9.figshare.c.3831631.v1) collection.

In [7]:
os.makedirs("data/annotations", exist_ok=True)
urllib.request.urlretrieve("https://ndownloader.figshare.com/files/9448027", "data/annotations/all.TIGRFAM.standardized.tsv.gz")

('data/annotations/all.TIGRFAM.standardized.tsv.gz',
 <http.client.HTTPMessage at 0x10d0895f8>)

### Retrieve transporter information

Protein families associated with transporter functions have been identified using the https://github.com/johnne/transporters repository. Transporter protein families are clustered using cross-referencing of reviewed entries in the UniProt database (see the GitHub transporter [wiki](https://github.com/johnne/transporters/wiki) for details). Here we use transporter clustering created using the `2017_12` UniProt version.

In [8]:
uniprot_ver = "2017_12"

In [9]:
transdef = pd.read_csv("https://raw.githubusercontent.com/ChristoferLNU/transporters/master/results/transport-clusters.{}.tab".format(uniprot_ver), 
                       header=None, sep="\t", names=["transporter","fam"])
print("{} transporters, {} protein families".format(len(transdef.transporter.unique()), len(transdef.fam)))

1076 transporters, 1403 protein families


We limit transporters to the ones with at least one TIGRFAM entry.

In [10]:
transdef = transdef.loc[transdef.fam.str.contains("TIGR")]
print("{} remaining transporters, {} TIGRFAMs".format(len(transdef.transporter.unique()), len(transdef.fam)))

406 remaining transporters, 458 TIGRFAMs


### TIGRFAM annotations

Here we load the TIGRFAM annotations for ORFs in the metagenomic co-assembly.

In [11]:
tigrfams = pd.read_csv("data/annotations/all.TIGRFAM.standardized.tsv.gz", usecols=[0,1],names=["gene_id","fam"],header=0,sep="\t")
tigrfams.head(10)

Unnamed: 0,gene_id,fam
0,k99_10000020_1,TIGR00214
1,k99_10000020_2,TIGR00510
2,k99_10000077_5,TIGR01473
3,k99_1000008_1,TIGR00200
4,k99_10000154_2,TIGR00049
5,k99_10000155_3,TIGR01904
6,k99_1000015_1,TIGR01941
7,k99_10000270_10,TIGR01063
8,k99_10000270_12,TIGR00181
9,k99_10000270_15,TIGR00696


### Merge with annotations

The annotation table is then merged with the transporter definitions.

In [12]:
gene_trans = pd.merge(tigrfams, transdef, left_on="fam", right_on="fam")
print(" {} open reading frames, {} transporters, {} TIGRFAMs".format(len(gene_trans.gene_id.unique()), len(gene_trans.transporter.unique()), len(gene_trans.fam.unique())))

 66029 open reading frames, 275 transporters, 314 TIGRFAMs


In [13]:
gene_trans.sample(10)

Unnamed: 0,gene_id,fam,transporter
54543,k99_25301012_1,TIGR00801,T94
2836,k99_10471890_3,TIGR04183,T328
61628,k99_10909617_6,TIGR01065,T113
10934,k99_28461483_3,TIGR04183,T328
27017,k99_27806104_1,TIGR01129,T17
20308,k99_14338128_1,TIGR00813,T7
57630,k99_39081650_5,TIGR00975,T3
34148,k99_8186983_141,TIGR01189,T13
47709,k99_518338_2,TIGR01190,T13
22673,k99_30563785_2,TIGR04409,T122


In [14]:
gene_trans.set_index("gene_id", inplace=True)

### Merge with abundances

#### Metagenomes

The metagenomic time-series has some dubious samples that may have been mis-labeled.

In [15]:
dubious = ["120507","120521","120910","121123"]

Read abundance tables for metagenomic samples

In [16]:
mg_cov = file2df("data/mg/all_genes.tpm.tsv.gz", drop=dubious+["gene_length"])
mg_raw = file2df("data/mg/all_genes.raw_counts.tsv.gz", drop=dubious+["gene_length"])

Read abundance tables with taxonomic info

In [17]:
mg_taxcov = file2df("data/mg/all_genes.tpm.taxonomy.tsv.gz")
mg_taxraw = file2df("data/mg/all_genes.raw_counts.taxonomy.tsv.gz")

Merge with transporters table.

In [18]:
mg_transcov = pd.merge(gene_trans, mg_taxcov, left_index=True, right_index=True)
mg_transraw = pd.merge(gene_trans, mg_taxraw, left_index=True, right_index=True)

Store total raw counts per sample.

In [19]:
os.makedirs("results/mg", exist_ok=True)
mg_raw_tot = mg_raw.loc[mg_raw.index.str.match("^k.+")].sum()
mg_raw_tot = pd.DataFrame(mg_raw_tot,columns=["total_counts"])
mg_raw_tot.to_csv("results/mg/all_genes.total_counts.tsv", sep="\t")

#### Metatranscriptomes

The metatranscriptomic time-series needs to have the sample_ids renamed to sample dates.

In [20]:
mt_sample_names = {"P1456_101":"120516", "P1456_102":"120613", "P1456_103":"120712", 
                   "P1456_104":"120813", "P1456_105":"120927", "P1456_106":"121024", 
                   "P1456_107":"121220", "P1456_108":"130123", "P1456_109":"130226", 
                   "P1456_110":"130403", "P1456_111":"130416", "P1456_112":"130422", 
                   "P3764_101":"130507", "P3764_102":"130605", "P3764_103":"130705", 
                   "P3764_104":"130815", "P3764_105":"130905", "P3764_106":"131003", 
                   "P3764_112":"140408", "P3764_113":"140506", "P3764_114":"140604", 
                   "P3764_115":"140709", "P3764_116":"140820", "P3764_117":"140916", 
                   "P3764_118":"141013"}

In [21]:
mt_cov = file2df("data/mt/all_genes.tpm.tsv.gz", drop="gene_length", rename=mt_sample_names)
mt_raw = file2df("data/mt/all_genes.raw_counts.tsv.gz", drop="gene_length", rename=mt_sample_names)

Read the files with taxonomic annotations as well.

In [22]:
mt_taxcov = file2df("data/mt/all_genes.tpm.taxonomy.tsv.gz")
mt_taxraw = file2df("data/mt/all_genes.raw_counts.taxonomy.tsv.gz")

Merge with transporters table.

In [23]:
mt_transcov = pd.merge(gene_trans, mt_taxcov, left_index=True, right_index=True)
mt_transraw = pd.merge(gene_trans, mt_taxraw, left_index=True, right_index=True)

Store total raw counts per sample.

In [24]:
mt_raw_tot = mt_raw.loc[mt_raw.index.str.match("^k.+")].sum()
mt_raw_tot = pd.DataFrame(mt_raw_tot,columns=["total_counts"])
mt_raw_tot.to_csv("results/mt/all_genes.total_counts.tsv", sep="\t")

## Calculate total transporter abundance

Transporter abundances are calculated using the normalized TPM values. However, the DeSeq2 package requires raw counts so for that purpose the summed raw counts are calculated for 1 representative protein family per transporter cluster.

In [25]:
def get_representatives(df):
    '''Finds representative families for each transporter based on highest mean'''
    df_mean = df.groupby(["fam","transporter"]).sum().mean(axis=1).reset_index()
    df_mean.sort_values(0,ascending=False,inplace=True)
    df_mean.index = list(range(0,len(df_mean)))
    reps = {}
    for i in df_mean.index:
        fam = df_mean.loc[i,"fam"]
        t = df_mean.loc[i,"transporter"]
        if t in reps.keys():
            continue
        reps[t] = fam
    return reps

Sum to protein family.

In [26]:
mg_fam_sum = mg_transcov.groupby(["fam","transporter"]).sum().reset_index()
# Get representative families for each transporter cluster (for use with DSeq2)
mg_reps = get_representatives(mg_fam_sum)
mg_reps = pd.DataFrame(data=mg_reps,index=["fam"]).T

In [27]:
mt_fam_sum = mt_transcov.groupby(["fam","transporter"]).sum().reset_index()
# Get representative families for each transporter cluster (for use with DSeq2)
mt_reps = get_representatives(mt_fam_sum)
mt_reps = pd.DataFrame(data=mt_reps,index=["fam"]).T

Group by transporter and calculate means.

In [28]:
mg_trans = mg_fam_sum.groupby("transporter").mean()
mg_trans_percent = mg_trans.div(mg_trans.sum())*100
mg_trans.to_csv("results/mg/all_trans.tpm.tsv", sep="\t")
mg_trans_percent.to_csv("results/mg/all_trans.tpm.percent.tsv", sep="\t")

In [29]:
mt_trans = mt_fam_sum.groupby("transporter").mean()
mt_trans_percent = mt_trans.div(mt_trans.sum())*100
mt_trans.to_csv("results/mt/all_trans.tpm.tsv", sep="\t")
mt_trans_percent.to_csv("results/mt/all_trans.tpm.percent.tsv", sep="\t")

Calculate transporter maximum (in % of total transporters) across all samples.

In [30]:
mg_trans_percent_max = mg_trans_percent.max(axis=1)
mt_trans_percent_max = mt_trans_percent.max(axis=1)

Output max abundances for transporters for filtering

In [31]:
print("{} transporters with max% > 0.5 in the mg-samples".format(len(mg_trans_percent_max.loc[mg_trans_percent_max>=0.5])))

81 transporters with max% > 0.5 in the mg-samples


In [32]:
print("{} transporters with max% > 0.5 in the mt-samples".format(len(mt_trans_percent_max.loc[mt_trans_percent_max>=0.5])))

85 transporters with max% > 0.5 in the mt-samples


Write raw counts for representative protein families.

In [33]:
mg_reps_raw = pd.merge(mg_reps,mg_transraw,left_on="fam",right_on="fam")
mg_reps_raw_sum = mg_reps_raw.groupby("transporter").sum()
mg_reps_raw_sum.to_csv("results/mg/rep_trans.raw_counts.tsv", sep="\t")

In [34]:
mt_reps_raw = pd.merge(mt_reps,mt_transraw,left_on="fam",right_on="fam")
mt_reps_raw_sum = mt_reps_raw.groupby("transporter").sum()
mt_reps_raw_sum.to_csv("results/mt/rep_trans.raw_counts.tsv", sep="\t")

### Calculate transporter abundances for bacteria

Metagenome

In [35]:
# Get genes classified as bacteria but not cyanobacteria
mg_transcov_bac = mg_transcov.loc[(mg_transcov.superkingdom=="Bacteria")&(mg_transcov.phylum!="Cyanobacteria")]
# Calculate sum of protein families 
mg_transcov_bac_fam = mg_transcov_bac.groupby(["fam","transporter"]).sum().reset_index()
# Calculate mean of transporters
mg_trans_bac = mg_transcov_bac_fam.groupby("transporter").mean()
mg_trans_bac.to_csv("results/mg/bac_trans.tpm.tsv", sep="\t")

Metatranscriptome

In [36]:
# Get genes classified as bacteria but not cyanobacteria
mt_transcov_bac = mt_transcov.loc[(mt_transcov.superkingdom=="Bacteria")&(mt_transcov.phylum!="Cyanobacteria")]
# Calculate sum of protein families 
mt_transcov_bac_fam = mt_transcov_bac.groupby(["fam","transporter"]).sum().reset_index()
# Calculate mean of transporters
mt_trans_bac = mt_transcov_bac_fam.groupby("transporter").mean()
mt_trans_bac.to_csv("results/mt/bac_trans.tpm.tsv", sep="\t")

## Selected transporters

A subset of 58 transporters were selected for this study, based on abundances in the dataset (>=0.5% max in at least one sample) and their putative substrates. They were classified manually using TIGRFAM roles and Gene Ontology mappings. 

The curated table is at the [GitHub repository](https://github.com/ChristoferLNU/transporters/blob/master/article/selected_transporters_classified.tab)

In [37]:
transinfo = pd.read_csv("https://raw.githubusercontent.com/ChristoferLNU/transporters/master/article/selected_transporters_classified.tab", index_col=0, sep="\t")
transinfo.head()

Unnamed: 0_level_0,substrate_category,type,name,abbreviation
transporter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
T1068,AA peptide + (NH4+),2a,cyclic peptide transporter,AA-PEP
T534,AA peptide + (NH4+),2a,lao: LAO/AO transport,AA-PEP
T52,AA peptide + (NH4+),2a,livcs: branched-chain amino acid transport,AA-PEP
T37,AA peptide + (NH4+),2a,potA: polyamine ABC transporter,AA-PEP
T42,AA peptide + (NH4+),3a,proV: glycine betaine/L-proline,AA-PEP


Limit the transporter definitions to the selected transporters.

In [38]:
transdef_select = transdef.loc[transdef.transporter.isin(transinfo.index)]
print("{} transporters remaining, comprising {} TIGRFAMS".format(len(transdef_select.transporter.unique()), len(transdef_select.fam.unique())))

65 transporters remaining, comprising 99 TIGRFAMS


Add substrate categories to the dataframes.

In [39]:
mg_trans_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mg_trans,left_index=True,right_index=True)
mg_trans_select.head()

Unnamed: 0_level_0,substrate_category,type,name,abbreviation,120314,120322,120328,120403,120416,120419,...,120920,120924,121001,121004,121015,121022,121028,121105,121128,121220
transporter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
T3,Phosphate,3a,pst: phosphate ABC transporter,PI,111.665663,99.196607,47.641896,62.518603,82.239967,72.780571,...,140.473317,113.372772,159.11715,143.642418,152.468953,123.888681,94.326745,124.773051,124.276387,110.695318
T5,Carbohydrate,2a,dctM: TRAP transporter,CARB,268.636821,335.497604,102.243974,124.497542,175.458081,213.436623,...,233.211703,161.741922,240.296418,246.197774,177.397563,171.127173,103.973296,222.06955,165.948567,194.395479
T7,AA peptide + (NH4+),2a,sodium/proline symporter,AA-PEP,78.060674,91.449403,49.805555,64.459425,102.802106,111.323483,...,122.235794,86.714492,133.72689,124.082911,104.086574,98.715917,66.604495,133.018817,112.963159,133.283962
T12,Cations,2a,cation transport protein,CAT,15.96368,49.550619,18.22584,17.619883,31.228516,48.449218,...,33.415355,38.028089,39.377017,34.866484,33.009017,26.43726,17.679483,49.282565,34.301188,35.723674
T15,Other,3d,rnf: electron transport complex,OT,13.125817,23.449094,8.327092,11.660654,17.50477,23.729526,...,20.592712,13.716936,23.640316,22.444391,21.12874,16.323116,8.072915,21.017875,18.769937,22.16029


In [40]:
# Mean abundances of transporters for selected transporters
mg_trans_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mg_trans,left_index=True,right_index=True)
mg_trans_select.to_csv("results/mg/select_trans.tpm.tsv", sep="\t")
# Mean abundances of transporters for bacteria and selected transporters
mg_trans_bac_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mg_trans_bac,left_index=True,right_index=True)
mg_trans_select.to_csv("results/mg/bac_select_trans.tpm.tsv", sep="\t")
# TPM values per gene for genes matching selected transporters
mg_transcov_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mg_transcov,left_index=True,right_on="transporter")
mg_transcov_select.to_csv("results/mg/select_trans_genes.tpm.tsv", sep="\t")
# TPM values per gene for bacterial genes matching selected transporters
mg_transcov_bac_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mg_transcov_bac,left_index=True,right_on="transporter")
mg_transcov_bac_select.to_csv("results/mg/bac_select_trans_genes.tpm.tsv", sep="\t")

Metatranscriptomes

In [41]:
# Mean abundances of transporters for selected transporters
mt_trans_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mt_trans,left_index=True,right_index=True)
mt_trans_select.to_csv("results/mt/select_trans.tpm.tsv", sep="\t")
# Mean abundances of transporters for bacteria and selected transporters
mt_trans_bac_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mt_trans_bac,left_index=True,right_index=True)
mt_trans_select.to_csv("results/mt/bac_select_trans.tpm.tsv", sep="\t")
# TPM values per gene for genes matching selected transporters
mt_transcov_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mt_transcov,left_index=True,right_on="transporter")
mt_transcov_select.to_csv("results/mt/select_trans_genes.tpm.tsv", sep="\t")
# TPM values per gene for bacterial genes matching selected transporters
mt_transcov_bac_select = pd.merge(transinfo.loc[transdef_select.transporter.unique()],mt_transcov_bac,left_index=True,right_on="transporter")
mt_transcov_bac_select.to_csv("results/mt/bac_select_trans_genes.tpm.tsv", sep="\t")

#### Transporter type and substrate summary

Generate count summary across transporter type and substrate category.

In [42]:
# Group by and count type and substrate category
type_counts = transinfo.groupby(["type","substrate_category"]).count().reset_index().iloc[:,[0,1,2]]
SUM = transinfo.groupby("type").count().iloc[:,0]
SUM

type
1a     5
1b     2
2a    37
2c     1
3a    24
3d     1
4a     1
4b     1
9a     1
Name: substrate_category, dtype: int64

In [43]:
# Group by and count type and substrate category
type_counts = transinfo.groupby(["type","substrate_category"]).count().reset_index().iloc[:,[0,1,2]]
# Calculate total type sum
SUM = transinfo.groupby("type").count().iloc[:,0]
# Calculate total substrate category sum
colsum = transinfo.groupby("substrate_category").count().iloc[:,0]
colsum.name = "SUM"
colsum = pd.DataFrame(colsum).T
colsum = colsum.assign(SUM=SUM.sum())
# Pivot count table
type_counts.columns = ["type","substrate_category","counts"]
type_counts = pd.pivot_table(type_counts, index=["type"], columns=["substrate_category"])
type_counts.fillna("0", inplace=True)
type_counts = type_counts["counts"]
# Add row sums
type_counts = type_counts.assign(SUM=SUM)
# Add col sums
type_counts = pd.concat([type_counts,colsum])
# Convert to integer
type_counts = type_counts.astype(int)
type_counts.to_csv("results/transporter_type_table.tsv", sep="\t")
type_counts

substrate_category,AA peptide + (NH4+),Anions,Carbohydrate,Cations,Metal,Nitrate,Nucleoside,Other,Phosphate,Phosphonate,Urea,SUM
1a,0,0,0,0,2,1,0,2,0,0,0,5
1b,0,0,0,1,0,0,0,1,0,0,0,2
2a,8,2,9,6,4,0,3,4,1,0,0,37
2c,0,0,0,1,0,0,0,0,0,0,0,1
3a,1,1,0,1,3,2,0,4,1,5,6,24
3d,0,0,0,0,0,0,0,1,0,0,0,1
4a,0,0,1,0,0,0,0,0,0,0,0,1
4b,0,0,0,0,0,0,1,0,0,0,0,1
9a,0,0,0,1,0,0,0,0,0,0,0,1
SUM,9,3,10,10,9,3,4,12,2,5,6,73


Show fraction of the different transporter types.

In [44]:
type_sums = type_counts.sum(axis=1).drop("SUM")
round(type_sums.div(type_sums.sum()),2)

1a    0.07
1b    0.03
2a    0.51
2c    0.01
3a    0.33
3d    0.01
4a    0.01
4b    0.01
9a    0.01
dtype: float64