## Script to turn the count files into one file

The goal here is to order the columns with the contig names so that they are all the same. Then I need to turn the files into one file with just the counts.

The "eff_counts" column is the one that I will use and combine into one file. According to the eXpress manual online, this is the column that is recommended for using in count-based differential expression analyses. Since that is what we are interested in, we will go ahead and use this column.

I wrote a for loop, but this could likely easily be turned into a python script for use with future eXpress output.

In [3]:
import pandas as pd
import glob

In [4]:
allfiles = glob.glob('Mfav*') #note the different files

first=1 #initialize a 1 so that you can merge the files at the end
for x in allfiles:
    print(x)
    if (first == 1):  
        data = pd.read_csv(x,sep="\t") #read in the first file
        datasort = data.sort_values('target_id') #sort the file by the target id (contig) 
        cols = datasort[['target_id','eff_counts']] #select the only files representative of the count data used for the future differential expression analysis
        colsrename = cols.rename(columns={'eff_counts':x}) #rename the count column by the file name so you know which file is which
        first = 0 #change first to 0 so you go through the else section for all the other files
    else:
        df = pd.read_csv(x,sep="\t")
        dfsort = df.sort_values('target_id')
        cols2 = dfsort[['target_id','eff_counts']]
        colsrename2 = cols2.rename(columns={'eff_counts':x})
        colsrename = pd.merge(colsrename, colsrename2, how="outer", on="target_id") #merge the data frame to the initial colsrename. this will be the output.

Mfav_DD_euk_1_counts.xprs
Mfav_HH_euk_33_counts.xprs
Mfav_DD_euk_31_counts.xprs
Mfav_HH_euk_3_counts.xprs


In [5]:
# write the output colsrename from the for loop to a csv file.
colsrename.to_csv('Mfav_counts_all_filttxm.tab', sep = "\t", float_format='%.f') # write the table as a tab-delimited file and make the counts integers because edgeR cannot use counts.
colsrename.to_csv('Mfav_counts_all_filttxm_decimals.tab', sep = "\t") # write the table as a tab-delimited file, and keep the decimal-based counts