## Filtering Significantly Differentially Expressed Genes (p-adjusted value<=0.05) from Differential Gene Expression (DEG) data
The differential gene expression was analysed by using DESeq2 in R. Then the differentially expressed gene were filtered according to the adjusted p value. If the values were less than or equal to 0.05 (padj<=0.05), it was considered significant. 

#### Importing Libraries and all differentially expressed gene file

In [60]:
import pandas as pd
import numpy as np
## Importing file
data_path="/home/pinky/Desktop/tif3_temperature/wild-15◦C_vs_tif3_delta_mutant-15◦C.csv"
data_frame=pd.read_csv(data_path) # in order to read the file through panda 
data_frame.head() 

Unnamed: 0.1,Unnamed: 0,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj
0,YPL071C_mRNA,114.952821,0.135941,0.307636,0.44189,0.658569,0.829997
1,YLL050C_mRNA,1869.780746,-0.057613,0.135714,-0.424521,0.671186,0.838378
2,YMR172W_mRNA,404.28574,0.019447,0.193974,0.100257,0.92014,0.966445
3,YOR185C_mRNA,229.496258,0.403602,0.235065,1.716976,0.085984,0.266855
4,YLL032C_mRNA,282.417825,0.612284,0.228851,2.675469,0.007462,0.050417


Removing the NaN values of the padj column 

In [61]:
data_frame.dropna(subset = ["padj"], inplace=True)
data_frame


Unnamed: 0.1,Unnamed: 0,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj
0,YPL071C_mRNA,114.952821,0.135941,0.307636,0.441890,0.658569,0.829997
1,YLL050C_mRNA,1869.780746,-0.057613,0.135714,-0.424521,0.671186,0.838378
2,YMR172W_mRNA,404.285740,0.019447,0.193974,0.100257,0.920140,0.966445
3,YOR185C_mRNA,229.496258,0.403602,0.235065,1.716976,0.085984,0.266855
4,YLL032C_mRNA,282.417825,0.612284,0.228851,2.675469,0.007462,0.050417
...,...,...,...,...,...,...,...
6605,YFL056C,344.546702,-0.443656,0.188790,-2.349998,0.018774,0.097005
6607,YIL175W,25.336502,-0.160265,0.629084,-0.254759,0.798909,0.903411
6608,YLL016W,298.235983,-0.085343,0.215970,-0.395158,0.692726,0.847250
6609,YLL017W,17.282972,-0.821180,0.728146,-1.127767,0.259418,0.507745


In [62]:
data_frame.to_csv("nan_remove_wild-15◦C_vs_tif3_delta_mutant-15◦C.csv") #Save the filtered file (removed NaN values from padj) as CSV file

To read the recently saved CSV file

In [63]:
data_frame_new=pd.read_csv('nan_remove_wild-15◦C_vs_tif3_delta_mutant-15◦C.csv',index_col=0)
data_frame_new


Unnamed: 0,Unnamed: 0.1,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj
0,YPL071C_mRNA,114.952821,0.135941,0.307636,0.441890,0.658569,0.829997
1,YLL050C_mRNA,1869.780746,-0.057613,0.135714,-0.424521,0.671186,0.838378
2,YMR172W_mRNA,404.285740,0.019447,0.193974,0.100257,0.920140,0.966445
3,YOR185C_mRNA,229.496258,0.403602,0.235065,1.716976,0.085984,0.266855
4,YLL032C_mRNA,282.417825,0.612284,0.228851,2.675469,0.007462,0.050417
...,...,...,...,...,...,...,...
6605,YFL056C,344.546702,-0.443656,0.188790,-2.349998,0.018774,0.097005
6607,YIL175W,25.336502,-0.160265,0.629084,-0.254759,0.798909,0.903411
6608,YLL016W,298.235983,-0.085343,0.215970,-0.395158,0.692726,0.847250
6609,YLL017W,17.282972,-0.821180,0.728146,-1.127767,0.259418,0.507745


##### Filtering the genes in which p-adjusted values are equal to or less than 0.05

In [64]:
data_frame_filtered = data_frame_new[data_frame_new['padj'] <= 0.05]
data_frame_filtered

#print(df_filtered.shape)
data_frame_filtered.to_csv("significant_nan_remove_wild-15◦C_vs_tif3_delta_mutant-15◦C.csv")

In [65]:
c_d=pd.read_csv("significant_nan_remove_wild-15◦C_vs_tif3_delta_mutant-15◦C.csv")
#c_d.drop(significant.csv[0],inplace=True)
## Changing the column names
c_d.columns=['Serial Number','Gene Name','BaseMean','log2FoldChange','lfcSE','stat','pvalue','padj']
c_d

Unnamed: 0,Serial Number,Gene Name,BaseMean,log2FoldChange,lfcSE,stat,pvalue,padj
0,10,YKL103C_mRNA,485.404117,0.681498,0.170591,3.994917,6.471704e-05,1.270169e-03
1,22,YCR031C_mRNA,5906.109763,-0.740065,0.157131,-4.709862,2.478846e-06,8.325503e-05
2,37,YLR437C_mRNA,176.909947,-0.793857,0.237481,-3.342820,8.293153e-04,1.005502e-02
3,47,YHR094C_mRNA,5754.374734,-0.453276,0.136312,-3.325285,8.832828e-04,1.055604e-02
4,59,YOL119C_mRNA,1028.019551,-1.440296,0.156878,-9.180977,4.271993e-20,3.022937e-17
...,...,...,...,...,...,...,...,...
858,6573,YER055C_mRNA,2831.583288,-0.391483,0.122855,-3.186551,1.439799e-03,1.529852e-02
859,6574,YKL008C_mRNA,1232.262305,-0.547066,0.150291,-3.640051,2.725840e-04,4.002464e-03
860,6577,YMR121C_mRNA,452.143884,-0.636233,0.176943,-3.595691,3.235321e-04,4.578018e-03
861,6578,YER091C_mRNA,2606.995541,1.527204,0.160798,9.497669,2.146408e-21,2.072551e-18


In [66]:
# Save the only gene names which are significantly differentially expressed in CSV file format

c_d.to_csv("Tif3_significant_nan_remove_wild-15◦C_vs_tif3_delta_mutant-15◦C.csv")

Convert into Tabular File Format

In [53]:
from tabulate import tabulate
print(tabulate(c_d))

---  ----  --------------  -----------  ---------  --------  ---------  ------------  ------------
  0    10  YKL103C_mRNA       485.404    0.681498  0.170591    3.99492  6.4717e-05    0.00127017
  1    22  YCR031C_mRNA      5906.11    -0.740065  0.157131   -4.70986  2.47885e-06   8.3255e-05
  2    37  YLR437C_mRNA       176.91    -0.793857  0.237481   -3.34282  0.000829315   0.010055
  3    47  YHR094C_mRNA      5754.37    -0.453276  0.136312   -3.32528  0.000883283   0.010556
  4    59  YOL119C_mRNA      1028.02    -1.4403    0.156878   -9.18098  4.27199e-20   3.02294e-17
  5    89  YER056C-A_mRNA    2311.86    -0.932447  0.149259   -6.24719  4.1791e-10    4.44049e-08
  6    96  YPL197C_mRNA      3143.78    -0.554685  0.128753   -4.30812  1.64648e-05   0.000411199
  7    97  YBR031W_mRNA     18224.8     -1.04245   0.15853    -6.57577  4.84025e-11   6.57824e-09
  8   103  YLR075W_mRNA     22980.9     -0.811909  0.123597   -6.569    5.06554e-11   6.72796e-09
  9   104  YML088W_mRNA    

##### Finding out the upregulated and downregulated gene number
** The differentially Expressed Gene where the log2FoldChange value is positive considered Upregulated

** The differentially Expressed Gene where the log2FoldChange value is negative considered Upregulated

In [54]:
import csv
reader = csv.reader(open("significant_nan_remove_wild-15◦C_vs_tif3_delta_mutant-15◦C.csv", "r"), delimiter=',')

#f = csv.writer(open("new.csv", "w"))
counter = 0
length = 0
for line in reader:
	length +=1
	if line[3].startswith('-'):
            counter += 1

print(counter,'Total negative value')
print((length-counter)-1,'Total positive value')

487 Total negative value
376 Total positive value
