<a href="https://colab.research.google.com/github/Saherpathan/biomarker-detection/blob/main/Potential_biomarkers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [139]:
# Load libraries
library(readr)

# Read the data from the CSV file
data <- read.csv("/Diff_genes_heatmap_NSIP-IPF.csv")

# Define column names for log2 fold change and adjusted p-value (replace with actual names)
log2_fold_change_col <- "log2 Fold Change"  # Replace with your column name
adj_pval_col <- "Adj.Pval"              # Replace with your column name

# Define the p-value cutoff
p_cutoff <- 0.05

# Filter for genes with a significant adjusted p-value
filtered_data <- data[data[, adj_pval_col] < p_cutoff, ]



In [140]:

# Print the filtered data to the console
print(filtered_data)

# Save the filtered data to a new CSV file
write.csv(filtered_data, "/content/filtered_data.csv", row.names = FALSE)


     Regulation      Ensembl.ID log2.Fold.Change Adj.Pval
1            Up ENSG00000229807        10.521848 2.97e-06
2            Up ENSG00000134184         9.417209 3.43e-02
3            Up ENSG00000270641         8.334585 2.98e-03
4            Up ENSG00000132972         7.711860 2.54e-02
5            Up      AL645929.2         7.162614 3.84e-02
6            Up ENSG00000115361         7.131530 1.90e-02
7            Up ENSG00000178662         7.092240 2.87e-02
8            Up ENSG00000251611         7.067953 2.14e-02
9            Up ENSG00000275624         6.938350 3.32e-02
10           Up ENSG00000139767         6.866760 4.22e-02
11           Up ENSG00000178031         6.772988 4.77e-02
12           Up ENSG00000232192         6.771753 4.21e-02
16           Up ENSG00000124440         6.599903 3.84e-02
18           Up      AP001993.1         6.582991 4.22e-02
20           Up ENSG00000111218         6.556045 2.37e-02
21           Up      AC106897.1         6.400596 4.97e-02
25           U

***Data*** ***Analysis***

In [141]:
# Identify the most significant gene (assuming top row)
most_significant_gene_row <- filtered_data[1, ]  # Select the first row


In [142]:
print(nrow(filtered_data))

[1] 450


In [143]:
print(summary(filtered_data))

  Regulation         Ensembl.ID        log2.Fold.Change      Adj.Pval        
 Length:450         Length:450         Min.   :-5.26235   Min.   :2.970e-06  
 Class :character   Class :character   1st Qu.:-2.73285   1st Qu.:3.190e-02  
 Mode  :character   Mode  :character   Median :-1.81378   Median :4.210e-02  
                                       Mean   : 0.02149   Mean   :3.736e-02  
                                       3rd Qu.: 3.77133   3rd Qu.:4.560e-02  
                                       Max.   :10.52185   Max.   :4.990e-02  
    Symbol              Chr                Type              NSIP_73      
 Length:450         Length:450         Length:450         Min.   : 2.000  
 Class :character   Class :character   Class :character   1st Qu.: 3.844  
 Mode  :character   Mode  :character   Mode  :character   Median : 5.030  
                                                          Mean   : 5.826  
                                                          3rd Qu.: 7.514  
    

In [144]:
# Access values using $ operator
log2_fold_change <- most_significant_gene_row$log2.Fold.Change  # Note the exact column name
adjusted_pval <- most_significant_gene_row$Adj.Pval
gene_symbol <- most_significant_gene_row$Symbol  # Assuming this is the column name for gene symbol

# Check for missing values and handle accordingly
if (is.na(log2_fold_change) | is.na(adjusted_pval)) {
  cat("Warning: Missing values encountered in the most significant gene.\n")
  cat("Potential Biomarker:", gene_symbol, "\n")
  cat("Log2 Fold Change:", log2_fold_change, "\n")  # Print even if NA
  cat("Adjusted P-value:", adjusted_pval, "\n")  # Print even if NA
} else {
  cat("Potential Biomarker:", gene_symbol, "\n")
  cat("Log2 Fold Change:", log2_fold_change, "\n")
  cat("Adjusted P-value:", adjusted_pval, "\n")
}

Potential Biomarker: XIST 
Log2 Fold Change: 10.52185 
Adjusted P-value: 2.97e-06 
