## ChemProp2
Authors: Abzer Kelminal (abzer.shah@uni-tuebingen.de) <br>
Edited by:  <br>
Input file format: .txt files <br>
Outputs: .csv files  <br>
Dependencies: library(ggplot2)

### About Input files:
- **Feature_file** is obtained by performing Feature based Molecular Networking on the data using MZmine software.
- **Nw_edge file** has the information of Feature IDs that are similar (not the same) in the columns 'Feature_ID_1' & 'Feature_ID_2'
- **Nw_edge file** is an output of GNPS. 

In [None]:
# setting the current directory as the working directory
setwd('Downloads/ChemProp2_Test') #Example
#install.packages('ggplot2') # install the package if not present
library(ggplot2)

Feature_file <- read.table("feature_table_ChemProp2.txt", sep="\t", header=TRUE, row.names = 1) # By applying 'row.names = 1', the 1st column 'ID' becomes the row names
Meta_File <-read.table("metadata_ChemProp2.txt", sep="\t",header=TRUE, row.names = 1)
Nw_edge <-read.table("Network_Edges_ChemProp2.txt", sep="\t", header = TRUE)

In [None]:
# head function returns the header (upto first 6 rows)of each files. This gives an idea about the content of the files.
head(Feature_file)
head(Meta_File)
head(Nw_edge)

In [None]:
# If Meta_File Info given in Column-wise
#Meta_Data <- Meta_File %>% select(contains(readline('Enter the MetaData Name:')))
# If Meta_File Info given in Row-wise
Meta_Data <- Meta_File[(readline('Enter the MetaData Name:')),]

The below code adds a column of *Chemical Proportionality score* to the Nw_edge file. In addition to that, columns with information such as absolute values of ChemProp score and the sign of Chemprop scores are also added.

In [None]:
ChemProp2 <- c()

for (i in 1:NROW(Nw_edge)) {
  
  x<- subset(Feature_file, rownames(Feature_file) == Nw_edge$Feature_ID_1[i]) # rownames(Feature_file) is the feature ID. This line gets the 'Feature ID 1' from the first column of Nw_edge i.e., Feature_ID_1. Then picks the corresponding 
  x<- rbind(x,subset(Feature_file, rownames(Feature_file) == Nw_edge$Feature_ID_2[i]))
  # x is the subset data which has the Feature ID 1 and 2 which are similar according to Nw_edge file.
  x<-x[,c(-1:-2)] # Removing the first two columns --> Row m/z and Row Retention Time information
  A<-colnames(x) 
  B<-colnames(Meta_Data)
  A==B # Checking the column names of the subset data x against that of meta data.
  reorder_id<-match(B,A) #Match gives the position in which B (the column names of Meta data) is present in A (subset data) and store the position info in reorder_id 
  reordered_x <- x[reorder_id] #Rearranging x (subset data) with respect to the new positions
  reordered_x <- rbind(Meta_Data[1,],reordered_x) # With positions of both x and meta data being the same now, it can be combined
  
  reordered_x <-data.frame(t(reordered_x))  # Transposing the data, thus it contains 3 columns, 'Metadata info. For ex: Time', 'Feature ID 1', 'Feature ID 2'
  
  corr_result<-cor(reordered_x, method = "pearson") # Performing Pearson correlation
  ChemProp_score <- (corr_result[1,3] - corr_result[1,2]) / 2 # ChemProp2 score is obtained by: (Pearson(Feature ID 2) - Pearson(Feature ID 1)) / 2
  
  ChemProp2 <- rbind(ChemProp2, ChemProp_score)
}
 
rownames(ChemProp2) <- NULL
colnames(ChemProp2) <- 'ChemProp2'

Nw_edge_new <- cbind (Nw_edge, ChemProp2)
Nw_edge_new <- Nw_edge_new[order(Nw_edge_new$ChemProp2, decreasing = TRUE), ] # Rearranging Nw_edge_new in the decreasing order of ChemProp2 score
Abs_ChemProp2 <- abs(Nw_edge_new$ChemProp2)
Sign_ChemProp2 <- sign(Nw_edge_new$ChemProp2)

ChemProp2_file <- cbind(Nw_edge_new,Abs_ChemProp2,Sign_ChemProp2)
write.csv(ChemProp2_file, 'With_ChemProp2_score.csv')

To visualize the distribution of the sample data against the normal distribution of random numbers:

In [None]:
histplot1 <- data.frame(Length = ChemProp2_file$ChemProp2)
histplot2 <- data.frame(Length = rnorm(nrow(histplot1)))

histplot1$Type <- 'Sample-Data'
histplot2$Type <- 'Random Values'
HistPlot <- rbind(histplot1,histplot2)

ggplot(HistPlot, aes(Length, fill = Type)) +
  geom_histogram(alpha = 0.5, aes(y = ..density..), position = readline('Enter dodge or identity:'))