# GCB5350: Debugging Code

In this module, you will practice looking at, identifying, and correcting code that is *buggy*.

Check out the code below. I have devided the code into cells (blocks) for you to dissect sequentially, but the ultimate goal here would be code that is fixed such that it "works" as the authors intended. 

The data files references in the code are provided to you in the working directory. _**Before trying to fix the code**_, you should look at the data files and understand what they contain. GExp\_snippet.txt contains gene expression values for 3 genes across 6 samples. Loc\_snippet.csv contains genomic coordinates for these genes. The goal of this code \(when operating correctly\) is to merge these two tables and replace each expression value with its [standard score \(or z\-score\)](https://en.wikipedia.org/wiki/Standard_score) across samples. 

Here is the code:

In [None]:
### BLOCK ONE - Read both files and merge them
GExpr <- read.table(file=Gexp_snippet.csv, header=T)
Loc <- read.table(file=Loc_snippet.txt, sep="\t")
z <- left_join(GExpr,Loc, by="geneid") %>%
  relocate(chr,pos,.after=geneid) # This line is correct - see https://dplyr.tidyverse.org/reference/relocate.html 

### BLOCK TWO - calculate the mean and standard deviation of each gene across samples
# c_across() lets you calculate summaries (like mean, sd, etc) across columns rather than across rows
# See https://dplyr.tidyverse.org/reference/c_across.html for more info and examples
z %>% 
  rowwise(geneID) %>% 
  mutate(ave = mean(c_across(starts_with("GTEX"))),na.rm=T) %>%
  mutate(sd = sd(c_across(starts_with("GTEX")))) %>%
  relocate(ave,sd.after=pos)

### BLOCK THREE - replace expression values with standard scores (make sure you understand how to calculate a z-score)
# Hint: make sure this for loop only loops across the expression values, not other values
for (i in 1:length(x[,1])) {
  for(j in 1:length(x[i,])) {
    this_ave = x[1,]$ave
    this_sd = x[i,]$sd
    x[i,j] = x[i,j] - this_ave / this_sd
  }
}

### BLOCK FOUR - calculate the mean and standard deviation of z-scores across samples
x %>% 
  rowwise(geneID) %>% 
  mutate(ave = mean(c_across(starts_with("GTEX")),na.rm=T) %>%
  mutate(sd_std = sd(c_across(starts_with("GTEX")))) %>%
  relocate(ave,sd.after=pos)

**Q1.** First, look over the code in its entirety. In your own words, what is the intent of the code here: what is the goal / objective that the author would like the code to achieve?

**Q2.** Start with BLOCK ONE. In human terms, describe each problem that you see:

1. 
2. 
...

**Q3.** REVISE the code for BLOCK ONE which corrects the totality of the problems that you described above.

**Provide and execute your code below.**

**Q4.** Turn to BLOCK TWO. In human terms, describe each problem that you see:

**Q5.** REVISE the code for BLOCK TWO which corrects the totality of the problems that you described above.

**Provide and execute your code below.**

**Q6.** Turn to BLOCK THREE. In human terms, describe each problem that you see:

**Q7.** REVISE the code for BLOCK THREE which corrects the totality of the problems that you described above.

**Provide and execute your code below.**

**Q8.** Turn to BLOCK FOUR. In human terms, describe each problem that you see:

**Q9.** REVISE the code for BLOCK FOUR which corrects the totality of the problems that you described above.

**Provide and execute your code below.**

**Q10.** OK, pull it all together now: Aggregate all of your 'repaired' code!

**Provide and execute your code below.

In practice, you wouldn't have this -- but I'm sharing with you an "answer" so that you can cross compare!

| geneid   | chr | pos     | ave_std   | sd_std | ave   | sd    | GTEX.A01 | GTEX.A02 | GTEX.A03 | GTEX.A04 | GTEX.A05 | GTEX.A06 |
|----------|-----|---------|-----------|--------|-------|-------|----------|----------|----------|----------|----------|----------|
| ENSG0001 | 11  | 1023832 | -6.66E-17 | 1      | -1.2  | 2.22  | 1.03     | -0.45    | 1.12     | NA       | -0.855   | -0.855   |
| ENSG0002 | 17  | 199299  | -1.11E-16 | 1      | -1.42 | 2.31  | -0.643   | -0.643   | -0.643   | -0.643   | 1.44     | 1.13     |
| ENSG0003 | 22  | 111238  | -3.89E-17 | 1      | 1.26  | 0.207 | -1.25    | -0.772   | NA       | 0.193    | 0.675    | 1.16     |