## 1. Types of errors
### 1.1 Syntax error

In [2]:
%%bash
ecchoo "Hello world!!" # Typo

In [None]:
%%bash
echo "apple" > 123.txt
awk '{if (~/a/ $1 print $0}' 123.txt # Missing )

In [5]:
%%bash
for i in seq $(seq 1 10)
echo $i
done # Missing do

### 1.2 Runtime error

In [9]:
%%bash
cd /Users/yolandatiao/Documents/0_Bioinformatics2017/2018_Bioinformatics/Applied-Bioinformatics-HW-Yolanda/ChIP-seq.QA.3
cat fruits.txt

### 1.3 Logic errors

In [20]:
%%bash
# Try to find how many odd numbers in an array
odd=0
for i in 1 1 2 2 3 3
do
  if (($(($i%2))==0 )) # Found even numbers instead of odd numbers
    then
    odd=$(expr $odd + 1)
    fi
done
echo $odd

2


## 2. Common mistakes

### 2.1 Typo (like 1.1)
### 2.2 Directory error

In [28]:
%%bash
cd /Users/yolandatiao/Documents/0_Bioinformatics2017/2018_Bioinformatics/Applied-Bioinformatics-HW-Yolanda/ChIP-seq.QA.3
ls -R

dir1
dir2
dir3

./dir1:
123.txt

./dir2:
234.txt

./dir3:
345.txt


In [27]:
%%bash
cd /Users/yolandatiao/Documents/0_Bioinformatics2017/2018_Bioinformatics/Applied-Bioinformatics-HW-Yolanda/ChIP-seq.QA.3
cd dir1 
cat 345.txt #345.txt is not in dir1...

### 2.3 Missing or extra { [ ( ; , “ (like 1.1)
### 2.4 Structure error

In [30]:
i = 1
while i < 6:
  print(i)

### 2.5 Off-by-one error
##### In bash, array element number start with 0

In [37]:
%%bash
fruits_arr=("apple" "banana" "cranberry" "durian")
for i in $(seq 1 ${#fruits_arr[@]})
do
  echo $i
  echo ${fruits_arr[i]}
done

echo ${fruits_arr[0]}

1
banana
2
cranberry
3
durian
4

apple


##### In python, list element number start with 0

In [12]:
fruits_list = ["apple","banana","cranberry","durian"]
for i in range(1, len(fruits_list)):
    print(fruits_list[i])

print(fruits_list[0])

banana
cranberry
durian
apple


##### In R, vector element number start with 1

In [9]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [15]:
%%R
fruits.vec = c("apple","banana","cranberry","durian")
for (i in c(1:length(fruits.vec))){
    print(fruits.vec[i])
}

[1] "apple"
[1] "banana"
[1] "cranberry"
[1] "durian"


### 2.6 Names contain special characters

In [48]:
%%bash
ls
cd my fruits # There is space in file name

dir1
dir2
dir3
gene_names.csv
my fruits


[1] X123
<0 rows> (or 0-length row.names)


## 3. How to avoid bugs
### 3.1 Practice, practice, practice

### 3.2 Use a good text editor that works for you (I use Sublime Text / Spyder / R Studio)
* Syntax highlight
* Autofill 

### 3.3 Good coding style

#### 3.3.1 Good readability is important to yourself and others
##### Do you want to read this:
![Bad example](https://cavanzyl.files.wordpress.com/2016/05/faa03c93e060f132991f5660e5a2978c.png?w=444)
##### Or this:
![Good example](http://cppbetterexplained.com/wp-content/uploads/2015/01/good-commenting-example.jpg)

#### 3.3.1 Comment codes well
#### 3.3.2 Comment codes well when you update them
##### This is exactly true:
![Forgot to comment](https://i.redd.it/54ss55ix0vwy.jpg)

#### 3.3.3 Consistent naming format
##### Bad

In [25]:
%%R
fs = c("apple","banana","cranberry","durian")
pc = c(10, 20, 30, 40)
my_DF.1 = data.frame(names=fruits.vec, prices=prices.vec)
head(my_DF.1)
# Who knows what is my_DF1 after 100 lines of codes?
# Will you remember that you used Uppercase DF and lowercase my?
# Will you remember that you used _ and also . in your name?

      names prices
1     apple     10
2    banana     20
3 cranberry     30
4    durian     40


##### Good

In [27]:
%%R
fruit.names.vec = c("apple","banana","cranberry","durian")
fruit.prices.vec = c(10, 20, 30, 40)
fruits.df = data.frame(names=fruit.names.vec, prices=fruit.prices.vec)
head(fruits.df)
# Everything has similar structure, easy to recall

      names prices
1     apple     10
2    banana     20
3 cranberry     30
4    durian     40


#### 3.3.4 Simplicity
##### Would you still understand how this works after 2 days?
![nested](http://firstclassthoughts.co.uk/Articles/Readability/img/i.imgur.com_BtjZedW.jpg)

#### 3.3.4 Modularity: separate code into self-contained, independent pieces

### 3.3 Test
#### 3.3.1 Test by block when you write

#### Example: Try to define a function that return all the gene names that contains "Tbx"

In [29]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


##### Wrong

In [52]:
%%R
setwd("/Users/yolandatiao/Documents/0_Bioinformatics2017/2018_Bioinformatics/Applied-Bioinformatics-HW-Yolanda/ChIP-seq.QA.3")
# Define function
MatchGn <- function(str.x, vec.x){
  out.vec <- c()
  for (i in vec.x){
    if (grepl(str.x, i)){
      out.vec <- c(out.vec, i)
    }
  }
  return(out.vec)
}
# Run
gn.all.vec <- read.csv("gene_names.csv")
print(MatchGn("Tbx", gn.all.vec$gene_names))

 [1] "Tbx1"   "Tbx10"  "Tbx15"  "Tbx18"  "Tbx19"  "Tbx2"   "Tbx20"  "Tbx21" 
 [9] "Tbx22"  "Tbx3"   "Tbx4"   "Tbx5"   "Tbx6"   "Tbxa2r" "Tbxas1"


##### Right

In [61]:
%%R
# Try to make the finding work
str.x <- "Tbx"
vec.x <- c("Abc", "Tau","Tbx1", "tBx", "Tb10", "Tbx2","Uau")
out.vec <- c()
for (i in vec.x){
    if (grepl(str.x, i)){
        out.vec <- c(out.vec, i)
    }   
}
print(out.vec)

[1] "Tbx1" "Tbx2"


In [63]:
%%R
# Package into function
MatchGn <- function(str.x, vec.x){
  out.vec <- c()
  for (i in vec.x){
    if (grepl(str.x, i)){
      out.vec <- c(out.vec, i)
    }
  }
  return(out.vec)
}

print(MatchGn("au", c("Abc", "Tau","Tbx1", "tBx", "Tb10", "Tbx2","Uau")))

[1] "Tau" "Uau"


In [64]:
%%R
# Use it
gn.all.vec <- read.csv("gene_names.csv")
print(MatchGn("Tbx", gn.all.vec$gene_names))

 [1] "Tbx1"   "Tbx10"  "Tbx15"  "Tbx18"  "Tbx19"  "Tbx2"   "Tbx20"  "Tbx21" 
 [9] "Tbx22"  "Tbx3"   "Tbx4"   "Tbx5"   "Tbx6"   "Tbxa2r" "Tbxas1"


### 3.4 No need to re-invent the wheel when there’s something available and tested

#### Just google it and maybe you'll find something that works