# Genomic Command-Line Exercises

This notebook contains exercises on command-line tools for genomic data science. Each exercise includes a question followed by the corresponding command and explanation."
  

In [1]:
!cp -r /kaggle/input/plants/Plants /kaggle/working/

In [2]:
%%bash
cd /kaggle/working/Plants
ls -l

total 52
drwxr-xr-x 3 root root 4096 Mar 24 17:14 apple
-rw-r--r-- 1 root root   60 Mar 24 17:14 applegenes
-rw-r--r-- 1 root root  191 Mar 24 17:14 months
-rw-r--r-- 1 root root  191 Mar 24 17:14 months.test
-rw-r--r-- 1 root root  139 Mar 24 17:14 orchard
-rw-r--r-- 1 root root   98 Mar 24 17:14 orchard.1
drwxr-xr-x 2 root root 4096 Mar 24 17:14 peach
-rw-r--r-- 1 root root   39 Mar 24 17:14 peachgenes
drwxr-xr-x 2 root root 4096 Mar 24 17:14 pear
-rw-r--r-- 1 root root   43 Mar 24 17:14 peargenes
drwxr-xr-x 2 root root 4096 Mar 24 17:14 sample
drwxr-xr-x 4 root root 4096 Mar 24 17:14 sandbox
-rw-r--r-- 1 root root   78 Mar 24 17:14 seasons


## **Exercises**  



### 1. How many chromosomes are there in the Apple genome?  

In [3]:
%%bash

cd /kaggle/working/Plants/apple

grep ">" apple.genome

echo ""

grep ">" apple.genome | wc -l

# or

echo ""

grep -c ">" apple.genome

>chr1
>chr2
>chr3
>chr4
>chr5

5

5


### 2a. How many genes are there in Apple?  
 

In [4]:
%%bash 

cd /kaggle/working/Plants/apple
more apple.genes

::::::::::::::
apple.genes
::::::::::::::
Smell	Smell-1	chr1	100	1000	+	(100-150,400-500,900-1000)
Smell	Smell-2	chr1	100	1000	+	(100-150,900-1000)
Size	Size-1	chr1	1500	2000	+	(1500-1750,1800-1850,	1900-2000)
Size	Size-2	chr1	1500	2000	+	(1500-2000)
Color	Color-1	chr2	3400	4200	-	(3400-3600,4000-4200)
Color	Color-2	chr2	3400	4200	-	(3400-3550,3800-3900,4000-4200)
Color	Color-3	chr2	3400	4200	-	(3400-3600,3800-3900,4100-4200)
appl4	appl4-1	chr3	50	800	+	(50-400,700-80)
appl5	appl5-1	chr3	1100	1500	+	(1100-1200,1450-1500)
Taste	Taste-1	chr3	2000	2600	-	(2000-2300,2500-2600)
Taste	Taste-2	chr3	2000	2600	-	(2000-2050,2200-2300,2500-2600)
Shape	Shape-1	chr4	3100	3700	-	(3100-3500,3600-3700)
Shape	Shape-2	chr4	3100	3700	-	(3100-3200,3400-3500,3600-3700)
appl8	appl8-1	chr4	4500	4800	+	(4500-4800)
appl9	appl9-1	chr5	200	1000	-	(200-450,550-650,800-1000)
appl10	appl10-1	chr5	1100	1700	+	(1100-1400,1500-1700)


In [5]:
%%bash

cd /kaggle/working/Plants/apple

cut -f1 apple.genes | uniq | more

echo ""

cut -f1 apple.genes | uniq | wc -l

# or 

echo ""

cut -f1 apple.genes | sort -u | wc -l

Smell
Size
Color
appl4
appl5
Taste
Shape
appl8
appl9
appl10

10

10


### 2b. How many transcripts are there in Apple?  


In [6]:
%%bash

cd /kaggle/working/Plants/apple

cut -f2 apple.genes | more

echo ""

cut -f2 apple.genes | sort -u | wc -l

Smell-1
Smell-2
Size-1
Size-2
Color-1
Color-2
Color-3
appl4-1
appl5-1
Taste-1
Taste-2
Shape-1
Shape-2
appl8-1
appl9-1
appl10-1

16


### 3a. How many genes have a single variant?  


In [7]:
%%bash

cd /kaggle/working/Plants/apple

cut -f1 apple.genes | uniq -c


      2 Smell
      2 Size
      3 Color
      1 appl4
      1 appl5
      2 Taste
      2 Shape
      1 appl8
      1 appl9
      1 appl10


In [8]:
%%bash 

cd /kaggle/working/Plants/apple

cut -f1 apple.genes | uniq -c | grep "1"

echo ""

cut -f1 apple.genes | uniq -c | grep "1" | wc -l

#or 

echo ""

cut -f1 apple.genes | uniq -c | grep -c "1"

      1 appl4
      1 appl5
      1 appl8
      1 appl9
      1 appl10

5

5


### 3b. How many genes have multiple variants?

In [9]:
%%bash

cd /kaggle/working/Plants/apple

cut -f1 apple.genes | uniq -c | grep -v "1"

echo ""

cut -f1 apple.genes | uniq -c | grep -v "1" | wc -l

      2 Smell
      2 Size
      3 Color
      2 Taste
      2 Shape

5


### 4. How many genes are there on each of the ‘+’ and ‘-’ strands?


In [10]:
%%bash

cd /kaggle/working/Plants/apple

cut -f6 apple.genes

echo ""

cut -f1,6 apple.genes |  grep "+" | wc -l

echo ""

cut -f1,6 apple.genes |  grep "-" | wc -l

+
+
+
+
-
-
-
+
+
-
-
-
-
+
-
+

8

8


### 5. How many transcripts are there on each chromosome?


In [11]:
%%bash

cd /kaggle/working/Plants/apple

cut -f3 apple.genes | uniq -c

      4 chr1
      3 chr2
      4 chr3
      3 chr4
      2 chr5


### 6. How many genes are there on each chromosome


In [12]:
%%bash

cd /kaggle/working/Plants/apple

cut -f1,3 apple.genes |sort -u | cut  -f2 | sort | uniq -c

      2 chr1
      1 chr2
      3 chr3
      2 chr4
      2 chr5


### 7. What plant systems contain a Smell gene?


In [13]:
%%bash

cd /kaggle/working/Plants

grep Smell */*.genes

apple/apple.genes:Smell	Smell-1	chr1	100	1000	+	(100-150,400-500,900-1000)
apple/apple.genes:Smell	Smell-2	chr1	100	1000	+	(100-150,900-1000)
peach/peach.genes:Shape	Smell-1	scaffold_1	200	1500	+	(200-350,1300-1500)
peach/peach.genes:Smell	Smell-1	scaffold_2	3000	4000	-	(300-3100,3300-3400,3800-4000)
peach/peach.genes:Smell	Smell-2	scaffold_2	3000	4000	-	(300-3100,3800-4000)
peach/peach.genes:Smell	Smell-3	scaffold_2	3000	4000	-	(300-3100,3300-3400,3600-3700,3900-4000)


### 8. What plant systems contain a Color gene?


In [14]:
%%bash

cd /kaggle/working/Plants

grep Color */*.genes

apple/apple.genes:Color	Color-1	chr2	3400	4200	-	(3400-3600,4000-4200)
apple/apple.genes:Color	Color-2	chr2	3400	4200	-	(3400-3550,3800-3900,4000-4200)
apple/apple.genes:Color	Color-3	chr2	3400	4200	-	(3400-3600,3800-3900,4100-4200)
peach/peach.genes:Color	Color-1	scaffold_5	2500	3000	+	(2500-2700,2800-3000)
peach/peach.genes:Color	Color-2	scaffold_5	2500	3000	+	(2500-2700,2900-3000)
pear/pear.genes:Color	Color-1	scaffold00001	+	1000	2500	(1000-1200,2000-2500)
pear/pear.genes:Color	Color-1	scaffold00001	+	1000	2500	(1000-1200,1400-1600,2000-2500)


### 9. What genes are in common between Apple and Pear? 


In [15]:
%%bash

cd /kaggle/working/Plants

cut -f1 apple/apple.genes | sort -u > applegenes

cut -f1 pear/pear.genes | sort -u > peargenes

comm -1 -2 applegenes peargenes

echo ""

comm -1 -2 applegenes peargenes | wc -l


Color
Shape
Size
Taste

4


In [16]:
%%bash

cd /kaggle/working/Plants

cat applegenes peargenes | sort | uniq -c | grep " 2 "

echo ""

cat applegenes peargenes | sort | uniq -c | grep " 2 " | wc -l


      2 Color
      2 Shape
      2 Size
      2 Taste

4


### 10. What genes are specific to each?

In [17]:
%%bash

cd /kaggle/working/Plants

comm -2 -3 applegenes peargenes

echo ""

comm -1 -3 applegenes peargenes

appl10
appl4
appl5
appl8
appl9
Smell

pyr1
pyr2
pyr3
pyr4


### 11. How many genes are in common to all three plant systems?


In [18]:
%%bash

cd /kaggle/working/Plants

cut -f1 peach/peach.genes | sort -u > peachgenes

cat applegenes peargenes peachgenes | sort | uniq -c | grep " 3 "

      3 Color
      3 Shape
      3 Size
      3 Taste
