# Motivations

- minimum-typing programming
- `Perl regexes have become a de facto standard, having a rich and powerful set of atomic expressions.`




# Prepare files


## Download the annotation file
`curl http://downloads.yeastgenome.org/curation/chromosomal_feature/SGD_features.tab > SGD_features.tab`

## Get the header
`vi SGD_features.header`

`primary standfor gene database id (sgdid) (mandatory)
feature type (mandatory)
feature qualifier (optional)
feature name (optional)
standard gene name (optional)
alias (optional, multiples separated by |)
parent feature name (optional)
secondary sgdid (optional, multiples separated by |)
chromosome (optional)
start_coordinate (optional)
stop_coordinate (optional)
strand (optional)
genetic position (optional)
coordinate version (optional)
sequence version (optional)
description (optional)`



## Short-hand Command lines

`vi ~/.bash_profile`

alias 'll' 'les'

## Loop

```
for chr in `seq 1 3`
do

mkdir Chr$chr
cp SGD_features.header Chr$chr

done


for chr in 1 2 3
do

rm -R Chr$chr

done
```


## Make a combo table

`awk '{print "## "NR". "$0}' SGD_features.header | cat - SGD_features.tab | les`



## Awk

### Predefined Variables

- **NR** - Count of the number of input lines (real-time value)

- **NF** - Count of the number of words in an input line ($NF corresponds to the last field)

- **FILENAME** - Name of input file

- **FS** - "Field Separator" character used to divide fields on the input line (default is all "white space"). FS assigned another character to change the field separator.






- RS - "Record Separator" character delimiting records, which by default are single lines separated by a "newline".

- OFS - "Output Field Separator" used when printing (default is a "space").

- ORS - "Output Record Separator" used when printing (default is a "newline" character). 



```bash

awk '{print NF}' SGD_features.tab | les

awk -F '\t' '{print NF}' SGD_features.tab | les

## Setting conditions on fields

awk -F '\t' '{if ($10>1000 && $10<5000){print $0}}' SGD_features.tab | cut -f10 | les

cut -f2 SGD_features.tab | sort | uniq -c | les

grep pseudogene SGD_features.tab | les

grep X_element SGD_features.tab | les

grep X_element SGD_features.tab | grep -v X_element_comb | les

grep centromere  SGD_features.tab | les

awk -F '\t' 'match ($2,/centromere$/){print $0}' SGD_features.tab | les

awk -F '\t' '{if ($10>1000 && $10<5000){print $0}}' SGD_features.tab | awk '{if ($2=="CDS"){print $0}}' | awk -F '\t' '{print $11-$10}' | sed 's/-//g' | paste -sd+ - | bc

```




## Sed

```bash
## substitute
les SGD_features.header | sed 's/feature/FEATURE/g' | les 

## delete
les SGD_features.header | sed '1,2d' | les
les SGD_features.header | sed '/alias/d' | les -N

## print
les SGD_features.header | sed -n '/alias/p' | les -N
les SGD_features.header | sed -n '1,2p' | les -N

```





## Regex

https://www.cs.tut.fi/~jkorpela/perl/regexp.html

```bash

les SGD_features.tab | perl -ne '/^S0+(\d+)/;if($1>3000){print $_}' | les 


les SGD_features.tab | perl -ne '@F=split /\t/; if ($F[3]=~/[A-Z]{3}[0-9]{3}$/){print $_}' | cut -f4 | les

```





## Extend the Pipe <()

```
les SGD_features.header | perl -ne 's/ \(.+\)//g; s/ /_/g; print $_' | les


head -1 SGD_features.tab | sed 's/\t/\n/g' | paste <(cat SGD_features.header | perl -ne 's/ \(.+\)//g; s/ /_/g; print $_') - | les


```


## For fun

`hongru@210.75.224.141`

`cowsay 'go go go'`

`cowsay -s 'go go go'`

`sl`

`cmatrix`

