# Some very basic things to do in bash

The command line:

![image-2.png](attachment:image-2.png)

Syntax: command_name -options argument

Separated by **space** - make sure to correctly place them

Command usually ends with a new line or semicolon

Here, you will see `code` with grey background - you can try this on the command line.

You interact with a **hierarchical file system**: /directory/directory/file


## Simple navigation on the command line
`cd ~/appladmix/notebooks/1_basics/data` - change directory

`cd ..` - go one directory upwards

`cd data` - go into the directory called "data"

`ls` - list files

`mkdir newd` - make directory

`cp fileA.txt fileB.txt` - copy file

Caution: files are irreversibly overwritten if they already exist!

`mv fileB.txt fileC.txt` - move file

`rm fileC.txt` - remove file

Caution: this is also irreversible!

`pwd` - print working directory


## Tricks
The wildcard: \* - The asterisk can be used instead of everything

`ls file*`

Autocomplete: use the TAB key to autocomplete unique stuff

Store information in objects:

`stuff="weird random stuff"` - an object with the letters is created, can be called with dollar sign: `$`

Now try `$stuff` and a command called `echo`: `echo $stuff`

Get help: `--help` or `-h`

**use Google!**

*(be cautious about chatGPT)*


## Looking at files
`less fileA.txt` – look at file -- exit with `Q` key 

`cat fileA.txt` – concatenate = print file line by line to the command line

`wc fileA.txt` – word count of file 

`wc -l fileA.txt` – line count of file

`head fileA.txt` – print first 10 lines of file

`head -n 3 fileA.txt` – print first 3 lines

`tail fileA.txt` – print last 10 lines of file

`tail -n 3 fileA.txt` – print last 3 lines of file


## Now we grep and cut some things
We grab/grep:

`grep "apple" fileA.txt`

`grep -v "apple" fileA.txt`

`grep -c "apple" fileA.txt`

What is happening?

We cut:

`cut -f2-3 fileA.txt`

`cut -f2-3 -d " " fileA.txt`

And here?

`grep -v "apple" fileA.txt | cut -f2-3 > fileD.txt`

And what is this?


## We sort and pipe some stuff

We sort:

`sort fileA.txt`

`sort -n fileA.txt`

`uniq fileA.txt`

`sort fileA.txt | uniq`

What is happening there?

### Input and output

*Standard input*: the input a program takes (a filename, or the output of another program)

*Standard output*: where to put the output (write into a file with a filename, or give to another program)

`cat fileA.txt | grep -v "apple"`

`cat fileA.txt | grep -v "apple" > fileD.txt`

`cat fileA.txt | grep -v "apple" | cut -f2-3 > fileF.txt`

![image.png](attachment:image.png)

## Loops 
`for num in 1 2 3; do echo $num; done`

If you have the same task more than once, you loop!

Of course, you may just type 1 2 and 3, but not {1..5984}...


## Regular expressions
Sequence of characters to specify search pattern

For example, in grep (but also other tools)

Common types

`|` = or (note: not a pipe here!!)

`.` = any character

`^` = starting position

`$` = end position

`[ ]` = defined ranges, e.g. any digit \[0-9\], some letters \[acde\]

Examples

`grep -E "apple|pear" fileA.txt` = any line with “apple“ or “pear“

`grep -E "[aoi]pple" fileA.txt` = any line with “apple“ or “opple“ or “ipple“

`grep -E ".pple" fileA.txt` = any line with any character followed by “pple“

`grep -E "^3" fileA.txt` = any line starting with “3“



## Compression

Important for large files. But it means you often cannot just run the same commands.

Files get the extensions `.gz` or `.tar` or `.tar.gz` (gzipped files and tar-compressed directories) 

If you see this extension, it does not behave like a normal text file, some programs cannot interpret it!

`gzip fileE.txt`

`cat fileE.txt.gz`

What happened?

`zcat fileE.txt.gz`

`gunzip fileE.txt.gz`

Caution: If you just add this extension to a file without using a program for compression, it will not be compressed.

**DO NOT DO THIS:**

`mv fileF.txt fileF.txt.gz`

`cat fileF.txt.gz`

**REALLY - <u>NEVER EVER</u> JUST SAVE A FILE WITH THIS EXTENSION WITHOUT USING A GZIP PROGRAM**

Cryptic parameters for directory-structured archives:

`tar -zxvf fileA.tar.gz`



## More of this stuff: awk
A small programming language that is useful for filtering data, even complex and line by line. For example, genotype data where each line is one position in the genome.

We only get the third column here:
`awk '{ print $3 }' fileA.txt`

More complex:

`awk '$2=="apple" { print $3 }' fileA.txt`

`awk 'length($2)==4 { print $2,$3 }' fileA.txt`


# So far, the command line with bash. How about R?