# Unix Basics

The command line allows you to input commands, such as creating folders, deleting and copying files and extracting information from files.

By the end of this notebook, you will...

* Have (maybe installed) and used the command line on your personal laptop
* Created files and folders, and looked around a directory, using just the keyboard 
* Used command line tools like `sed`, `awk`, and `grep` to find replace text, fetch columns, and find words in files

* * *

## Opening the command line.

* **From Mac OS**
    * Applications folder, open Utilities and launch Terminal  
* **From Linux machine**
    * Applications, Accessories and launch Terminal  
* **From Windows**
    * Determine if you have a 32 or 64 bit version of Windows. 
        * https://support.microsoft.com/en-us/kb/827218. 
        * http://cygwin.com/install.html.
    * Run setup-x86.exe if you have 32 bit windows. If not Run setup-x86_64.exe
    * Install Cygwin and double-click on Cygwin Terminal


## Getting Started: Navigating your folders and files
You start any terminal session in your "home area".  View your "present working directory"
* `pwd`

Your default home folder (also called `$HOME`) is represented by the character alias `~` (tilde)
* `echo ~`

Change directory  
* `cd ~/Desktop`  

List all the files in the present working directory using  
* `ls`  
* `ls .`
    
Arguments for unix commands  
* `man ls`  
    
Creating a folder  
* `mkdir data`  
* `mkdir software`  
    
Change directory into data or software (tab complete or use Up and Down). `[TAB]` means to press the tab key on your keyboard, not to write out the characters.
* `cd da[TAB]`  

Change back to the root directory from any subdirectory:
* `cd ..`  

Create an empty file
* `touch emptyfile.txt`  

Write some text in it
* `echo "hello world" > emptyfile.txt`

Look at the contents of the file with `cat`

```
cat emptyfile.txt
```

Append to your file with `>>`

```
echo "I love bioniformatics" >> emptyfile.txt
```

### Exercise: look at the file

Count the number of lines with `wc -l`

```
wc -l emptyfile.txt
```



Move or rename a file
* `mv emptyfile.txt notempty.txt`  

Copy a file
* `cp notempty.txt deleteme.txt`  

Delete a file
* `rm deleteme.txt`  

Create a pointer (symlink) to a file
* `ln -s notempty.txt pointer`  

## File Manipulation: Getting some data from UCSC's Table Browser

Go to the [UCSC Table browser](http://genome.ucsc.edu/cgi-bin/hgTables) and choose "position" to pick a single chromosome (chr10+) to save the knownGene table with "all fields from selected table" (should be the default) as `knowngene.txt`.

### Exercise
Move `knownGene.txt` to Desktop. What is the command?


    
(*optional*) Secure copy knownGene.txt to TSCC.
* `scp Desktop/knownGene.txt  ewyeo@tscc-login2.sdsc.edu:.`  
    
### Exercise 

`less` and `more` are other commands (besides `cat`) you can use to look at the contents of files. How are they different?
    
See what's in the first n lines (in this case 10)

```
head -n 10 knownGene.txt
```


How many lines are in the file?
* `wc -l knownGene.txt`  
    
Check if it's indeed n lines (| command)
* `less knownGene.txt | wc -l`
* `wc -l knownGene.txt`  
    
What's in the last n lines?
* `tail -n 10 knownGene.txt`  
    
Extract specific columns
* `cut -f`  
* `paste column1.txt column2.txt > 2columns.txt`

How many genes have 3 exons?
* `grep -c 'REGEXSEARCHTERM' target.txt`  
    
How many genes have 1...max # exons?
* `sort | uniq -c`  
    

## Deleting files and file permissions
Which user are you logged in as?
* `whoami`

What groups is that user associated with?
* `groups`

What is the ownership status of all files in my current directory?
* `ls -lrt`
* [Interpreting output](http://linuxcommand.org/lts0070.php)  

Changing permissions
* `chmod 775`

The three digits indicate the affected user subset:
* Front = Owner  
* Middle = Group  
* Rear = All Users  

The value indicates visibility encoded as a sum of [octal numbers](https://en.wikipedia.org/wiki/Chmod#Octal_modes). For example, read + execute = 2 + 3 = 5. 775 or 755 are the most common permissions setups because then you the owner can do everything to your files, and maybe the rest of the group can, but the "all" or "world" can only read and execute your programs, but not overwrite them.

|#|	Permission	|rwx|
|--|--|--|
|7|	read, write and execute|	rwx
|6|	read and write|	rw-
|5|	read and execute|	r-x
|4|	read only|	r--
|3|	write and execute|	-wx
|2|	write only|	-w-
|1|	execute only|	--x
|0|	none|	---

Changing Files Recursively
* `chmod -R 777 Directory/`
* `chmod -R o-rwx ~/`

Changing executable nature of files
* `chmod +x`

Scratch maintenance occurs every 90 days:
* `cd important_scratch_dir`
* `find . | xargs touch`

## Introduction to `awk`

`awk` is a command-line tool to 

Another way to extract all lines

```
awk -F "\t" '{print;}' knownGene.txt
```
 
What if we only wanted one column

```
awk -F "\t" '{print $8;}' knownGene.txt  | head
```

What if we wanted the length of genes?

```
awk -F "\t" '{ len = $5-$4;} {print len;}' knownGene.txt | head
```

Length of all genes summed?
```
awk -F "\t" '{ len = $5-$4;} {tot = tot + len;} END {print tot;}' knownGene.txt | head
```

Don't process the header line (introduction to conditionals)
```
awk -F "\t" '{
if (FNR==1){
    next
};
tot = tot + $5-$4};
END {print tot;}' knownGene.txt | head
```
 
What if you only want the total length of genes in chromosome 1?
```
awk -F "\t" '{
    if (FNR==1){
        next;
        };
    chr =$2;
    if (chr == "chr1") {
        tot = tot + $5-$4;
    }
};
END {print tot;}' knownGene.txt
```