<a href="https://colab.research.google.com/github/rzl-ds/gu511/blob/master/003_linux_2.ipynb" target="_parent">
    <img src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>

# linux crash course (pt 2)

## common commands and tools

finally -- let's get into the real crash course: the commands!

this is not meant to be an all-encompasing tour or even an in-depth walkthrough of these commands. rather, this is a *smorgasbord* of commands you **will** encounter and use, with some categorical information to help you make sense of this unfortunately terse ecosystem

### help and information

there are a couple of built-in help and information facilities for every `linux` command. they mostly require knowing the name of a command first (an unavoidable problem, I think). I'll use `ls` and `python3` as my test commands, but the following should work for most `linux` commands.

##### `man`

the `man` command (short for "manual") is one of the most useful `linux` commands.

`man` (followed by some other command name) will open up a somewhat-standard-formatted manual document with the following information about any command

1. name
2. synopsis
    1. a standard-format description of how to invoke the command
    2. example: `ls [OPTION]... [FILE]...`
    3. anything in `[]` characters is *optional*
3. description
    1. begins with a short paragraph explaining what the program does
4. parameterization
    1. lists all the flags (ways of passing in parameters)
    2. may list configuration files
    3. may list meaningful environment variables
5. trailing information
    + a collection of items such as author, copyright, licensing, and further information

let's look at an example:

```bash
man python
```

*note: exit this viewer by pressing the `q` key*

In [None]:
%%bash
man python

##### command flags

the bulk of what we are looking at `man`ual entries for is the various command line flags we can use to parameterize the command.

to repeat from a prior lecture, a flag is a string starting with a "`-`". generally, there are two types of flags:

1. one dash followed by one character (`-h`)
2. two dashes folled by a spaceless word (`--help`)

usually they come in pairs where one is the full word and the other is the leading character.

sometimes just the existence of a flag signals that some action should be taken -- e.g. `-h` by itself will signal "I need help"

other times it is expected that a value will be provided after the flag, either following an `=` sign (`-f=myfval`) or separated by a space (`-f myfval`), depending on who wrote the program

in practice, there are plenty of little nuances and tricks.

for example, often the single-letter flags can be put together. supposing `-a`, `-b`, and `-c` were all valid flags, some programs will allow you to save time by writing `-abc` instead of `-a -b -c`.

we do this with `ls -alh`.

convenient, but confusing.

*note: also, `java` makes everything different, because it's `java`. `java` command line parameters are passed in as **single** dashes with **full word** strings.*

generally speaking, if you pass every flag as a separate string you won't go wrong. I prefer the full words for clarity's sake. treat everything beyond that as an optimization -- pick it up when you need it, not now.

##### `-h` and `--help`

the `-h` and `--help` flags are specific flags dedicated to providing help in the form of summary information. these are often focused on that top section of the `man`ual entry for a command rather than the full details. try out:

```bash
python3 -h
```

In [None]:
%%bash
python3 -h

##### `which`

the `which` command will tell you the full path of the program which will be executed by a single command, or will return that no such command exists. this becomes particularly useful when you have more than one installations of a program and confusion about which is being used (*c.f.* `python` distributions).

```bash
which python3
```

In [None]:
%%bash
which python

note that if a program doesn't exist (or if that file isn't found somewhere along on the `$PATH` variable) that `which` will return nothing.

```sh
which not_a_program
```

##### `whatis`

the `whatis` command provides a very short description of any command.

```bash
whatis python3
```

In [None]:
%%bash
whatis python

**<div align="center">mini exercise</div>**

1. pull up the manual entry for `ssh`
2. scroll to the bottom (down arrow or `d`) and back to the top (up arrow or `u`)
3. search for the phrase `private key`
    1. press `/` to enter the "search" mode
    2. type `private key` and press `enter`
    3. to scroll through hits, press `n` (for `n`ext), or capital `N` (for `N`ext?)
4. quit (`q`)

### navigation and file movement

for most users, not having access to a point-and-click file explorer is the first major hurdle when getting used to linux. there are just a few commands needed to move around the file system and manipulate files, and learning them is critical

##### `pwd`

the `pwd` (print work directory) will print to the screen the "work" directory -- the directory your session is "in". all *relative* paths will be relative to *this* path

In [None]:
%%bash
pwd

##### `ls`

we've already used this command many times, so you likely already know: to list the files in your current working directory you use the `ls` command.

the default behavior of `ls` is to print only filenames in a list or tabular format:

```bash
ls ~
```

In [None]:
%%bash
ls ~

##### `ls` (cont.)

the default behavior is useful in some instances (especially in scripting), but usually you want more information than just file names. Because of this, I usually invoke `ls` with at least three flags:

1. `-a`: print all files, including hidden files
2. `-l`: print the files in a detailed list format, not just names in a table
3. `-h`: print file sizes in a human-readable format

it is fairly typical for people to create the alias `ll = ls -alh` or `ll = ls -alF`. for our ubuntu servers we have a similar alias (swapping `h` for `F`, which displays end-of-line indicators of directories and symlinks).

```bash
ll ~
```

In [None]:
%%bash
ls -alh ~

##### `cd`

`cd` (short for `c`hange `d`irectory) will do just that -- change the directory you are in. you can pass in relative or absolute paths.

passing no argument will change your working directory to your home directory.

In [None]:
%%bash
cd /tmp
echo "after 'cd /tmp' we are in:"
pwd

cd
echo "after 'cd' with no argument, we are in:"
pwd

##### `touch`

this command will "touch" a file, which does two things:

1. creates it if it doesn't exist
2. updates the "last updated" timestamp of the file to be right now

this command is mostly useful for creating empty files just to have a filename (when starting a git repo with an empty `README` file, or when you're writing a large tutorial of linux commands, for example).

```bash
touch ~/testfile
ll ~/testfile
```

In [None]:
%%bash
touch ~/testfile
ls -alh ~/testfile

##### `cp`

copy a file from one path to another using `cp`

```bash
cp [file that exists] [file I want to exist]
```

In [None]:
%%bash
cp ~/testfile ~/testfile.bak
ls -alh ~/testfile*

##### `rm`

remove a file with `rm`

```bash
rm ~/testfile.bak
```

In [None]:
%%bash
echo "before 'rm' command:"
ls -alh ~/testfile*
rm ~/testfile.bak
echo
echo "after 'rm' command:"
echo
ls -alh ~/testfile*

##### `mv`

move a file (literally: copy and then remove) a file with `mv`

```bash
mv [current file name] [new file name]
```

In [None]:
%%bash
echo "before 'mv' command:"
ls -alh ~/testfile*
mv ~/testfile ~/testfile.newname
echo "after 'mv' command:"
ls -alh ~/testfile*

# just cleaning up before next time
rm ~/testfile.newname

##### `ln`

create a "link" (a shortcut) to a file or directory. Generally, you want to create "symbolic" links (flag `-s`).

```bash
ln -s [thing the shortcut points to] [shortcut name]
```

In [None]:
%%bash
ln -s /tmp ~/tmpshortcut
echo "look at the shortcut pointing to /tmp:"
ls -alh ~/tm*

rm ~/tmpshortcut

##### `mkdir`

make a new directory with `mkdir`. note: the `-p` (parents) flag will create all of the pieces of the path if they haven't been created before (good for creating folders several levels deep), and also will not throw an error if the directories already exist.

```bash
mkdir -p ~/code
```

In [None]:
%%bash
mkdir -p ~/code
ls -alh ~/code/

**<div align="center">mini exercise</div>**

1. `cd` into your home directory
2. create a `code` directory with `mkdir -p ~/code`
3. `cd` into `~/code`
4. verify that was successful with `pwd`
5. list the contents of this (empty) directory with `ll`
6. create a directory for a project `~/code/myproj`
7. create an empty readme file in that directory with `touch ~/code/myproj/README.md`
8. create a second project `~/code/myotherproj`
9. copy the first `README` into the new project directory

### editing and viewing files

no matter what your preference is for editor on your laptop, there will be times where you *must* use terminal editors -- and you should! `vim` and `emacs` are among the two most full-featured and best supported text editors ever made!

there are about twelveteen million articles on which text editor is best. in particular, there is a [nerd culture war](https://en.wikipedia.org/wiki/Editor_war) between `vim` and `emacs` as best editor. I was raises in an `emacs` home. there are, presumably, people who like `nano`. there are pros and cons to all of them.

[Just pick one and learn it!](https://xkcd.com/378/)

##### `vi` and `vim`

this is the oldest of 'em all. `vi` (for "visual") or `vim` (vi improved) is one of the first ever terminal-based editors.

the first thing to know about `vi` is that it is highly optimized for *performing editing actions* and not necessarily for *text entry*.

the idea is that users will wish to do things like cut, paste, delete characters or words, or move around documents just as often as they want to actually type out characters (and probably more).

normally, the action of moving up several paragraphs, copying and pasting a word, and moving back to the bottom may take considerably longer than typing. this time debt is optimized away with an extensive list of single-character shortcuts.

as a result, there are two *modes* within the `vi` editor, and you need to toggle between them to do different things:

1. "normal" mode, where keystrokes are commands that *do* things, and
2. "insert" mode, where keystrokes are literally printed to the document

when you enter `vi` you are in the *normal* mode, so you can't just start typing without accidentally executing a million strange commands.

to move from normal mode to insert mode, you need to type the `i` key. you know, to `i`nsert.

to move from the insert mode to the normal mode, you press the `ESC` key.

most importantly, when you've landed in vi and just want to leave, [ask stack overflow how to exit `vi`](https://stackoverflow.blog/2017/05/23/stack-overflow-helping-one-million-developers-exit-vim/) and they will tell you:

1. be in normal mode (so hammer `ESC` for a bit)
2. press `:`
3. then if you want to
    1. save changes (write to file) and exit: `wq`
    2. just quit: `q`
    3. quit and discard changes: `q!`

**<div align="center">mini exercise</div>**

1. open and exit vim

##### `emacs`

`emacs` (short for *E*diting *MAC*ro*S*) is the primary competitor to `vim`. the best way to describe `emacs` is via a popular backhanded compliment:

> `emacs` is a great operating system, lacking only a decent editor

`emacs` itself is actually a shell for a particular programming language (`lisp`), and as such it can do arbitrarily complicated things. fortunately, there are armies of dedicated hobbyists to make these awful monstrocities:

1. [open your ipython notebooks in emacs](https://github.com/millejoh/emacs-ipython-notebook)
2. [use emacs to browse the web](https://www.emacswiki.org/emacs/eww)
3. [play chess](https://github.com/jwiegley/emacs-chess)
4. [put on a holiday fireplace](https://github.com/johanvts/emacs-fireplace/)
5. [nyan cat mode](https://github.com/wasamasa/zone-nyan)
6. [make sounds like a typewriter](https://github.com/rbanffy/selectric-mode)

who even are these people? bless them.

of course, for every crazy, silly `emacs` package there are hundreds of useful ones.

the process of editing becomes *much* more reliant on chains of keyboard shortcuts and modifier keys. are you aware that in addition to `alt` and `ctrl` there is a `hyper` key? just because no keyboard has it doesn't mean it doesn't exist -- you just have to *BELIEVE*!

you will also (silently) activate context-dependent modes for different file types, which opens up different sets of commands that are specific to that context/file type.

generally speaking, though, you probably only *really* need to know a few things:

1. open a file by pressing `ctrl + x` and then `ctrl + f`
2. write changes to file by pressing `ctrl + x` and then `ctrl + w`
3. exit by pressing `ctrl + x` and then `ctrl + c`

**<div align="center">mini exercise</div>**

1. open and exit emacs

yeah, sorry, that's a trick question -- it's usually not installed by default on `linux` systems, because it's so much larger than `vim` or `nano`


##### `nano`

`nano` is a great starter option for editors. It uses a small handful of `ctrl`- and `alt`-modified key sequences (like emacs), but is much more like a standard press-arrow-keys-and-type editor. plus, the simple commands are listed at the bottom of the window at all times.

+ the `^` character stands for the `ctrl` modifier, which is the `ctrl` key
+ the `M` character stands for the `meta` modifier, which is the `alt` key or `esc` key (and often both)

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=18U31Wd1q839QPqyPCqDg2UAq3qfPT2hy"></div>

**<div align="center">mini exercise</div>**

1. open and exit (`ctrl + x`) nano

##### `less`

the `less` command (a play on an older command called `more`, which displayed "more" of a file) is a file *viewer*, not an editor. there are [a couple of useful navigation commands](https://en.wikipedia.org/wiki/Less_%28Unix%29#Frequently_used_commands) you can use within the `less` program (mostly stolen from `vi`). The ones I end up using every time:

+ quit: `q`
+ search for text: `/`
    + once matches have been found, cycle *foward* with: `n`
    + cycle *backward* with: `N`
+ page up: `u`
+ page down: `d`

```bash
less ~/.bashrc
```

note: this doesn't require you to load the whole file, which may make it better for viewing large files than the editors above (see `head` below, too)

**<div align="center">mini exercise</div>**

1. open and exit less (must supply filename, so `less ~/.bashrc`)
2. scroll up (up arrow and `u`) and down (down arrow and `d`)
3. search (`/`) for `alias`
4. quit (`q`)

##### `cat`

`cat` (for concatenate) *prints* files to `stdout` (and thus the terminal). because it prints the *entire* file, this will be probelmatic for long files. personally, I never use `cat` -- I always use `less`. that's a matter of preference, though. `cat` often is useful in shell scripting

```bash
cat ~/.bashrc
```

In [None]:
%%bash
cat ~/.bashrc

##### `echo`

`echo` will simply take whatever follows it on the line and print it to `stdout` (and therefore the terminal). this is often useful for resolving environment variables (as discussed above).

```bash
echo "user = $USER"
```

In [None]:
%%bash
echo "user = $USER"

##### `head`

the `head` command will display the first few lines (by default, 10) of a file (similar to head in the `R` or `python` dataframe contexts -- I wonder where they got the name...)

the flag `-n` modifies the number of records printed.

```bash
head -n 20 ~/.bashrc
```

In [None]:
%%bash
head -n 20 ~/.bashrc

##### `tail`

as the name implies, `tail` is the other half of `head` -- it prints the last few lines (default, 10) of a file.

again, the flag `-n` modifies the number of records printed. 

`tail` also has the flag `-f` for *following* a file. this means that the last `N` rows are printed, but we stay in the viewer process and update live as new lines are written. this can be perfect for watching logs. press `ctrl + c` to quit the "following" process.

```bash
tail -n 20 ~/.bashrc
```

In [None]:
%%bash
tail -n 20 ~/.bashrc

### filesystem info

you may have picked up on this by now, but the linux world is a bit more focused on files and the filesystem. Mac and Windows OS's abstract away a bit of this complexity from users, but linux does not.

There are some commonly occurring commands which deal directly with the file system that you should know about.

##### `file`

this command can be used to figure out roughly what *type* of file a given file is. under the hood, it is performing hundreds of different checks (mostly regular expressions) to see if there are any common sequences of characters, and making a best guess -- so this is by no means the final story.

```bash
file ~/.bashrc
```

In [None]:
%%bash
file ~/.bashrc

##### `df`

the `df` command (for "disk free") lists the free (and used) space on all available mounted file systems (separate partitions, separated drives, system-use drives). my most common use case is just to see how much free space is available anywhere on the machine:

```bash
df -h
```

but sometimes you only care about the filesystem you are currently working in (usually `/`, but not always):

```bash
df -h .
```

In [None]:
%%bash
df -h

##### `du` (advanced)

the `du` command (short for "Disk Usage") lists out the total file size of every file under a provided directory. By default, it will list look at the current work directory.

let's use this command as a means of exploring somethign we talked about above: `--help` flags.

start by doing the simplest thing: execute `du` from your home directory.

```bash
du -ah ~
```

*note: the `a` and `h` flags are doing the same things here as they did for the `ls` command: including hidden items and printing file sizes with human readable units.*

In [None]:
%%bash
#du -ah ~

this likely produced a small number of files and the sizes of each. note that it is ordered in a nested way such that every sub-directory is immediately followed by the directory it is in, and the last record is the top level (and has a size that is the sum of all it's children).

let's try that same command on a much bigger directory and see what happens:

```bash
du -ah /etc
```

what do you *think* will happen?

In [None]:
%%bash
du -ah /etc/

with so many items printed out, it would be nice if we could limit them -- especially the items that are several levels deep in the tree. let's see if that's possible:

```bash
du --help
```

In [None]:
%%bash
# linux only
#du --help

Let's try out `--summarize`

```bash
du -h --summarize /etc
```

In [None]:
%%bash
# linux only
#du -h --summarize /etc

that's pretty nice. What if I still wanted to know at least the size of the items in the directory, and the size of each sub-directory, but not further? there we could use the `-d` / `--max-depth` flag to specify we are interested in a maximum depth of 1:

```bash
du -h --max-depth 1 /etc
```

In [None]:
%%bash
# linux only
#du -h --max-depth 1 /etc

so why spend time talking about this command?

well, if you've never written a program which accidentally produced a dataset that was too large for your file system, you should really give it a try. and when you do, this set of commands will be fairly invaluable in determining which files are the real offenders and getting rid of them asap.

##### `tree`

one last file command, and this one is the best of the bunch -- you simply cannot live without it. It's called `tree` and it prints out the directory contents (*a la* `ls`), but in a graphical "tree"-like way such that the relationship between directories, sub-directories and file is visually obvious. It's a life-saver.

it will expand to all depths by default (like `du` above), so let's limit it to only 1 level deep first using the `-L` flag. we'll also want to see all files, so `-a` should be included as well.

```bash
tree -a -L 1 ~
```

In [None]:
%%bash
# linux only-ish
#tree -a -L 1 ~

wat

```
The program 'tree' is currently not installed. You can install it by typing:
sudo apt install tree
```

### packages

we simply *have* to have `tree` installed, but before we go fire off that command in the error message, let's talk for a second about packages.

packages are the linux world standard for installable software. they are basically just compressed (think "zipped", but technically don't think that) directories of all of the files needed to run an application, plus some other files needed to create or install that software and get it to "play nice" with the rest of your operating system.

unlike MSIs or DMGs in the windows and mac world, these are not downloadable programs which install software, but rather sets of files that *one* unified program uses to install software.

for you as a user, you will generally

1. want to install some software
2. know the name of that software
3. ...?

this is where *package managers* come in. a package manager is a program which will

1. find packages of applications (by name, typically)
2. resolve the *dependencies* of that package (look up all of the *other* software that you might need to download in order to have that software work)
3. download the package files and the dependency package files
4. perform any of the installation instructions or configurations in those package files
5. make sure that all parts of the system that "need to know" about new packages are informed

`R` and `python` both come with package managers, and you've used them to install packages for those languages.

each `linux` distribution has a package manager (or two) for doing this for linux packages

there are a handful of different package managers in linux world, but they are usually one-to-one with distributions (operating system variants):

1. `apt` (for "Advanced Package Tool")
    1. the primary package manager in modern debian (including Ubuntu) distributions
2. `apt-get` 
    1. is part of the same project as `apt`, but is an older version (being replaced everywhere by `apt`)
3. `yum` (for "Yellowdog Updater Modified". seriously)
    1. wrapper around `rpm` (below)
    2. primary package manager for Red Hat (RHEL), CentOS, and Fedora (remember those from `ec2`?)
4. `rpm` (for "Redhat Package Manager")
    1. basic package manager for Red Hat
5. downloading files and installing them yourself (usually via `make` and `make install`)
    1. this is possible and obviously a bit more advanced, but sometimes it is useful to be able to install what *you* want instead of what the package maintainer will allow you to install (which can lag behind development by years at times)

enough yaking, let's install `tree` already:

```bash
sudo apt install tree
```

one way we can check that tree is installed is just to run the command again:

```bash
tree -a -L 1 ~
```

finally, one last package to install:

```bash
sudo apt install sl
```

and check:

```bash
sl
```

### process information

you may have experience with the window "task manager" or the mac "activity manager", and if so you know how helpful they can be. there are gui versions of those same utilities in linux world, but they are less standard and often require a bit more configuration or user knowledge.

the tools that *are* standard tend to be a little more low-level and also a bit more single-use / niche (in following with the DOTADIW philosophy).

knowing these commands is often essential to doing any sort of debugging of system performance. that being said, you will use these much less as a linux *user* than as an admin.

##### `top` and `htop`

both `top` (Table of Processes) and `htop` (Hisham (author's name) TOP) are programs used to list out all the currently running processes (think of the "processes" tab on task or activity monitor).

both open in an interactive terminal window and can be exited (like `less`) with the `q` key.

try both!

```bash
top
```

note: you will likley have to install htop with `sudo apt install htop`

```bash
htop
```

as far as I can tell, the one and only reason to use `top` instead of `htop` is that `htop` isn't installed and you can't install it

##### `ps`

short for "Process Status", this command prints out some summary information about all running processes. unlike `top` and `htop`, this is a snapshot program, so you do not see updates.

the default behavior is to print out only the running commands initiated by the current user (*e.g.* `ubuntu`), and the following valued:

1. PID: the process id, an integer which uniquely identifies that process among all running processes on the system
2. TTY: a value identifying the terminal in which that command is running (may be none for graphical or background processes)
3. TIME: the time the command began (relative to the machine' start time)
4. CMD: the actual executed command

```bash
ps
```

In [None]:
%%bash
# linux only-ish
#ps

it is fairly common to modify this command with the `aux` flags, which will

1. `a`: list commands run by all users
2. `u`: add a column listing the user
3. `x`: include processes that weren't started in a terminal

```bash
ps -aux
```

In [None]:
%%bash
# linux only-ish
#ps -aux

##### `kill` and `pkill`

sometimes you have a long-running process that you realize was a mistake, or has become unresponsive (*aka* a zombie process). it is nice to be able to kill these processes, but often difficult -- especially when there is no gui interface with an X button.

the command to kill a process is, appropriately, `kill`, and it takes that unique process identifier we just saw via `ps`.

```bash
kill [the PID goes here]
```

the common workflow is to run `ps`, look up the PID (this is easier done via `ps` than `[h]top` because `ps` is a static snapshot), and run kill.

under the hood, the `kill` command is sending a *signal* to the running process. there are several different signals that all effectively mean "stop this process", but they come with different levels of urgency (with lower meaning more urgent). the only two you really need to know are

1. `SIGTERM`, level 15
    1. default
    2. this requests the process be "terminated"
    3. graceful: will try and do useful cleanup before quitting (if the process supports such a thing)
    4. not guaranteed to work
2. `SIGKILL`, level 9
    1. the "just do it" option
    2. not graceful: kills the process immediately and without cleanup

an alternative to `kill` is to run `pkill`, which takes a *name* instead of a PID and will kill any proces running where that name is in the CMD of that process.

this can be dangerous: for example, it is very possible that you might have several `python` scripts running, and only one becomes a zombie. In that case, you will not want to `pkill python`, and will *have* to look up the correct process id and use plain `kill` to stop it.

note: you can also kill process from within `htop` by selecting the process with the keyboard, pressing `k`, and selecting the signal to send (basically: `15 SIGTERM` or `9 SIGKILL`) 

**<div align="center">mini exercise</div>**

1. start a long-running process with `less ~/.bashrc &` (don't forget the `&`)
    1. the ampersand tells the command line to run the process "in the background", *i.e.* not directly in the current terminal
2. kill that command with `ps` or `htop`
    1. `ps`
        1. run `ps -aux`, and then find the `pid` (second column) for that `less` command
        2. kill that less command with `kill [PID NUMBER]`
        3. run `ps -aux` and make sure the command is gone (killed) now
        4. if it wasn't, try `kill -9 [PID NUMBER]` instead
    2. `htop`
        1. run `htop`
        2. search for the program (`/`)
        3. press `k` to kill it, and select the kill code (first 15, then 9)

##### `shutdown` and `reboot`

you know what these command will do by the name: they will `shutdown` or `reboot` the computer.

technically, `reboot` is a specail instance of `shutdown`: `shutdown -r`.

you can test these if you want. I won't ;)

### utilities

the following commands are a grabbag of utilities I use regularly for various purposes. your mileage may vary

##### `history`

this prints out all of the commands you have recently executed.

```bash
history
```

In [None]:
%%bash
#history 20

note: if you press `ctrl + r` you will be able to type and recursively search for commands you previously entered (those in your history) among all commands that contain that information.

try it out -- with an empty terminal line, enter `ctrl + r` and then type `apt ` and see what happens.

##### `date`

in the spirit of DOTADIW, `date` is an excellent date utility. the default behavior is this human readable format:

In [None]:
%%bash
#date

it is possible to print out a large number of strings representing different formats and arrangements of time values (current and relative) as desired. 

Let's try out two examples:

```bash
# print out the Y, M, D, and then H, M, and S as a timestamp with 
# periods separating the date from the time characters
date +%Y%m%d.%H%M%S
```

In [None]:
%%bash
#date +%Y%m%d.%H%M%S

and now the same timestamp, but last Friday and then 20 days ago

```bash
date --date="last Friday" +%Y%m%d.%H%M%S
date --date="20 days ago" +%Y%m%d.%H%M%S
```

In [None]:
%%bash
#date --date="last Friday" +%Y%m%d.%H%M%S
#date --date="20 days ago" +%Y%m%d.%H%M%S

##### `wc`

`wc` is short for "word count", and it does just what you expect -- counts words in strings or files. it has the ability to count bytes, chars, lines, paragraphs, and maximum line lenghts as well.

in all honesty, while it is *nice* to know the number of words in a file, I use this more often to count the number of *lines* in a file, or *files* in a directory.

if you print the number of files as a list (`ls -l`) and use `wc -l` to count the lines, you will have the number of files (plus two: `.` and `..`).

```bash
ls -l /etc | wc -l
```

In [None]:
%%bash
ls -l /etc/ | wc -l

##### `dirname` and `basename`

every path in linux can be described as a full directory name (the list of all directories between the file and root) and the filename. If you consider directories themselves to be named what they are name (generalize filename to "basename"), every path can be split in half as a "directory name" and "base name". The commands `dirname` and `basename` will convert every file into those two components

```bash
dirname /this/is/a/test/path/to/file.txt
basename /this/is/a/test/path/to/file.txt
```

*note: this is parsing strings based on path name rules, not looking at actual paths on the actual file system*

In [None]:
%%bash
dirname /this/is/a/test/path/to/file.txt
basename /this/is/a/test/path/to/file.txt

##### `grep`

as I've said about a couple different linux features and utilities now, you could take an entire class on using `grep` (short for "Globally search a Regular Expression and Print" -- catchy). 

the purpose of `grep` is to provide a fast and flexible way of performing generalized text searches (*i.e.* regular expression searches). often we may want to find all of the files in which we referenced a certain variable (*e.g.*, we want to change the variable `LogisticRegression` to `NeuralNet`, because we're feeling *spicy*), or find all instances of a known typo in a single file. this is the primary use case for `grep`.

as an example, let's suppose we want to see all of the aliases we created in our `bash` profile. We could perform a case-insensitive search (`-i`) and print off all the line numbers (`-n`) for all of the files matching the `glob` pattern `~/.bash*`:

```bash
grep -ni alias ~/.bash*
```

In [None]:
%%bash
grep -ni alias ~/.bash*

**<div align="center">mini exercise</div>**

1. find all the running commands which were executed out of `/usr/sbin`
    1. `ps -aef | grep "/usr/sbin"`
    2. more common to find `python` or `jupyter` sessions and shells, for example
2. find all the times you used the `python` command lately
    1. `history | grep python`

##### `diff`

this utility compares two files to find the differences (hence, `diff`). this is mostly pointless for *different* files; it is used mostly to compare different iterations / versions of files. under the hood, the basic action of `git` (next lecture) is to track `diff` output between previous and current versions of code files.

just to get an example of what diff output looks and feels like, let's do the following:

```bash
echo "hello world my name is zach" > ~/test.txt
echo "hello world my awesome name is zach" > ~/test.2.txt
diff ~/test.txt ~/test.2.txt
```

In [None]:
%%bash
echo "hello world my name is zach" > ~/test.txt
echo "hello world my awesome name is zach" > ~/test.2.txt
diff ~/test.txt ~/test.2.txt

the way that `diff` displays differences is to find lines which disagree or are additions / subtractions and to print out the two different versions. The lines as they appear in the "first" file (to the left in the command line statement) get a `<` character in front of them. they are separated from the "second" file (to the right in the command line statement) by a row of `-` characters, and the second / right file lines are lead by a `>` character

##### `tar` and `unzip`

you may not often think about Tape ARchives (`tar`) as a thing that happens, but in a lot of fields it is actually a legal compliance obligation (e.g. pretty much the entire financial sector, much of the government). that being said, you don't actually have to create tape archives -- but you will *very* often use the compression algorithms used to create tape archives to compress collection of files before sending them to other folks, or decompressing files sent to you.

in the windows and mac world, you are probably used to finding `.zip` files, and you are familiar with decompressing ("extracting") these archive files into several files and folders. you will soon also become familiar with `tar` and `tar.gz` file extensions -- these are archives in the linux world, and the command you use to compress or decompress files is `tar`.

because of the ubiquitous of `zip` files, the `unzip` function is also available on most linux distributions for decompressing `zip` archives.

##### `locate` and `find`

both `locate` and `find` can be used to find files by file names, but `locate` is much faster and simpler for general use. `find` is more useful for collecting names which follow certain patterns, and that in turn is useful for scripting purposes.

until you find  yourself reading about `find` on stack overflow posts, you should default to always using `locate`.

```bash
locate .bashrc
```

In [None]:
%%bash
# linux only-ish
#locate .bashrc

##### `sort` and `uniq` (advanced)

these two commands will take in lists of strings and sort them (`sort`, obviously) and remove duplicates / reduce them to a list of unique values (`uniq`). As with `find`, these are usually more useful for scripting purposes, though `sort` finds some general application.

##### `xargs` (advanced)

this command is a bit advanced, but let's quickly discuss *what* it does. As we mentioned in the philosophy section, each linux command is meant to act as a filter on an input variable. for historic reasons this was not always implemented in the same way: sometimes commands were built to take in *only* items from the command line (instead of standard in, so pipes are broken), or have a capped number of allowed arguments.

`xargs` was created to solve many of these problems, and to help make sure that even "broken" or "old" processes will follow the linux philosophy.

basically, `xargs` will take a list of items and use them to build as many executions of a single command on all the items in that list.

as a side effect, the ability to chunk up list items into smaller groups allows for multi-threading and parallelization via `xargs`. this is a dirty hack, but [a pretty common one](https://github.com/RZachLamberty/zshell/blob/master/hydra-curl-nofrills.sh)

##### `nohup` (advanced)

one thing that is perhaps obvious by now: what if the command you want to run takes a *long* time and you just kicked it off? 

what does your terminal look like now?

Can you do anything?

what happens if you close the terminal window?

as it so happens, if you start a process in a terminal window, and then close that terminal window, the process you started will generally be killed. this is obviously less than desirable. you could leave terminals open until process close, but what about *permanent* processes?

one option is `nohup`, which (like a trailing `&`) will tell the shell to run the command that follows "in the background". The results of that process (*e.g.* words printed to `stdout`) will be written to a file `nohup.out`, so you should be able to come back at your leisure.

this is good enough for most things.

##### `screen` (advanced)

`nohup` is great, but I usually use `screen`. `screen` is a way of creating "pseudo-terminals" that you can access as you wish, but that you do not need to remain logged into. you can think of it as putting your command "in the background", except that there is a real terminal that runs that program and "stays open", and you can re-connect to that terminal if you want.

I pretty much exclusively use `screen` to run my long-running processes (like web apps or persistent API calls / web scrapers). We may get to use of screen in the future, but for now I will just mention it so you know about this super useful package

##### `cron` and `crontab`

`cron` is named after the greek work for time, $\chi\rho\omicron\nu\omicron\sigma$ (*chronos*), and is the *de facto* method for scheduling executions of processes and jobs. all basic operating system actions are scheduled using this utility, so you absolutely should use it for your scheduling unless you *really* need something more advanced.

basically, a `cron` entry is a execution time pattern and a set of `bash` commands that you want to execute. To avoid writing it for the 18 billionth time, I'll refer to [the corresponding section on wikipedia](https://en.wikipedia.org/wiki/Cron#Overview).

to add your own entries to `cron` you will use the `crontab` utility with the `-e` (edit) flag. this will open an editor and all you will need to do is add the timestamp and command

what sorts of advanced things might you need that `cron` can't do?

1. inter-job dependencies (job chains)
1. communication between multiple servers
1. event-based job triggers
1. sub-second resolution for job scheduling

### networking

given that we are working on remote desktops, we will now often be interested in networked communication between computers. there are a couple of commands which feature fairly prominently in these interactions.

although this is explicitly a linux tutorial, many of these actions can be done with dedicated analogous programs in windows, and I will discuss those as well

##### `ssh`

you've already used this -- `ssh` is the basic command for all Secure SHell connections between computers.

1. linux, mac: the command `ssh` often comes pre-installed, but if not it can be installed as part of the `openssh` package
2. windows: the industry standard is [PuTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html)

##### `scp`

`scp` is short for Secure CoPy. basically, this is just an implementation of the copy command using the `ssh` connection protocol. It relies on all the same technology as `ssh` including configuration files and private / public key exchange.

1. linux, mac, recent windows versions: the `scp` command is a part of the `openssh-client` package, so if you have `ssh` installed you will have `scp` available.
2. windows: the industry standard is [WinSCP](https://winscp.net/eng/download.php)

##### `curl` and `wget`

these two tools are the primary command line tools for downloading materials over the HTTP and HTTPS protocols. They each have many, many features, and I have found myself using both for various purposes. I recommend installing and being open to using both (as opposed to learning one well, as I would recommend for editors).

let's just try the simples thing we can -- download a single, simple test webpage.

```bash
# curl: download and print to the screen
curl https://www.york.ac.uk/teaching/cws/wws/webpage1.html

# curl: download and write to file
curl -o curl.html https://www.york.ac.uk/teaching/cws/wws/webpage1.html

# wget: default behavior is to write to file "webpage1.html" 
#       (basename of url). let's write to a different file name
#       note this flag is *capital* O
wget -O wget.html https://www.york.ac.uk/teaching/cws/wws/webpage1.html
```

1. linux, mac: `curl`, `wget`
2. windows: [curl](https://curl.haxx.se/download.html#Win64), [wget](http://gnuwin32.sourceforge.net/packages/wget.htm)

##### `ftp`, `lftp`, and `sftp` (advanced)

I mentioned it briefly in a previous lecture, but there are a few protocols (rules for constructing messages and sending them to remote services) that are explicitly dedicated to file transfer. They fall into two camps:

1. ftp: the first iteration, stands for File Transfer Protocol
2. sftp: the second iteration, stands for Secure (or SSH) File Transfer Protocol

in linux and mac world, each of these protocols has a command of the same name that implements the command line interface (cli) for that protocol. `lftp` is a general purpose command that provides many useful features in addition to the basic `ftp` and `sftp` commands

in terms of usage, these commands will effectively

1. create a connection using the corresponding protocol
2. create a new interactive session for executing ftp or sftp commands
    1. example of commands: GET, PUT, MV, CP
3. logging and error messages are all handled and displayed as needed

the growing use of S3 as a file storage and sharing utility means that *our* file storage will be done in an entirely different way. that being said, ftp and sftp are still ubiquitous. I have used an FTP or SFTP server on every project I have worked on.

**<div align="center">mini exercise (advanced)</div>**

let's do a quick demo of using one of these protocols (the simpler: ftp).

1. in your browser, open the NOAA CLASS (comprehensive large array-data stewardship system) ftp site for satelite data distribution: ftp://ftp-npp.bou.class.noaa.gov/20170827/
2. in your ec2 terminal try the following:

```bash
ftp ftp-npp.bou.class.noaa.gov
# ignore the error message. if prompted,
# just press enter for the user name and password
# or enter anything you want

# then enter help to see the type of commands available
# some should be familiar (ls, cd, pwd)
ftp> help
```

when you log in to an ftp server, you and all users are (by default) dropped off into a single root directory.

let's try to get our bearings by listing out the contents of this root directory in which we find ourselves. for silly reasons, you [need to turn on "passive" mode](https://serverfault.com/a/450655) before you can do anything useful.

```bash
ftp> pass  # turns on passive mode
ftp> ls
ftp> cd 20180830
ftp> ls
```

1. linux, mac: `ftp`, `lftp`, and `sftp`
2. windows: [winscp](https://winscp.net/eng/index.php) (this is my recommendation for both ftp and sftp), also [filezilla](https://filezilla-project.org/)

##### `ping` (advanced)

often we simply want to know if a server exists and is responsive. the act of sending a single "packet" (a single piece of information) over the internet to ask if "anyone is there" is called "pinging", and `ping` is the command which does it.

let's check on google:

```bash
ping -c 5 www.google.com
```

in addition to the fact that we hear back on all 5 of our "pings", we have some additional info:

1. the ip address we received for "www.google.com"
2. the round trip time (about 8 ms)

1. linux, mac: `ping`
2. windows: `ping` is a built-in `cmd` and `powershell` command

##### `mtr` (advanced)

sometimes we would like some more information about how our packets are travelling from our server to others (c.f. complaining to comcast about internet latency). `ping` is nice for demonstrating that we can reach a server, but `mtr` is the standard command for debugging *how* we reached a server (called a "traceroute").

```bash
mtr www.google.com
```

in all honesty, I've never performed a similar operation in windows. there is a built-in command, though:

1. linux, mac: `mtr`
2. windows: `tracert` is a built-in `cmd` and `powershell` command

##### `hostname` (advanced)

this one is pretty simple: print out the name of your system's host (this is the human-readable text that on your `ec2` server is built up from your ip address but can be any string the admin desires).

```bash
hostname
```

In [None]:
%%bash
hostname

##### `ifconfig` (advanced)

this command is generally used to simply get your IP address. 

```bash
ifconfig
```

the address you are looking for will appear in the `eth0` block after "`inet addr`".

on a windows machine, I would probably just google search "what is my ip" and let google figure it out for me.

## subcommands

when we've typed `<command> -h` or `man <command>` above, we've seen things like

```
usage: command [option] ... [-c cmd | -m mod | file | -] [arg] ...
```

some of the `usage` statements will include a *second* "command", often called a subcommand. for example, `git` (see next lecture):

```
usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           <command> [<args>]
```

in the above, there was an element `<command>` within the usage of the `git` command -- this is suggesting that `git` is a *collection* commands you could enter, and you have options for what you do with `git`.

this is not the same as a `python` package, but you can make an analogy: you can `import pandas` and then do multiple "pandas-things" with that library. you can use the `git` command to do multiple "git-things" with that program -- things like `init` (for starting a `git` repo) or `status` (for asking about the status of a `git` repo)

when we are using programs with subcommands, there is often a distinction between

+ the flags which affect the top-level command (e.g. flags like `--git-dir` that affect how top-level `git` runs), and
+ the arguments and flags passed to the sub-command (e.g. flags that modify how `git status` runs)

you can generally find out what flags affect the top command by typing

```
<command> -h
```

and the subcommands the same way

```
<command> <subcommand> -h
```

for example

```
git -h
```

and the subcommands the same way

```
git status -h
```

## using the results of commands

### saving the results to a variable

one last piece of information: it is possible to "use" the results of shell commands within shell scripts, or to save these results as variables. the syntax for doing this is to take a given command (of arbitrary single-line complexity) and to put it inside of parentheses and lead it with a `$` character:

```bash
$([bash command goes here])
```

a common use case for this is acquiring a timestamp for naming files:

```bash
YYYYMMDD=$(date +%Y%m%d)
```

let's try it out:

In [None]:
%%bash
YYYYMMDD=$(date +%Y%m%d)
echo my_file.$YYYYMMDD.csv

another good use case is using timestamps in log files:

In [None]:
%%bash
MSG="result of some command"
echo "$(date +'%Y-%m-%d %H:%M:%S')    my_bash_program.sh    $MSG"

that command above could be appended to a log file, for example:

```bash
echo "$(date +'%Y-%m-%d %H:%M:%S')    my_bash_program.sh    $MSG" >> my_log_file.log
```

### checking if a command was successful

there are a few special variables exposed in bash that are *not* in the `env` list. One of these is `$?`, which lists the exit status code of the previous command. 

Every linux command is expected to return a `0` if the command was successful and a positive non-0 value if it was not.

Try the following

```bash
# we know this will work
ls ~/

# the previous command worked; this should be a 0
echo $?

# we know this *shouldn't* work
ls a_directory_that_doesnt_exist

# the previous command didn't work, this should be non-0
echo $?
```

In [None]:
%%bash
# linux only-ish

<div align="center">have we become... ***TOO*** powerful?</div>
<div align="center"><img src="https://techviral.net/wp-content/uploads/2015/04/Why-Hackers-Use-Linux.jpg"></div>

# END OF LECTURE

next lecture: [`git`](003_git.ipynb)