# Command Line Basics
---
### Quick Overview

Virtually all programmers and data scientists make extensive use of the terminal, and knowing how to interact with it is a critical skill. Every operating system has a slightly different terminal with different commands and syntax. Because Linux and OS X are both based on an older operating system called UNIX, they both have very similar terminals. Windows isn't based on UNIX, so it has different commands. It's more common to use Linux and OS X for data science work, so we will use Linux.

The terminal that we'll be using uses a programming language called BASH. Bash is a language used with command line interfaces to provide an alternative to the GUI and can be much quicker as results are seen within a matter of seconds. An impressive feature that BASH uses is called command line completion i.e., autocorrect with just the touch of a tab key. The feature will match partly typed program names, filenames, and variables. When installing tools such as `pip`, `python`, and other useful tools that can supercharge your workflow, BASH understands `apt-get`, which is a command to install programs such as `docker`, `pip`, and even command line games!

To the right is a terminal interface. The dollar sign is called the command prompt. Anything we type to the right of it is a shell command that will execute immediately when we press the Enter key.

For UNIX terminals, here is a list of some Bash commands. For a quick command comparison, you can click here for a comparison.

---
### First Command

The first command we'll learn is `pwd`. This command returns the current directory (folder you're located in). It's an acronym that stands for **print working directory**.

The terminal was designed to navigate and switch between directories.

```shell
pwd
```

---
### Changing Directories

We can use the `cd` command (which stands for **change directory**) to switch directories. For example, we can type `cd /` to switch to the **root** directory. The root directory, or /, is a special directory that will navigate to the root of the filesystem.

```shell
cd /
```

---
### Absolute/Relative Paths

When we typed the forward slash (`/`), the terminal switched to the root directory. Any path that starts with `/` is an `absolute path` that's written in relation to the root of the filesystem. On the other hand, **relative paths** are relative to the directory we're currently in. These don't start with a forward slash. If we're in the `home` folder, for example, typing `cd Anton` will move us to `/home/Anton`.

```shell
cd Anton
```

---
### User

Most popular operating systems have a concept of users. Users have certain permissions within the system, can create their own files, and run their own programs. Users can also restrict other users from accessing their files and running their programs.

We can check which user we're logged in as with the `whoami` command.

```shell
whoami
```

---
### Home Directory

Every user has their own home directory within `/home` where they can add files specific to their username. For example, the home directory for `Anton` is at `/home/Anton`. The tilde (`~`) is a shortcut for referring to the home directory. Typing `cd ~` will automatically take us to the current user's home directory. 

```shell
cd ~
```

---
### Creating Directories

We can also create files and directories with the terminal. We'll explore how to make a directory first. We can do this with the `mkdir` command. We just have to type `mkdir test` to create a directory called `test`. This will also add a folder icon to the corresponding location in your GUI file browser. 

Note that rules about absolute and relative paths apply here, and in almost every command that involves paths. If we type `mkdir` test, the shell will make a directory called `test` in the current folder, because it's a relative path. If we type `mkdir /home/Anton/test` instead, the shell will make a folder called test inside the `/home/Anton` folder, because it's an absolute path.

```shell
mkdir test
```

---
### Options

Commands have *options* that can modify their behavior. We specify these options by adding them, preceded by one dash, after we invoke the command.

For example, adding the `-v` option, or "flag," after the `mkdir` command will turn on *"verbose"* mode, and print output when it makes the folder.

Since verbose mode communicates what action it takes to the terminal, this is helpful if you need information or output on the command. Since it provides more output on the screen, you will be able to get real time information on what is happening as it is happening. As you continue throughout the command line courses, you might find it helpful to use verbose mode when you practice using each command.

```shell
mkdir -v
```

---
### Reviewing Available Command Options

Most commands will let us use the `--help` flag to see what all of the possible options are. A flag comes after a command. When you use the `--help` flag, you don't need to specify a directory -- the shell will automatically know that you want to see the help contents.

```shell
mkdir --help
```

---
### Listing The Contents

Now that we've made a couple of directories, let's see what's in our home folder. We can use the `ls` command to list all of the files and folders in a directory. If we pass in the `-l` option, it will format the list nicely as a table.

```shell
ls -l
```

---
### Removing A Directory

We can use the `rmdir` command to delete a directory.

```shell
rmdir test
```

---

# Working With Files
---
### Making A File

Now we'll look at files more closely and learn how to interact with them on the command line.

First, we need to **create** a file. While there are several ways to do this, we'll start with the `touch` command. This command will create an empty file with the name we give it. For example, typing `touch file.txt` will create a new file called `file.txt` in the current directory. We can open the file and edit it later on if we want.

We can also use the `touch` command to change the date we last accessed a file if we need to. You can read more about the touch command on [Wikipedia](https://en.wikipedia.org/wiki/Touch_(Unix)).

```shell
touch test.txt
```

---
### Standard Streams

Now that you've created the `test.txt` file, you can add text to it in a few different ways. The first is with the `echo` command, which simply prints whatever you tell it to as output. If you type echo `"Dataquest is awesome"`, it will print `Dataquest is awesome`.

It prints this text into a stream called **standard output**, or `stdout`. Every program writes to standard output, and receives input through **standard input** (`stdin`). Whenever a program experiences an error while running, it writes the error message to standard error (`stderr`). These standard streams are how programs show us output in the terminal, and how we enter input.

`stdout` and `stderr` usually display on the monitor, while `stdin` is the input from the keyboard. In this case, `echo` is taking a string from `stdin`, and printing that string to `stdout`. By default, we see the message that it prints to `stdout`, because it shows on the monitor.

The interfaces look something like this:

```
                          Stdout
           Stdin          ----->
|Keyboard| ----> |Program|      > |Display|
                          ----->
                          Stderr
```

`stdout`, `stderr`, and `stdin` exist because these standard streams allow the interfaces to be abstract. A program doesn't need to care whether it's getting input from a keyboard, file, or somewhere else. A program also doesn't need to care if it's outputting to the display, a file, or somewhere else. The standard streams allow us to hook various inputs and outputs up to programs without the programs having to concern themselves with what those inputs and outputs are.

```shell
echo "Hello There"
echo Hello There # same output in the console
```

---
### Redirecting Standard Streams

We can redirect standard streams in order to connect them to different sources. For example, we can connect `stdout` to a file. Afterwards, the program the stream is connected to will write to a file instead of the screen.

To redirect, we use the greater than sign (`>`). For example, echo `"Dataquest is awesome" > dataquest.txt` will write `Dataquest is awesome` to `stdout`, then redirect `stdout` to the file `dataquest.txt`. The end result is that the stream will write `Dataquest is awesome` to the file `dataquest.txt`.

```shell
echo "Hello There" > test.txt
```

---
### Editing A File

We can also edit a file directly from the terminal, without redirection. While there are a few programs that let us do this, the simplest is called `nano`. **Nano** is a command line text editor that lets us edit and save files directly from the terminal.

To run `nano`, type `nano`, followed by the name of the file you want to edit. For example, `nano test.txt` will open the `test.txt` file for editing.

Once a file is open, we can make whatever changes we want, then hit `ctrl+x` to quit. When we quit, the terminal will prompt us to save our work. Typing `Y` (for yes), then pressing Enter will save all changes.

```shell
nano test.txt
# A GNU IS OPENED
# Editting (or even deleting!) the text in the file using nano
# Closing and saving the file (Control-key sequences are notated with a '^' and can be entered either by using the Ctrl key or pressing the Esc key twice)
# You're in shell again!
```

---
### Overview Of File Permissions

In Unix, every file and folder has permissions associated with it. These permissions have three scopes:

* `owner` - The user who created the file or folder
* `group` - Users in the owner's group (on Unix systems, an owner can place users in groups)
* `everyone` - All other users on the system who aren't the user or in the user's group

Each scope can have any of three permissions (a scope can have multiple permissions at once):

* `read` - The ability to see what's in a file (if defined on a folder, the ability to see what files are in a folder)
* `write` - The ability to modify a file (if a folder, the ability to delete, modify, and rename files in the folder)
* `execute` - The ability to run a file (some files are executable, and need this permission to run)

Each permission can be granted or denied to each scope.

You can view the permissions on files and folders using `ls -l`. This command will display the permissions to the left of each file name. Here's an example of what the output looks like:

```shell
ls -l

    total 4

    -rw-r--r-- 1 dq dq 10 Nov 14 00:08 test.txt
```

In the example above, the permissions for the file `test.txt` are `-rw-r--r--`. There are 10 characters in that string.

```
Breakingdownthepermissionsstring
  -      rw-     r--      r--
Ignore   User   Group   Everyone
```

Don't worry about the first character for now. Starting at the second character, the permissions are split into three groups -- one for `user`, one for `group`, and one for `everyone`.

The `owner` has the permissions `rw-`, which corresponds to characters `2` through `4` in the string. This means that the owner can ***r**ead* and ***w**rite* the file, but not execute it. The first character represents read permissions, the second represents write permissions, and the third execute permissions. The character for **read** is `r`, the character for **write** is `w`, and the character for **execute** is `x`. If a scope doesn't have a permission, it displays as a dash (`-`). If the permissions for the owner were `rwx` instead, the owner would be able to execute as well.

The permissions for `group` are represented by characters `5` through `7` in the string, which corresponds to `r--`. This means that people in the owner's group can only read the file.

The permissions for `everyone` are `r--`, meaning that anyone who has an account on this machine can read the file.

---
### Octal Notation For File Permissions

We just looked at file permissions that looked like this:

`-rw-r--r--`

We call this symbolic notation for permissions, because it expresses each permission as a symbol. The downside to symbolic notation is that if we want to change permissions, it takes a long time to type the changes out. We can do this more quickly by representing permissions with **octal notation**.

**Octal notation** allows us to represent the permissions for all scopes with just `4` digits, rather than the `10` characters involved in symbolic notation. There are `8` possible combinations of the three permissions `r`, `w`, and `x`. We can express each combination, or scope, as a single digit in an octal (base `8`) counting system.

Here are the combinations and their corresponding digits:

* `---` : No permissions; corresponds to `0`
* `--x` : Execute only permission; corresponds to `1`
* `-w-` : Write only permissions; corresponds to `2`
* `-wx` : Write and execute permissions; corresponds to `3`
* `r--` : Read only permissions; corresponds to `4`
* `r-x` : Read and execute permissions; corresponds to `5`
* `rw-` : Read and write permissions; corresponds to `6`
* `rwx` : Read, write, and execute permissions; corresponds to `7`

We can use this system to convert the permissions string `-rw-r--r--` to `0644`. When reading octal notation, we read the digits from left to right. The first digit in the sequence is sets a *Special Mode*. For more information on what the first digit actually is, you can visit this article on [setuid](https://en.wikipedia.org/wiki/Setuid).

We can pull up a file's octal permissions with the `stat` command. Typing `stat test.txt` will show us some information about the file `test.txt`, including the octal permissions.

The `stat` command returns quite a bit of detailed information about the file. Let's focus on the permissions for the moment; we can find them in the `Access` section.

```shell
stat test.txt
  File: test.txt
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: 802h/2050d	Inode: 5268524     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/   anton)   Gid: ( 1000/   anton)
Access: 2019-02-19 02:56:32.903224161 +0300
Modify: 2019-02-19 02:56:32.903224161 +0300
Change: 2019-02-19 02:56:32.903224161 +0300
 Birth: -

```

---
### Modifying File Permissions

Now that we understand file permissions, we can modify them using the `chmod` command. If we pass in an octal permissions string and a file name, the command will modify the file to assign it the permissions we specified in the string.

Typing `chmod 0664 test.txt` will give the `owner` read and write permissions, the `group` read and write permissions, and `everyone` else read-only permissions. The first digit is completely optional when granting permissions without setting the user id, group id, or sticky bits. Don't worry about the user id, group id, or sticky bits; they're outside the scope for now.

```shell
chmod 0760 test.txt # owner rwx, group rw-, everyone ---
```

---
### Moving Files

We can move files with the `mv` command. Typing `mv test.txt /tomove` will move the `test.txt` file to the `/tomove` folder. This assumes that `test.txt` is in the current directory.

```shell
mv test.txt /tomove
# mv test.txt tomove does the same thing
```

---
### Copying Files

Instead of moving a file, sometimes you'll want to make a copy, and then move that copy somewhere else. The `cp` command is useful for this. `cp test.txt test2.txt` will copy the `test.txt` file, and create a new file called `test2.txt` containing the contents of `test.txt`.

```shell
cp test.txt test2.txt
```

---
### File Extensions

Files typically have extensions like `.txt` and `.csv` that indicate the file type. Operating systems use them to determine the default program to open a file with. On Windows, for instance, a text editor will be the default program for files with the `.txt` extension.

Rather than relying on extensions to determine file type, Unix-based operating systems like Linux use **media** types, which are also called **MIME** types. The MIME type `application/pdf` indicates that a file is a `pdf`, and the MIME type `image/png` indicates that a file is a `png` image. The *first* part of the MIME type string is *for the type*, such as `application` or `image`, and the *second* part is *for the subtype*, such as `pdf` or `png`.

There are MIME types for every type of file. The MIME type is stored in the file *metadata* (which is stored as part of the file). As a result, Linux can determine a file's type and open it properly, even if it doesn't have an extension.

We can rename files and remove extensions whenever we want. In fact, we'll often come across files that don't have extensions, such as `test`.

Specifying a folder as the second argument to `mv` will preserve the file name, and move it into the folder. *If we specify a full path instead, including the file name*, it will move the original file to the new file name, *essentially renaming* it. For example, `mv test.txt test2.txt` will move the file `test.txt` to `test2.txt`. This will basically rename `test.txt`.

```shell
mv test.txt test_no_extension
```

---
### Deleting A File

We can delete a file with the `rm` command. Typing `rm test.txt` will remove the `test.txt` file, for example, provided that it's in the current directory.

```shell
rm test_no_extension
```

---
### Bypassing Permissions As The Root User

Unix systems have a special user called the root user. We can run commands as the root user using sudo. Adding sudo to the beginning of any command will run that command as the root user.

For example, typing sudo rm test.txt will switch to the root user, then delete the test.txt file as the root user. This is useful in situations where the current user doesn't have permission to delete the file. The root user has all permissions and access to all files by default.

You'll typically need to enter a password to switch to the root user, which makes sense.

```shell
sudo rm test2.txt
```

---

# Working With Programs
---

### Setting Variables

Bash is essentially a program that lets us run other programs. It does this by implementing a command language. This language specifies how to type and structure the commands we want to execute.

A command language is a special kind of programming language through which we can control applications and the system as a whole. Just like Python and other programming languages, we can use Bash to create scripts, set variables, and more. Because it's a language, *Bash is far more powerful than a graphical shell*.

For example, we can set variables on the command line by assigning values to them. In the command line environment, variables consist entirely of uppercase characters, numbers, and underscores. We can assign any data type to a variable. Here are a few examples of how we can set variables on the command line:

```shell
OS=linux

OPERATING_SYSTEM="linux"
```

Both of the variables above, `OS` and `OPERATING_SYSTEM`, will actually end up with the same value. That's because quotes are optional when using strings in Bash, unless the string contains a space. Bash is sensitive to spaces, so strings that have them won't work properly if we don't surround them with quotes.

This assignment won't work, for example:

```shell
ANIMAL=Shark with a laser beam on its head
```

This one will:

```shell
ANIMAL="Shark with a laser beam on its head"
```

It's also important to avoid adding stray spaces around the equals sign. For example, this assignment will fail:

```shell
ANIMAL = "Shark with a laser beam on its head"
```

---
### Accessing Variables

In Bash, we can access the value of a variable again after we set it, just like we can with other programming languages like Python. There's one major difference, though -- we need to add a dollar sign to the beginning of the variable name when we try to retrieve its value.

If we create a variable named `FOOD` with the value `Shrimp gumbo`, for example, we'll need to use `$FOOD` when we want to access the value again later. This is because typing `FOOD` at the command prompt will attempt to call the command `FOOD`. It will return an error, because there's no executable command named `FOOD` in `PATH`. We'll discuss `PATH` more in-depth later, but for now: `PATH` is a dynamic-named value specifying a set of directories where executable programs are located.

Another difference between Python variables and Bash variables is that when we type `$FOOD` at the command prompt, it will resolve to the value of the variable, or `Shrimp gumbo`. By default, Bash will try to turn this value into a command named `Shrimp`, and then call it. Because there's no executable named `Shrimp` in `PATH`, this will generate an error.

If we want to see the value of a variable named `FOOD`, we'll need to type `echo $FOOD`. This will become `echo "Shrimp gumbo"`, which will print `Shrimp gumbo` to `stdout`.

```shell
echo $FOOD
```

---
### Setting Environment Variables

So far, we've been creating shell variables. We can only access these variables within the Bash shell. 

Another type of variable is an environment variable. We can access these through any program we run from the shell.

We can create [environment variables](https://en.wikipedia.org/wiki/Environment_variable) using the `export` command. For example, `export FOOD="Chicken and waffles"` will create an environment variable called `FOOD`.

```shell
export FOOD="Chicken and waffles"
```

---
### Accessing Environment Variables

We can run many programs from Bash, including Python. To run the Python interpreter from the Bash shell, we type `python` at the command prompt.

Once we're inside the command prompt, we can access the environment variables with commands that look like this:

```python
import os

print(os.environ["FOOD"])
```

First, we imported the [`os` package](https://docs.python.org/3.5/library/os.html), which is built into the Python standard library. It contains many useful functions for working with the operating system.

Then, we used `os.environ`, a dictionary containing all of the values for the environment variables. We can access any environment variable by specifying it as a key, just like we can with any Python dictionary.

This should give you a feel for the power of environment variables -- we can use them to set configuration in Python scripts and other places. This functionality is useful when configuration is secret (like with access keys), or changing rapidly.

```shell
python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.environ["FOOD"])
Chicken and waffles
>>> exit()
```

---
### Calling Programs

Previously, we accessed Python by typing `python` in the shell. We can run many programs this way. There's nothing special about a program -- it's just a file somewhere on the system.

We can access any program by typing its full path. The full path for Python, which itself is a program, is `/usr/bin/python`

```shell
/usr/bin/python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
```

---
### The PATH Variable

Previously, we typed `/usr/bin/python` to access the Python interpreter. If the Python interpreter is at that location, though, how come we can also access it by typing python?

We can do this because of the `PATH` environment variable, which is configured to point to several folders (creating a "shortcut"). We can run any program in any one of these folders just by typing the program's name. Because `/usr/bin` is one of the folders in `PATH` and `python` is in that folder, we can access the python interpreter just by typing `python`, instead of the full path. If we did not have a `PATH` variable, we would have to type in the absolute path to run `python` every time.

Earlier we discussed how to create environment variables. We can re-create an environment variable called `PATH` and append our new directory. For example, `export PATH="/usr/bin:$PATH"` will allow us to type in the name of any executable inside `/usr/bin` at any time without typing the absolute path in order to execute the file.

```shell
echo $PATH
```

---
### Flags

Some of the programs we've been running have arguments, and some don't. When we type `echo $FOOD`, we're passing in the value of the `$FOOD` variable as a positional argument to the `echo` program. This is similar to a function in Python, which has positional and keyword arguments. Programs can have any number of positional arguments, including zero. `python` is an example of a program that doesn't require any positional arguments.

The copy command (`cp`) is an example of a command with two positional arguments -- we need to pass in the file name, as well as the path we want to copy it to.

Programs can also have optional *flags*. These are like keyword arguments in Python, which modify program behavior. For example, if we pass the `-l` flag (for "**l**ong mode") to `ls` (the "**l**i**s**t" command), the command will list the files in the directory in long mode, meaning that it will show more information about them.

```shell
ls -l
```

---
### Combining Flags

We'll often want to specify multiple flags. Most flags have short, single-character names, as well as longer versions of those names. See the [`ls` manual page](http://man7.org/linux/man-pages/man1/ls.1.html) for a closer look at this.

For example, `ls -a`, and `ls --all` do the same thing -- they'll both list all of the files in a directory, rather than hiding files that begin with a dot (`.`). The commands are equivalent.

When we have multiple flags with short, single-character names, we can chain them together to save time. `ls -la` will list **a**ll of the files in **l**ong format; it's equivalent to `ls -a -l`. The order of the l and the a don't matter. While experienced programmers do this all the time, it can be a bit confusing to parse at first.

```shell
ls -al
ls -la # does the same thing
```

---
### Long Flags

We can specify longer flags with two dashes. One such longer flag for `ls` is `--ignore`. Using `ls --ignore=test.txt` won't include any files named `test.txt` in the output of `ls`.

```shell
ls -al --ignore=.ipython
```

---

# Command Line Python Scripting
---

### Introduction To Command Line Python

The interpreter lets us run Python commands and see their results immediately. It's very useful for testing snippets of code quicky, as well as debugging. But it's not a good way to develop Python programs, because the commands aren't saved anywhere.

In order to develop Python programs, we'll need to make files containing Python code. Then we'll be able to use the interpreter to run them from the command line. This way, we can save all of our commands, but still see what's happening.

This is a very common way to develop with Python -- use an IDE or text editor to create Python files, then run them from the command line.

We can make a file that Python can execute on the command line by adding some lines of Python code to a blank file. Here's an example of Python code:

```python
if __name__ == "__main__":

    print("Welcome to a Python script")
```

The code above will print `Welcome to a Python script` when we run it from the command line. To run it, we just need to put those lines into a file, save the file as `file.py`, and then call it with python `file.py`.

This code works because the `__name__` variable in Python scripts is automatically set to the name of the module. If the module is being run from the command line, the `__name__` variable will be `__main__` by default. Checking the `__name__` variable allows us to tell whether a script is running from the command line or not. By the way, using the default `python` executable will launch Python version 2. Use `python3` instead to launch the third version.

`-e` option enables interpretation of backslash.

```shell
echo -e 'if __name__ == "__main__":\n    print("Welcome to a Python script")' > script.py
python3 script.py
# OR
touch script.py
nano script.py
# typing in the script, saving it, exitting nano...
python3 script.py
```

---
### Installing Packages That Extend Python

Packages are an important way to extend Python's functionality. We've worked with packages like `matplotlib` and `pandas`. The best way to install packages is to use the command line and a program called **pip**. The newest versions of Python include pip by default, so installing Python will automatically give you access to pip.

In order to install a package with pip, we just use `pip install`. `pip install requests` will install the requests package, which developers use to interact with websites and APIs.

```shell
pip install requests
```

---
### Overview Of Virtual Environments

We used the default version of pip to install `requests` for the `python` executable, which is Python version 2.

What if we had wanted to install `requests` for Python 3 instead? Different projects can require different packages and Python versions. This type of version switching can become confusing.

For this reason, a computer system has one `python` executable, and we have to install all packages and libraries globally. This means that every single project on a machine has to use the same version of Python, and the same version of every package.

By default, we can't use different versions of Python without some hacks. One such hack is renaming `python` to `python3` so we can have access to both Python 2 and Python 3.

A better solution is for each project we write to have its own version of Python, along with its own packages. This way, we don't need to worry that upgrading the version of a package will affect other projects on the system and cause them to stop working.

**Virtual environments**, or virtualenvs, let us do this. We can create a new virtualenv with the `virtualenv` command. Normally we have to install the virtualenv package first in order to access this command.

```shell
sudo apt install virtualenv
```

Typing `virtualenv main` will create a `virtualenv` named main. It will create a folder in the current directory called main that will hold all of the packages we install into the virtual environment.

```shell
virtualenv python2
#Running virtualenv with interpreter /usr/bin/python2
#New python executable in /home/anton/python2/bin/python2
#Also creating executable in /home/anton/python2/bin/python
#Installing setuptools, pkg_resources, pip, wheel...done.
```

---
### Creating A Python 3 virtualenv

By default, `virtualenv` will use the `python` executable when it makes a new virtualenv, which means that it has the same version of Python as the system. In this case, we want to use `python3` for our virtualenv instead. In order to do this, we pass the `-p` flag to the `virtualenv` command, which will allow us to change the Python interpreter that virtualenv uses.

In this case, we can type virtualenv `-p /usr/bin/python3 python3` to use Python 3 instead of Python 2.

```shell
virtualenv -p /usr/bin/python3 python3
```

---
### Activating A virtualenv

Once we've created a virtualenv, we can activate it using `source python3/bin/activate` (this assumes that the virtualenv is called `python3`, and that the folder for the virtualenv is in our current directory).

Once we activate a virtualenv, the Python version and packages installed in it will become the default Python version and packages that run when we type `python`.

```shell
source python3/bin/activate
```

---
### Verifying Installed Packages

We can find out which version of Python we're using with `python -V`. We can also look up which packages are currently installed (along with their versions) with `pip freeze`. If we activate a virtualenv, all of the packages, including `pip`, will be from the virtualenv instead of the main system Python executable.

```shell
python -V
pip freeze
```

---
### Importing Saved Functions Into A File

One of the great things about Python is that we can import functions from a package into a file. We can also import functions and classes from one file into another file. This gives us a powerful way to structure larger projects without having to put everything into one file.

We'll experiment with this style of import by writing a function in a file, and then importing it into another file.

If there's a file named `utils.py`, we can import it into another file in the same directory using `import utils`. All of the functions and classes defined in `utils.py` will then be available using dot notation. If there's a function called `keep_time()` in `utils.py`, we can access it with `utils.keep_time()` after importing it.

```shell
nano utils.py
#def print_message():
#        print("Hello from another file!")
# closing and saving the file...
nano script.py
#import utils
#
#if __name__ == "__main__":
#    utils.print_message()
# closing and saving the file...
python script.py
# Hello from another file!
```

```shell
# same code
echo -e 'def print_message():\n    print("Hello from another file!")' > utils.py
echo -e 'import utils\n\nif __name__ == "__main__":\n    utils.print_message()' > script.py
python script.py
```

---
### Accessing Command Line Arguments

We can also pass command line options into Python scripts. We can retrieve them from inside the script through the `sys` package.

Once we import the `sys` package, the `argv` list will allow us to retrieve the positional arguments passed into the script. We learned about positional arguments -- they're the arguments that come after the command name. `python script.py 82` is one example. The first positional argument is `script.py`, and the second is `82`.

The following code will read input from the command line and print it back out. If the code is in a file named `script.py`, we'd call python `script.py "The text we want to display"` to pass in the text we want to display.

```python
import sys

if __name__ == "__main__":

    print(sys.argv[1])
```

Notice that we printed the second item in the `argv` list (`sys.argv[1]`). This is because the arguments come after the `python` command, so the first argument is the name of the file we want to run. The second argument is the actual text that we want to print.

```shell
echo -e 'import sys\n\nif __name__ == "__main__":\n    print(sys.argv[1])' > script.py
python script.py "Great text"
# Great text
```

---
### Deactivating A virtualenv

To switch a virtualenv off so we can move to a different project, we deactivate it with the `deactivate` command. This command will automatically shut down the current virtualenv, so we don't need to pass in its name.

```shell
deactivate
```

---

# Working With Jupyter Console
---

The [Jupyter console](https://github.com/jupyter/jupyter_console), formerly known as IPython, is an enhanced Python interpreter. By typing `python` on the command line you get access to an interactive shell that lets you write and execute Python code. Jupyter console enhances this shell, and adds several niceties that make working with data easier.

Generally, it's useful to use the shell in situations where you need to quickly test some code you're writing. This happens frequently when you're writing data analysis scripts. It can also be used to quickly explore datasets and do basic analysis. Another use case is prototyping code before later saving it to a script file.

The main difference between Jupyter console and Jupyter notebook is that the console functions in interactive mode. Whenever you type a line of code, it is immediately executed, and you can see the results. If you want to write medium-length pieces of code, do deep exploration of a dataset to tell a story, the notebook is better. If you want to test out code you're writing, or run quick commands, the console is better.

The Jupyter project is in the midst of rebranding from IPython to Jupyter. Depending on the version of Jupyter you have installed, you can access the console by typing either `jupyter console` or `ipython` at the command line.

```shell
ipython
jupyter console # does the same
```

---
### Getting HELP

Jupyter console has a robust built-in help system. You can get help in several ways:

* You can type `?` after starting the console. This will display help about Jupyter. You can exit by typing `q`
* You can type `%quickref`. This is a **magic** that will tell you some useful commands. We'll talk more about Jupyter magics shortly
* If you want information about a variable, just type the name of the variable, followed by `?`. For information on the `super_var` variable, you'd type `super_var?`
* Type `help()` to get access to Python help. This will enable you to get help on all the modules and functions currently available. You can quit by typing `quit`
* If you want to use the Python help system to get information on a variable, type `help(variable_name)`

Being able to get help will let you see which methods are allowed on which objects, and be able to better understand the capabilities of Jupyter console.

```shell
ipython
```
```ipython
super_var = 5
super_var? # checking info on super_var...
help(super_var) # checking more info on super_var... 
help() # looking for help...
exit # leaving iPython
```

---
### Persistent Sessions

Just like with Jupyter notebook, Jupyter console starts a kernel session when you first load it. Every time you run code in the console, the variables are stored in the session. Any subsequent lines you execute can access those variables.

This functionality is extremely powerful, and allows you to execute longer scripts line by line and inspect the variables at each step.

```shell
ipython
```
```ipython
super_var = 5
super_var_10 = super_var * 10
exit
```

---
### Jupyter Magics

We've seen `%quickref` Jupyter magic previously. Magics are special Jupyter commands that always start with `%`. They enable you to access Jupyter-specific functionality, without Python executing your commands.

Some useful magics are:

* `%run` -- allows you to run an external Python script. Any variables in the script will be stored in the current kernel session
* `%edit` -- opens a file editor. Any code you type into the editor will be executed by Jupyter when you exit the editor
* `%debug` -- if there's an error in any of your code, running `%debug` afterwards will open an interactive debugger you can use to trace the error
* `%history` -- shows you the last few commands you ran
* `%save` -- saves the last few commands you ran to a file
* `%who` -- print all the variables in the session
* `%reset` -- resets the session, and removes all stored variables

You can see a full list of magics [here](http://ipython.readthedocs.org/en/stable/interactive/magics.html).

You can use the `%run`, `%who`, and `%debug` magics to iteratively develop scripts with Jupyter console. Have your favorite editor open, and start writing a Python script. In a separate shell, open Jupyter console. As you get to checkpoints in your script where you want to test it out, use the `%run` magic to run the script. Check the values of the variables using the `%who` magic. If you see any errors, debug them with the `%debug` magic. If you want to clear the session, use `%reset`.

```shell
nano el_scripto.py
```
```ipython
# creating a script with one print and one variable...
%run el_scripto.py
# printing a message...
%who
# printing variables...
exit
# leaving ipython...
```

---
### Accessing The Shell

You can run shell commands in Jupyter console. Just prefix your shell commands with an exclamation point(`!`). Running `!ls` in Jupyter will show the contents of the current directory.

This can be useful when you want to quickly inspect a file or check on the contents of a folder.

```shell
ipython
```
```ipython
!mkdir wow # creating 'wow' direction...
!touch ouch # creating 'ouch' file...
!ls # displaying the contents...
exit # leaving ipython...
```

---
### Pasting In Code

You'll often want to paste code into Jupyter console to see if it runs properly. Because of how Python handles indentation, nested for loops, functions, and if statements will fail if you just copy and paste them in.

In order to paste in code with indents, you'll need to use paste magics:

* `%cpaste` -- opens a special editing area where you can paste in code normally, without whitespace being a problem. You can type `--` alone on a line to exit. After you exit, any code you pasted in will be immediately executed.
* `%paste` -- takes code from your clipboard and runs it in Jupyter. This won't work on a remote system where Jupyter doesn't have access to your clipboard. Since we're using a remote system, you will see `ERROR: Getting text from the clipboard on this platform requires Tkinter`

```shell
ipython
```
```ipython
%cpaste # special pasting area opened...
# Ctrl + V
for i in range(10):
    if i < 5:
        print(i)
    else:
        print(i * 2)
--
exit
```

---

# Piping And Redirecting Output
---

### Appending

We already know how to redirect output from a command to a file using >.

```shell
echo "This is all a dream..." > dream.txt
```

Assuming the file `dream.txt` exists, the above code will overwrite the file with the string `"This is all a dream...."` If the file `dream.txt` doesn't exist, it will be created, and the string `"This is all a dream..."` will be used as the content. This involves redirecting from the standard output of the command to the standard input of the file.

If we don't want to overwrite `dream.txt`, and we instead want to add to it, we can use `>>`.

The code below will append `"Wake up!"` to the file `dream.txt`. The file will still be created if it didn't exist.

```shell
echo "Wake up!" >> dream.txt
```

---
### Redirecting From A File

We've seen how to redirect from a command to a file. We can also redirect the other way, from a file to a command. This involves redirecting from *the standard output of the file to the standard input* of the command.

The file beer.txt ends up looking like this:

```
99 bottles of beer on the wall...

Take one down, pass it around, 98 bottles of beer on the wall...
```

The Linux [sort](https://en.wikipedia.org/wiki/Sort) command will sort the **lines** of a file in alphabetical order. If we pass the `-r` flag, the lines will be sorted in reverse order.

The code below will sort each of the lines in beer.txt in order.

```shell
sort < beer.txt
```

---
### The grep Command

Sometimes, we'll want to search through the contents of a set of files to find a specific line of text. We can use the [grep](http://www.gnu.org/software/grep/manual/grep.html) command for this.

```shell
grep "pass" beer.txt
```

The above command will print any lines in `beer.txt` where the string pass appears, and highlight the string pass.

We can specify multiple files by passing in more arguments:

```shell
grep "beer" beer.txt coffee.txt # shows all lines from either file that contain the string beer
```

---
### Special Characters

If we wanted to search for a string in `beer1.txt` and `beer2.txt`, we could use this command:

```shell
grep "beer" beer1.txt beer2.txt
```

Or use [**speccial character**](http://www.tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm) `?`:

```shell
grep "beer" beer?.txt
```

`?` is used to represent a single, unknown character.

Another name for special characters is wildcars. We can use the `*` character to match any number of characters, including zero characters:

```shell
grep "beer" beer*.txt
```

We can use wildcards anytime we would otherwise enter a filename.

```shell
ls *.txt
```

There are quite a few special characters that bash uses. A full list can be found [here](http://tldp.org/LDP/abs/html/special-chars.html). When you use these characters in a string or a command, and you don't want them to have a special effect, you may have to escape them.

Escaping tells the shell to not treat the character as special, but to treat it as a plain character instead. Here's an example:

```shell
echo ""Get out of here," said Neil Armstrong to the moon people." >> famous_quotes.txt
```

The above command won't work as we intend because the quotes inside the string will be treated as special. But what we want to do is add the quotes into the file.

We use a backslash (`\`) as an escape character -- if you add a backslash before a special character, the special character is treated like plain text.

The command below has the double quotes escaped with a backslash, so it will work as we intend.

```shell
echo "\"Get out of here,\" said Neil Armstrong to the moon people." >> famous_quotes.txt
```

---
### Piping Output

The pipe character, `|`, allows you to send the standard output from one command to the standard input of another command. This can be very useful for chaining together commands.

For example, let's say we had a file called `logs.txt` with **100000** lines. We only want to search the last **10** lines for the string `"Error"`. We can use the tail `-n 10 logs.txt` to get the last 10 lines of `logs.txt`. We can then use the pipe character to chain it with a grep command to perform the search:

```shell
tail -n 10 logs.txt | grep "Error"
```

The above command will search the last 10 lines of `logs.txt` for the string `"Error"`.

We can also pipe the output of a Python script. Let's say we had this script called `rand.py`:


```ipython
import random

for i in range(10000):

    print(random.randint(1,10))
```

The above script will use the [`random`](https://docs.python.org/3/library/random.html) library to generate a sequence of random integers, ranging in value from 0 to 10, and will print them to the standard output.

This command will run the script, search each line of output to see if a 9 occurs and print any lines that output a 9:

```shell
python rand.py | grep 9
```

---
### Chaining Commands

If we want to run two commands sequentially, but not pass output between them, we can use `&&` to chain them. Let's say we want to add some content to a file, then print the whole file:

```shell
echo "All the beers are gone" >> beer.txt && cat beer.txt
```

This will first add the string `"All the beers are gone"` to the file `beer.txt`, then print the entire contents of `beer.txt`. The `&&` only runs the second command if the first command doesn't return an error. If we instead tried this:

```shell
ec "All the beers are gone" >> beer.txt && cat beer.txt
```

We'd get an error, and nothing would be printed, because we used the command *ec* instead of `echo`.

The Linux command `cat` concatenates the contents of a file and displays the contents of a file.

---

# Data Cleaning and Exploration Using Csvkit
---

### Csvkit

Now we'll learn about the Csvkit library, which supercharges your workflow by adding 13 new command line tools specifically for working with CSV files. We'll focus on these 5 tools from Csvkit:

* **csvstack**: for stacking rows from multiple CSV files
* **csvlook**: renders CSV in pretty table format
* **csvcut**: for selecting specific columns from a CSV file
* **csvstat**: for calculating descriptive statistics for some or all columns
* **csvgrep**: for filtering tabular data using specific criteria

We'll be using csvkit version 0.9.1 and you can read about the installation procedure in the [documentation](https://csvkit.readthedocs.io/en/0.9.1/install.html). We'll continue to work with the same 3 datasets on housing affordability:

* `Hud_2005.csv`
* `Hud_2007.csv`
* `Hud_2013.csv`

---

### csvstack

To start, let's circle back to the task of merging 3 CSV files into 1 file. We can use [csvstack tool](http://csvkit.readthedocs.io/en/0.9.1/scripts/csvstack.html#description) to consolidate the rows from multiple CSV files and redirect the stdout to a new file:

```shell
csvstack file1.csv file2.csv file3.csv > final.csv
```

As long as the header row for each file in the stdin to csvstack is the same, the first row in the resulting file will match this header row. After the header row, `final.csv` will contain all of the non-header rows from `file1.csv`, then all of the non-header rows from `file2.csv`, then finally the non-header rows from `file3.csv`. If you don't redirect the stdout of csvstack to a file or a tool like `head`, the full output will be rendered in the terminal. This can cause your terminal to grind to a halt as it tries to process and display all of the output so you want to be extra careful to avoid doing so.

The behavior of csvstack can be modified using a few different flags. For example, if you want to be able to trace the file where each row originated from in the merged file, you can use the `-g` flag to specify a grouping value for each filename. When stacking the rows from a file, `csvstack` will add the corresponding value in a new column. Lastly, you can use the `-n` flag to specify the name of this new column. The following code will create a new column named `origin`, containing the values `1`, `2`, or `3` depending on which file that row originated from:

```shell
csvstack -n origin -g 1,2,3 file1.csv file2.csv file3.csv > final.csv
```

The rows in `final.csv` that originated from `file1.csv` will contain the value `1` in the origin column and those from `file2.csv` will contain the value `2` in the origin column. Let's now use `csvstack` to combine the 3 datasets on U.S. housing affordability.

```shell
csvstack -n year -g 2005,2007,2013 Hud_2005.csv Hud_2007.csv Hud_2013.csv > Combined_hud.csv
head Combined_hud.csv
#year,AGE1,BURDEN,FMR,FMTBEDRMS,FMTBUILT,TOTSAL
#2005,43,0.513,680,'3 3BR','1980-1989',20000
#2005,44,0.223,760,'4 4BR+','1980-1989',71000
#2005,58,0.218,680,'3 3BR','1980-1989',63000
#2005,22,0.217,519,'1 1BR','1980-1989',27040
#2005,48,0.283,600,'1 1BR','1980-1989',14000
#2005,42,0.292,788,'3 3BR','1980-1989',42000
#2005,-9,-9.000,702,'2 2BR','1980-1989',-9
#2005,23,0.145,546,'2 2BR','1980-1989',48000
#2005,51,0.296,680,'3 3BR','1980-1989',58000
wc -l Combined_hud.csv
#154118 Combined_hud.csv
```

---
### csvlook

While `head` allows you to quickly observe the first few rows in a file, it doesn't attempt to format the rendered output at all. CSV files are tabular and it's incredibly useful to observe this structure and other data tools like Pandas and Microsoft Excel factored that notion in when displaying tabular data. Thankfully, we can use the csvlook tool to display tabular data in the table format we're used to.

The [csvlook tool](http://csvkit.readthedocs.io/en/0.9.1/scripts/csvlook.html) parses CSV formatted data from it's stdin and outputs a pretty formatted table representation of that data to it's stdout. Let's use csvlook to explore the first few rows from the CSV file we created in the last screen.

```shell
head Combined_hud.csv | csvlook
#|  year | AGE1 |  BURDEN | FMR | FMTBEDRMS | FMTBUILT  | TOTSAL |
#| ----- | ---- | ------- | --- | --------- | --------- | ------ |
#| 2 005 |   43 |  0,513… | 680 | 3 3BR     | 1980-1989 | 20 000 |
#| 2 005 |   44 |  0,223… | 760 | 4 4BR+    | 1980-1989 | 71 000 |
#| 2 005 |   58 |  0,218… | 680 | 3 3BR     | 1980-1989 | 63 000 |
#| 2 005 |   22 |  0,217… | 519 | 1 1BR     | 1980-1989 | 27 040 |
#| 2 005 |   48 |  0,283… | 600 | 1 1BR     | 1980-1989 | 14 000 |
#| 2 005 |   42 |  0,292… | 788 | 3 3BR     | 1980-1989 | 42 000 |
#| 2 005 |   -9 | -9,000… | 702 | 2 2BR     | 1980-1989 |     -9 |
#| 2 005 |   23 |  0,145… | 546 | 2 2BR     | 1980-1989 | 48 000 |
#| 2 005 |   51 |  0,296… | 680 | 3 3BR     | 1980-1989 | 58 000 |
```

---
### csvcut

Csvlook returned a table formatted output of the merged CSV file. Let's now explore individual columns using the `csvcut` tool. Using the `csvcut` command with just the `-n` flag parses and displays all the columns in a CSV file along with a unique integer identifier for each column:

```shell
csvcut -n Combined_hud.csv
#1: year
#2: AGE1
#3: BURDEN
#4: FMR
#5: FMTBEDRMS
#6: FMTBUILT
#7: TOTSAL
```

You can use the integer identifier for each column and the `-c` flag to select just a specific column:

```shell
csvcut -c 1 Combined_hud.csv
#outputs all the rows of the year column
```

You want to avoid displaying the entire column since it contains 154118 rows and your terminal window will severely come to a halt attempting to display all that information. Instead, you can pipe the column output to `head` to preview just the first `n` rows.

```shell
head -10 Combined_hud.csv | csvcut -c 2
#AGE1
#43
#44
#58
#22
#48
#42
#-9
#23
#51
```

---
### csvstat

Now that we know how to select specific columns, we can select a column and pipe it to the [csvstat tool](http://csvkit.readthedocs.io/en/0.9.1/scripts/csvstat.html#description) to calculate summary statistics for that column:

```shell
csvcut -c 4 Combined_hud.csv | csvstat
```

This calculates a full suite of summary statistics, including:

* max
* min
* sum
* mean
* median
* standard deviation

Depending on the size of the data, the full summary statistics for a column can take a long time and you often just want a specific summary statistic. You can use `--` flags to choose specific summary statistics, which will greatly improve the speed:

```shell
# Just the max value.

csvcut -c 2 Combined_hud.csv | csvstat --max

# Just the mean value.

csvcut -c 2 Combined_hud.csv | csvstat --mean

# Just the number of null values.

csvcut -c 2 Combined_hud.csv | csvstat --nulls
```

You can see a full list of flags in the documentation. If you want to calculate summary statistics over all the columns in a CSV file, you can pass the file to csvstat directly:

```shell
csvstat --mean Combined_hud.csv
#  1. year: 2008.9044232628457
#  2. AGE1: 46.511215505103266
#  3. BURDEN: 5.303764743668771
#  4. FMR: 1037.1186695822005
#  5. FMTBEDRMS: None
#  6. FMTBUILT: None
#  7. TOTSAL: 44041.841931779105

csvcut -c 2 Combined_hud.csv | csvstat
#  1. AGE1
#        <class 'int'>
#        Nulls: False
#        Min: -9
#        Max: 93
#        Sum: 7168169
#        Mean: 46.511215505103266
#        Median: 48
#        Standard Deviation: 23.04901451351246
#        Unique values: 80
#        5 most frequent values:
#                -9:     11553
#                50:     3208
#                45:     3056
#                40:     3040
#                48:     3006
#
#Row count: 154117
```

---
### csvgrep

`-9` is the most common value in the `AGE1` column, which is problematic since age values have to be greater than `0`. We can use [csvgrep](http://csvkit.readthedocs.io/en/0.9.1/scripts/csvgrep.html) to select all the rows that match a specific pattern to dive a bit deeper. By default, `csvgrep` will search all of the rows in the dataset but we can restrict the search to specific columns using the `-c` flag (just like with `csvcut`). We then use the `-m` flag to specify the pattern:

Let's display the first 10 rows from Combined_hud.csv where the value for the AGE1 column is -9 in a pretty table format. The behavior of `csvgrep` can be customized using the flags. For example, you can use the `-r` flag to pass in a regular expression as the pattern instead. We're now going to combine several of the tools we've talked about so far so that you can see the real power of using the `csvkit` tools combined with other CLI tools.

```shell
csvgrep -c 2 -m -9 Combined_hud.csv | head -10 | csvlook
#|-------+------+--------+------+-----------+-------------+---------|
#|  year | AGE1 | BURDEN | FMR  | FMTBEDRMS | FMTBUILT    | TOTSAL  |
#|-------+------+--------+------+-----------+-------------+---------|
#|  2005 | -9   | -9.000 | 702  | '2 2BR'   | '1980-1989' | -9      |
#|  2005 | -9   | -9.000 | 531  | '1 1BR'   | '1980-1989' | -9      |
#|  2005 | -9   | -9.000 | 1034 | '3 3BR'   | '2000-2009' | -9      |
#|  2005 | -9   | -9.000 | 631  | '1 1BR'   | '1980-1989' | -9      |
#|  2005 | -9   | -9.000 | 712  | '4 4BR+'  | '1990-1999' | -9      |
#|  2005 | -9   | -9.000 | 1006 | '3 3BR'   | '2000-2009' | -9      |
#|  2005 | -9   | -9.000 | 631  | '1 1BR'   | '1980-1989' | -9      |
#|  2005 | -9   | -9.000 | 712  | '3 3BR'   | '2000-2009' | -9      |
#|  2005 | -9   | -9.000 | 1087 | '3 3BR'   | '2000-2009' | -9      |
#|-------+------+--------+------+-----------+-------------+---------|
```

---
### Filtering Out Problematic Rows Using Csvkit

Let's select all rows where the value for `AGE1` isn't `-9` and write just those rows to `positive_ages_only.csv`. To accomplish this, we can redirect the output of `csvgrep` to a file. So far, we've only used `csvgrep` to select rows that match a specific pattern. We need to instead select the rows that don't match a pattern, which we can specify with the [`-i` flag](http://csvkit.readthedocs.io/en/0.9.1/scripts/csvgrep.html).

```shell
csvgrep -c 2 -m -9 -i Combined_hud.csv > positive_ages_only.csv
```

We should use `csvkit` whenever you need to quickly transform or explore data from the command line, but remember that it has a few limitations:

* Csvkit is not optimized for speed and struggles to run some commands over larger files

* Csvkit has very limited capabilities for actually editing problematic values in a dataset, since the community behind the library aspired to keep the library small and lightweight

---