# To Jupyter... And Beyond!
**Start the day with students looking at my screen. They will open their own screens in part 2**

## A typical project workflow.

The steps are only loosely chronological. Usually, these are all happening at the same time.

1. **An idea**
    - e.g. "If big black holes are formed from smaller black holes, more massive black holes would spin faster. I think more massive black holes spin faster than less massive black holes."

2. **Exploratory plots in a jupyter notebook:** 
    - Most analyses start in a jupyter notebook, just like the ones we've been using in class. 
    - Yay! The comfort of familiarity. You usually start with the lines you know and love: 
    ```python
import numpy as np
import matplotlib as plt
import pandas as pd
```
    - Exploration often happens with a small subset of your data to make playing around manageable. 
    - e.g. take only 50 black hole observations (rather than 1000s) and make a quick-and dirty plot via
    ```python
data = pd.load_csv('black_holes.csv')[:50]
data.plot.scatter(x='mass',y='spin')
```
    - Iteration is a necessary part of this process! Your first idea will rarely be your best one.
    
3. **Develop robust analysis**
    - Take your explorations and try to prove your ideas right or wrong using ~ statistics ~
    - "Hm, mass and spin don't seem correlated, can I show that their non-correlation is statistically significant?"
4. **Automate analysis (outside of the notebook)**
    - This allows you scale your analysis up to larger datasets, makes it more portable so you can use it on a cluster, and makes it more useable by others.
    
5. **Make pretty plots and write a paper**
    - Yay!


This lesson is to help with step 4. Today, we will teach you what you need to know to run python outside of these class notebooks. As a result, today's notebook will serve as more of an outline than a workbook. We will come here periodically, but most of the class will be following along on your computer as the instrutors write code live. Today will also be more interactive, with instructors looking for answers from in-person students. We will cover:
- The terminal
- python in the terminal
- `.py` files
- Writing your own libraries and modules

*But If I leave the comfort of the jupyter notebook, where will I go??*

We know, the world is scary, and computers are even scarier. Luckily, we are here to show you how to write python code elsewhere. The first place we will go is the terminal. After that, we will move on to simple text editors. Then, we will go to jupyter hub, which is basically a jupyter notebook. Not so scary after all!

## The Command Line

The command line is kind of like a much more powerful Finder, but you have to type everything instead of clicking. You can move files, copy them, create them, rename them, move between folders, create them, rename them. In the command line, you can launch any application you want, open any file you want, search for files, search for strings in files, and much more. Think of it as a way to speak more directly to your computer.

The command line is a very powerful interface for your computer, but it is also a universal interface for all Unix-based computers (Macs and PCs running Linux or other Unix operating systems, but not Windows machines). Therefore, the command line is how you will interact with computing clusters/supercomputers. Most clusters don't have a way to control them graphically (i.e. by clicking on things). Instead, they just have a command line interface. Even if they do have a graphical interface, its usually very cumbersome, slow, and limited. Therefore, learning command line syntax is the way to go. Therefore, when you eventually scale up your code to use clusters, you might find it helpful to come back to this notebook.

The languages most commonly used on the command line are `bash` and `zsh`. They are very similar, and for our intents and purposes, they are the same. I'll probably just refer to both of them as `bash`.

### Commands you know and love
Let's start with the usual commands we do every day. Open a new terminal window and type the following:
```
cd <your_path>/intro-programming-2022 
```
`cd` means "change directory". So this command changed our directory from wherever the terminal started to your intro programming directory. Here, the _command_ is `cd` and the _argument_ is the path you would like to navigate to. 
```
git pull
```
Git is a software installed on your computer, like python or microsoft word. This line says "run the software called git. Now that you know I'm talking to the software called git, I want you to specifically run the part of the software that pulls modifications from an online repository to my local version"
```
cd notebooks
```
Again, change directories. This time, we move to the `notebooks` directory.
```
jupyter notebook
```
Jupyter is another software installed on your computer. This line says "run the software called jupyter, and specifically run the notebook part of that software." Now, Open today's notebook and give a green check ✅ when you're ready.

### New directory-navigation commands
Now for some more commands related to directory navigation. 

It might be useful to have this picture of our directoy stucture in your head as we proceed. 
```
intro-programming-2022
    |
    - README.md
    - notebooks
        |
        - Day1_Setup.ipynb
        - Day2_Data_and_Storage.ipynb
          ...
    - data
        |
        - Pokemon.txt
        - star_formation_rate_MD.csv
         ...
```
Here, `intro-programming-2022` is _above_ `data`, and `Pokemon.txt` is _below_ data. `data` is _below_ `intro-programming-2022` and _level with_ `notebooks`.

0. Open a new terminal tab or window. Then, navigate to the `notebooks` directory like we do every day. Give a ✅ when you get there or a ❌ if you run into errors.

1. Lets make a new directory called "my_directory."
    ```
    mkdir my_directory
    ```
    Here, the _command_ is called `mkdir` for "make directory" and the _argument_ is the name of the directory. The user supplies the argument, and in this case we decided to pass `my_directory`. 
2. Lets check that we have made the directory by listing everything in our current directory.
    ```
    ls
    ```
    Here, `ls` stands for "list". You should see listed all the contents of the `notebooks` directory, where we currently are. Check that, in addition to all of the notebooks we've used so far, you see `my_directory`.
    - Give a ✅ when you've successfully found `my_directory` in the list of `notebook`'s contents.
    - Sidenote: You can list files in other directories without navigating into them. Try listing everything in the above directory
    ```
    ls ../
    ```
    Where `..` means "move up one". You could also do `../../` to move up two, and so on. Try listing everything in the `data` directory.
    ```
    ls ../data/
    ```
    - Sidenote: You can list files that have a specific pattern in them using _wildcards_. List all of the jupyter notebook files in the current directory:
    ```
    ls *.ipynb
    ```
    List all of the `.csv` files in the `data` directory:
    ```
    ls ../data/*.csv
    ```
    Give a ✅ when you've done these.
    
    Note: You can use wildcards for many commands in the command line, not just `ls`.
3. Navigate into our new directory
    ```
    cd my_directory
    ```
    Again, `cd` means "change directory."
4. Check that we've navigated there
    ```
    pwd 
    ```
    This command means "print working directory" and the output gives the full path to your current location in the file structure. Give a ✅ if its output makes sense to you, and a ❌ if not.
5. Make a text file in the directory called "my_file.txt" with the words "Command lines are cool" in it. You can do this in many ways, but here is a simple one.
    ```
    echo "command lines are cool" > my_file.txt
    ```
6. Open the file in your default text editor.
```
open my_file.txt
```
Here, `open` is the command and the argument is the filename, in this case `my_file.txt`.

7. Move/rename the file
``` 
mv my_file.txt my_cool_file.txt
```
Here, `mv` means "move". This command moves contents of files to new locations. In this case, the new location was the same directory but inside a file with a different name. Check it worked by running `ls`. Double check by running `open my_cool_file.txt`. Give a ✅ when you've confirmed your change.

7. Delete the file
```
rm my_cool_file.txt
```
Check that it is really gone with an `ls` and and `open`. 
Note: `rm` can't be undone! There is no recycle bin here. Regular backups are important. 

7. Tab completion: Look at Amanda's terminal for this.

Once you get the hang of these commands, they become much, much faster than clicking around in a Finder window. 

### New fancier commands

1. Download stuff from the internet using `curl`
    - `curl` is short for "Client for URLs." Check out the documentation [here](https://curl.se/docs/manpage.html).
    - We will download the "research and development survey" from this website: https://www.stats.govt.nz/large-datasets/csv-files-for-download/
    - do this using `curl` in two ways:
    ```
    curl https://www.stats.govt.nz/assets/Uploads/Research-and-development-survey/Research-and-development-survey-2021/Download-data/Research-and-development-survey-2021-CSV-notes.csv
    ```
    This downloads the file and displays it. But if we want to save the file on our computer, we will use the `-o` option and specify a file. We will save the data to a file called "RnD.csv" in the output directory.
    ```
    curl -o ../output/RnD.csv https://www.stats.govt.nz/assets/Uploads/Research-and-development-survey/Research-and-development-survey-2021/Download-data/Research-and-development-survey-2021-CSV-notes.csv
    ```
    Once you do this, run `ls ../output/` and see if your new file is there.
    
    - Give a ✅ when you see your new file.
1. Find files using `find`
    - Finally! A command named something that makes sense. This is used when I'm looking for a file but I don't know where it is on my computer. Check out the documetation [here](https://linux.die.net/man/1/find).
    - The general form is: `find <starting directory> <matching criteria and actions>`
    - So, if we want to find all `txt` files in the "output" directory, we can do 
    ```
    find ../output -name "*.txt" -print
    ```
    The first argument is the place to look for your files (here, it was `../output`) then, its all optional arguments. These are a lot like keyword arguments in python. The first optional argument is `-name` which accepts a pattern. This is the pattern we are searching for (here, it was `"*.txt"` where we used `*` as a wildcard. The second optional argument is `-print`, which tells find to print the output of its search. Try to run the command.
    - Give a ✅ when you've found your files.
    
1. Find contents in files using `grep`
    - Syntax: `grep <string to search for> <options> <starting directory>`
    - Look for all lines in all files that have the string "Huron", starting one level up, and going "recusrively" through all directoryies under our starting directory:
    ```
    grep "Huron" -r ../
    ```
    Here, "Huron" is the string we want to search for, "-r" is telling `find` to go recursively (meaning go through each sub directory under the starting directory), and "../" means start one directory up from where we are now.
    - **_Coding Check-in_**: Try looking for your own string! Give a ✅ when you're done.


It is very useful to keep a small text file (or other set of notes) of useful commands. Unlike `python`, `bash` and `zsh` don't have very intuitive syntax or naming conventions, so it can be hard to memorize their commands. Katie, Maria, and Amanda all keep a short list of useful commands, especially ones with confusing syntax, along with explanations of how to use them.

## Python in The Terminal

Follow these steps:
1. Open a new terminal tab or window
2. Navigate to your working directory, like we do every day. We will work out of the `notebooks` directory.
3. Type `python`
    - Your terminal should display something similar to the following:
    ```
    Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
    [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    ```
    - If it displays `Python 3.X`, where `X` is any value, you are good!
    - If it displays `Python 2.Y` for any value of Y, type `exit()` and press enter. Then type `python3`. Ask a TA if this doesn't work.
    
4. You are now running python interactively! Let's try a few things out to see how interactive python works.
    - Type a print statement with a string as the argument to `print()`. Press enter
    - Type a mathematical expression (e.g. `1+1`) and press enter
    - Notice that inputs are always on lines denoted by `>>>` or `...` and outputs start newlines without these characters.
5. Let's try some more nuanced commands that we are used to using on Jupyter Notebooks.
    - Try some of our usual imports (e.g. `import numpy as np`, then try `import matplotlib.pyplot as plt`)
    - Define a variable, calling it whatever you want and giving it whatever value you want, using any datatype you want. Press enter.
    - Type the variable's name. Press enter.
5. Now let's try some code we've seen before to see what happens in interactive mode.
    - First, I'll copy some things from Day 2, where we had some fun with lists. Start by replicating the following line in your terminal, then follow along with the instructor. Try to predict what will happen with each line before pressing enter.
    ```python
    animals = ['cat', 'dog', 'squid', 'moose', 'falcon']
    ```
    - Now, we will try some things using `pandas`. Remember, to use a package we need to import it first. We haven't imported `pandas` yet, so lets do that now.
    - We're going to load in some data that we used earlier. This data is stored in a csv format. I don't quite remember the syntax for the `pandas` function for this, but we don't want to leave the terminal because that's a lot of work, so lets try the built in `help()` function. 
    - It works very similarly to using it in jupyter! you can even scroll. However, there's no buttons to close out of the viewing mode once you're in it, so you'll have to type `q` to leave after you've read everything you need.
    - You can re-run the same line without having to re-type the whole thing by using the "up" button on your keyboard.
    - Looking at the docstring, it seems like the first argument is the file name, and all other arguments are optional. the default delimiter is `,`, which is exactly what we want, so we can just go with default behavior.
    - Load in the file but don't assign it to a variable. You'll have to figure out the path. The file's name is `Pokemon.csv` and it is in the `data` directory. 
    - You should see the column names as well as the first and last 5 rows of data.
    - Now load in the file but this time assign it to a variable. I'll call my variable `data`. Notice how there's no output now.
    - _**Coding check-in:**_ Play around with your dataframe. How many Pokemon have Grass as their first type? What's the mean attack value?
6. How does plotting work?
    - With our nice graphical user interface (GUI) gone, we're stuck with a very basic looking screen that doesn't seem like it has support for the beautiful plots you've learned how to make. What happens when you try to make a plot?
    - We've already imported `matplotlib` and `numpy`, so we have all the libraries we need for a simple plot. 
    - Make an array full of increasing `float` values.
    - Plot the square of that array vs the array itself.
    - you might get taken away from the terminal for seemingly no reason. There is a reason, just go back to the terminal for now.
    - Run `plt.show()`. A plot should pop up.
    - You must close your plot to have access to the command line again.
7. Functions
    - Write a simple "hello world!" function.
    - You have to provide your own indentation in interactive python. Otherwise you will get the dreaded `IndentationError: expected an indented block`!
    - An empty newline indicates you're done writing the function. 
    - `...` indicates being inside an indented block, `>>>` indicates being outside of it
    - Call your function.
    - Write a function that takes in one argument, applies something to it, and returns a new value.
    - Call your function.

If I'm writing a super long function or a complex plotting script and I make a typo, I have to go and redo everything. Even with our newly-beloved "up" key, this is extremely annoying. An obvious solution might be to go back to jupyter notebooks, but there are many situations in which that is not possible. Instead, we will move on to a generally useful tool, `.py` scripts.

9. Leave interactive mode by typing `exit()`. Don't close your terminal window.

## `.py` Files
1. Open up your favorite text editor. If you don't have a favorite, we recommend BBEdit (https://www.barebones.com/products/bbedit/). You probably installed this as part of the setup for the course.
2. Save the file to the same directory as all the notebooks are in (`notebooks`). Use whatever name you want, but make sure there are no spaces and no special characters (besides underscore `_`), and give it a `.py` extension.
3. On the first line, write the shebang. For python, it looks like this: 
```
#!/usr/bin/env python
```
    - When you execute a file from the shell, the shell tries to run the file using the command specified on the shebang line. The # character is used because it defines a comment in most scripting languages (including python), so the shebang line will be ignored by the scripting language by default.
    - Some background on the shebang which is not necessary to know but may be interesting to some: "The shebang line was invented because scripts are not compiled, so they are not executable files, but people still want to "run" them.  The shebang line specifies exactly how to run a script.  In other words, this shebang line says that, when I type in `./my_script.py`, the shell will actually run `/usr/bin/env python my_script.py`" - Sam King
4. Write a script that prints "hello world!" when run.
4. Save your file.
5. Run your simple script in the terminal.
    - Go back to your terminal. 
    - Type `ls`. You should see all the notebooks, plus the new file you just made. If not, you probably got confused about paths. Ask a TA for help.
    - Type `chmod 755 <your-script-name>.py`, replacing "your-script-name" with whatever name you gave your script. This command gives your computer permission to actually *run* your new script, rather just to read it and write to it. You need to do this every time you create a new excecuteable or change and excecuteable's file name, but not every time you edit an existing one.
        - if you get a "command not found" error, that's fine. You'll just have to type `python <your-script-name>.py` rather than `./<your-script-name>.py` for the rest of the class
    - Type `./<your-script-name>.py`, again replacing "your-script-name" with whatever name you gave your script. Press enter.
    - Try a different way. Type `python <your-script-name>.py`. It still works! If you forget to type the shebang or forget to give excecuteable permissions, you will have to type `python` before your script name every time. This is annoying and should be avoided.

### Putting things together: editing and re-running `.py` files in the terminal
1. Edit your script and run it again.
    - Change the output of your print statement.
    - Add a comment to see if it works the same way as in jupyter notebooks
    - Save it
    - Go to the terminal and type `./<your-script-name>.py`
    - Notice how your changes were perpetuated.
2. In the same file, write code that makes a plot and run it. 

There's a lot more we can do here, like making scripts that take user-defined arguments (similar to functions, but you edit the arguments at runtime). These are cool, but beyond the scope of this class. I recommend you looking up how to this when you get to the point in your research career where you're writing code for others to use. A good built-in package for this is `argparse`: https://docs.python.org/3/howto/argparse.html

### Using `.py` files to house your own libraries
The real advantage of `.py` files over jupyter notebooks is their _portability_. Because of their simplicity, You can easily send collaborators little snippets of code or long scripts (via github or similar services) in the `.py` format, wheras jupyter notebooks are really just too bulky to be sent around. Collaborating with others on a project is also basically impossible with jupyter notebooks, but is really straightforward with the `.py` file format.

Importantly, the `.py` format allows you to use the same code over and over again in different projects without having to copy and paste that code into all of your notebooks. Imagine if every time you wanted to use a function from `numpy`, you had to copy all 770,945 lines of the source code into your jupyter notebook! That would suck, especially because half of the numpy code is actually in `C`, a completely different programming language that jupyter doesn't know how to deal with. We don't have to do this because `numpy` has been made into a [library](https://librarycarpentry.org/lc-python-intro/06-libraries/index.html). Now we will show you how to use the `.py` format to create your own libraries and [modules](https://www.analyticsvidhya.com/blog/2021/07/working-with-modules-in-python-must-known-fundamentals-for-data-scientists/#What-are-Python-Modules?). (A library is a collection of modules. A module is a `.py` file.)

1. Open a **new** file in BBEdit or your preferred text editor.
2. Add the python shebang.
3. Write a function named `adder` that takes in two numbers as arguments and returns their sum.
```python
#!/usr/bin/env python3
def adder(x,y):
    return x+y
```
4. Save your new file: give it a `.py` extension and put it in the `notebooks` directory. Name it whatever you want, but make sure there are no spaces and no special characters (besides underscore `_`)

5. Go back to your terminal and fire up a new interactive session in your `notebooks` directory by typing `python` (or `python3`, depending on what you had to do before).
6. Try to use your function: type `adder(5,6)`. What output do you get?

7. Just like we do with our other libraries, we need to import our module to use its contents. Try `import <your_script_name>` but replace "`<your_script_name>`" with your actual script name. **Do not include .py**
8. Now try to use your function, similarly to how you use functions from other modules: `<your_script_name>.adder(5,6)`. Again, replace "`<your_script_name>`" with your actual script name. Ask TAs if this step doesn't work.

9. Writing the module name before the function you want to use can be tiresome sometimes, especially when you only want a few functions from your module. Try this instead:
```
>>> from <your_script_name> import adder
>>> adder(5,6)
11
```
Pretty nifty, right?
10. Import your module into this notebook and then try to run `adder` in the cells below.

In [None]:
# import here, replacing ? with your script name
import ?

In [None]:
# use here, replacing the ? as necessary
?.adder(?,?)

11. Look at Amanda's screen to see an example of how you can create your own modules to house code you want to use in multiple projects.
12.  Go back to BBEdit and edit your script to make the `adder()` function print something before returning the output. Save your changes.
```python
#!/usr/bin/env python3
def adder(x,y):
    print("I love coding!")
    return x+y
```
13. Run `adder(5,6)` in the cell below. Did it print anything?
```python
my_module.adder(5,6)
## hmm that doesn't work. maybe:
import my_module
my_module.adder(5,6)
# still doesn't work!
```

In [None]:
# use your function again here

The reason for this behavior is the same reason why interactive python and jupyter notebooks work in the first place.
In order to provide users with an interactive experience, modules remain "alive" even when they are not being actively called. (This is opposed to other "compiled" languages like C and fortran, which aren't capable of running interactively in the style we've used in this course.) For example, a numpy array doesn't get erased from one cell of a jupyter notebook to another. Instead, it remains alive in your kernel. The same is true for modules. When you re-import a module, python sees the line and recognizes the module since the module is still alive in the kernel. Because of this, it doesn't re-read the files associated with the module since it thinks its already read them. Thus, changes aren't effective.

Therefore, to get the changes we made, we need to restart the kernel and re-run the cells. 

14. Restart your kernel.
15. Re-run your import and usage cells. Does your function call cause a print statement now?
16. Restarting your kernel every time can be somewhat annoying. Here's a workaround for notebooks to manually force python to re-read all modules when they are loaded in a second time. 
    1. Run the below cell
    1. Edit your `adder()` function in the text editor to have it print out a different string.
    1. Import your module again, without restarting your kernel. Do you get the revised output or the original one?

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# edit your function, then
# re-import your module and use it here
import ?
?.adder(5,6)

#### Plotting

1. Add a plotting function called `plot_line()` to your module by writing it into your script. Don't delete the `adder()` function. Your plotting function should be very simple. Have it take in two arguments and plot them against each other.
1. Remember to load whatever libraries you need! You probably want matplotlib at the very least:
```python
import matplotlib.pyplot as plt
```
You might also want seaborn, numpy, or pandas:
```python
import seaborn as sns
import numpy as np
import pandas as pd
```
These go into your external script, not your notebook. If you want to use them again in your notebook, load them separately into your notebook.

1. Use your function in the celsl below.
    1. Create two lists of the same length. You could also make two 1-D arrays of the same length instead, if you want.
    1. Reload your module
    1. Call your function and pass your arrays as arguments
    1. Behold your beautiful plot! Then give a ✅
    1. *IF* and only if you can get the above steps to work, and you are just waiting for Amanda to move on to the next step, add some keyword arguments (`**kwargs`) to your function that customize the plot.

In [None]:
# create two lists or 1-D arrays of the same length
x = # fill me in!!
y = # fill me in!!

In [None]:
# reload your module
import ? # replace ? with your module name!

In [None]:
# call your function and pass x and y as arguments


Now we will try changing the inputs to your function and seeing what we get out. Replace the `?` with your module name and then run the following cell.

In [None]:
x= np.linspace(5,30)
?.plot_line(x, x**2)
?.plot_line(x, 3*5)

See how easy it is to use functions multiple times? you might want to make similar plots in multiple projects, so this is how you would do it.

That's it for this part of the lesson! We will now go over to some slides to conclude the course :)