## Review from the previous lecture

The previous lecture covered basic mathematical operations, variables, and lists. Last lecture also introduced you to conditional statements, loops, and basic plotting using matplotlib. Before moving forward, here is a quick review.

 **Loops** with **dummy variables** are useful to iterate over lists and perform operations on each element.

Conditional statements like **if** and **else** are used to implement more complex logic.

The other kind of loop that you might encounter more often while working in Python is the `for` loop. A for-loop has the dummy variable "built in," in a sense.

The `range` function can be iterated over to produce a regular sequence of numbers.

`range` can be used as `range(end_)`, or as `range(start_, end_)`, or as `range(start_, end_, step_)`. Here's the [official documentation](https://docs.python.org/3/library/functions.html#func-range) as well as an [easier-to-read explainer webpage](https://www.w3schools.com/python/ref_func_range.asp).

Finally, recall that the **matplotlib** module can be used to plot data.

First, import the module - this is always the first step whenever you're using a python module, but it's easy to forget! Then, we use a magic command that makes the figure appear within the cell.

# Lecture 2 - NumPy, Functions, and Data

Today, we will learn about NumPy, learn how to define our own functions, and learn about handling data in Python.

1. An introduction to **numpy** and **numpy arrays** and a discussion of their usefulness in solving real problems
2. **Functions** and how to define new ones. 
3. Reading in **data** from text and numpy file formats, along with creating your own outputs to be used later
    

## A. Introduction to Numpy Arrays - Initialization and Advanced Indexing

The Python `list` is a fast and flexible built-in data type, but because of its flexbility, it is limited in the operations we can perform on it. A popular scientific computing package called `numpy`, short for "numerical Python", can help by way of its incredibly powerful object type: the `array`.

In [None]:
c = list(range(10))

<a id='ranges'></a>
There are a few easier ways to create arrays besides creating a list and turning it into a numpy array. These include:
* `np.arange(start_,stop_,step_)`
* `np.linspace(first_,last_,num_)`

(And the accompanying official documentation pages: [numpy.arange](https://numpy.org/doc/stable/reference/generated/numpy.arange.html), [numpy.linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html#numpy.linspace))

These create arrays of numbers within a range with a fixed step-size between each consecutive number in the array.

`np.arange(start_,stop_,step_)` works just like the `range` function we introduced at the beginning of this lesson! But instead of the mysterious `range` object type, the numpy function returns a nice, neat numpy array.

<a id='empty'></a>
Sometimes it is handy to create an array of all constant values, which can then be replaced later with data. This can be done in several ways by using the following commands:

* [`np.zeros(size_)`](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) To fill the array with zeros
* [`np.ones(size_)`](https://numpy.org/doc/stable/reference/generated/numpy.oness.html) To fill the array with ones
* [`np.empty(size_)`](https://numpy.org/doc/stable/reference/generated/numpy.empty.html) To fill the array with arbitrary values

These create arrays of the given size, filled with zeros, ones, or arbitrary values, depending on your specific needs. Great for initializing an array to store important data in later!

<a id='indexing'></a>
We can also assign new values to elements of existing arrays, using the following "square bracket" notation. This is the same as the list indexing we taught you in Lecture #1!
> `array_name[index_number] = value` 

This command will replace whatever value is currently in the position corresponding to ``index_number`` in the array called ``array_name`` with the value stored in ``value``.

Recall that arrays are numbered starting from 0, such that

* Index of first position = 0
* Index of second position = 1
* Index of third position = 2
* etc.


**Negative indexing** is the same as normal indexing, but backward, in the sense that you start with the last element of the array and count forward.  More explicitly, for any array:

* array[-1] = last element of array
* array[-2] = second to last element of the array
* array[-3] = third to last element of the array
* etc

Sometimes it's useful to access more than one element of an array. Let's say that we have an array spanning the range [0,10] (including endpoints), with a step size of 0.1. If you recall, this can be done via the `np.linspace()` or `np.arange()` functions.

<a id='slicing'></a>
In order to get a range of elements in an array, rather than simply a single one, use **array slicing**:

* `array_name[start_index:end_index]` To grab all of the values from `start_index` to `end_index - 1`
* `array_name[:end_index]` To grab all of the values up to `end_index-1`
* `array_name[start_index:]` To grab all of the values from `start_index` and beyond

In this notation, ":" means you want everything between your start and end indices, including the value to the left but excluding the value to the right.

## B. Functions

### More Numpy functions

The previous section introduces a built-in Python function, `range`, as well as a couple `numpy` module functions, `np.arange` and `np.linspace`, but there are many, many more functions that you'll encounter.
Functions are the most fundamental way to process data in Python; they take some inputs, which they may alter, and they (usually) return a result.

\begin{gather}
function(input) \rightarrow output
\end{gather}

<a id='built-in'></a>
In general, most common mathematical functions like sqrt, log, exp, sin, cos can be found in the numpy module.

For more information, the documentation of many **built-in functions** that can be applied to integers and floats (i.e. single numbers), as well as **numpy arrays**, can be found here: https://docs.scipy.org/doc/numpy/reference/routines.math.html

When in need of a mathematical function like one of the above, or even a more complicated one (like the "erf" function, or "Riemann zeta function", if you've ever encountered those), **never** spend time writing it yourself. Nearly every mathematical function one can think of already exists in some well-documented module. Check [numpy](https://numpy.org/doc/stable/reference/) and [scipy](https://docs.scipy.org/doc/scipy/reference/); google search "numpy `<function name>`" or "scipy `<function name>`" and you should find your desired function.

If `numpy` and `scipy` don't have your function, it is likely that someone has posted it on StackExchange or StackOverflow or somewhere.
Google search "python `<function name>`".

Copy+paste from the Internet is a very efficient way to code, and we do it all the time.
If you're worried about things like plagiarism, you should paste the link to the webpage where you found the code as a comment in your notebook.
Then you've cited your source!

### Defining your own functions

Python allows you to define your own functions, too.
**User-defined functions** allow you to clean up your code and apply the same set of operations to multiple variables.
Organizing your code into functions is also a good way to write code that other people will use.

<a id='user-def'></a>
The outline for a function is:
```python
def <function name> (<input variables>):
    <some code here>
    return <output variables>
```

When defining your own functions, you can also use **multiple input variables**.


<a id='multi-line'></a>
A note about comments: it's good Python etiquette to comment the functions you write.
Even if you never intend to share your code with anyone else, it is extremely useful to write yourself reminders about what your functions do.
You can even include an example (this is pretty common in professional documentation).

One common way to document the functions you write is with a "**multi-line comment**" immediately following the `def` line.
A multi-line comment starts and ends with three double quotation marks.

In [None]:
def my_function(arg):
"""
This is my function. It's for doing this one important thing.
This function needs one argument, which should be a single number.
The function will return another number.

Example:
>>> my_function(3)
7
"""
    return arg + 4


## C. Loading And Saving Data Arrays

Up to now, the data that you have handled has been self-defined: you have constructed an array and fill that array with values that you operate on all in the same code. Often, in scientific programming, this is not the case. One program or piece of equipment creates and stores data, while another loads, operates on, and analyzes it. Thus it is essential to learn the ways that one can **load** and **save** data in python.

<a id='loadtxt'></a>
The simplest way to **load data from a plain-text file** with numpy is using [`numpy.loadtext`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html). At its simplest, the usage is:
* data_array_name = `np.loadtxt('path_to_file')`

  (Note that the path should be listed as a string!)
  
From the documentation page, it seems there a lot of **optional arguments** that let you specify more precisely how you want to read the data, but at its most basic, the function will work the way we have shown above.

<a id='path'></a>
A file's **path** is its specific location in the file structure of your computer. This is most often defined relative to your current place in the file structure. To specify the folder you are currently working in (your **current working directory**), use a single period followed by a forward slash. To specify a subfolder or file within a folder use the name of the subfolder or file, always following a folder name with a forward slash.

* `./` = Path to the current folder that you are working in.
* `./subfolder1/` = Path to a folder that exists inside of your current folder, named "subfolder1"
* `./subfolder1/myfile` = Path to a file that exists inside of a folder tht exists inside of your current folder
* `../` = Path to the folder that contains the folder you are currently working in

<a id='matrix'></a>
By convention, we first specify the row index followed by the column index.
- `array_name[n,:]` is the n-th row, and all columns within that row.
- `array_name[:,n]` is the n-th column, and all rows within that particular column.

<a id='conditional'></a>
In order to choose only the parts of an array that meeto some criteria, you can use **conditional indexing** in place of normal indices. This involves taking a **conditional statement** (more on those later) and testing whether it evaluates to True on each element in the array.

This gives an array of Booleans, which you can use as **logical indices** to select *only* the entries for which the logical statement is `True`. 

<a id='save'></a>
Here we cover two main ways to **save data files** for use again later, one that is Python-specific, and the other a simple text format.

* [`np.save('file_path', array_name)`](https://numpy.org/doc/stable/reference/generated/numpy.save.html) - Creates a `.npy` file (Python readable only!)
* [`np.savetxt('file_path', array_name)`](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html#numpy.savetxt) - Creates a `plain text` (or `.txt`) file (more generally readable)

The basic syntax is pretty much the same for each. What differs is the type of file that the functions create. Below is an example of how each function can be called to store the same data.

After saving a data file, you can load it up again using `np.loadtxt()` and `np.load()` for .txt and .npy files respectively. 

## D. Loading data files automatically

Let's combine what we learned about loops to make our data workflow more efficient. Suppose we have a set of data saved in separate text files that we would like to load automatically. For example, in `./lecture2_data/` you will find files `c1.dat`, `c2.dat`, `c3.dat`, `c4.dat`, `c5.dat`, `c6.dat`. 

Rather than loading each of these files individually, you can use a for (or while) loop, constructing a string at each iteration corresponding to each of these files. 

In Python you can use `+` to concatenate strings together. Here's an example. 

In [None]:
first_string = 'a'
second_string = 'b'
print(first_string + second_string)

ab


You can also cast an integer to a string using the `str` command.

In [None]:
first_string = 'a'
second_string = str(1)
print(first_string + second_string)

a1


You can use these "string manipulation" techniques to automatically generate filenames to quickly save multiple arrays to individual files.

So, to summarize, not only can you manipulate arrays, but now you can save them and load them. In a way, those are some of the most important skills in scientific computing. Almost everything you'll be doing requires you know this, and now that you've mastered it, you're well on your way to being an expert in computational physics and astronomy!

# Summary/References
<a id='summary'></a>

## Arrays
* Create [arrays](#array) from lists using [`np.array(listname)`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)
* Create [evenly spaced arrays](#ranges) of numbers using:
  * [`np.arange(start, stop, stepsize)`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html)
  * [`np.linspace(start, stop, num_points)`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)
* Create ["empty" arrays](#empty) using:
  * [`np.zeros(size)`](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) for an array of all 0
  * [`np.ones(size)`](https://numpy.org/doc/stable/reference/generated/numpy.ones.html) for an array of all 1
  * [`np.empty(size)`](https://numpy.org/doc/stable/reference/generated/numpy.empty.html) for a quickly generated array of nonspecific values
* [Index](#indexing) and [slice](#slicing) your arrays using brackets, colon notation, and [conditional statements](#conditional), e.g.:
  * `myarray[0]` for the first (zeroth) element
  * `myarray[-1]` for the last element
  * `myarray[:]` for the whole array
  * `myarray[5:]` for everything after and including the 5th element
  * `myarray[:20]` for everything up to but not including the 20th element
  * `myarray[myarray > 10]` for all values greater than 10 in your array
  * `mymatrix[:, 0]` for all values in the first column of a [matrix](#matrix) (2D array)
  * See [here](https://www.google.com/url?q=https://stackoverflow.com/a/4729334&sa=D&ust=1608320322998000&usg=AFQjCNGmI429xTVOP87NgDrSRyL3xRkVgg) for another series of explanations of slicing and indexing lists/arrays in Python!
* Determine the [size and shape](#shape) of your array using [`myarray.shape`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html)


## Functions
* Use (and find) [built-in](#built-in) numpy functions and variables
  * [`np.sin(x)`](https://numpy.org/doc/stable/reference/generated/numpy.sin.html), [`np.cos(x)`](https://numpy.org/doc/stable/reference/generated/numpy.cos.html)
  * [`np.exp(x)`](https://numpy.org/doc/stable/reference/generated/numpy.exp.html), [`np.log(x)`](https://numpy.org/doc/stable/reference/generated/numpy.log.html)
  * [`np.sqrt(x)`](https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html)
  * [`np.pi` (and other constants)](https://numpy.org/doc/stable/reference/constants.html)
* [Create](#user-def) functions using the following format:
```python
def <function name> (input1, input2, ...):
    <some code here>
    return output1, output2, ...
```
* Make [multi-line comments](#multi-line) surrounded by `"""` triple quotes
  * See the first answer at [this page](https://stackoverflow.com/questions/7696924/is-there-a-way-to-create-multiline-comments-in-python) for an example.

## Loading & Saving Data
* Define a [path to a file](#path) with notation like `"./subfolder/filename"`, where `.` means your current directory
  * See [this page](https://docs.oracle.com/javase/tutorial/essential/io/path.html) for a brief description of paths (written for java, but still applicable!)
  * See the [pathlib](https://docs.python.org/3/library/pathlib.html) module for how it's supposed to be done in Python3, (and [this](https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f) beginner-friendly article) if you're interested!
* [Load data](#loadtxt) from a text file with [`np.loadtxt(FilePath)`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html)
* [Save data](#save) to a text file with [`np.savetxt(FilePath, ArrayName)`](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html)
* (And do the same with `.npy` [Python specific files](#save) using [`np.load`](https://numpy.org/doc/stable/reference/generated/numpy.load.html) and [`np.save`](https://numpy.org/doc/stable/reference/generated/numpy.save.html) and the same arguments)
* Use lists and loops to load several files with regular naming schemes
  * See final exercise above