# Contents



1.   [**Review**](#Review)
2.   [**Intro to Numpy Arrays**](#intro)
  
  a. Negative Indexing [Exercise](#neg_ex)

  b. Ranges [Exercise](#range_ex)

  c. Indexing & Slicing [Exercises](#slice_ex)
3. [**Functions in Numpy**](#numpy_funcs)

  a. Numpy Trig Function [Exercises](#trig_ex)
4. [**User-Defined Functions**](#user_funcs)

  a. Path Length Function [Exercises](#ex_pathlen)
5. [**Loading and Saving Numpy Arrays**](#load_save)

  a. Time-Series Data [Exercises](#ex_timeseries)
6. [**Loading Data "Automatically"**](#sec_load)
7. [**Summary and Further Resources!**](#summary)

## Review from the previous lecture
<a id='Review'></a>


The previous lecture covered basic mathematical operations, variables, and lists. Last lecture also introduced you to conditional statements, loops, and basic plotting using matplotlib. Before moving forward, here is a quick review.

*Instructor:* First, let's make a list to play with.

In [None]:
ourList = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

 **Loops** with **dummy variables** are useful to iterate over lists and perform operations on each element.

*Instructor:* For example, say we want to print `ourList` with each element multiplied by 10. We can use a **while** loop to do this:

In [None]:
i = 0
while i < len(ourList):
    num = ourList[i] * 10
    print(num)
    i = i+1

Conditional statements like **if** and **else** are used to implement more complex logic.

*Instructor:* What if we wanted to print out all the elements in `ourList` that are smaller than 5? We can use conditional statements to do this:

In [None]:
i = 0
while i < len(ourList):
    num = ourList[i]
    if num < 5:
        print(num)
    else:
        print("I only print numbers less than 5!")
    i = i+1

The other kind of loop that you might encounter more often while working in Python is the `for` loop. A for-loop has the dummy variable "built in," in a sense.

In [None]:
for num in ourList:
    print(num)
    #Edited to print only less than 5 (remove above print statement)
    if num < 5:
        print(num)
    else:
        print("I only print numbers less than 5!")

*Instructor*: Try editing the above code to make it only print the numbers in `ourList` that are less than 5!

*Instructor:* You might find yourself wanting a regular sequence of numbers, like what we wrote into `ourList`, but without handwriting them yourself. You can use the built-in Python `range` function.

The `range` function can be iterated over to produce a regular sequence of numbers.

`range` can be used as `range(end_)`, or as `range(start_, end_)`, or as `range(start_, end_, step_)`. Here's the [official documentation](https://docs.python.org/3/library/functions.html#func-range) as well as an [easier-to-read explainer webpage](https://www.w3schools.com/python/ref_func_range.asp).

*Instructor*: Let's see how our code from above changes if we use range instead of defining the whole list by ourselves!

In [None]:
for num in range(10):
    print(num)

Now try it yourself!

*Instructor:* Try using a for-loop and the `range` function to print even numbers between 10 and 20 (you don't have to include 20).

In [None]:
#Your code goes here
for num in range(10, 20, 2):
    print(num)

Finally, recall that the **matplotlib** module can be used to plot data.

First, import the module - this is always the first step whenever you're using a python module, but it's easy to forget! Then, we use a magic command that makes the figure appear within the cell.

*Instructor:* Say we wanted to plot each element of `ourList` versus the square of each element. 

In [None]:
import matplotlib.pyplot as plt  #importing module into python and naming it plt
%matplotlib inline 

squareList = [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

plt.plot(ourList,squareList)
plt.show()

# Lecture 2 - NumPy, Functions, and Data

Today, we will learn about NumPy, learn how to define our own functions, and learn about handling data in Python.

1. An introduction to **numpy** and **numpy arrays** and a discussion of their usefulness in solving real problems
2. **Functions** and how to define new ones. 
3. Reading in **data** from text and numpy file formats, along with creating your own outputs to be used later
    

## A. Introduction to Numpy Arrays - Initialization and Advanced Indexing
<a id='intro'></a>

The Python `list` is a fast and flexible built-in data type, but because of its flexbility, it is limited in the operations we can perform on it. A popular scientific computing package called `numpy`, short for "numerical Python", can help by way of its incredibly powerful object type: the `array`.

*Instructor*: As we learned in the previous lecture, Python is a modular language. We can add tools and functionality as we need them, in the same way that we imported **matplotlib**. Today we'll be learning about a module called `numpy` (short for "numerical Python"). 

*Instructor:* First, let's create a regular old list to play with. **What do you think the code below will do?**

*Instructor*: Print out `c` to find out.

In [None]:
c = list(range(10))

*Instructor*:

**Poll:** What do you think the next block of code will do?

A. Square each element of the list

B. Cross product of c with itself

C. Dot product of c with itself

**D. Give an error**

Let's find out! 

**POLL** -- Wait for the instructor to open polling before running the next block of code!

In [None]:
d = c**2
print(d)

*Instructor:* It doesn't work! Let's take a minute to understand this error message and why regular Python lists don't have this functionality. Next, let's convert our list to a NumPy array and see what happens.

First, import the numpy module. We typically abbreviate it as `np`.

To convert the list to a numpy array, use `np.array()`.
<a id='array'></a>

In [None]:
# Import the numpy module
import numpy as np

In [None]:
# convert the list c to an array
c = np.array(c)

*Instructor*: **What will happen if we try to perform the same operation on our array that previously threw an error with our list?**

Use same poll as before

**POLL** -- Wait for the instructor to open polling before running the next block of code!

In [None]:
d = c**2
print(d)

*Instructor*: **What was the result of the above code?** (Discuss with the students)

<a id='ranges'></a>
There are a few easier ways to create arrays besides creating a list and turning it into a numpy array. These include:
* `np.arange(start_,stop_,step_)`
* `np.linspace(first_,last_,num_)`

(And the accompanying official documentation pages: [numpy.arange](https://numpy.org/doc/stable/reference/generated/numpy.arange.html), [numpy.linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html#numpy.linspace))

These create arrays of numbers within a range with a fixed step-size between each consecutive number in the array. You can try these out below.

`np.arange(start_,stop_,step_)` works just like the `range` function we introduced at the beginning of this lesson! But instead of the mysterious `range` object type, the numpy function returns a nice, neat numpy array.

*Instructor*: Numpy has excellent documentation, so let's take a look back at those manual pages before we do our examples. Learning to read documentation is a valuable skill in every programmer's toolbox; if you don't know how to do something, you need to know how to look it up! (Help them read the np.arange documentation page)

In [None]:
np.arange(0, 10, 1)

In [None]:
np.linspace(0, 10, 11)

*Instructor*: **What differences do you notice between the results from running the two previous cells?** (followed by) **How can we make them do the same thing?**

<a id='empty'></a>
Sometimes it is handy to create an array of all constant values, which can then be replaced later with data. This can be done in several ways by using the following commands:

* [`np.zeros(size_)`](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) To fill the array with zeros
* [`np.ones(size_)`](https://numpy.org/doc/stable/reference/generated/numpy.oness.html) To fill the array with ones
* [`np.empty(size_)`](https://numpy.org/doc/stable/reference/generated/numpy.empty.html) To fill the array with arbitrary values

These create arrays of the given size, filled with zeros, ones, or arbitrary values, depending on your specific needs. Great for initializing an array to store important data in later!

*Instructor*: Let's say we want to store 10 numbers in an numpy array for easy access in the future. To ready such an array, we call the function `np.zeros()` with the size of the array as an input variable.

In [None]:
data = np.zeros(10)
print(data)

*Instructor*: Retry the above block of code using `np.ones` or `np.empty`. **What does np.empty do? Is one of these methods better than another for any situation you can think of?**

<a id='indexing'></a>
We can also assign new values to elements of existing arrays, using the following "square bracket" notation. This is the same as the list indexing we taught you in Lecture #1!
> `array_name[index_number] = value` 

This command will replace whatever value is currently in the position corresponding to ``index_number`` in the array called ``array_name`` with the value stored in ``value``.

Recall that arrays are numbered starting from 0, such that

* Index of first position = 0
* Index of second position = 1
* Index of third position = 2
* etc.


*Instructor*: Try it out yourself below by changing and printing different indices in our `data` array defined above.

In [None]:
data[0] = #
print(data[1])

*Instructor*: Let's say you wanted the last element of the array, but you don't recall the size of the array. One of the easiest ways to access that element  is to use negative indexing.

**Negative indexing** is the same as normal indexing, but backward, in the sense that you start with the last element of the array and count forward.  More explicitly, for any array:

* array[-1] = last element of array
* array[-2] = second to last element of the array
* array[-3] = third to last element of the array
* etc

*Instructor*: Now let's create an array using `np.arange()` with 10 elements, and see if you can access the last element and the second to last element using negative indexing and print out those values.

### **Exercise**: Negative Indexing
<a id='neg_ex'></a>


Create an array with 10 elements using `np.arange()` and print out the last and second-to-last elements using negative indexing!

In [None]:
#Your code goes here
array = np.arange(0, 10, 1)
print(array)

print(array[-1])
print(array[-2])

Sometimes it's useful to access more than one element of an array. Let's say that we have an array spanning the range [0,10] (including endpoints), with a step size of 0.1. If you recall, this can be done via the `np.linspace()` or `np.arange()` functions.

<a id='slicing'></a>
In order to get a range of elements in an array, rather than simply a single one, use **array slicing**:

* `array_name[start_index:end_index]` To grab all of the values from `start_index` to `end_index - 1`
* `array_name[:end_index]` To grab all of the values up to `end_index-1`
* `array_name[start_index:]` To grab all of the values from `start_index` and beyond

In this notation, ":" means you want everything between your start and end indices, including the value to the left but excluding the value to the right.

*Instructor*: Let's define an array to play with so that we can test this array slicing notation.

### **Exercise**: `np.linspace` and `np.arange`
<a id='range_ex'></a>


Create an array named `x` of values from 0 to 10 (including 10) in steps of 0.1. *(Hint: use `np.arange` or `np.linspace`)*

In [None]:
#Your code here
x = np.linspace(0, 10, 101)
#OR
x = np.arange(0, 10.1, 0.1)


*Instructor*: **Who would like to give an example of how they constructed their array above?** (have the student read out what they did, type it in the code box above, do this for both linspace and arange)

*Instructor*: Based on the descriptions above, for array slicing, **what would each of the following lines of code do?** (Let different students answer, ask whether the bounds are inclusive for each one)

In [None]:
x[1:4]

x[90:]

x[:25]

## **Exercise 1**: Indexing and slicing 
<a id='slice_ex'></a>



So, let's say that you would want everything up to and including the tenth element of the array $x$. How would you do that?

(Remember, the tenth element has an index of 9)

In [None]:
#Your code goes here
x[0:10]
# OR
x[:10]

Now try to select just the first half of the array. 

In [None]:
#Your code goes here
print(x[:50])

Then, pick out middle sixty elements of the array.

In [None]:
#Your code goes here
print(x[20:80])

Let's try a few more. In the next block of code, perform the following actions on different lines:

* Access all elements of your `x` up to, but not including the 17th element
* Access the last 20 elements of `x`
* Create a new array named `y` that contains the 12th through 38th elements of `x`, including the 38th element


In [None]:
#Your code goes here
print(x[:17])
print(x[-20:])
y = x[12:39]

Finally, how would you get all of the elements in the array using colon notation?

In [None]:
#Your code goes here
x[:]

## B. Functions
<a id='sec_funcs'></a>


### More Numpy functions
<a id='numpy_funcs'></a>

The previous section introduces a built-in Python function, `range`, as well as a couple `numpy` module functions, `np.arange` and `np.linspace`, but there are many, many more functions that you'll encounter.
Functions are the most fundamental way to process data in Python; they take some inputs, which they may alter, and they (usually) return a result.

\begin{gather}
function(input) \rightarrow output
\end{gather}

*Instructor:* You can think about functions like some sort of machine. You read the instructions (documentation), it tells you what to put in, what will happen to what you put in, and what you'll get out. You don't necessarily need to know how it works.

You can use your toaster at home without knowing exactly how the circuitry works or how to build your own toaster, right? It's like a function! You put in bread, and you know that afterwards, your bread will be warm and toasted.

*Instructor:* Let's import `numpy` and try out their square root function. Run the code below.

In [None]:
import numpy as np

print(np.sqrt(25))

*Instructor:* Let's try to use another `numpy` math function. **What's $sin(\pi/2)$?** Print the result below.

You can either try to guess what `numpy` calls this function, or you can Google it.
Both of these are valid approaches that we use all the time.

Numpy defines some useful variables like `pi` and `e`.

### **Exercise:** Numpy trig functions
<a id='trig_ex'></a>


Find a `numpy` function for the mathematical $sin$ function and use it to print the value of $sin(\pi/2)$.

Once you've found $sin$, see if you can find $cos$. *(Hint: try your first guess)*

In [None]:
# the numpy package defines np.pi; it's just the precise value of pi stored as a "float". It's a variable you don't have to set yourself!
print("The value of pi is", np.pi, "and 2pi is", 2*np.pi)

#Your code here
print(np.sin(np.pi/2))

### **Exercise:** Combining Numpy range and trig functions
Use one of the `numpy` array-generating functions from the previous section to create an array of values from $0$ to $4\pi$ in incrememnts of $\pi/8$. Then evaluate the cosine of all the values in this array.

Use the plotting code included in the next cell to see what your array looks like.

What's the **frequency** of your cosine? Try to change the frequency in your expression for the cosine array. Re-plot it, did it work?

In [None]:
#Your code here
# 0 to 4pi in incrememnts of pi/8
x = np.arange(0, 4*np.pi, np.pi/8)
# cosine of x
y = np.cos(2*x)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(x, y)

<a id='built-in'></a>
In general, most common mathematical functions like sqrt, log, exp, sin, cos can be found in the numpy module.

For more information, the documentation of many **built-in functions** that can be applied to integers and floats (i.e. single numbers), as well as **numpy arrays**, can be found here: https://numpy.org/doc/stable/reference/routines.math.html

When in need of a mathematical function like one of the above, or even a more complicated one (like the "erf" function, or "Riemann zeta function", if you've ever encountered those), **never** spend time writing it yourself. Nearly every mathematical function one can think of already exists in some well-documented module. Check [numpy](https://numpy.org/doc/stable/reference/) and [scipy](https://docs.scipy.org/doc/scipy/reference/); google search "numpy `<function name>`" or "scipy `<function name>`" and you should find your desired function.

If `numpy` and `scipy` don't have your function, it is likely that someone has posted it on StackExchange or StackOverflow or somewhere.
Google search "python `<function name>`".

Copy+paste from the Internet is a very efficient way to code, and we do it all the time.
If you're worried about things like plagiarism, you should paste the link to the webpage where you found the code as a comment in your notebook.
Then you've cited your source!

In [None]:
# More numpy functions
print(np.sin(np.pi/2))
print(np.exp(np.pi)-np.pi)
print(np.sum(np.arange(5))) # 0 + 1 + 2 + 3 + 4 + 5 = ...

### Defining your own functions
<a id='user_funcs'></a>


Python allows you to define your own functions, too.
**User-defined functions** allow you to clean up your code and apply the same set of operations to multiple variables.
Organizing your code into functions is also a good way to write code that other people will use.

*Instructor:* So far, we have focused on learning **built-in functions** (such as from `numpy` and `matplotlib`), but what about **defining our own**? This allows you to **clean up** your code, and **apply the same set of operations to multiple variables** without having to explicitly write them out every time.

<a id='user-def'></a>
The outline for a function is:
```python
def <function name> (<input variables>):
    <some code here>
    return <output variables>
```

*Instructor*: As an example, let's say we want to define a function that takes the square root of a number. Let's check to make sure the number is positive first, so that we don't end up with an imaginary answer. 

In [None]:
#Defining a square root function
def sqrt(x):
    if (x < 0):
        print("Your input is not positive!")
    else: 
        return x**(0.5)

*Instructor:* Try the function on a positive number

In [None]:
#Your code here
sqrt(4)

*Instructor:* Try it on a negative number

In [None]:
#Your code here
sqrt(-4)

*Instructor:* Now let's see how `numpy` deals with negatives!

In [None]:
#Your code goes here
np.sqrt(-25)

When defining your own functions, you can also use **multiple input variables**.


*Instructor:* For example, if we want to calculate the **length** of a vector $(x,y)$, we can create a function that takes in the components $x$ and $y$ individually. 

In [None]:
def length(x, y):
    """Calculates the length of a 2D vector (x,y) using the Pythagorean theorem."""
    return np.sqrt(x**2 + y**2)

*Instructor:* Call this function on the vector (3,4), we should get 5. 

In [None]:
#Your code goes here
length(3,4)

<a id='multi-line'></a>
A note about that funny looking comment line in that `length` function: it's good Python etiquette to comment the functions you write.
Even if you never intend to share your code with anyone else, it is extremely useful to write yourself reminders about what your functions do.
You can even include an example (this is pretty common in professional documentation).

One common way to document the functions you write is with a "**multi-line comment**" immediately following the `def` line.
A multi-line comment starts and ends with three double quotation marks.

In [None]:
def my_function(arg):
    """
    This is my function. It's for doing this one important thing.
    This function needs one argument, which should be a single number.
    The function will return another number.

    Example:
    >>> my_function(3)
    7
    """
    return arg + 4


In this lecture, we've learned about numpy arrays, loops, and defining functions. You'll have a chance to test these skills in the following exercises!

## Exercise 2: Define a simple function

Define a function that prints every even-indexed element of an array.

In [None]:
#Your code here
def even(array):
    i = 0
    while i<=len(array):
        print(array[i])
        i = i+2

even([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

## Exercise 3a: Path Length Function
<a id='ex_pathlen'></a>


For a given set of points, the **pathlength** $L$ from $(x_0,y_0)$ to $(x_n,y_n)$ is given by the following expression,
\begin{gather}
L = \sum_{i = 1}^n \sqrt{ \left(x_i - x_{i-1}\right)^2 + \left(y_i - y_{i-1} \right)^2}
\end{gather}

What this quantity represents is the sum of the **lengths** between $(x_{i-1},y_{i-1})$ and $(x_i,y_i)$ for $i$ between 1 and $n$.

Write a function `pathLength` which computes $L$ given two numpy arrays `x_array` and `y_array` as input variables. You'll need this function later on to work on the challenge problem.

In [None]:
def pathLength(x_array,y_array):
    #Your code goes here
    if len(x_array) != len(y_array):
        raise Exception("Vectors do not have the same length")
        
    n = len(x_array)
    i = 1
    L = 0
    while (i < n):
        L = L + length(x_array[i]-x_array[i-1],y_array[i]-y_array[i-1])
        i = i+1
    return L

Test your function on the example below. Your answer should come out to $4\sqrt{2} \approx 5.657$

In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 3, 4, 5])
pathLength(x,y)

## Exercise 3b: Fix my path length function

The instructors attempted Exercise 2a, but not everything went quite right. Try to de-bug their code!

*Instructor:* Look, I tried that exercise I just gave you, and it didn't go so well for me.
See if you can figure out what's wrong with my function.

In [None]:
# Version as broken code
def pathLength(x_array, y_array, z_array): # too many arguments
    """
    This function returns the path length given the x and y arrays
    It will print out a single number

    It doesn't work right now...
    """
    if len(x_array) == len(y_array): # should be !=
        raise Exception("Vectors do not have the same length")
        
    n = len(x_array)
    i = 1
    L = 0
    while (i < n):
        L = L + length(x_array[i] - x_array[i-1], y_array[i] - y_array[i-1])
        # needs increment condition i = i + 1
    return L

# Test case
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 3, 4, 5])
pathLength(x,y) # Should print approximately 5.657

There are always multiple valid ways to write a function. There are no wrong answers as long as the code works as intended.

Here's that same function written using Numpy functions and advanced indexing.


*Instructor:* Try running this cell. Same answer, right?

In [None]:
import numpy as np

def pathLength(x_array, y_array):
    return np.sum(np.sqrt( (x_array[1:] - x_array[:-1])**2 + (y_array[1:] - y_array[:-1])**2 ))

# Test case
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 3, 4, 5])
pathLength(x,y) # Should print approximately 5.657

*Instructor:* Play around in the code block above and see if you can figure out what those `[:-1]` indexes do.

## C. Loading And Saving Data Arrays
<a id='load_save'></a>


Up to now, the data that you have handled has been self-defined: you have constructed an array and fill that array with values that you operate on all in the same code. Often, in scientific programming, this is not the case. One program or piece of equipment creates and stores data, while another loads, operates on, and analyzes it. Thus it is essential to learn the ways that one can **load** and **save** data in python.

*Instructor*: 
While there are many ways to import data, and some which are a bit complicated at times, we're going to teach you the most basic, most general, and most useful ways.

First, if you haven't already, import the matplotlib module.

In [None]:
import numpy as np
#%matplotlib nbagg
import matplotlib.pyplot as plt


<a id='loadtxt'></a>
The simplest way to **load data from a plain-text file** with numpy is using [`numpy.loadtext`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html). At its simplest, the usage is:
* data_array_name = `np.loadtxt('path_to_file')`

  (Note that the path should be listed as a string!)
  
From the documentation page, it seems there a lot of **optional arguments** that let you specify more precisely how you want to read the data, but at its most basic, the function will work the way we have shown above.

*Instructor*: Now that we have brought it up, let's talk briefly about how to specify the **path** to a file in python (or in general on a terminal).

<a id='path'></a>
A file's **path** is its specific location in the file structure of your computer. This is most often defined relative to your current place in the file structure. To specify the folder you are currently working in (your **current working directory**), use a single period followed by a forward slash. To specify a subfolder or file within a folder use the name of the subfolder or file, always following a folder name with a forward slash.

* `./` = Path to the current folder that you are working in.
* `./subfolder1/` = Path to a folder that exists inside of your current folder, named "subfolder1"
* `./subfolder1/myfile` = Path to a file that exists inside of a folder that exists inside of your current folder
* `../` = Path to the folder that contains the folder you are currently working in

The above works as written for Linux and Mac systems. Windows systems are a little different, which you can [read more about](https://www.howtogeek.com/137096/6-ways-the-linux-file-system-is-different-from-the-windows-file-system/) if you'd like, but the general directory structure lessons remain the same.

*Instructor*: Now then, let's say we are doing a timing experiment, where we look at the brightness of an object as a function of time. This is actually a very common type of measurement that you may do in research, such as looking for dips in the brightness of stars as a way to detect planets.

Our data file is stored in a text file named `timeseries_data.txt` in the directory `lecture2_data`, which exists as a subfolder of the one we are currently working in.

*Instructor*: Let's load in the data together. **Who wants to try to specify a part of the path to our data file?** (Try to get at least 2 people to specify part of the path)

In [None]:
path = "Your_Path_Here"
timeseriesData = np.loadtxt(path)
###Solution
path = "./lecture2_data/timeseries_data.txt"
timeseriesData = np.loadtxt(path)

<a id='shape'></a>
One handy thing you can do after loading data into a numpy array is to use Python to find the dimensions of the array. This is done by using the [``array.shape``](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html) method like so.

In [None]:
timeseriesData.shape

*Instructor*: Looking at the output from the above cell, **how many rows and how many columns does our time series data have?** (answer: 2 rows, 1000 columns)

This is an example of a **2-dimensional array**, also known as a **matrix**. 

The first row of `timeseriesData` gives the time stamp of when each measurement was taken, while the second row gives the measured value of the brightness at that time.

*Instructor*: For ease of handling this data, we can take each of these rows and create new arrays out of them.  Let's do just that.

Since `timeseriesData` is 2-dimensional, each element has two indices. 

In [None]:
t = timeseriesData[0,:] # this represents the time
signal = timeseriesData[1,:] # this represents the brightness

<a id='matrix'></a>
By convention, we first specify the row index followed by the column index.
- `array_name[n,:]` is the n-th row, and all columns within that row.
- `array_name[:,n]` is the n-th column, and all rows within that particular column.

*Instructor*: **Let's see what the data looks like** using the `plot()` function with `t` as your x-axis and `signal` as your y-axis.

### **Exercise**: Plotting time-series data
<a id='ex_timeseries'></a>


Plot the loaded data with time on the x-axis and the signal on the y-axis


In [None]:
#Your code goes here
plt.plot(t,signal)

*Instructor*: Looking at our data, you see clear spikes that jump well above most of the signal. (I've added this to the data to represent outliers that may sometimes appear when you're messing with raw data, and those must be dealt with). In astronomy, you sometimes have relativistic charged particles, not from your source, that hit the detector known as cosmic rays, and we often have to remove these.

<a id='conditional'></a>
In order to choose only the parts of an array that meet some criteria, you can use **conditional indexing** in place of normal indices. This involves taking a **conditional statement** (more on those later) and testing whether it evaluates to True on each element in the array.

This gives an array of Booleans, which you can use as **logical indices** to select *only* the entries for which the logical statement is `True`. 

*Instructor*: There are some very complex codes that handle cosmic rays, but for our purposes (keeping it easy), we're going to just set a hard cut off of, let's say 15. **Let's do that together in the example below.** (walk them through the example)

In [None]:
cutOff = # Cutoff Value Here #
signalFix = signal[signal < cutOff]
### Solution ###
cutOff = 15.
signalFix = signal[signal < cutOff]

*Instructor* In this case, the **conditional statement** that we have used is `signal < cutOff`. 

**Conditional indexing** keeps the data that the programmer deems "good" by their specified criteria.

*Instructor:* Print out the `signal < cutoff` array to see exactly what it is you're using as an index. Then play around with the statement, change `<` to `>`. Before you print that out, how do you think the array and its values will change? (e.g. will the array change shape, what will happen to the values?)

In [None]:
print(signal < cutOff)

*Instructor*: We can also do the same for the corresponding time stamps, since `t` and `signal` have the same length. **What conditional statement should we use so that we get the values for `t` that correspond to the values we kept in `signal`?**

In [None]:
tFix = t[# Conditional Statement Here #]
### Solution ###
tFix = t[signal < cutOff]

### **Exercise**: Plotting filtered data

Try to plot the "fixed" data!

In [None]:
#Your code goes here
plt.plot(tFix,signalFix)
plt.show()

*Instructor*: Now that you have your data all cleaned up, it would be nice if we could save it for later and not have to go through the process of cleaning it up every time.  Fear not!  Python has you covered.

First, package your two cleaned up arrays into one again. This can be done simply with the `np.array()` function.

In [None]:
dataFix = np.array([tFix,signalFix]) 

<a id='save'></a>
Here we cover two main ways to **save data files** for use again later, one that is Python-specific, and the other a simple text format.

* [`np.save('file_path', array_name)`](https://numpy.org/doc/stable/reference/generated/numpy.save.html) - Creates a `.npy` file (Python readable only!)
* [`np.savetxt('file_path', array_name)`](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html#numpy.savetxt) - Creates a `plain text` (or `.txt`) file (more generally readable)

The basic syntax is pretty much the same for each. What differs is the type of file that the functions create. Below is an example of how each function can be called to store the same data.

In [None]:
np.save('./lecture2_data/dataFix.npy',dataFix)
np.savetxt('./lecture2_data/dataFix.txt',dataFix)

After saving a data file, you can load it up again using `np.loadtxt()` and `np.load()` for .txt and .npy files respectively. 

*Instructor*: We used `np.loadtxt()` above, and `np.load` works the same way. So, let's load in the .npy file and see if our data was saved correctly.

In [None]:
data = np.load('./lecture2_data/dataFix.npy')
t = data[0,:]
signal = data[1,:]
plt.plot(t,signal)
plt.show()

### **Exercise**: Load data from a .txt file

See if you can do the same thing, but with the .txt file that we saved.

In [None]:
#Your code goes here
data = np.loadtxt('./lecture2_data/dataFix.txt')
t = data[0,:]
signal = data[1,:]
plt.plot(t,signal)
plt.show()

## D. Loading data files automatically
<a id='sec_load'></a>


We can combine what we learned about loops to make our data workflow more efficient. Suppose we have a set of data saved in separate text files that we would like to load automatically. For our example, in `./lecture2_data/` you will find files `c1.dat`, `c2.dat`, `c3.dat`, `c4.dat`, `c5.dat`, `c6.dat`. 

Rather than loading each of these files individually, you can use a for (or while) loop, constructing a string at each iteration corresponding to each of these files. 

In Python you can use `+` to concatenate strings together, as shown below:

*Instructor:* Run the cells below to see what they print out.

In [None]:
first_string = 'a'
second_string = 'b'
print(first_string + second_string)

You can also cast an integer to a string using the `str` command.

In [None]:
first_string = 'a'
second_string = str(1)
print(first_string + second_string)

## Exercise 4: Load multiple files
<a id='exercise_load'></a>

Your goal in this task is to write some code to load in this set of 6 `.dat` files as numpy arrays. 

We will define an empty list (call it `datalist`) that will store the data. 


In [None]:
datalist = []

This is an odd idea, defining a list variable without any elements, so instead think of it as a basket without anything inside of it yet. We will use the `append()` class function to fill it.

Next, we call `np.loadtxt` on a single `.dat` file and add it to `datalist` using the command

> `datalist.append(loadedFile)`

where `loadedFile` is the variable we've assigned the file to after loading it in. 

In [None]:
loadedFile = np.loadtxt('./lecture2_data/c1.dat')
datalist.append(loadedFile)

*Instructor:* Now it's your turn. Can you figure out how to load the rest of the data files into `datalist` automatically?

In the cell below, use a loop of some kind to load the rest of the files and add them to `datalist`.

Hint: The names of the remaining files are are `c2.dat`, `c3.dat`, `c4.dat`, `c5.dat`, and `c6.dat`. What is the only thing that changes among these names? Can you think of a way to generate this part separately and combine it with the rest of the string?

In [None]:
# Your code here
i = 2
while i <= 6:
    datalist.append(np.loadtxt('./lecture2_data/c' + str(i) + '.dat'))
    i = i+1

This is **just one way** to load and save multiple files; there are lots of different ways, and none of them are right or wrong. Depending on your project and the types of data you will be using, your adviser might teach you different ways to load in your data. Some of us may use the [`glob` module](https://docs.python.org/3/library/glob.html) to accomplish the task above, for example. There are also lots of different file formats besides ASCII (plain human-readable text, such as `.dat` or `.txt`) or Numpy's `.npy`: some of us `pickle` our data, some of us use FITS files more often, and some of us save to `.csv` files. They all have their advantages and disadvantages, and your adviser will tell you why you might use a particular one for your project (and if they don't, you can ask!).

So, to summarize, not only can you manipulate arrays, but now you can save them and load them. In a way, those are some of the most important skills in scientific computing. Almost everything you'll be doing requires you know this, and now that you've mastered it, you're well on your way to being an expert in computational physics and astronomy!

# **Summary/References**
<a id='summary'></a>

## Arrays
* Create [arrays](#array) from lists using [`np.array(listname)`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)
* Create [evenly spaced arrays](#ranges) of numbers using:
  * [`np.arange(start, stop, stepsize)`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html)
  * [`np.linspace(start, stop, num_points)`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)
* Create ["empty" arrays](#empty) using:
  * [`np.zeros(size)`](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) for an array of all 0
  * [`np.ones(size)`](https://numpy.org/doc/stable/reference/generated/numpy.ones.html) for an array of all 1
  * [`np.empty(size)`](https://numpy.org/doc/stable/reference/generated/numpy.empty.html) for a quickly generated array of nonspecific values
* [Index](#indexing) and [slice](#slicing) your arrays using brackets, colon notation, and [conditional statements](#conditional), e.g.:
  * `myarray[0]` for the first (zeroth) element
  * `myarray[-1]` for the last element
  * `myarray[:]` for the whole array
  * `myarray[5:]` for everything after and including the 5th element (element at index 5, thus the 6th value in the array)
  * `myarray[:20]` for everything up to but not including the 20th element
  * `myarray[myarray > 10]` for all values greater than 10 in your array
  * `mymatrix[:, 0]` for all values in the first column of a [matrix](#matrix) (2D array)
  * See [here](https://www.google.com/url?q=https://stackoverflow.com/a/4729334&sa=D&ust=1608320322998000&usg=AFQjCNGmI429xTVOP87NgDrSRyL3xRkVgg) for another series of explanations of slicing and indexing lists/arrays in Python!
* Determine the [size and shape](#shape) of your array using [`myarray.shape`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html)


## Functions
* Use (and find) [built-in](#built-in) numpy functions and variables
  * [`np.sin(x)`](https://numpy.org/doc/stable/reference/generated/numpy.sin.html), [`np.cos(x)`](https://numpy.org/doc/stable/reference/generated/numpy.cos.html)
  * [`np.exp(x)`](https://numpy.org/doc/stable/reference/generated/numpy.exp.html), [`np.log(x)`](https://numpy.org/doc/stable/reference/generated/numpy.log.html)
  * [`np.sqrt(x)`](https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html)
  * [`np.pi` (and other constants)](https://numpy.org/doc/stable/reference/constants.html)
* [Create](#user-def) functions using the following format:
```python
def <function name> (input1, input2, ...):
    <some code here>
    return output1, output2, ...
```
* Make [multi-line comments](#multi-line) surrounded by `"""` triple quotes
  * See the first answer at [this page](https://stackoverflow.com/questions/7696924/is-there-a-way-to-create-multiline-comments-in-python) for an example.

## Loading & Saving Data
* Define a [path to a file](#path) with notation like `"./subfolder/filename"`, where `.` means your current directory
  * See [this page](https://docs.oracle.com/javase/tutorial/essential/io/path.html) for a brief description of paths (written for java, but still applicable!)
  * See the [pathlib](https://docs.python.org/3/library/pathlib.html) module for how it's supposed to be done in Python3, (and [this](https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f) beginner-friendly article) if you're interested!
* [Load data](#loadtxt) from a text file with [`np.loadtxt(FilePath)`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html)
* [Save data](#save) to a text file with [`np.savetxt(FilePath, ArrayName)`](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html)
* (And do the same with `.npy` [Python specific files](#save) using [`np.load`](https://numpy.org/doc/stable/reference/generated/numpy.load.html) and [`np.save`](https://numpy.org/doc/stable/reference/generated/numpy.save.html) and the same arguments)
  * (AND use things like the [pickle](https://docs.python.org/3/library/pickle.html) and [dill](https://pypi.org/project/dill/) modules, once you want to get *fancy*)
   * Also see these usage examples/tutorials for [pickle](https://www.datacamp.com/community/tutorials/pickle-python-tutorial) and [dill](https://stackoverflow.com/questions/42168420/how-to-dill-pickle-to-file), if you're interested
* Use lists and loops to load several files with regular naming schemes
  * This includes [string concatenation](#sec_load) with the format: `'str1' + 'str2' = 'str1str2'`
  * See [exercise in Section D](#exercise_load)