# PHYS 4025/8025: Introduction to python

This notebook is intended to give a (very brief) introduction to using the python programming language for data analysis. It's impossible to cover everything here, so you should expect to continue learning new topics as we work through data analysis exercises over the course of the semester. If you are feeling stuck, don't be afraid to turn to google -- the internet is a programmer's best friend!

We will cover the following topics:
* Jupyter notebooks
* Python basics: built-in functions, simple data types (strings, ints, floats), containers (tuples, lists, dicts), if statements, for loops
* Writing functions
* Modules
* numpy and matplotlib

There is lots of example code written below, but just reading through it isn't a great way to learn. Instead, you should look at the examples and then try to modify them, extend them, or repeat them with variations. Copy and paste is not always helpful when you are learning a programming language -- typing things out for yourself can be a good way to remember.

Programming can be very frustrating when you are starting out, but a bit of experience goes a long way. If you have done some programming in other languages, then I expect you will be able to quickly adapt to python. If you don't have any previous programming experience, then the skills you learn in this class will surely be useful for the next programming challenge you encounter, even if you are using some other language. 

## Jupyter notebooks

In this class, we will do all of our data analysis using the python programming language in the [Jupyter notebook](http://jupyter.org/) environment. In fact, you are looking at a Jupyter notebook right now. Jupyter is a web application that you access using your web browser. Visually, the notebook looks like a webpage that you can edit. It combines formatted text, images, and code. Underneath the hood, there is a python interpreter running on the host machine (in the OSC cluster). When you write new code or modify existing code, you can instruct the host machine to run it and then inspect the results, which might include text or numerical output, plots, etc.

Jupyter is a great tool for data analysis because you can combine text that explains your analysis process, the code that actually runs your analysis, and plots that show your results. More traditional methods of publishing (like academic papers) usually describe the process and show results but omit the detailed code, which could make it much harder to duplicate results or find mistakes.

The notebook consists of a series of cells. You can create a new cell by pressing the "+" (plus) button in the toolbar near the top of the screen. There are different types of cells, specifically "Markdown" cells for text or "Code" cells for python code. If you double-click on an existing cell, you can edit it. Try double-clicking on this cell and modifying the text. When you are done with a cell, you can "run" it by clicking on the right-arrow (play) button in the toolbar, or by typing shift-enter. When you run a Markdown (text) cell, all it does is properly format the text. Markdown refers to a simple and intuitive way of writing formatted text. You can learn about it [here](https://daringfireball.net/projects/markdown/syntax), but any text will be fine for our purposes.

Once you have tried modifying this Markdown cell, click on the plus button to create a new cell. By default, the new cell will be a Code cell, but you can change that using the toolbar menu. When you run a Code cell (by clicking the play button or typing shift-enter), the python interpreter will execute any code in that cell. Type the following python code in the new Code cell that you created and then try running it.

    print("Hello world")
    
One nice feature that you will notice is that the Code blocks will help you out by automatically trying to close your parentheses and quotation marks. It will also highlight the code, so you can quickly see the difference between the function `print` and the string `"Hello world"`.

## Python basics

In [None]:
# Now let's write some code!
# The first thing to learn about are comments -- python will ignore any line that starts with '#' (pound sign).
# If you are writing some complicated code, it is extremely helpful to leave some comments that explain what is
# going on.
# Since this is a Jupyter notebook, we also have the option to create a Markdown cell and writing text there, 
# but a comment is usually the easiest way to make a quick note.

# This cell contains only comments. If you run it, nothing will happen!

### Built-in functions

Above, you created a Code cell and ran `print("Hello world")`. `print` is one of the basic functions built into python. When you use a function, it will typically require one or more inputs. You pass the inputs into the function by putting them in parentheses that follow the function name. For a function that takes multiple inputs, just separate them with commas. Even if a function requires no inputs, you still need to have parentheses (with nothing inside) to tell python that it's supposed to be running a function.

Try running the `print` function with several inputs. Those inputs can be strings, like `"Hello world"`, numbers, or anything else.

In [None]:
# Use the print function to print a bunch of stuff here.


Another built-in function is `len`, which tells you the length of the thing you give it as input. This only works for certain types of inputs -- if you try running `len(7)`, it will give an error because a number doesn't have a particular length. However, you can calculate the length of a string of characters, like `"Hello world"`.

Try using `len` to calculate the length of some character strings (make sure to put quotes around them) and print the results. You can nest the `len` function inside the `print` function so that it prints the numerical result. Or you can take advantage of a neat feature of the Jupyter notebook -- it will automatically print the result of the last line in any Code cell, even if you didn't use a print function. 

In [None]:
# Use the len function with the print function to get the length of some strings.


`len` is an example of a function that returns a value (the length of the string or other input object). You can assign this value to a variable using '=' (equal sign). When you are assigning a value to a variable, the name of the variable goes on the left and the value goes on the right.

In [None]:
# Quick example of variable assignment.
# First, let's assign an integer value to var1
var1 = 12
print(var1)
# Next, let's assign the output of the len function to var2
var2 = len("the number nineteen")
print(var2)


One more super useful function is `help`. This function will display documentation about whatever you give it as an input. If you are having trouble figuring out how to use some new function, try plugging it into `help`.

In [None]:
# Run this cell to learn about the len function.
help(len)


### Strings

We have already started using strings. Python doesn't care whether you use single quotes (`'Hello world'`) or double quotes (`"Hello world"`), so long as you use the same type of quotes to start and end the string.

You can combine strings just by adding them together ("+" sign):

In [None]:
# Add together several strings and print the result
print("PHYS" + "4025" + "/" + "8025")


Strings are an example of a more complex object that includes "methods", which are functions that are bound to the object. You can access methods with a dot (or period) '.' following the object. Here are a couple of example methods for strings:

In [None]:
# The split method will break up a string anywhere it finds a space (i.e. breaks it into words). 
# It returns the separate words as a list of strings. Like other functions, the split method is 
# followed by parentheses that contain the input arguments. But this is an example of a function
# that doesn't need any inputs.
long_string = "Using python to study astrophysics"
print(long_string.split())

# The replace method requires two arguments: the segment of the string to be replaced and the 
# new segment that you want to replace it with.
print(long_string.replace("astrophysics", "astronomy"))

# The format method will search through the string for a set of curly braces '{}' and replaces 
# them with the function argument. It also has powerful capabilities to format the newly added
# text.
greeting_string = "My name is {}"
name = "Colin"
print(greeting_string.format(name))


### Numbers

We will be doing lots of calculations, so numbers are going to be important. Python has all of the usual arithmetic operations built in: you can add (`+`), subtract (`-`), multiply (`*`), and divide (`/`) numbers. Like many other programming languages, python distinguishes between integers and floating-point (decimal) numbers. If you write a number like `2`, then it will be interpreted as an integer, but if you write `2.0` it will be interpreted as floating-point.

However, python3 (which we are using) isn't too much of a stickler about this distinction. For example, if you divide one integer by another then the result will automatically be switched to a floating-point number if the answer is not a round integer. In many other programming languages (including python2!), calculating `2 / 3` will return `0` because integer arithmetic only counts the whole numbers. If you want integer arithmetic in python3, you can use a double divide sign, i.e. `2 // 3` will return `0`.

To raise a number to a power, python uses a double-multiply symbol `**`. For example, `4**2` equals `16`.


In [None]:
# Try dividing 2 by 3 and printing the result.
# Also try some other basic arithmetic while you're at it.
print(2 / 3)
print(2 // 3)


### Lists

A list is just a container that holds multiple items in order. You can have a list of numbers, a list of strings, or a mixed list that has entries of many different types. You can create a list simply by separating multiple items with commas and putting it in square brackets '[ ]'. 

In [None]:
# Create a list using square brackets.
mixed_list = [12, "a string", 3.14159, "another string"]
print(mixed_list)


You can use square brackets to access an specific entry in a list. The entries are numbered starting with zero. You can also update the entries of a list by assigning values to them. You can also extend the list using the `append` method.

In [None]:
# Print entries 0 and 3.
print(mixed_list[0])
print(mixed_list[3])
# Update entry 3, then print the updated value.
mixed_list[3] = 15
print(mixed_list[3])
# Now, append a new entry.
mixed_list.append("appendix")
print(mixed_list)


We can quickly make a list of numbers using the `range` function (except in python3, `range` actually returns a `range` object, which is slightly different from a list... but we'll ignore that). If you run `range(100)`, it will give you a list of every number starting from 0 and ending *just before* 100 (so it includes 99 but not 100). You can also specify the start and end points for range: `range(5, 15)` gives you all the numbers from 5 to 15 (but not including 15). Finally, you can also choose to specify the step size: `range(0, 10, 2)` gives you the numbers from 0 to 10 in steps of 2. Try it out.

In [None]:
# Three different ways to get lists of numbers from the range function.
r0 = range(10)
print(r0)       # This prints the range object.
print(list(r0)) # This converts the range object into a list and then prints values in the list.
print(list(range(3, 10)))
print(list(range(0, 10, 2)))


Now that we can use `range` to quickly generate some long lists, we can learn about slicing. To get a slice of a list, you use the square brackets, just like when we were accessing individual list entries. However, slicing allows you to access many entries at once. It's easiest to explain with some examples.

In [None]:
# Start out with a long list of integers from 0 to 100.
# It's not really necessary to convert this range to a list, 
# but I want to print the actual values below.
long_list = list(range(100))

# Access a single entry.
print(long_list[20])
# Now, access a slice containing entries 20 through 25.
# NOTE: Just like the range function, we don't actually get the value at the end of the slice!
print(long_list[20:25])

# If you want your slice to start at a particular entry and then just continue to the end of the
# list, just leave out the end limit of the slice.
print(long_list[92:])

# Also like the range function, we can select a different step size for our slice.
print(long_list[0:10:3])
# And the step size can be negative, if we want to count backwards. In this case, I left out the
# second value in my slice specification so that it would count all the way back to the start of
# the list.
print(long_list[5::-1])

# All of these examples have been using a list where the list entries are numerically equivalent
# to their positions in the list, but that doesn't have to be the case.
jumbled_list = [8, 2, 6, "ten", 1]
print(jumbled_list[1:4])


### Tuples

Tuples are another type of container in python, similar to lists. The major difference between tuples and lists is that once you create a tuple it can't be modified. So you can access the different entries, but you can't update them and you can't add new items to a tuple. There are reasons why this is useful, but we will generally be using lists (and numpy arrays that behave like lists).

Tuples are created using parentheses rather than square brackets. You still use square brackets to access entries of a tuple, however.

In [None]:
# Quick tuple example.
string_tuple = ("entry0", "entry 1", 2.0, 3)
print(string_tuple[1])


In [None]:
# But this will cause an error if you run it
string_tuple[2] = "some updated value"


### if statements

When we are writing more complicated programs, it is necessary to have conditional statements, like "if X is greater than 10 do this, otherwise do that." First we will need operators to do boolean arithmetic. Boolean arithmetic means that you are doing some calculation where the answer is either `True` or `False`. For example, `2 < 3` is `True` but `2 < 1` is `False`.

Besides the less than `<` operator used above, we can also use greater than `>`, is equal to `==` (note that there are *two* equal signs), less than or equal to `<=`, greater than or equal to `>=`, and not equal to `!=`. We can also combine boolean expressions using `and` and `or`, or invert them using `not`.

Try evaluating some boolean operations.

In [None]:
# Is 2 equal to 3?
print(2 == 3)
# Is 2 *not equal* to 3?
print(2 != 3)


Now that we have some boolean operations, we can write an `if` statement. In python, this consists of the word `if`, followed by the boolean expression, and then a colon `:`. If the boolean expression is `True`, then python will evaluate any subsequent lines *that are indented*. Anything that is not in the indented block is not part of the if statement and will be evaluated normally, regardless of whether the boolean expression is `True` or `False`.

Here are some examples.

In [None]:
# The expression is false, so we don't evaluate the indented block.
if 2 > 10:
    print("This text will not be printed.")
print("This line is not indented, so it is printed normally.")

some_number = 2
# This expression is true, so we evaluate the indented block.
if some_number == 2:
    print("This line will be printed.")


More complicated conditional statements can be built using the `elif` (else-if) and `else` commands.

In [None]:
some_number = 4
if some_number > 4:
    print("The number is greater than 4")
elif some_number < 4:
    print("The number is less than 4")
else:
    print("The number must be equal to 4")
    

### for loops

We will frequently want to run the same operations over and over with slight variations. This can be done using a `for` loop. In a `for` loop, you take a sequence of values (like a list) and then just step through them. At each step of the for loop, it will execute any commands found in an indented block.

In [None]:
# Step through each item in a list.
greek_alphabet = ["alpha", "beta", "gamma", "delta", "epsilon"]
for letter in greek_alphabet:
    print(letter)
print("This line is not part of the loop, so only runs once.")


In [None]:
# Often we have some ordered lists and we want to just step through the entry numbers.
# Then the range and len functions are very useful.
array_1 = [2.0, 3.0, 4.0]
array_2 = [7.0, 1.0, 1.0]
# Now let's add these two arrays of numbers.
for i in range(len(array_1)):
    print("{} + {} = {}".format(array_1[i], array_2[i], array_1[i] + array_2[i]))


## Writing functions

Good code should be well structured and logical. This makes it easier to understand and much easier to find bugs. One way that we will structure our code is by writing functions to do certain tasks that will be repeated again and again. 

We have already seen several functions, like `len` and `range`. The key aspect of these functions is that they take some inputs and then return one or more output values.

To write a function in python, we start with the `def` statement (short for "define"), followed by the name of the function, then a list of the input variables in parentheses. The body of the function is another indented block. In that block we use the input variables to carry out our task. At the end of this indented block, there is usually a `return` statement, followed by a list of the output values to return.

In [None]:
# Here is a stupidly simple example of a function.
def multiply(x, y):
    return x * y

# Now, let's try it out.
print(multiply(2, 3))


In [None]:
# Here is a less stupid function to take the dot product of two vectors, a and b.
def dot_product(a, b):
    # Check to make sure that a and b have the same length.
    if len(a) != len(b):
        print("ERROR: vectors a and b must have the same length!")
        return None 
        # If the lengths are different, then the function ends here.
    # But if the lengths are the same, we can continue.
    x = 0.0
    for i in range(len(a)):
        # Multiply the corresponding entries of vectors a and b and add it up.
        x = x + a[i] * b[i]
    # After the for loop is finished, we can return the result.
    return x

# Now, let's try it out.
a = [0.5, 1.0, 2.0]
b = [1.0, 1.0, 1.0]
print(dot_product(a, b))
# Example with orthogonal vectors.
a = [1.0, 1.0, -2.0]
b = [0.0, 2.0, 1.0]
print(dot_product(a, b))


In [None]:
# Here is an example of a function that returns two values.
# Finds the minimum and maximum values in an array of numbers.
def min_max(array):
    # Start with first entry of the array.
    amin = array[0]
    amax = array[0]
    # Now, loop through the remaining entries and update min and max as we go.
    for value in array:
        if value < amin:
            # Update min value.
            amin = value
        if value > amax:
            # Update max value.
            amax = value
    # Done with the loop, return the result.
    return (amin, amax)

# Let's try it out.
array = range(30)
(amin, amax) = min_max(array)
print(amin, amax)


## Modules: numpy and matplotlib

The most important features of python are not the built-in functions, but rather the countless modules that have been written to provide useful functions and object types. For almost any computing challenge, you can find a well designed and well documented module that solves your problem. We are going to focus on two specific modules that will be used heavily in the class.

[`numpy`](http://www.numpy.org/) (short for "numeric python") provides a new container object called the `ndarray`. This is similar to a list, except that it only contains numerical data and it can be multi-dimensional. For example, a vector is a one-dimensional array and a matrix is a two-dimensional array. `numpy` contains many other useful features, such as random number generators.

[`matplotlib`](http://matplotlib.org/) contains tools for making plots. We are specifically going to focus on `pyplot`, which is a submodule of `matplotlib` with a simpler interface.

Let's start out with `numpy`. To use this module in your python code, you just `import` it. After you have imported a module, you can access its functions the same way that we accessed methods of the string or dict objects, using the dot `.` operator. So if you want the `logspace` function in `numpy`, you would use `numpy.logspace`. This could lead to typing `numpy` over and over again, so everyone typically shortens it from `numpy` to `np`, as follows:

In [None]:
import numpy as np
# Now we can access all of the numpy functions and classes using the 'np' shorthand.

# The logspace function creates an array of logarithmically-spaced values.
print(np.logspace(-1, 1, 11))


We will use the `numpy` array as our general purpose data container. Let's quickly explore some properties of it.

In [None]:
# We can convert a list of numbers to an array.
list_of_numbers = [12.0, 4.1, 8.5, 9.9, 16.7]
array_of_numbers = np.array(list_of_numbers)

# We can get the shape of the array. This is similar to the len function, except the array can be 
# multi-dimensional (but in this case it only has one dimension).
print(array_of_numbers.shape)


In [None]:
# The arange function is very similar to range, except that we can get non-integer value.
new_array = np.arange(0.0, 1.0, 0.1)
print(new_array)


In [None]:
# The zeros function creates an array filled with zeros of any specified shape.

# One-dimensional case.
zeros_1d = np.zeros((10,))
print("1D array:")
print(zeros_1d)

# Two-dimensional case.
zeros_2d = np.zeros((3, 4))
print("2D array:")
print(zeros_2d)

# Three-dimensional case.
zeros_3d = np.zeros((3, 2, 2))
print("3D array:")
print(zeros_3d)


A very nice feature of `numpy` arrays is that we can do arithmitic with them, without writing for loops to step through every entry. For example, if you have two arrays that are the same size, then you can add them, subtract them, multiply them, or divide them. All of these operations will automatically function entry by entry. `numpy` also includes trig functions, exponentials, and logarithms, which will work seamlessly on arrays.

In [None]:
# Multiply an array by a number.
a = np.array([1.0, 2.0, 3.0])
print(2 * a)

# Subtract one aray from another.
b = np.array([2.0, 3.0, 4.0])
print(b - a)

# Multiply or divide arrays.
print(b * a)
print(b / a)

# Take the square root of an array.
x = np.array([1.0, 4.0, 9.0, 16.0, 25.0])
print(np.sqrt(x))

# Exponentiate an array.
print(np.exp(a))


Since the `numpy` array can be multidimensional, we can slice it along any dimension. You still use square brackets, then select the slice for each dimension separated by commas. If you want to select *all* of the entries along a particular dimension, you can simply use a colon `:` without any start or end value. This sounds a bit complicated, so let's look at an example.

In [None]:
# Create a 4x3 array (two-dimensional).
new_array = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
print(new_array)
print("array shape = {}".format(new_array.shape))

# Select column 0 only.
print("column 0 = {}".format(new_array[:,0]))
# Select row 0 only.
print("row 0 = {}".format(new_array[0,:]))
# Select column 0 starting from row 1 (not row 0).
print("column 0 starting from row 1 = {}".format(new_array[1:,0]))


Now that we have arrays for data containers, we can make plots using `matplotlib`. This is a much better way to inspect data than using `print`!

We are going to import the `matplotlib.pyplot` plotting module, but we will refer to it as `plt` for short. Besides the import, I have added one more line that actually isn't a python command. "`% matplotlib inline`" tells the Jupyter notebook that any plots that are generated should be displayed in the notebook. (Otherwise we would have to write our plots to an image file and then open that file for viewing.)

In [None]:
import matplotlib.pyplot as plt
% matplotlib inline


In [None]:
# Now we are ready to do some plotting.
# Generate some values ranging from 0 to 2*pi, in steps of 0.01.
x = np.arange(0, 2*np.pi, 0.01)
# Use the plot function to draw some trig functions.
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
# We can also label the axes of our plot.
plt.xlabel("x")
plt.ylabel("sin or cos")


In [None]:
# You can specify the color of the lines.
plt.plot(x, np.sin(x), color="blue")
plt.plot(x + 1, np.sin(x), color="red")
# Or choose dashed or dotted lines.
plt.plot(x + 2, np.sin(x), color="blue", linestyle="--")
plt.plot(x + 3, np.sin(x), color="red", linestyle=":")
plt.plot(x + 4, np.sin(x), color="green", linestyle="-.")


The `numpy` and `matplotlib` modules are both very deep with many capabilities. You will pick a lot more up over the course of the semester. Don't be afraid to use the `help` function!