# Introduction to Jupyter Notebooks and Data Structures in Python

This is an example of a Jupyter notebook. Jupyter notebooks are stored in files with the .ipynb extension and consist of cells and the output of code that has been run. There are two types of cells:
1. Markdown cells. These contain structured text and when run, they are replaced by typeset text.
2. Code cells. These contain code (Python, R, or Julia) and when run, any assigned variables are added to RAM and any results are printed below the cell.

You can run a cell by hitting Shift + Enter (hold shift and hit enter) while the cell is highlighted. You can also click the play button in the upper right.

Markdown cells like this one can be edited by double clicking on them. Markdown is a mini-typesetting language (like LaTeX) for quickly writing simple structured documents. It supports basic things like headers, lists, *italics*, **bold**, `code blocks`, and links. 

It also supports LaTeX! Here is an inline equation, $r\in(-\infty,\infty)$. Here is a display equation:
$$\frac{dy}{dt} = ry\left(1-\frac{y}{K}\right)$$
and a system
$$
\begin{align}
\frac{dS}{dt} &= -\beta SI + \nu I\\
\frac{dI}{dt} &= \beta SI - \nu I
\end{align}
$$

Here is a [cheat sheet](https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet) for most of the things you can do in Markdown.

## Data Structures in Python

The goal of this notebook is to give you a quick introduction to the basic data structures in Python. Single numerical and boolean values (True, False) can be assigned to variables in Python with the equals sign, e.g. `x=2` stores the value 2 in the variable x. You can similarly assign letters and words (called "strings", more on this below) to variables, as well as container objects which can hold many values.

The ability to choose between many different types of container objects is a key feature of Python and one of the main things that makes it more flexible than MATLAB or R. It is very important that you learn these basic data structures and when to use them.

***Therefore*** *you will have a quiz on the contents of this notebook*. The quiz will be limited to only the information below, but you should try to read more about each one of these structures that I will mention because we will use them constantly. 

[This page in the Python documentation](https://docs.python.org/3/tutorial/introduction.html) is a good place to go for more basic information on the built-in data structures, and you can read more about NumPy arrays in the numpy documentation. **PLEASE READ** all of 3.1.

You can run any code block in this notebook by clicking on it and hitting Shift+Enter. I encourage you to play with this notebook and see what happens when you change this or that - try things, see what happens!

## Mutable vs. Immutable

One extremely important concept in Python is that *everything* you can assign to a variable in Python is classified as either *mutable* or *immutable*. Mutable data types can be changed after they are created without completely overwritting all of the data. Examples include appending to a list or replacing the first-row, first-column number in a 3x3 matrix with a different number. Immutable data types cannot be altered once they are created without completely deleting and rewriting them. Trying to alter them in-place will produce an error.

You should remember which data types are mutable vs. immutable because it will be important later on.

## Lists (also indexing and slicing)

Lists are a mutable, ordered container object where each element in the list can be any type of object, including another list. The different elements do not have to be of the same type either. This makes lists very versatile. It is also computationally fast to append to a list and to remove the last entry of the list. Lists are denoted with square brackets. You should use a list whenever you will need to add or remove things, making it longer or shorter in the process.

In [23]:
# This is a comment. Anything after the # is ignored by the Python interpreter.
mylist = [] # This is an empty list. Note the square brackets. 
mylist.append(42) # this appends the number 42 to the list
# The dot notation is used to call the append method of the list. 
# Methods are functions that are associated with a particular type of object. There are many methods that can be used with lists, 
#   and they are all accessed using the dot notation and parentheses to call them. 
# Examples include: insert, remove, pop, sort, reverse, etc., but just remember append for now.

mylist.append('to wong foo') # this appends a string
mylist.append([3,5,7]) # this appends a list of numbers
print(mylist) # This will print the entire list below.

[42, 'to wong foo', [3, 5, 7]]


In [26]:
# Because lists are mutable, you can assign to them.
# Indexing in Python starts at 0 and uses square brackets, so mylist[2] is the third element of the list.
mylist[2] = 'thanks for everything!'
# Note that you have to run the cell above this one in order for this cell to work, since the list was created in that cell.
# Jupyter notebooks "remember" cells that have been run previously, even if they were run out of order.
print(mylist)

# Because lists are not an exclusively numeric data type, you cannot perform arithmetic operations on them.
# Instead, addition concatenates lists, and multiplication repeats them, resulting in a new list.
print(mylist*2+[0,0,0,0,0])

# You can use the built-in len function to find the length of a list (and of many other data types).
print(len(mylist))

# You can access any element of the list using its index
print(mylist[1])

[42, 'to wong foo', 'thanks for everything!']
[42, 'to wong foo', 'thanks for everything!', 42, 'to wong foo', 'thanks for everything!', 0, 0, 0, 0, 0]
3
to wong foo


"Slicing" is a powerful way to access a range of elements in a list. The syntax is mylist[start:stop:step], 
where start is the index of the first element you want, stop is the index of the first element you don't want, 
and step is how many elements to skip between each one you take.

**LET ME REPEAT THIS:** Slicing includes the "start" index but omits the "stop" index. Think of it as a half-close, half-open interval.

This is done so that the intervals fit together without overlap when stacked next to each other. If you omit start, it defaults to 0. If you omit stop, it defaults to the length of the list. If you omit step, it defaults to 1.

Negative numbers can also be used to index backward through the list. For instance, -1 corresponds to the last element in the list.

If you use a negative number for the step, it results in stepping backward through the list instead of forward (starting at the start index and ending at the stop index). In this case, if you omit start, it defaults to -1 and if you omit stop, it means to go through to the beginning of the list.

In [10]:
numlist = [0,1,2,3,4,5,6,7,8,9]
print(numlist[1:3]) # This will print the second and third elements of the list, but not the fourth.
print(numlist[0:8:2]) # This will print every other element of the list, starting with the first and ending with the seventh.
print(numlist[1:]) # This will print the list starting with the second element and ending with the last element.
print(numlist[:5]) # This will print the list starting with the first element and ending with the fifth element.
print(numlist[::3]) # This will print every third element of the list, starting with the first element.

[1, 2]
[0, 2, 4, 6]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4]
[0, 3, 6, 9]


In [13]:
print(numlist[::-1]) # This will print the list in reverse order.
print(numlist[:0:-1]) # This will print the list in reverse order, ommitting the first element in the list.
print(numlist[-1]) # This will print the last element of the list.
print(numlist[-3:]) # This will print the last three elements of the list.

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
9
[7, 8, 9]


## Tuples

Tuples are just like lists except they are *immutable*. You cannot append to them or change individual values once they are created. Tuples are denoted by parentheses.

Use tuples when you want to ensure that the data inside them cannot accidently be altered later on, or when you want to pass ordered sequences of objects in and out of functions (more on this later).

In [15]:
mytup = (1,(2,3,4,5),'wow') # This is a tuple. Note the parentheses instead of square brackets.

print(mytup)
print(mytup[1][0]) # This finds the second element of the tuple, which is a tuple, and then finds the first element of that tuple. So that prints the number 2.

(1, (2, 3, 4, 5), 'wow')
2


## Strings

Strings are immutable and represent text. Addition concatenates two strings while multiplication repeats the string. They can also be indexed into.

In [20]:
spam = 'Have some spam. ' # This is a string. It's an immutable type. Note the space I put at the end.
eggs = "Have some eggs.\n" # You can use double or single quotes for strings.
# Backslash is an escape character, which is used to include special characters in strings. \n is a newline character, which starts a new line when printed.
# If you want to include a backslash in a string, you have to escape it with another backslash, like this: \\
casserole = 'Have some casserole. It\'s delicious!\n' # If you want to include a single quote in a string that is delimited by single quotes, you have to escape it with a backslash.
receipe = """This is a multi-line string. It can span multiple lines without needing to use the newline character.
This is useful for writing long strings, such as receipes or poems, without needing to worry about formatting. 
You can use triple single quotes or triple double quotes for multi-line strings.""" # Triple
                         
print(spam*2+eggs+casserole+receipe) # Prints the resulting string to the terminal and starts a new line.
print(spam[0:5] + spam[5:10]) # This will print the first five characters of the string, followed by the next five characters of the string. So it will print "Have some".
print(spam[::-1]) # This will print the string in reverse order.

Have some spam. Have some spam. Have some eggs.
Have some casserole. It's delicious!
This is a multi-line string. It can span multiple lines without needing to use the newline character.
This is useful for writing long strings, such as receipes or poems, without needing to worry about formatting. 
You can use triple single quotes or triple double quotes for multi-line strings.
Have some 
 .maps emos evaH


In [22]:
# You can dynamically update strings in a few different ways, producing a new string. This allows you to include variables in strings, 
#   which is very useful for printing messages that include data.
# One way to do this is with the format method of strings, which uses curly braces as placeholders for the variables you want to include in the string.
call = 'arrg'
the_answer = 42
pirate_call = 'The pirate says {}, {}!!!'.format(42, call)
print(pirate_call)

# Another, newer way to do this is with f-strings, which are strings that start with the letter f and with the variables in the curly brace placeholders.
pirate_call2 = f'The pirate says {call}, {the_answer}!!!'
print(pirate_call2)

The pirate says 42, arrg!!!
The pirate says arrg, 42!!!


## Dictionaries (dicts)

Dictionaries are mutable, unordered, key-value pairs. Keys can be any immutable you choose, and each one points to a value which can be any object. They are denoted with curly brackets.

Dictionaries are useful as a container object for storing things like parameter values. You can name each parameter whatever you want, e.g. 'alpha' and 'beta', and have that key point to the numerical value of the parameter.

In [27]:
mydict = {} # This is an empty dict. It's mutable. Note the curly braces.
mydict['color'] = 'blue' # the string 'color' points to the string 'blue'
mydict['number'] = 3 # the string 'number' points to the integer 3
print(mydict)
print(mydict['color'])

{'color': 'blue', 'number': 3}
blue


## Numpy Arrays (ndarrays)

Numpy arrays are mutable (but their dimension is fixed on creation - e.g. 3x3 or 5x6x9), and they are the data structure that you will use the most in this class since they are numerical in nature. However, they are not a built-in Python data type; they are part of the NumPy library. That means that in order to access numpy arrays, you have to first import the numpy library. This is usually done via the command `import numpy as np`, which allows you to use "np" as a shorthand for "numpy".

Numpy arrays are also known as ndarrays, meaning n-dimensional array. Arrays are essentially matrices.

Since numpy arrays are numerical, arithmetic operations are carried out mathematically element-wise. There are tons of methods and functions that operate on numpy arrays, and other libraries utilize numpy arrays as well. We will learn about these later.

In [None]:
import numpy as np

# There are many ways of creating a numpy array. You can create an array of zeros of a given shape, an array of ones, an array of random numbers, etc.

# The most basic way is to use the array function, which takes a list (or a list of lists) as an argument and returns a numpy array.
A = np.array([[1,2,3],[4,5,6],[7,8,9], [10,11,12]]) # 4x3 array of integers
print(f'A = {A}')
# Unlike with lists, you can use a comma to index into each dimension of a numpy array.
print(A[1,0]) # This will print the element in the second row and first column of the array, which is 4. 
              # Remember that Python is base 0, so the first row is row 0, the second row is row 1, etc.

print(A[:,0]) # This will print the first column of the array, which is [1 4 7 10]. 
              # The colon by itself means "all", so this is saying "print all the rows of the first column".
              # Note that the result is a 1-dimensional array, not a 2-dimensional array with one column. This is called "squeezing" the array.
              # 1-dimensional arrays are always rows, because Python is row-major (unlike Matlab, which is column-major).

A = [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
4
[ 1  4  7 10]


In [32]:
# Arrays have a number of attributes. Attributes are values associated with the array that provide information about it.
#   They aren't functions, so you don't call them with parentheses. Instead, you just access them directly with dot notation.

# Here are a couple attributes that are useful to know about:
print(A.shape) # This will print the shape of the array, which is a tuple of the number of rows and columns. In this case, it will print (4, 3).

x,y = A.shape  # This will unpack the shape of the array into the variables x and y, so x will be the number of rows and y will be the number of columns.
print(f'The shape of A is {x} by {y}.')

print(A.T) # This will print the transpose of the array, which is a 3x4 array where the rows and columns are swapped.

(4, 3)
The shape of A is 4 by 3.
[[ 1  4  7 10]
 [ 2  5  8 11]
 [ 3  6  9 12]]


In [37]:
# As mentioned previously, all operations are elementwise.
B = np.array([[1,2,3],[4,5,6]])
C = np.array([[7,8,9],[10,11,12]])
print('Element-wise multiplication:')
print(B*C)

# The easiest way to do matrix multiplication is via the dot product, which treats
#   1-D arrays as column vectors.
v = np.array([1,2,3])
print(np.dot(B,v))

Element-wise multiplication:
[[ 7 16 27]
 [40 55 72]]
[14 32]
