# Introduction to Jupyter Notebooks and Data Structures in Python

This is an example of a Jupyter notebook. Jupyter notebooks are stored in files with the .ipynb extension and consist of cells and the output of code that has been run. There are two types of cells:
1. Markdown cells. These contain structured text and when run, they are replaced by typeset text.
2. Code cells. These contain code (Python, R, or Julia) and when run, any assigned variables are added to RAM and any results are printed below the cell.

You can run a cell by hitting Shift + Enter (hold shift and hit enter) while the cell is highlighted. You can also click the play button in the upper right.

Markdown cells like this one can be edited by double clicking on them. Markdown is a mini-typesetting language (like LaTeX) for quickly writing simple structured documents. It supports basic things like headers, lists, *italics*, **bold**, `code blocks`, and links. Here is a [cheat sheet](https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet) for most of the things you can do in Markdown.

## Data Structures in Python

The goal of this notebook is to give you a quick introduction to the basic data structures in Python. Single numerical and boolean values (True, False) can be assigned to variables in Python with the equals sign, e.g. `x=2` stores the value 2 in the variable x. You can similarly assign letters and words (called "strings", more on this below) to variables, as well as container objects which can hold many values.

The ability to choose between many different types of container objects is a key feature of Python and one of the main things that makes it more flexible than MATLAB or R. It is very important that you learn these basic data structures and when to use them.

**Therefore** you will have a quiz on the contents of this notebook. The quiz will be limited to only the information below, but you should try to read more about each one of these structures that I will mention because we will use them constantly. [This page in the Python documentation](https://docs.python.org/3/tutorial/introduction.html) is a good place to go for more basic information on the built-in data structures, and you can read more about NumPy arrays in the numpy documentation.

## Mutable vs. Immutable

One extremely important concept in Python is that *everything* you can assign to a variable in Python is classified as either *mutable* or *immutable*. Mutable data types can be changed after they are created without completely overwritting all of the data. Examples include appending to a list or replacing the first-row, first-column number in a 3x3 matrix with a different number. Immutable data types cannot be altered once they are created without completely deleting and rewriting them. Trying to alter them in-place will produce an error.

You should remember which data types are mutable vs. immutable because it will be important later on.

## Lists

In [None]:
mylist = [] # This is an empty list. Lists are the most basic mutable type, and 
            #   are what you should use anytime you will need to add or remove things.
            #   They can hold any combination of any type of object, including
            #   other lists.
mylist.append('to wong foo') # this appends a string
mylist.append([3,5,7]) # this appends a list of numbers
print(mylist)

['to wong foo', [3, 5, 7]]


In [None]:
# Because lists are mutable, you can assign to them and there are methods which 
#   alter them in-place.
mylist[1] = 'is a great movie'
mylist.reverse()
# Addition concatenates lists
print(mylist+[0,0,0,0,0])

['is a great movie', 'to wong foo', 0, 0, 0, 0, 0]


## Tuples

In [None]:
mytup = (1,(2,3),'wow') # Tuples are immutable container types. They are just
                        #   like lists, but they can't be changed once created.
                        #   So: no appending or assignment!

print(mytup)
print(mytup[2][1])
# These are useful if you want to be sure the contents don't change.

(1, (2, 3), 'wow')
o


## Strings

In [None]:
spam = 'Have some spam.' # This is a string. It's an immutable type.
eggs = "Have some eggs." # You can use double or single quotes for strings
                         
print(spam) # Prints the string to the terminal and starts a new line.

Have some spam.


In [None]:
# Remember that Python is base 0.
# Like lists and tuples, strings are "iterables". You can index into them.
# All intervals in Python include the left endpoint and exclude the right one.
# Also: addition concatenates strings
print(spam[0:5] + spam[:10])

Have Have some 


In [None]:
# You can use indexing to go through iterables in different ways
print(eggs[1:10:2])
print(eggs[::-1])

aesm 
.sgge emos evaH


In [None]:
# You can use the format property of a string to replace {} with text, etc.
pirate_call = 'The pirate says {}, {}!!!'.format(2+5,'win')
print(pirate_call)

The pirate says 7, win!!!


## Dictionaries

In [None]:
mydict = {} # This is an empty dict. It's mutable.
            #   Dicts are made up of key-value pairs. Keys can be any immutable,
            #   and they point to a value which can be any object.
            #   While they are iterable (you can iterate through them using 
            #   techniques TBD later), they are NOT ordered in any way.
mydict['color'] = 'blue' # color points to 'blue'
mydict['number'] = 3 # 'number' points to 3
print(mydict)
print(mydict['color'])

{'color': 'blue', 'number': 3}
blue


## Numpy Arrays

In [None]:
import numpy as np
# (Nearly) everything in NumPy uses the array datatype.
# It is mutable, but has a fixed shape and should only contain a single datatype
#   (usually numbers, and most usually floating point numbers)
# You can create an array out of a list, or a list of lists.
A = np.array([[1,2,3],[4,5,6],[7,8,9]]) # 3x3 array of integers
print(f'A = {A}')
print(A[1,0])
print(A[0,:])

In [None]:
# Arrays have a number of attributes
print(A.shape)
x,y = A.shape
print(f'The shape of A is {x} by {y}.')

In [None]:
# There are a number of routines to create arrays.
#  NOTE: Arrays must be created with a certain shape and maintain that shape.
#  Any change to the shape is essentially destroying the array and recreating it from scratch.
#  Often, an array of zeros is created and then filled in later.
myarray = np.zeros((4,5))
print(myarray)

In [None]:
# Arrays can be of any dimension
array3 = np.ones((4,3,2))

In [None]:
B = np.random.rand(3,3) # 3x3 array of floating point numbers between 0 and 1
print(f'B = {B}')

In [None]:
# All operations are elementwise.
print('Element-wise multiplication:')
print(A*B)

In [None]:
# In Python 3.5+, we also have @ which is matrix multiplication
print('Matrix multiplicaton:')
print(A@B)
# That doesn't always work well for matrix-vector multiplication, 
#   because vectors are typically rows (Python is a row-major language).
#   Instead, use np.dot
v = np.array([2,1,2])
print(np.dot(A,v))

In [None]:
# Tons of other routines can be found in the numpy library. The docs are fantastic!!

# Key facts about mutable/immutable objects (assignment)

- If you use assignment (=) to make a copy of an *immutable* object, the assignment will do what you expect because the underlying data cannot be changed.
- If you use assignment to make a copy of a *mutable* object, **only a shallow copy is made**. That means the copy is merely a reference to the same data as contained in the original (it is essentially just a link to that data). This is much faster and uses less memory than a "deep copy", where all the data is duplicated into a new area of memory, but it means that if you change the data with one of the variables, you have also changed it for the other one.

In [None]:
my_tuple = (1,2,3) # tuples are immutable collection of objects
my_tuple_cpy = my_tuple 
# you cannot alter immutable data without erasing it and replacing with completely new data
my_tuple = (4,5,6) # this overwrites my_tuple, but not my_tuple_cpy
print('my_tuple = {}'.format(my_tuple))
print('my_tuple_cpy = {}'.format(my_tuple_cpy))

my_tuple = (4, 5, 6)
my_tuple_cpy = (1, 2, 3)


Re-assigning the original variable will also work the same way with mutable objects 
because you are essentially just re-assigning/re-using one of the labels for the original data. 
Another label (variable) still points to the data, so it isn't erased from memory.

As a side note, this is how automatic *garbage collection* works in Python. Any data that becomes orphened/unreachable because 
the last variable pointing was removed or reassigned is marked for deletion. Then, at some unannounced point when the program isn't 
otherwise too busy, the garbage collector will act and free up the memory.

In [None]:
my_list = [1,2,3] # mutable
my_list_ref = my_list # this now refers to the same object in memory as my_list
my_list = [4,5,6] # a new object is created, and my_list refers to it.
                  #   but my_list_ref still refers to [1,2,3]
print('my_list = {}'.format(my_list))
print('my_list_ref = {}'.format(my_list_ref))

my_list = [4, 5, 6]
my_list_ref = [1, 2, 3]


**But here is the key point:**

In [None]:
my_list = [1,2,3]
my_list_ref = my_list # both my_list and my_list_ref point to the same object in memory
my_list[0] = 0 # so when I alter my_list, my_list_ref is altered too!
print('my_list = {}'.format(my_list))
print('my_list_ref = {}'.format(my_list_ref))

my_list = [0, 2, 3]
my_list_ref = [0, 2, 3]


If you don't want this behavior, you can force a deep copy as follows:

In [None]:
my_list = [1,2,3]
my_list_cpy = list(my_list) # the list function makes a true copy
my_list[0] = 0 # so when I alter my_list, my_list_cpy remains the same
print('my_list = {}'.format(my_list))
print('my_list_cpy = {}'.format(my_list_cpy))

my_list = [0, 2, 3]
my_list_cpy = [1, 2, 3]


This works in general: just call the corresponding type function on the data you want to copy. 
E.g., to create a deep copy of a numpy array, use np.array(). For numpy arrays, you can also use 
the array method .copy().

Not sure if you have a shallow copy or a deep copy? You can test for this using the keyword "is". 
In Python, there are two different types of equivalency tests. "==" tests for elementwise data equivalency. 
"is" tests to see if they are actually the same object in memory. 

**Note** that with immutable objects, these two comparisons will always correspond because an immutable type is considered fundamentally itself. That is, the number 2 is the number 2 everywhere. There aren't copies of the number 2, there exists only one immutable one and 
many variables can utilize it. This remains true with more complicated immutable types, like tuples.

In [None]:
my_list = [1,2,3]
my_list_cpy = my_list
print(my_list == my_list_cpy)
print(my_list is my_list_cpy)

True
True


In [None]:
my_list = [1,2,3]
my_list_cpy = list(my_list)
print(my_list == my_list_cpy)
print(my_list is my_list_cpy)

True
False


In [None]:
my_tuple = (1,2,3) # tuples are immutable collection of objects
my_tuple_cpy = my_tuple # they point to the same unchangable object that is (1,2,3). This poses no issue, 
                        #    just as if you had made two variables equal to the number 2.
print(my_tuple == my_tuple_cpy)
print(my_tuple is my_tuple_cpy)

True
True


For built-in Python types, == returns True if they are all the same and False otherwise. 
In numpy, you instead get a boolean array with the elementwise results. So, if you want to 
see if they are all equivalent, you have to call the .all() method on the resulting array. 
There is also a .any() method that will check to see if *any* of the data is elementwise the same.

In [None]:
import numpy as np
A = np.array([1,2,3])
B = A
print(A == B)
print((A==B).all())
print(A is B)

[ True  True  True]
True
True


In [None]:
B = A.copy()
print((A==B).all())
print(A is B)

True
False
