# Intro to (/review of) python 
In the next few lessons, we'll review basic python and programming concepts, as well as go over the fundamentals of how to use some common python packages, such as:
- numpy
- pandas
- matplotlib

## Importing packages
Packages are collections of pre-written code made available for reuse. Packages are convenient because they save you from having to implement every feature and function on your own. The widely-used packages also provide a standard, common set of tools for others to develop with --- allowing interoperability between programs.

There are a few different ways to import  package in python:

The simplest is just to `import {packagename}`. For this lesson, we'll use `numpy` as the example package
```python
import numpy
```
The functions, classes, and variables of the `numpy` package can then be accessed using "dot" notation: for example, the numpy array class can be accessed with `numpy.array`

---
A variant of this is to use `import {packagename} as {shortname}`, as in:
```
import numpy as np
```
This reduces the number of characters needed to type, and can be convenient if the package name is long, or you need to use many things from the same package. Accessing the numpy array class, for example, can be done with ```np.array```

---

If you only need a subset of items from a package, for example, a single class, function, or a submodule (subpackage of the main package), you can use the syntax ``` from {packagename} import {element}```, as in:
```
from numpy import array
```
This allows you to use the `array` class directly, without importing the rest of the numpy package, and without needing to use the package prefix dot notation.
For example, if you use this import method, then writing
```python
test_array = array([0])
```
would be equivalent to writing
```python
test_array = numpy.array([0])
test_array = np.array([0])
```
using the previous import styles, respectively.

In [2]:
import numpy as np

## Documentation

Packages contain functions, classes, and variables which may be helpful. Crucial to the usability of a package is the documentation (or API reference), which (should) list all of the contents of the package, and how to use them.

To get the built-in help about a function or class, use the `help()` command

Documentation for most common packages are also usually available online. For example, the documentation for [numpy can be found here](https://docs.scipy.org/doc/numpy/reference/)

In [3]:
help(print)

Help on built-in function print in module builtins:

print(*args, sep=' ', end='\n', file=None, flush=False)
    Prints the values to a stream, or to sys.stdout by default.
    
    sep
      string inserted between values, default a space.
    end
      string appended after the last value, default a newline.
    file
      a file-like object (stream); defaults to the current sys.stdout.
    flush
      whether to forcibly flush the stream.



In [4]:
print('hello','world')

hello world


### Commenting code
In order for your code to be readable to others (or your future self), you should provide comments on your code to explain what you are doing. The comment character in python is `#`, and any text following a `#` symbol will not be interpreted as code by python.

In [5]:
array_of_zeros = np.zeros([3,3]) # this creates a 3x3 array full of zeros
print(array_of_zeros) # the print() function displays the value of the variable on screen

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


## String formatting
You can insert variable values into strings a few different ways in python. We will highlight two main ways

The first, and preferred method is using `f-strings`, which follow this form:
```python
x = 10
print(f'the value of x is {x}')
```
should return
```
the value of x is 10
```
`f-strings` begin with an `f` and are followed by single `'` or double quotes `"`, then contain a string, and use curly braces `{}` with a variable name inside of the curly braces to indicate what variable value should be inserted in there. 

Let's try it! Try changing the value of `x` and rerunning the cell below

In [6]:
x = 10
print(f'the value of x is {x}')

the value of x is 10


You can also include expressions and operations inside the curly braces. For example:

In [7]:
y = 20
print(f'the sum of x={x} and y={y} is x+y={x+y}')

the sum of x=10 and y=20 is x+y=30


There's actually a shortcut way of printing out a variable name and value using `f-strings` using the `=` operator

In [8]:
print(f'the sum of {x=} and {y=} is {x+y=}')

the sum of x=10 and y=20 is x+y=30


The second way of including values is using the `.format()` method of strings. This method does not require you to start the string with an `f` before the opening quotation marks, and still uses the curly braces. You can pass in arguments to `.format()`, and the order in which you pass in the arguments is the order in which the curly braces will be filled.

For example:

In [10]:
my_str = 'the value of x is {} and the value of y is {}'
print(my_str.format(x,y))



the value of x is 10 and the value of y is 20


You can get more flexibility for positioning by putting a variable name in the curly braces, and passing in arguments as named arguments to `.format()`. The variables will be filled with the argument name in the `.format()` function, not the global variable name.

In [None]:
print('the value of x is {a} and it is still {a} no matter what the value of y is, which is {b}'.format(a=x, b=y))

In general, we prefer to use `f-string` style formatting, since it is more intuitive which variable's value is being subsituted into the string, but some older documentation may use the `.format()` style.

For more flexibility about how numbers are displayed, such as choosing how many digits to display, alignment, etc. Check out this documentation https://docs.python.org/3/library/string.html#format-string-syntax

Note that there is an even older way, reminiscent of C-style string formatting using the `%` operator. We will not use that. https://docs.python.org/3/tutorial/inputoutput.html

## Code flow


### Loops
A key part of programming is automating repetitive tasks, such as applying the same operation to a list of inputs. This is achieved using "loops"; most commonly, the `for` loop.

In its simplest form, a python loop iterates over a list, and runs the code within the loop with the variable set equal to the respective element of the list.

In [None]:
idx_list = [0,1,2,3,4,5]
for idx in idx_list: # loop over idx_list, set idx equal to each element sequentially
  print(f'idx is equal to {idx}') # print the current value of idx

Lists are not the only kinds of objects that can be iterated over (also known as an iterable). A special kind of object, called a generator, does not explicitly store every single value in memory, but instead stores the current value, and the rule to generate the next value. This can often be faster than explicitly storing every element.

As an analogy, if you wanted to send to your friend the following sequence of numbers: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59], you could write each number down and send them the entire list. Or you could write "the sequence of numbers starting at 1, increasing by 2, but less than 60"

If the sequence is very long, then the second representation becomes preferable to write, because you don't need to explicitly write out every single element. One common generator that is used in python is `range(start, end, increment)`, which creates a generator that produces the sequence of numbers starting at `start`, increments by `increment`, and is less than (__but not equal to__) `end`.  If `increment` is not set, it defaults to 1.

This is often used in conjunction with iteration:

In [11]:
for idx in range(0,6): #equivalent to the above
  print(f'idx is equal to {idx}') # print the current value of idx

idx is equal to 0
idx is equal to 1
idx is equal to 2
idx is equal to 3
idx is equal to 4
idx is equal to 5


In [12]:
for idx in range(6): # if you only give one argument, it automatically starts from 0
  print(f'idx is currently {idx}') # print the current value of idx

idx is currently 0
idx is currently 1
idx is currently 2
idx is currently 3
idx is currently 4
idx is currently 5


### Conditionals
Sometimes you want to execute code only if certain conditions are met. The `if`, `elif` (short for else-if), and `else` keywords are used for this purpose

In [None]:
for idx in range(1,6):
  if idx > 2 : # only execute the following indented block of code if idx is greater than 2
    print(f'idx={idx}, which is greater than 2')
  elif idx == 2: # only execute if the above condition isn't met, and also idx==2
  # note that == is used to check for equality; single = is the assignment operator
    print(f'idx={idx} is equal to 2') 
  else: # execute this code if none of the above conditions are met
    print(f'idx={idx} is less than 2')

You can combine different conditions using the keywords `and` and `or`, and negate conditions using `not`

In [None]:
x = 5
y = 10
print(x==5 or y==11) # true because first statement is true
print(x==5 and y==11) # false because not both are true
print(x==5 and not y==11) # true because second condition is negated (flipped)

### List comprehension

You can generate a list from any iterable in a couple ways. This is called list comprehension.

The simplest way is just to call `list()` on the generator object

In [13]:
list(range(6))

[0, 1, 2, 3, 4, 5]

Another way is using the following syntax:

```python
[x for x in iterable]
```
for example:

In [None]:
list_of_numbers = [x for x in range(6)]
print(list_of_numbers)

However, the list comprehension syntax is actually more powerful than that: it allows for functions to be called within the expression
```python
[expression(x) for x in iterable]
```

In [None]:
list_of_first_five_squares = [x**2 for x in range(6)] # the double star ** expression denotes exponentiation
# hence, the above gives the first 5 square numbers, including zero
print(list_of_first_five_squares)

In fact, the list comprehension syntax is even more powerful: it can also include conditional statements
```python
[expression(x) for x in iterable if condition]
```

In [None]:
list_of_first_few_odd_squares = [x**2 for x in range(10) if np.mod(x,2) == 1] 
print(list_of_first_few_odd_squares)

Now try to print a list of the first few even cubes using list comprehension. 

In [17]:
list_of_first_few_even_cubes = [x**3 for x in range(20) if x % 2 ==0]
print(f' x = {list_of_first_few_even_cubes}')

x = [0, 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832]


In [18]:
list_of_first_few_even_cubes = [x**3 for x in range(20) if x % 2 == 0]
total_cubes = len(list_of_first_few_even_cubes)
print(f'There are {total_cubes} even cubes, and the first one is {list_of_first_few_even_cubes[0]}')


There are 10 even cubes, and the first one is 0


## Functions
Functions are a way to repeat the same lines of code, potentially with different inputs. If you find yourself writing a lot of repetitive code that shares the same structure, you may want to try and formulate it as a function. Functions are declared using the `def` keyword. In this example, we will write a function that checks if a number is prime.


In [13]:
def is_prime(number):
    if type(number) is int:
        if number < 2: # all negative numbers, 0, and 1 are not prime
            return False 
        sqrt_num = int(np.sqrt(number)) # we only need to check integer factors up to the square root of the number, rounded down (int() always rounds down)
        for potential_factor in range(2,sqrt_num+1): #range(a,b) iterates from the value a to b-1
            if np.mod(number, potential_factor) == 0: #np.mod() is the modulo (aka remainder) function; thus, if the remainder is zero, then it divides evenly
                return False # if it divides evenly, then it's not prime, then we can return and end the function
        return True #if we get through all of the potential factors and haven't found a factor, then it's prime
    return False
  

In [14]:
is_prime(3)


True

## Data structures

### Lists
We already looked at one python data structure: the list. Lists are _ordered_ collections of values, denoted with square brackets. 

Lists are _ordered_ in the sense that the order of their elements matter. The list [1,2,3,4] is not the same as [4,3,2,1]

In [None]:
a_list = [2,0,15,5] # square brackets denote a list
another_list = [15,0,5,2]
print(f'a_list = {a_list}; another_list = {another_list}') 

print('is a_list equal to another_list?')
print(a_list == another_list) # print out the truth value of whether a_list is the same as another_list (it shouldn't be, because they have different ordering)

yet_another_list = [2,0,15,5]
print('but it is equal to yet_another_list:')
print(a_list == yet_another_list)

List elements can be any python object, including strings, numbers, and other lists

In [1]:
diverse_list = ['a', False, [0,0,0], 1.0, 10]
print(f'the elements of diverse_list are: {diverse_list}'.format())
print(f'the data types of the elements are {[type(x) for x in diverse_list]}') # using list comprehension to get the type of each element

the elements of diverse_list are: ['a', False, [0, 0, 0], 1.0, 10]
the data types of the elements are [<class 'str'>, <class 'bool'>, <class 'list'>, <class 'float'>, <class 'int'>]


You can access a specific element of a list using the square bracket notation (this is known as indexing)
```
list_name[idx]
```
Index values can be negative, which start counting from the end. So `list_name[-1]` gives the __last__ element of the list

In [None]:
first_element_of_diverse_list = diverse_list[0] # python starts counting at 0, so the first element is at index 0
print(first_element_of_diverse_list)
last_element_of_diverse_list = diverse_list[-1]
print(last_element_of_diverse_list)

You can "slice" a list using the colon `:` notation
```
list_name[start_idx:end_idx]
```
Note that the slice starts at the start_idx, but __does not include__ the element at end_idx.

If you omit either start_idx or end_idx, it automatically starts at the first element/ends at the last element respectively

In [None]:
print(diverse_list[0:2]) # gets the elements at index 0 and 1
print(diverse_list[:2]) # equivalent to the above
print(diverse_list[2:]) # gets all elements from index 2 to the end
print(diverse_list[:]) # gets all elements

Lists are modifiable: you can append and delete entries, as well as change the values of elements

In [None]:
diverse_list.append('new entry') # add a value to the end
print(f'appended an entry to diverse_list: {diverse_list}')
diverse_list[0] = 'changed entry' # change the value of entry at index 0
print(f'changed an entry of diverse_list: {diverse_list}')
first_entry = diverse_list.pop(0) # remove (and return) the value at element 0
print(f'removed "{first_entry}" from diverse_list: {diverse_list}')
diverse_list.remove('new entry') # you can also remove the first entry with a specific value, in this case, the "new entry"
print(f'removed "new entry" from diverse_list: {diverse_list}')
diverse_list.insert(0,'a') # insert the value 'a' at index 0
print(f'inserted "a" back into diverse_list: {diverse_list}')

### Tuples
Tuples are unchangeable, ordered sequences of elements, grouped with regular parentheses:
```
('a','b','c')
```

In [2]:
a_tuple = ('a','b','c')
print(f'the first element of a_tuple is "{a_tuple[0]}"') # tuples can be indexed like lists

the first element of a_tuple is "a"


In [3]:
a_tuple[0] = 10 # however, unlike lists, you cannot change their values once they are set
# this will throw a TypeError

TypeError: 'tuple' object does not support item assignment

### Dictionaries
Dictionaries are data structures that store _mappings_ from "keys" to respective "values". You can think of them as lookup tables which return a specific value for a given key. For example, an english dictionary (the book) could be stored as a python dictionary, where the "keys" are each of the words in english, and the "values" are the respective definitions.

They are defined using the curly braces, or the `dict()` function:
```
dictionary = {key: value, key2: value2}
dictionary = dict([(key, value),(key2, value2)])
```

Keys can be a variety of data types, including numeric, strings, and tuples. However, they cannot be changeable objects, such as lists, or other dictionaries. Values, on the other hand, can be any data type.

Accessing the dictionary values are done using square brackets using the syntax:
```
dictionary[key] # returns the value associated with key
```

In [5]:
pokemon_types = {'bulbasaur':'grass', 
                 'charmander':'fire', 
                 'squirtle':'water'}
pokemon_types

{'bulbasaur': 'grass', 'charmander': 'fire', 'squirtle': 'water'}

In [4]:
print(pokemon_types['bulbasaur'])

NameError: name 'pokemon_types' is not defined

You can add or change an element to a dictionary using the following syntax:
```
dictionary[key] = value
```

In [6]:
pokemon_types['bulbasaur'] = 'grass/poison' #bulbasaur is actually dual typed, so we'll change its entry
pokemon_types['ivysaur'] = 'grass/poison' #let's add an evolution
pokemon_types

{'bulbasaur': 'grass/poison',
 'charmander': 'fire',
 'squirtle': 'water',
 'ivysaur': 'grass/poison'}

You can get a list of all of the keys to a dictionary using the `.keys()` function, similarly with the `.values()` function.

In [7]:
print(pokemon_types.keys())
print(pokemon_types.values())

dict_keys(['bulbasaur', 'charmander', 'squirtle', 'ivysaur'])
dict_values(['grass/poison', 'fire', 'water', 'grass/poison'])


You can use the function `.items()` to get a list of `(key, value)` tuples. This is often useful for looping

In [8]:
for k, v in pokemon_types.items():
  print(f'the type of {k} is {v}')

the type of bulbasaur is grass/poison
the type of charmander is fire
the type of squirtle is water
the type of ivysaur is grass/poison


# Numpy
Numpy is a package for python which provides various tools to make math and numerical computation much easier. One of the key components is the numpy array, which enables matrices.

The content in this section is adapted from the Python Data Science Handbook, which is [freely available online](https://github.com/jakevdp/PythonDataScienceHandbook)

## numpy arrays
numpy arrays provide the ability to create matrices, which are essentially 2-dimensional lists. (They can also be used to create even higher-dimensional arrays: tensors, etc)
Unlike python lists, numpy arrays must all have the same data type (e.g. numeric, string). 

Arrays can be created from python lists:

In [10]:
import numpy as np
print('a vector can be created from a list {}'.format(np.array([1, 4, 2, 5, 3])))
print('a matrix can be created from a list of lists:\n {}'.format(np.array([[1,1,1],[2,2,2],[3,3,3]]))) #\n is the newline character and makes the following text appear on the next line

a vector can be created from a list [1 4 2 5 3]
a matrix can be created from a list of lists:
 [[1 1 1]
 [2 2 2]
 [3 3 3]]


There are also a bunch of built-in functions for generating arrays. 

In [7]:
# Create a length-10 integer array filled with zeros
x = np.zeros(10, dtype=int)
y = np.zeros(20, dtype= float)
print(x) 
print(y)

[0 0 0 0 0 0 0 0 0 0]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [None]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

In [None]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

## Array attributes

In [9]:
x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array
print('x1=',x1)
print('x2=',x2)
print('x3=',x3)

x1= [0 0 1 9 5 4]
x2= [[5 6 0 2]
 [4 7 4 0]
 [2 9 8 3]]
x3= [[[2 7 1 0 3]
  [7 3 5 3 0]
  [3 8 4 7 7]
  [9 6 9 4 7]]

 [[1 2 2 6 9]
  [5 4 3 4 7]
  [4 4 0 8 4]
  [5 9 1 3 5]]

 [[3 0 7 7 0]
  [1 5 1 3 3]
  [9 9 2 5 3]
  [5 6 0 7 1]]]


In [10]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


## Array Indexing
You can index arrays much in the same way that you can index python lists

One-dimensional arrays function just like lists

In [None]:
print('x1 is', x1)
print('first entry is', x1[0]) # first entry of the array
print('second and third entries are', x1[[1,2]])
print('first three entries are', x1[:3]) # slice the array
print('a second set of colons in the slice allow you to set the interval:', x1[::2]) # every other element of x1
print('you can reverse the order using a negative interval:', x1[3::-1]) # count backwards from entry at index 3 to the beginning

Multi-dimensional arrays are indexed using a tuple of indices. The indices for a 2d array are ordered as `(row_idx, col_idx)`

In [None]:
print(x2)
print('the element in the second row, third column is: ', x2[(1,2)])

You can slice multidimensional arrays as well!

If you change the value of an entry in a slice, you change the value in the original object. This is what's known as a "view" of an array. Slices do not return an independent object, but instead can be thought of as just a reference to a subset of elements in the original object.

However, if you do not want this behavior, you can avoid it by making a copy using the `.copy()` function.

In [None]:
print('x2 is originally: \n', x2)
slice_of_x2 = x2[1:,2:] # slices the 2nd row to the end, and 3rd column to the end
copied_slice_of_x2 = x2[1:,2:].copy() # note that slices provide a direct view of the original object, if you want an independent copy, use the .copy()
print('slice_of_x2 is:\n',slice_of_x2)
print('copied_slice_of_x2 is: \n', copied_slice_of_x2)

In [None]:
# let's change the value of slice_of_x2
slice_of_x2[0,0] = 99 # we changed the value of top left element to 99; this corresponds to the element in the 2nd row, 3rd column of x2
print('now x2 is: \n', x2)
print('slice_of_x2 is:\n',slice_of_x2)
print('and copied_slice_of_x2 is:\n',copied_slice_of_x2)

In [None]:
# If you change the value of a copy, it does not affect the original object
copied_slice_of_x2[0,0] = -50
print('copied_slice_of_x2 is: \n', copied_slice_of_x2)
print('x2 is unchanged by this operation:\n', x2)

## Reshaping
You can reshape an array using the `.reshape()` function

In [15]:
print('np.arange(12):',np.arange(12))
reshaped = np.arange(12).reshape(3,4)
print('reshaped into a 3x4 array: \n',reshaped)


np.arange(12): [ 0  1  2  3  4  5  6  7  8  9 10 11]
reshaped into a 3x4 array: 
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


You can combine and split arrays in numpy. However, we won't be going too much in depth with that. Check out [this tutorial](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.02-The-Basics-Of-NumPy-Arrays.ipynb) for more info in that realm.

## Boolean arrays and masking
Boolean data represents True/False values, which can also be expressed as 1 or 0 respectively.

You can compute boolean operations on arrays in numpy

In [None]:
even_entries_bool = np.mod(reshaped,2)==0
print(even_entries_bool)

You can then use those arrays to select the entries which match that criteria

In [None]:
reshaped[even_entries_bool]

### Exercise
Using the `is_prime()` function we previously wrote, write a function which takes a numpy array and returns a boolean array of the prime entries with the same shape as the input array

In [11]:
import numpy as np
def is_prime(number):
    if type(number) is int:
        if number < 2: # all negative numbers, 0, and 1 are not prime
            return False 
        sqrt_num = int(np.sqrt(number)) # we only need to check integer factors up to the square root of the number, rounded down (int() always rounds down)
        for potential_factor in range(2,sqrt_num+1): #range(a,b) iterates from the value a to b-1
            if np.mod(number, potential_factor) == 0: #np.mod() is the modulo (aka remainder) function; thus, if the remainder is zero, then it divides evenly
                return False # if it divides evenly, then it's not prime, then we can return and end the function
        return True #if we get through all of the potential factors and haven't found a factor, then it's prime
    return False

In [12]:
def is_prime_array(input_array):
    rows, cols = input_array.shape
    # creating a new array with same shape as input 
    result_array = np.empty((rows, cols), dtype=bool)
    
    for i in range(rows):
        for j in range(cols):
            # if prime number add to the resulting array 
            result_array[i, j] = is_prime(int(input_array[i, j]))
    
    return result_array
    
# test
x = np.array([[0, 1, 2, 3],
              [4, 5, 6, 7],
              [8, 9, 10, 11]])

result = is_prime_array(x)
print(result)

[[False False  True  True]
 [False  True False  True]
 [False False False  True]]
