<a href="https://colab.research.google.com/github/bwsi-hadr/01-Intro-to-python/blob/master/01_Intro_to_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to (/review of) python 
In this lesson, we'll review basic python and programming concepts, as well as go over the fundamentals of how to use some common python packages, such as:
- numpy
- pandas
- matplotlib

## Importing packages
Packages are collections of pre-written code made available for reuse. In the previous lesson, we installed some necessary packages using the `pip` python package manager. Packages are convenient because they save you from having to implement every feature and function on your own. The widely-used packages also provide a standard, common set of tools for others to develop with --- allowing interoperability between programs.

There are a few different ways to import  package in python:

The simplest is just to `import {packagename}`. For this lesson, we'll use `numpy` as the example package
```
import numpy
```
The functions, classes, and variables of the `numpy` package can then be accessed using "dot" notation: for example, the numpy array class can be accessed with `numpy.array`

---
A variant of this is to use `import {packagename} as {shortname}`, as in:
```
import numpy as np
```
This reduces the number of characters needed to type, and can be convenient if the package name is long, or you need to use many things from the same package. Accessing the numpy array class, for example, can be done with ```np.array```

---

If you only need a subset of items from a package, for example, a single class, function, or a submodule (subpackage of the main package), you can use the syntax ``` from {packagename} import {element}```, as in:
```
from numpy import array
```
This allows you to use the `array` class directly, without importing the rest of the numpy package, and without needing to use the package prefix dot notation.
For example, if you use this import method, then writing
```
test_array = array([0])
```
would be equivalent to writing
```
test_array = numpy.array([0])
test_array = np.array([0])
```
using the previous import styles, respectively.

In [0]:
import numpy as np
import pandas as pd
from matplotlib import pyplot

## Documentation

Packages contain functions, classes, and variables which may be helpful. Crucial to the usability of a package is the documentation (or API reference), which (should) list all of the contents of the package, and how to use them.

To get the built-in help about a function or class, use the `help()` command

Documentation for most common packages are also usually available online. For example, the documentation for [numpy can be found here](https://docs.scipy.org/doc/numpy/reference/)

In [28]:
help(np.array)

Help on built-in function array in module numpy:

array(...)
    array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.  This argument can only be used to 'upcast' the array.  For
        downcasting, use the .astype(t) method.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', 'C', 'F'}

### Commenting code
In order for your code to be readable to others (or your future self), you should provide comments on your code to explain what you are doing. The comment character in python is `#`, and any text following a `#` symbol will not be interpreted as code by python.

In [29]:
array_of_zeros = np.zeros([3,3]) # this creates a 3x3 array full of zeros
print(array_of_zeros) # the print() function displays the value of the variable on screen

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


## Code flow


### Loops
A key part of programming is automating repetitive tasks, such as applying the same operation to a list of inputs. This is achieved using "loops"; most commonly, the `for` loop.

In its simplest form, a python loop iterates over a list, and runs the code within the loop with the variable set equal to the respective element of the list.

In [30]:
idx_list = [0,1,2,3,4,5]
for idx in idx_list: # loop over idx_list, set idx equal to each element sequentially
  print('idx is equal to {}'.format(idx)) # print the current value of idx

idx is equal to 0
idx is equal to 1
idx is equal to 2
idx is equal to 3
idx is equal to 4
idx is equal to 5


Lists are not the only kinds of objects that can be iterated over (also known as an iterable). A special kind of object, called a generator, does not explicitly store every single value in memory, but instead stores the current value, and the rule to generate the next value. This can often be faster than explicitly storing every element.

As an analogy, if you wanted to send to your friend the following sequence of numbers: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59], you could write each number down and send them the entire list. Or you could write "the sequence of numbers starting at 1, increasing by 2, but less than 60"

If the sequence is very long, then the second representation becomes preferable to write, because you don't need to explicitly write out every single element. One common generator that is used in python is `range(start, end, increment)`, which creates a generator that produces the sequence of numbers starting at `start`, increments by `increment`, and is less than (__but not equal to__) `end`.  If `increment` is not set, it defaults to 1.

This is often used in conjunction with iteration:

In [31]:
for idx in range(0,6): #equivalent to the above
  print('idx is equal to {}'.format(idx)) # print the current value of idx

idx is equal to 0
idx is equal to 1
idx is equal to 2
idx is equal to 3
idx is equal to 4
idx is equal to 5


In [32]:
for idx in range(6): # if you only give one argument, it automatically starts from 0
  print('idx is equal to {}'.format(idx)) # print the current value of idx

idx is equal to 0
idx is equal to 1
idx is equal to 2
idx is equal to 3
idx is equal to 4
idx is equal to 5


### Conditionals
Sometimes you want to execute code only if certain conditions are met. The `if`, `elif` (short for else-if), and `else` keywords are used for this purpose

In [34]:
for idx in range(1,6):
  if idx > 2 : # only execute the following indented block of code if idx is greater than 2
    print('idx={}, which is greater than 2'.format(idx))
  elif idx == 2: # only execute if the above condition isn't met, and also idx==2
  # note that == is used to check for equality; single = is the assignment operator
    print('idx={} is equal to 2'.format(idx)) 
  else: # execute this code if none of the above conditions are met
    print('idx={} is less than 2'.format(idx))

idx=1 is less than 2
idx=2 is equal to 2
idx=3, which is greater than 2
idx=4, which is greater than 2
idx=5, which is greater than 2


You can combine different conditions using the keywords `and` and `or`, and negate conditions using `not`

### List comprehension

You can generate a list from any iterable in a couple ways. This is called list comprehension.

The simplest way is just to call `list()` on the generator object

In [33]:
list(range(6))

[0, 1, 2, 3, 4, 5]

Another way is using the following syntax:

```
[x for x in iterable]
```
for example:

In [42]:
list_of_numbers = [x for x in range(6)]
print(list_of_numbers)

[0, 1, 2, 3, 4, 5]

However, the list comprehension syntax is actually more powerful than that: it allows for functions to be called within the expression
```
[expression(x) for x in iterable]
```

In [43]:
list_of_first_five_squares = [x**2 for x in range(6)] # the double star ** expression denotes exponentiation
# hence, the above gives the first 5 square numbers, including zero
print(list_of_first_five_squares)

[0, 1, 4, 9, 16, 25]

In fact, the list comprehension syntax is even more powerful: it can also include conditional statements
```
[expression(x) for x in iterable if condition]
```

In [46]:
list_of_first_few_odd_squares = [x**2 for x in range(10) if np.mod(x,2) == 1] 
print(list_of_first_few_odd_squares)

[1, 9, 25, 49, 81]


## Functions
Functions are a way to repeat the same lines of code, potentially with different inputs. If you find yourself writing a lot of repetitive code that shares the same structure, you may want to try and formulate it as a function. Functions are declared using the `def` keyword. In this example, we will write a function that checks if a number is prime.


In [0]:
def is_prime(number):
  sqrt_num = int(np.sqrt(number)) # we only need to check integer factors up to the square root of the number, rounded down (int() always rounds down)
  for potential_factor in range(2,sqrt_num+1): #range(a,b) iterates from the value a to b-1
    if np.mod(number, potential_factor) == 0: #np.mod() is the modulo (aka remainder) function; thus, if the remainder is zero, then it divides evenly
      return False # if it divides evenly, then it's not prime, then we can return and end the function
  return True #if we get through all of the potential factors and haven't found a factor, then it's prime
  

In [36]:
is_prime(101.0)

True

## Data structures

### Lists
We already looked at one python data structure: the list. Lists are _ordered_ collections of values, denoted with square brackets. 

Lists are _ordered_ in the sense that the order of their elements matter. The list [1,2,3,4] is not the same as [4,3,2,1]

In [38]:
a_list = [2,0,15,5] # square brackets denote a list
another_list = [15,0,5,2]
print('a_list = {}; another_list = {}'.format(a_list, another_list)) # the .format() function of strings allows you to plug in the variable values in the respective curly braces {}

print('is a_list equal to another_list?')
print(a_list == another_list) # print out the truth value of whether a_list is the same as another_list (it shouldn't be, because they have different ordering)

yet_another_list = [2,0,15,5]
print('but it is equal to yet_another_list:')
print(a_list == yet_another_list)

a_list = [2, 0, 15, 5]; another_list = [15, 0, 5, 2]
is a_list equal to another_list?
False
but it is equal to yet_another_list
True


List elements can be any python object, including strings, numbers, and other lists

In [29]:
diverse_list = ['a', False, [0,0,0], 1.0, 10]
print('the elements of diverse_list are: {}'.format(diverse_list))
print('the data types of the elements are {}'.format([type(x) for x in diverse_list])) # using list comprehension to get the type of each element

the elements of diverse_list are: ['a', False, [0, 0, 0], 1.0, 10]
the data types of the elements are [<class 'str'>, <class 'bool'>, <class 'list'>, <class 'float'>, <class 'int'>]


You can access a specific element of a list using the square bracket notation (this is known as indexing)
```
list_name[idx]
```
Index values can be negative, which start counting from the end. So `list_name[-1]` gives the __last__ element of the list

In [17]:
first_element_of_diverse_list = diverse_list[0] # python starts counting at 0, so the first element is at index 0
print(first_element_of_diverse_list)
last_element_of_diverse_list = diverse_list[-1]
print(last_element_of_diverse_list)

a
10


You can "slice" a list using the colon `:` notation
```
list_name[start_idx:end_idx]
```
Note that the slice starts at the start_idx, but __does not include__ the element at end_idx.

If you omit either start_idx or end_idx, it automatically starts at the first element/ends at the last element respectively

In [18]:
print(diverse_list[0:2]) # gets the elements at index 0 and 1
print(diverse_list[:2]) # equivalent to the above
print(diverse_list[2:]) # gets all elements from index 2 to the end
print(diverse_list[:]) # gets all elements

['a', False]
['a', False]
[[0, 0, 0], 1.0, 10]
['a', False, [0, 0, 0], 1.0, 10]


Lists are modifiable: you can append and delete entries, as well as change the values of elements

In [30]:
diverse_list.append('new entry') # add a value to the end
print('appended an entry to diverse_list: {}'.format(diverse_list))
diverse_list[0] = 'changed entry' # change the value of entry at index 0
print('changed an entry of diverse_list: {}'.format(diverse_list))
first_entry = diverse_list.pop(0) # remove (and return) the value at element 0
print('removed "{}" from diverse_list: {}'.format(first_entry, diverse_list))
diverse_list.remove('new entry') # you can also remove the first entry with a specific value, in this case, the "new entry"
print('removed "new entry" from diverse_list: {}'.format(diverse_list))
diverse_list.insert(0,'a') # insert the value 'a' at index 0
print('inserted "a" back into diverse_list: {}'.format(diverse_list))

appended an entry to diverse_list: ['a', False, [0, 0, 0], 1.0, 10, 'new entry']
changed an entry of diverse_list: ['changed entry', False, [0, 0, 0], 1.0, 10, 'new entry']
removed "changed entry" from diverse_list: [False, [0, 0, 0], 1.0, 10, 'new entry']
removed "new entry" from diverse_list: [False, [0, 0, 0], 1.0, 10]
inserted "a" back into diverse_list: ['a', False, [0, 0, 0], 1.0, 10]


### Tuples
Tuples are unchangeable, ordered sequences of elements, grouped with regular parentheses:
```
('a','b','c')
```

In [38]:
a_tuple = ('a','b','c')
print('the first element of a_tuple is "{}"'.format(a_tuple[0])) # tuples can be indexed like lists

the first element of a_tuple is "a"


In [36]:
a_tuple[0] = 10 # however, unlike lists, you cannot change their values once they are set

TypeError: ignored

### Dictionaries
Dictionaries are data structures that store _mappings_ from "keys" to respective "values". You can think of them as lookup tables which return a specific value for a given key. For example, an english dictionary (the book) could be stored as a python dictionary, where the "keys" are each of the words in english, and the "values" are the respective definitions.

They are defined using the curly braces, or the `dict()` function:
```
dictionary = {key: value, key2: value2}
dictionary = dict([(key, value),(key2, value2)])
```

Keys can be a variety of data types, including numeric, strings, and tuples. However, they cannot be changeable objects, such as lists, or other dictionaries. Values, on the other hand, can be any data type.

Accessing the dictionary values are done using square brackets using the syntax:
```
dictionary[key] # returns the value associated with key
```

In [41]:
pokemon_types = {'bulbasaur':'grass', 'charmander':'fire', 'squirtle':'water'}
print(pokemon_types['bulbasaur'])

grass


You can add or change an element to a dictionary using the following syntax:
```
dictionary[key] = value
```

In [43]:
pokemon_types['bulbasaur'] = 'grass/poison' #bulbasaur is actually dual typed, so we'll change its entry
pokemon_types['ivysaur'] = 'grass/poison' #let's add an evolution
print(pokemon_types)

{'bulbasaur': 'grass/poison', 'charmander': 'fire', 'squirtle': 'water', 'ivysaur': 'grass/poison'}


You can get a list of all of the keys to a dictionary using the `.keys()` function, similarly with the `.values()` function.

In [56]:
print(pokemon_types.keys())
print(pokemon_types.values())

dict_keys(['bulbasaur', 'charmander', 'squirtle', 'ivysaur'])
dict_values(['grass/poison', 'fire', 'water', 'grass/poison'])


You can use the function `.items()` to get a list of `(key, value)` tuples. This is often useful for looping

In [55]:
for k, v in pokemon_types.items():
  print('the type of {} is {}'.format(k,v))

the type of bulbasaur is grass/poison
the type of charmander is fire
the type of squirtle is water
the type of ivysaur is grass/poison


## Reading files

# Numpy
Numpy is a package for python which provides various tools to make math and numerical computation much easier. One of the key components is the numpy array, which enables matrices.

## numpy arrays
numpy arrays provide the ability to create matrices, which are essentially 2-dimensional lists. (They can also be used to create even higher-dimensional arrays: tensors, etc)
Unlike python lists, numpy arrays must all have the same data type (e.g. numeric, string). 

Arrays can be created from python lists:

In [84]:
print('a vector can be created from a list {}'.format(np.array([1, 4, 2, 5, 3])))
print('a matrix can be created from a list of lists:\n {}'.format(np.array([[1,1,1],[2,2,2],[3,3,3]]))) #\n is the newline character and makes the following text appear on the next line

a vector can be created from a list [1 4 2 5 3]
a matrix can be created from a list of lists:
 [[1 1 1]
 [2 2 2]
 [3 3 3]]


There are also a bunch of built-in functions for generating arrays. 

ValueError: ignored