<a href="https://colab.research.google.com/github/dlmacedo/maxtrack/blob/master/notebooks/machine-learning/RECOMMENDED_Python_Basics_Functions_External_Libraries_File_IO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

ISRC Python Workshop: Baiscs II

__Functions, File I/O and External Libraries__

<hr>

@author: Zhiya Zuo

@email: zhiya-zuo@uiowa.edu

---

### Functions

#### Calling functions

Previously, we have already made use of many built-in functions to facilitate programming. Function is a block of codes with input arguments (and, optionally, return values) for specific purposes. In Python ( and many other languages), a function call is as the following:

```python
>> output = function(input_argument)
```

For example:

In [0]:
range(5)

Now that Python 3 use [`iterator`](https://stackoverflow.com/questions/25653996/what-is-the-difference-between-list-and-iterator-in-python) for 'range' function, we can manually convert the output into `list` so that we can see the output explicitly

In [0]:
list(range(5))

As another example:

In [0]:
abs(-3.5)

In many cases we need more sophisticated usage of functions, where we need to use more than one input arguments. For example:

In [0]:
list(range(5, 0, -1))

A second example, sort a dictionary by values:

In [0]:
d = {'a': 100, 'c': 50, 'b': 70}
sorted(d)

In [0]:
d

In [0]:
d['a']

In [0]:
sorted(d, key=lambda k: d[k])

#### Lambda functions

Aha, we just saw something different: `lambda`!

Lambda functions are just functions, except that they are anonymous (literally). See [here](https://stackoverflow.com/questions/890128/why-are-python-lambdas-useful) for many good discussions. In short, you can use regular functions to achieve anything with `lambda`. Yet, it is handy because it is lightweight and anonymous.

The example above is actually a good example of when to use `lambda`:

In [0]:
sorted(d, key=lambda k: d[k])

There is one and only one expression within the `lambda` function. In this case, the input is `k`, a key inside the dictionary `d` and the output is `d[k]`, the value in `d` w.r.t. the key `k`. Therefore we are sorting our dictionary keys by their values instead of the keys themselves.

#### Define our own functions

Note that we are not limited to built-in functions only. Let's now try make our own functions. Before that, we need to be clear on the structure of a function
```python
def func_name(arg1, arg2, arg3, ...):
    #####################
    # Do something here #
    #####################
    return output
```

\* *`return output` is NOT required*

In the following example, we make use of `sum`, a built-in function to sum up numeric iterables.

In [0]:
def mySum(list_to_sum):
    return sum(list_to_sum)

In [0]:
mySum(range(5))

A more complicated one that does not use `sum` function.
- Do not remember for loop? Check out [here](https://github.com/zhiyzuo/python-tutorial/blob/master/1-Variables-Data_Structures-Control_Logic.ipynb)

In [0]:
def mySumUsingLoop(list_to_sum):
    sum_ = list_to_sum[0]
    for item in list_to_sum[1:]:
        sum_ += item
    return sum_

In [0]:
mySumUsingLoop(range(5))

*The two example functions are not doing anything interesting but just served as illustrations to build customized functions.*

Finally, let's see how we can sort a dictionary by values using functions instead of `lambda`

In [0]:
d

In [0]:
def my_key(key):
    return d[key]

In [0]:
sorted(d, key=my_key)

See, `lambda` is way simpler than defining a function explicitly

---

### FIle I/O

This section is about some basics on reading and writing data, in Python native style

#### Write data to a file

In [0]:
f = open("tmp1.csv", "w") # f is a file handler, while "w" is the mode (w for write)
for item in range(6):
    f.write(str(item))
    # add newline character 
    f.write("\n") 
    # alternatively, we can do:
    # f.write(str(item)+"\n") because we can concat two strings by using `+`
f.close() # close the filer handler for security reasons.

check out the file we just created `tmp.csv`

In [0]:
cat tmp1.csv

Note that without the typecasting from `int` to `str`, an error will be raised.

A more commonly used way:

In [0]:
with open("tmp2.csv", "w") as f: # f is a file handler, while "w" is the mode (w for write)
    for item in range(4):
        f.write(str(item))
        f.write("\n") # add newline character

In [0]:
cat tmp2.csv

No need to close because of `with`.

See more here:
1. https://stackoverflow.com/questions/3012488/what-is-the-python-with-statement-designed-for
2. https://docs.python.org/3/whatsnew/2.6.html#pep-343-the-with-statement

Occasionally, we need to _append new elements_ instead of _overwriting_ existing files. In this case, we should use `a` mode in our `open` function:

In [0]:
with open("tmp2.csv", "a") as f:
    for item in range(15, 19):
        f.write(str(item)+"\n")

In [0]:
cat tmp2.csv

#### Read data to a file

To read a text file into Python, we use `r` mode (for _read_)

In [0]:
f = open("tmp1.csv", "r") # this time, use read mode
contents = [item for item in f] # list comprehension. This is the same as for-loop but more concise
print(contents)

Usually, we do not like trailing newlines. We can use `strip` to remove them.

In [0]:
contents = [item.strip("\n") for item in contents] # strip the newline
print(contents)

`map` is a function to do similar things like _list comprehension_. See [here](https://stackoverflow.com/questions/10973766/understanding-the-map-function) for more discussions.

In [0]:
int_values = list(map(int, contents)) # map the values into integer type
print(int_values)
f.close() # always remember to close the file handler

Also using with:

In [0]:
with open("tmp1.csv", "r") as f:
    contents = [item for item in f] # list comprehension. This is the same as for-loop but more concise
    contents = [item.strip("\n") for item in contents] # strip the newline
    print('Before converting to `int`')
    print(contents)
    int_values = list(map(int, contents)) # map the values into integer type
    print('After...')
    print(int_values)

---

### Libraries

Often times, we need either internal or external help for complicated computation tasks. In these occasions, we need to _import libraries_. 

#### Built-in libraries

Python provides many built-in packages to prevent extra work on some common and useful functions

We will use __math__ as an example.

In [0]:
import math # use import to load a library

To use functions from the library, do: `library_name.function_name`. For example, when we want to calculate the logarithm using a function from `math` library, we can do `math.log`

In [0]:
x = 3
print("e^x = e^3 = %f"%math.exp(x))
print("log(x) = log(3) = %f"%math.log(x))

You can also import one specific function:

In [0]:
from math import exp # You can import a specific function
print(exp(x)) # This way, you don't need to use math.exp but just exp

Or all:

In [0]:
from math import * # Import all functions

In [0]:
print(exp(x))
print(log(x)) # Before importing math, calling `exp` or `log` will raise errors

Depending on what you want to achieve, you may want to choose between importing a few or all (by `*`) functions within a package.

#### External libraries

There are times you'll want some advanced utility functions not provided by Python. There are many useful packages by developers.

We'll use __numpy__ as an example. (__numpy__, __scipy__, __matplotlib__,and probably __pandas__ will be of the most importance to you for data analyses.

Installation of packages for Python is the easiest using <a href="https://packaging.python.org/installing/" target="_blank">pip</a>:

```bash
~$ pip install numpy scipy pandas
```

If you use Anaconda, I beleive all these are ready for your use.

Loading external libraries is just the same as built-in ones. To use _alias_ for easier access to the libraries, we can import a library by: `import library_long_name as short_name`. For example:

In [0]:
# After you install numpy, load it
import numpy as np # you can use np instead of numpy to call the functions in numpy package

In [0]:
x = np.array([[1,2,3], [4,5,7]], dtype=np.float) # create a numpy array object, specify the data type as float
print(x)
print(type(x))

We can call `shape` function designed for `numpy.ndarray` class to check the dimension

In [0]:
x.shape

Unlike `list`, we have to use one single data type for all elements in an array

In [0]:
y = np.array([1, 'yes'])
y

In [0]:
y[0], type(y[0])

In [0]:
y_list = [1, 'yes']
y_list[0], type(y_list[0])

__Scipy/Numpy__ provides extensive utilities to manipulate data and simple analysis

In [0]:
from scipy.stats import pearsonr, spearmanr # correlation functions

In [0]:
print(pearsonr(x[1, :], x[0, :]))
print(spearmanr(x[1, :], x[0, :]))

__Pandas__ (Python Data Analysis Library) is a great package for data structures: `DataFrame`

If you're familar with `R`, then you must love `pandas.DataFrame` data structure.

In [0]:
import pandas as pd

In [0]:
x

In [0]:
x_df = pd.DataFrame(x)
x_df

Easy import/export

In [0]:
x_df.to_csv('tmp_pd.csv', index=False) # `index=False`: do not write row indices to file

In [0]:
df = pd.read_csv('tmp_pd.csv')

In [0]:
df

---

### Quick Intro to Numpy

Instead of using the native data structures, we use `numpy.ndarray` for data analytics most of the time. While they are not as "flexible" as lists, they are easy to use and have better performance. As Numpy's official documentation states:
> NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.

As we were using it just now, the most common alias for `numpy` is `np`:

In [0]:
import numpy as np

#### Create arrays

Depending on what types of analyses we are going to work on later, the most appropriate array initialization methods can be choosed.

##### By hand

This is very similar to creating a list of elements manually, except that we wrap the list around by `np.array()`.

In [0]:
arr = np.array([1,2,3,8])
arr

In [0]:
arr.shape

Multidimensional arrays: seperated by comma

1 by 4: 1 row and 4 columns

In [0]:
arr = np.array([[1,2,3,8]])
arr.shape

In [0]:
arr

3 by 4: 3 row and 4 columns

In [0]:
arr = np.array([[1,2,3,8], [3,2,3,2], [4,5,0,8]])
arr.shape

In [0]:
arr

##### By functions

There are many special array initialization methods to call:

In [0]:
np.zeros([3,5], dtype=int)

In [0]:
np.ones([3,5])

In [0]:
np.eye(3)

#### Arithmatic operations

The rules are very similar to R: they are generally element wise

In [0]:
arr

In [0]:
arr * 6

In [0]:
arr - 5

In [0]:
np.exp(arr)

Note that if we want conduct matrix multiplication, we need to use `@` or `.dot` function, since `*` still means element wise computation

In [0]:
arr_2 = np.array([[1], [3], [2], [0]])
arr_2

In [0]:
arr @ arr_2

In [0]:
arr.dot(arr_2)

##### Operation based on itself

There are many class methods to calculate some statistics of the array itself along some axis:
- `axis=1` means row-wise
- `axis=0` means column-wise

In [0]:
arr

In [0]:
arr.max()

In [0]:
arr.max(axis=1)

In [0]:
arr.max(axis=0)

In [0]:
arr.cumsum()

In [0]:
arr.cumsum(axis=1)

#### Indexing and slicing

The most important part is how to index and slice a `np.array`. It is actually very similar to `list`, except that we now may have more index elements because there are more than one dimension for most of the datasets in real life

##### 1 dimensional case

In [0]:
a1 = np.array([1,2,8,100])
a1

In [0]:
a1[0]

In [0]:
a1[-2]

In [0]:
a1[[0,1,3]]

We can also use boolean values to index
- `True` means we want this element

In [0]:
a1 > 3

In [0]:
a1[a1 > 3]

##### 2 dimensional case

In [0]:
arr

Using only one number to index will lead to a subset of the original multidimenional array: also an array

In [0]:
arr[0]

In [0]:
type(arr[0])

Since we have 2 dimensions now, there are 2 indices we can use for indexing the 2 dimensions respectively

In [0]:
arr[0,0]

We can use `:` to indicate everything along that axis

In [0]:
arr[1]

In [0]:
arr[1, :]

In [0]:
arr[1,:] == arr[1]

In [0]:
arr[:, 1]

##### 3 dimensional case

As a final example, we look at a 3d array:

In [0]:
arr_3 = np.random.randint(low=0, high=100, size=24)
arr_3

We can use [`reshape`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.reshape.html) to manipulate the shape of an array

In [0]:
arr_3 = arr_3.reshape(3,4,2)
arr_3

In [0]:
arr_3[0]

In [0]:
arr_3[:, 3, 1]

In [0]:
arr_3[2, 3, 1]