### **MEGC Workshop for Python**

<hr>

Thanks to Zhiya Zuo (https://github.com/zhiyzuo) for providing the original teaching material. 

---

### Variables

When assigning values, we put the variable to be assigned to on the left hand side (LHS), while the value to plug in on the RHS. LHS and RHS are connected by an equal sign (`=`), meaning assignment.

In [0]:
x = 3 # integer
y = 3. # floating point number
z = "Hello!" # strings
Z = "Wonderful!" # another string, stored in a variable big z.
print(x)
print(y)
print(z)
print(Z)

You can do operations on numeric values as well as strings.

In [0]:
sum_ = x + y # int + float = float
print(sum_)

In [0]:
v = "World!"
sum_string = z + " " + v # concatenate strings
print(sum_string)

Print with formating with `%`

In [0]:
print("The sum of x and y is %.2f"%sum_) # %f for floating point number

In [0]:
print("The string `sum_string` is '%s'"%sum_string) # %s for string

#### Naming convention

There are two commonly used style in programming:

1. __camelCase__
2. __snake_case__ or __lower_case_with_underscore__

All variable (function and class) names must start with a letter or underscore (\_). You can include numbers.

In [0]:
myStringHere = 'my string'
myStringHere

In [0]:
x = 3 # valid
x_3 = "xyz" # valid

In [0]:
3_x = "456" # invalid. Numbers cannot be in the first position.

You can choose either camel case or snake case. Always make sure you use one convention consistenly across one project.

See more here:

[1] https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles

[2] https://en.wikipedia.org/wiki/Naming_convention_(programming)

#### Some notes on Strings

To initialize a string variable, you can use either double or single quotes.

In [0]:
store_name = "HyVee"

You can think of strings as a sequence of characters (or a __list__ of characters, see the next section). In this case, indices and bracket notations can be used to access specific ranges of characters.

In [0]:
name_13 = store_name[1:4] # [start, end), end is exclusive; Python starts with 0 NOT 1
print(name_13)

In [0]:
last_letter = store_name[-1] # -1 means the last element
print(last_letter)

### Simple Data Structures

#### Numbers

For numbers w/o fractional parts, we say they are ___integer___. In Python, they are called `int`

In [0]:
x = 3
type(x)

For numbers w/ fractional parts, they are floating point numbers. They are named `float` in Python.

In [0]:
y = 3.0
type(y)

We can apply arithmetic to these numbers. However, one thing we need to be careful about is ___type conversion___. See the example below.

In [0]:
z = 2 * x
type(z)

In [0]:
z = y + x
type(z)

#### Text/Characters/Strings

In Python, we use `str` type for storing letters, words, and any other characters, as mentioned previously in Section 2.2

In [0]:
my_word = "see you"
type(my_word)

Unlike numbers, `str` is an iterable object, meaning that we can iterate through each individual character:

In [0]:
my_word[0], my_word[2:6]

We can also use `+` to _concatenate_ different strings 

In [0]:
my_word + ' tomorrow'

#### Boolean

Boolean type comes in handy when we need to check conditions. For example:

In [0]:
my_error = 1.6
compare_result = my_error < 0.1
compare_result, type(compare_result)

There are two and only two valid Boolean values: `True` and `False`. We can also think of them as `1` and `0`, respectively.

In [0]:
my_error > 0

When we use Boolean values for arithmetic operations, they will become `1/0` automatically

In [0]:
(my_error>0) + 2

#### Type Conversion

Since variables in Python are dynamically typed, we need to be careful about type conversion.

When two variables share the same data type, there is not much to be worried about:

In [0]:
s1 = "no problem. "
s2 = "talk to you later"
s1 + s2

But be careful when we are mixing variables up:

In [0]:
a = 3 # recall that this is an ____?
b = 2.7 # how about this?
c = a + b # what is the type of `c`?

To make things work between string and numbers, we can explicitly convert numbers into `str`:

In [0]:
s1 + 3

In [0]:
s1 + str(3)

---

### Data Structures To Store Those in Section 3

In this section, we discuss some ___nonprimitive___ data structures in Python.

We can think of ___nonprimitive___ types as those who can store ___primitive___ data

#### List

Initialize a list with brackets. You can store anything in a list, even if they are different types
- note that we use [___string formatting___](https://pyformat.info/) to display strings
- `%i` is a placeholder for `int`
- `%s` for `str`

In [0]:
a_list = [1, 2, 3] # commas to seperate elements
print("Length of a_list is: %i"%(len(a_list)))
print("The 3rd element of a_list is: %s" %(a_list[2])) # Remember Python starts with 0
print("The last element of a_list is: %s" %(a_list[-1])) # -1 means the end
print("The sum of a_list is %.2f"%(sum(a_list)))

We can put different types in a list

In [0]:
b_list = [20, True, "good", "good"] 
b_list

Update a list: __pop__, __remove__, __append__, __extend__

In [0]:
print(a_list)
print("Pop %i out of a_list"%a_list.pop(1)) # pop the value of an index
print(a_list)

In [0]:
print("Remove the string good from b_list:")
b_list.remove("good") # remove a specific value (the first one in the list)
print(b_list)

In [0]:
a_list.append(10)
print("After appending a new value, a_list is now: %s"%(str(a_list)))

merge `a_list` and `b_list`: 

In [0]:
a_list.extend(b_list)
print("Merging a_list and b_list: %s"%(str(a_list)))

We can also use `+` to concatenate two lists

In [0]:
a_list + b_list 

#### Tuple (A special case of list whose elements cannot be changed)

Initialize a tuple with paranthesis. The major difference between list and tuple is that you can alter list but not tuple.

In [0]:
a_tuple = (1, 2, 3, 10)
print(a_tuple)
print("First element of a_tuple: %i"%a_tuple[0])

You cannot change the values of a_tuple

In [0]:
a_tuple[0] = 5

In order to create a single value tuple, you need to add a ','

In [0]:
a_tuple = (1) # this would create a int type
print(type(a_tuple))
b_tuple = (1,) # this would create a tuple type, take note of the comma.
print(type(b_tuple))

#### Dictionary: key-value pairs

Initialize a dict by curly brackets `{}`

In [0]:
d = {} # empty dictionary
d[1] = "1 value" # add a key-value by using bracket (key). You can put anything in key/value.
print(d)

In [0]:
# Use for loop to add values
for index in range(2, 10):
    d[index] = "%i value"%index
print(d)
print("All the keys: " + str(d.keys()))
print("All the values: " + str(d.values()))

In [0]:
for key in d:
    print("Key is: %i, Value is : %s"%(key, d[key]))

---

### Control Logics

In the following examples, we show examples of comparison, `if-else` loop, `for` loop, and `while` loop.

#### Comparison

Python syntax for comparison is the same as our hand-written convention: 

1. Larger (or equal): `>` (`>=`)
2. Smaller (or equal): `<` (`<=`)
3. Equal to: `==` (__Notie here that there are double equal signs__)
4. Not equal to: `!=`

In [0]:
3 == 5 

In [0]:
72 >= 2

In [0]:
store_name

In [0]:
store_name == "HyVee" # Will return a boolean value True or False

IMPORTANT: It is worth noting that comparisons between floating point numbers are tricky.

In [0]:
print(2.2 * 3.0)
2.2 * 3.0 == 6.6

In [0]:
3.3 * 2.0 == 6.6

In [0]:
import math
math.isclose(2.2*3.0,6.6,rel_tol=0.001)

Therefore, be really careful when you have to do such comparison

#### If-Else

In [0]:
sum_ 

In [0]:
if sum_ == 0:
    print("sum_ is 0")
elif sum_ < 0:
    print("sum_ is less than 0")
else:
    print("sum_ is above 0 and its value is " + str(sum_)) # Cast sum_ into string type.

Note that you do not have to use `if-else` or `if-elif-...-else`. You can use `if` without other clauses following that.

In [0]:
if sum_ > 5:
    print('sum_ is above 5')

Comparing strings are similar

In [0]:
store_name = 'Walmart'
#store_name = 'Hyvee'

In [0]:
if 'Wal' in store_name:
    print("The store is not Walmart. It's " + store_name + ".")
else:
    print("The store is Walmart.")

#### For loop: Iterating thru a sequence

In [0]:
for letter in store_name:
    print(letter)

`range()` is a function to create interger sequences:
- Note that Python 3 now returns an `iterator` instead of actual `list` with `range` function
- See [link](https://stackoverflow.com/questions/44571718/python-3-range-vs-python-2-range)

In [0]:
print("range(5) gives" + str(list(range(5)))) # By default starts from 0
print("range(1,9) gives: " + str(list(range(1, 9)))) # From 1 to 9-1 (Again the end index is exclusive.)

In [0]:
for index in range(len(store_name)): # length of a sequence
    print("The %ith letter in store_name is: %s"%(index, store_name[index]))

#### While loop: Keep doing until condition no longer holds.

Use `for` when you know __the exact number of iterations__; use `while` when you __do not (e.g., checking convergence)__.

In [0]:
x = 2

In [0]:
while x < 10:
    print(x)
    x = x + (x-1)
    #x += x-1

#### Notes on `break` and `continue`

`break` means get out of the loop immediately. Any code after the `break` will NOT be executed.

In [0]:
store_name = 'Walmart'

In [0]:
index = 0
while True:
    print(store_name[index])
    index += 1 # a += b means a = a + b
    if store_name[index] == "a":
        print("End at a")
        break # instead of setting flag to False, we can directly break out of the loop
        print("Hello!") # This will NOT be run

`continue` means get to the next iteration of loop. It will __break__ the current iteration and __continue__ to the next.

In [0]:
for letter in store_name:
    if letter == "a":
        continue # Not printing V
    else:
        print(letter)

In [0]:
index = 0
while index <= len(store_name)-1:
    print(store_name[index])
    if store_name[index] == "a":
        print("This is an `a`")
        index += 1 # a += b means a = a + b
        continue
    print("Hello!") # This will NOT be run
    index += 1 # a += b means a = a + b

### Functions

#### Calling functions

Previously, we have already made use of many built-in functions to facilitate programming. Function is a block of codes with input arguments (and, optionally, return values) for specific purposes. In Python ( and many other languages), a function call is as the following:

```python
>> output = function(input_argument)
```

For example:

In [0]:
range(5)

Now that Python 3 use [`iterator`](https://stackoverflow.com/questions/25653996/what-is-the-difference-between-list-and-iterator-in-python) for 'range' function, we can manually convert the output into `list` so that we can see the output explicitly

In [0]:
list(range(5))

As another example:

In [0]:
abs(-3.5)

In many cases we need more sophisticated usage of functions, where we need to use more than one input arguments. For example:

In [0]:
list(range(5, 0, -1))

A second example, sort a dictionary by values:

In [0]:
d = {'a': 100, 'c': 50, 'b': 70}
sorted(d)

In [0]:
d

In [0]:
d['a']

In [0]:
sorted(d, key=lambda k: d[k])

#### Lambda functions

Aha, we just saw something different: `lambda`!

Lambda functions are just functions, except that they are anonymous (literally). See [here](https://stackoverflow.com/questions/890128/why-are-python-lambdas-useful) for many good discussions. In short, you can use regular functions to achieve anything with `lambda`. Yet, it is handy because it is lightweight and anonymous.

The example above is actually a good example of when to use `lambda`:

In [0]:
sorted(d, key=lambda k: d[k])

There is one and only one expression within the `lambda` function. In this case, the input is `k`, a key inside the dictionary `d` and the output is `d[k]`, the value in `d` w.r.t. the key `k`. Therefore we are sorting our dictionary keys by their values instead of the keys themselves.

#### Define our own functions

Note that we are not limited to built-in functions only. Let's now try make our own functions. Before that, we need to be clear on the structure of a function
```python
def func_name(arg1, arg2, arg3, ...):
    #####################
    # Do something here #
    #####################
    return output
```

\* *`return output` is NOT required*

In the following example, we make use of `sum`, a built-in function to sum up numeric iterables.

In [0]:
def mySum(list_to_sum):
    return sum(list_to_sum)

In [0]:
mySum(range(5))

A more complicated one that does not use `sum` function.
- Do not remember for loop? Check out [here](https://github.com/zhiyzuo/python-tutorial/blob/master/1-Variables-Data_Structures-Control_Logic.ipynb)

In [0]:
def mySumUsingLoop(list_to_sum):
    sum_ = list_to_sum[0]
    for item in list_to_sum[1:]:
        sum_ += item
    return sum_

In [0]:
mySumUsingLoop(range(5))

*The two example functions are not doing anything interesting but just served as illustrations to build customized functions.*

Finally, let's see how we can sort a dictionary by values using functions instead of `lambda`

In [0]:
d

In [0]:
def my_key(key):
    return d[key]

In [0]:
sorted(d, key=my_key)

See, `lambda` is way simpler than defining a function explicitly

---

### FIle I/O

This section is about some basics on reading and writing data, in Python native style

#### Write data to a file

In [0]:
f = open("tmp1.csv", "w") # f is a file handler, while "w" is the mode (w for write)
for item in range(6):
    f.write(str(item))
    # add newline character 
    f.write("\n") 
    # alternatively, we can do:
    # f.write(str(item)+"\n") because we can concat two strings by using `+`
f.close() # close the filer handler for security reasons.

check out the file we just created `tmp.csv`

In [0]:
cat tmp1.csv

Note that without the typecasting from `int` to `str`, an error will be raised.

A more commonly used way:

In [0]:
with open("tmp2.csv", "w") as f: # f is a file handler, while "w" is the mode (w for write)
    for item in range(4):
        f.write(str(item))
        f.write("\n") # add newline character

In [0]:
cat tmp2.csv

No need to close because of `with`.

See more here:
1. https://stackoverflow.com/questions/3012488/what-is-the-python-with-statement-designed-for
2. https://docs.python.org/3/whatsnew/2.6.html#pep-343-the-with-statement

Occasionally, we need to _append new elements_ instead of _overwriting_ existing files. In this case, we should use `a` mode in our `open` function:

In [0]:
with open("tmp2.csv", "a") as f:
    for item in range(15, 19):
        f.write(str(item)+"\n")

In [0]:
cat tmp2.csv

#### Read data to a file

To read a text file into Python, we use `r` mode (for _read_)

In [0]:
f = open("tmp1.csv", "r") # this time, use read mode
contents = [item for item in f] # list comprehension. This is the same as for-loop but more concise
print(contents)

Usually, we do not like trailing newlines. We can use `strip` to remove them.

In [0]:
contents = [item.strip("\n") for item in contents] # strip the newline
print(contents)

`map` is a function to do similar things like _list comprehension_. See [here](https://stackoverflow.com/questions/10973766/understanding-the-map-function) for more discussions.

In [0]:
int_values = list(map(int, contents)) # map the values into integer type
print(int_values)
f.close() # always remember to close the file handler

Also using with:

In [0]:
with open("tmp1.csv", "r") as f:
    contents = [item for item in f] # list comprehension. This is the same as for-loop but more concise
    contents = [item.strip("\n") for item in contents] # strip the newline
    print('Before converting to `int`')
    print(contents)
    int_values = list(map(int, contents)) # map the values into integer type
    print('After...')
    print(int_values)

---

### Libraries

Often times, we need either internal or external help for complicated computation tasks. In these occasions, we need to _import libraries_. 

#### Built-in libraries

Python provides many built-in packages to prevent extra work on some common and useful functions

We will use __math__ as an example.

In [0]:
import math # use import to load a library

To use functions from the library, do: `library_name.function_name`. For example, when we want to calculate the logarithm using a function from `math` library, we can do `math.log`

In [0]:
x = 3
print("e^x = e^3 = %f"%math.exp(x))
print("log(x) = log(3) = %f"%math.log(x))

You can also import one specific function:

In [0]:
from math import exp # You can import a specific function
print(exp(x)) # This way, you don't need to use math.exp but just exp

Or all:

In [0]:
from math import * # Import all functions

In [0]:
print(exp(x))
print(log(x)) # Before importing math, calling `exp` or `log` will raise errors

Depending on what you want to achieve, you may want to choose between importing a few or all (by `*`) functions within a package.

#### External libraries

There are times you'll want some advanced utility functions not provided by Python. There are many useful packages by developers.

We'll use __numpy__ as an example. (__numpy__, __scipy__, __matplotlib__,and probably __pandas__ will be of the most importance to you for data analyses.

Installation of packages for Python is the easiest using <a href="https://packaging.python.org/installing/" target="_blank">pip</a>:

```bash
~$ pip install numpy scipy pandas
```

If you use Anaconda, I beleive all these are ready for your use.

Loading external libraries is just the same as built-in ones. To use _alias_ for easier access to the libraries, we can import a library by: `import library_long_name as short_name`. For example:

In [0]:
# After you install numpy, load it
import numpy as np # you can use np instead of numpy to call the functions in numpy package

In [0]:
x = np.array([[1,2,3], [4,5,7]], dtype=np.float) # create a numpy array object, specify the data type as float
print(x)
print(type(x))

We can call `shape` function designed for `numpy.ndarray` class to check the dimension

In [0]:
x.shape

Unlike `list`, we have to use one single data type for all elements in an array

In [0]:
y = np.array([1, 'yes'])
y

In [0]:
y[0], type(y[0])

In [0]:
y_list = [1, 'yes']
y_list[0], type(y_list[0])

__Scipy/Numpy__ provides extensive utilities to manipulate data and simple analysis

In [0]:
from scipy.stats import pearsonr, spearmanr # correlation functions

In [0]:
print(pearsonr(x[1, :], x[0, :]))
print(spearmanr(x[1, :], x[0, :]))

__Pandas__ (Python Data Analysis Library) is a great package for data structures: `DataFrame`

If you're familar with `R`, then you must love `pandas.DataFrame` data structure.

In [0]:
import pandas as pd

In [0]:
x

In [0]:
x_df = pd.DataFrame(x)
x_df

Easy import/export

In [0]:
x_df.to_csv('tmp_pd.csv', index=False) # `index=False`: do not write row indices to file

In [0]:
df = pd.read_csv('tmp_pd.csv')

In [0]:
df

---

### Quick Intro to Numpy

Instead of using the native data structures, we use `numpy.ndarray` for data analytics most of the time. While they are not as "flexible" as lists, they are easy to use and have better performance. As Numpy's official documentation states:
> NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.

As we were using it just now, the most common alias for `numpy` is `np`:

In [0]:
import numpy as np

#### Create arrays

Depending on what types of analyses we are going to work on later, the most appropriate array initialization methods can be choosed.

##### By hand

This is very similar to creating a list of elements manually, except that we wrap the list around by `np.array()`.

In [0]:
arr = np.array([1,2,3,8])
arr

In [0]:
arr.shape

Multidimensional arrays: seperated by comma

1 by 4: 1 row and 4 columns

In [0]:
arr = np.array([[1,2,3,8]])
arr.shape

In [0]:
arr

3 by 4: 3 row and 4 columns

In [0]:
arr = np.array([[1,2,3,8], [3,2,3,2], [4,5,0,8]])
arr.shape

In [0]:
arr

##### By functions

There are many special array initialization methods to call:

In [0]:
np.zeros([3,5], dtype=int)

In [0]:
np.ones([3,5])

In [0]:
np.eye(3)

#### Arithmatic operations

The rules are very similar to R: they are generally element wise

In [0]:
arr

In [0]:
arr * 6

In [0]:
arr - 5

In [0]:
np.exp(arr)

Note that if we want conduct matrix multiplication, we need to use `@` or `.dot` function, since `*` still means element wise computation

In [0]:
arr_2 = np.array([[1], [3], [2], [0]])
arr_2

In [0]:
arr @ arr_2

In [0]:
arr.dot(arr_2)

##### Operation based on itself

There are many class methods to calculate some statistics of the array itself along some axis:
- `axis=1` means row-wise
- `axis=0` means column-wise

In [0]:
arr

In [0]:
arr.max()

In [0]:
arr.max(axis=1)

In [0]:
arr.max(axis=0)

In [0]:
arr.cumsum()

In [0]:
arr.cumsum(axis=1)

#### Indexing and slicing

The most important part is how to index and slice a `np.array`. It is actually very similar to `list`, except that we now may have more index elements because there are more than one dimension for most of the datasets in real life

##### 1 dimensional case

In [0]:
a1 = np.array([1,2,8,100])
a1

In [0]:
a1[0]

In [0]:
a1[-2]

In [0]:
a1[[0,1,3]]

We can also use boolean values to index
- `True` means we want this element

In [0]:
a1 > 3

In [0]:
a1[a1 > 3]

##### 2 dimensional case

In [0]:
arr

Using only one number to index will lead to a subset of the original multidimenional array: also an array

In [0]:
arr[0]

In [0]:
type(arr[0])

Since we have 2 dimensions now, there are 2 indices we can use for indexing the 2 dimensions respectively

In [0]:
arr[0,0]

We can use `:` to indicate everything along that axis

In [0]:
arr[1]

In [0]:
arr[1, :]

In [0]:
arr[1,:] == arr[1]

In [0]:
arr[:, 1]

##### 3 dimensional case

As a final example, we look at a 3d array:

In [0]:
arr_3 = np.random.randint(low=0, high=100, size=24)
arr_3

We can use [`reshape`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.reshape.html) to manipulate the shape of an array

In [0]:
arr_3 = arr_3.reshape(3,4,2)
arr_3

In [0]:
arr_3[0]

In [0]:
arr_3[:, 3, 1]

In [0]:
arr_3[2, 3, 1]