## Python and Notebook Basics
Python is an open source coding language used for a huge variety of analysis and modelling: spatial, statistical, mathematical...you name it, Python probably can do it! 

## At the end of this lab: save your notebook and upload to Moodle to show that you have completed all sections of the notebook. 

### What's a notebook?
This is a notebook! 

A "notebook" is a file that contains both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc…). Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run to perform data analysis.

Notebook files have the file extension '.ipynb'

Some of the text below is modified from the Introduction to Python Notebooks, which is highly useful and can be found here: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html. Another very detailed tutorial can be found here: https://jupyter.brynmawr.edu/services/public/dblank/Jupyter%20Notebook%20Users%20Manual.ipynb


The notebook app will open a browser window with your default file location. Starting a new notebook, or navigating to and opening an existing notebook, will start a 'kernel'. A kernel is the engine that runs the code, and you can have several different kernels open in different tabs.

__Note__: don't open the same notebook in two kernels (or tabs)! And you only need to run the Jupyter notebook app once. Notebooks are saved automatically every few minutes, so its a good idea to make a copy of the original file as a backup.   

Notebooks are made using markdown: cells can be marked up text, formulas, or executable code (with comments). 

To execute the code in a given **cell**, highlight it and hit $CTRL-Return$

Here are some tutorials you can go through to familiarize yourself with Python notebooks:
* http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html
* https://medium.com/codingthesmartway-com-blog/getting-started-with-jupyter-notebook-for-python-4e7082bd5d46
* https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook


  

### Python Basics
- cell execution 
- assignment
- arithmetic
- print

In [None]:
# assign the value 10 to the variable 'a'
a = 10

In [None]:
b = 20

In [None]:
a + b

In [None]:
a * b

In [None]:
b / a

In [None]:
# assign the string 'Hello' to the variable 'b' 
d = 'Hello World'

In [None]:
print(d)

In [None]:
10/3

### Importing packages

At the start of **EVERY SINGLE PYTHON CODE OR NOTEBOOK** there needs to be commands to import packages that are used in the code. This is called the *preamble*. 

Trigonometry functions are found in the math package. But you can't access them until you've imported the math package - try running the cell below. 

In [20]:
sin(30)

NameError: name 'sin' is not defined

In [21]:
# import all functions from the math package
from math import *  #the wildcard '*' denotes all objects in the package

# import the numpy package as np
import numpy as np  #this allows us to use shorthand for calling this package below (i.e. we can type 'np...' rather than 'numpy...' each time)

In [22]:
sin(30)      

-0.9880316240928618

In [23]:
# accessing functions from packages
np.sin(30)

-0.9880316240928618

This last one is really important: to access the functions within a package, you first import it, and then type the name of the package followed by a dot:

In [24]:
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In this course, you'll be given the names of the packages to import at the start of each lab. 

# Navigating Jupyter

### Keyboard Shortcuts

There are some commands you will use over and over again in this course. Keyboard shortcuts will help make things easier and quicker for you. A brief list: 

- Ctrl + Enter: Run cell 
- a: insert cell above
- b: insert cell below
- c: copy cell
- v: paste cell
- x: delete cell
- z: undo deletion/paste

To see a complete list of Jupyter keyboard shortcuts, go to 'Help'-> 'Keyboard Shortcuts'

Notebook cells can be *executable* or **markdown**. To change the cell type, click to the toolbar above and select markdown or code. 

In order to edit the content of a cell, it needs to be highlighted. Double-click on the cell to make it editable.

### Autocomplete
A nifty feature of Jupyter is the autocomplete feature. To explore all the functions in different packages, type the package name and use 'tab' to see a list of all possible functions. 




In [None]:
np.

Autocomplete also works for directories and file names in the current directory. Place a quotation mark and start typing the name of the notebook, then hit `Tab`:

# More Python Basics


**OBJECTIVES**: This lab will introduce you to data types, data structures, and get you comfortable working with Jupyter Notebooks.

Sections of this lab include the following:
- Data types
- Data structures
- Indexing and slicing
- Simple functions

## Data Types

The data you'll be working with include numerical data, strings, boolean values, and dates and time. 

| Data Type | Description | 
|-|-|
| `None` | The Python "null" value|
| `str` | String type | 
| `byte` | Raw ASCII bytes|
| `float` | Double-precision (64-bit) floating point numbers|
| `bool` | A True or False value|
| `int` | Arbitrary precision signed integer | 

### Numeric types
`int` and `float` are the most common numeric data types you will be working with. Integers are **whole** numbers. No decimals. 

In [None]:
ival = 17
type(ival)

In [None]:
ival * 6  

Integer division that doesn't result in a whole number (another integer) will yield a floating-point number: 

In [None]:
3/2

Floating point numbers are double-precision (64-bit) values. They can also be expressed with scientific notation:

In [None]:
fval = 7.243
fval2 = 6.78e-5

For `fval2`, the `e-5` means $\times 10^{-5}$

In [None]:
type(fval)

In [None]:
print(fval2)
print(format(fval2,'f'))

### Strings

Strings are signified by either single quotes `'` or double quotes `"`.

In [None]:
a = 'one way of writing a string'
b = "another way"

Numeric data types can be converted to strings:

In [None]:
str(fval)

Strings can also be *concatenated*, or joined together:

In [None]:
a = 'this is the first half ... '
b = 'and this is the second half'
a + b

You can also *query* the elements of a string using indexing (square brackets with a number indicating the position; more on this later):

In [None]:
str = 'Joseph'
str[0]

### Boolean

In Python (and in particular geospatial applications), the *boolean* data type (True or False) is **extremely** useful. `True` is equivalent to a value of 1, and `False` is equivalent to a value of zero. 

In [None]:
a = True
b = False
a == b  # This double equal sign is a logical test asking  "Does a = b"?

In [None]:
a = [0,1,2,False]
sum(a)

In [None]:
a = [0,1,2,True]

In [None]:
sum(a)

To test multiple conditions, use the "and" (`&`) operator or the "or" (`|`) operator. Each condition has to be enclosed in parentheses ():

In [None]:
a = 5
b = 9

print((a == 5) & (b == 9))
print((a < 4) | (b > 10))

### None

`None` is a null value type in Python.  

In [None]:
a = None

Check: is 'a' equal to None?

In [None]:
a is None

In [None]:
b = 5

In [None]:
# is the variable 'b' None?
b is None

*Note:* We will also use the numpy function `nan` to define missing data. There are large number of built-in funtions in numpy to deal with `nans`.

In [None]:
import numpy as np
#autocomplete this
np.nan

# generate 10 random values
a = np.random.randn(10)

# insert an nan 
a[7] = np.nan

print(a) 

In [None]:
# calculate the mean of this array
print(a.mean())


In [None]:
# calculate the mean using nanmean
np.nanmean(a)

### Casting Between Data Types
As seen above, you can query the type of data you are working with. But you can also convert (or *cast*) between data types using the functions $str$, $float$, $int$. 

In [None]:
a = '10'
float(a)

In [None]:
int(a)

## Mathematical Operators
| Operator | Description | 
|-|-|
| `+` | Addition|
|`-`| Subtraction|
|`*` | Multiplication|
|`/` | Division|
|`**`| Exponent (to the power of...)|
|`//`| Floor division | 
|`%`| Modulus (returns the remainder)|

In [None]:
# example: create two variables with values of 6 and 10 and add them together

In [None]:
# example: now divide them

In [None]:
# example: now raise the first to the power of the second

In [None]:
# example: what is the remainder if you divide them (use modulus)? 

## Data Structures

Data structures are how data are organized in Python. 

### Lists

Lists are the most commonly used data structure. A list is a sequence of data that is enclosed in square brackets and data are separated by a comma. Each data point can be accessed by calling its index value.

Lists are declared by equating a variable to '[ ]', and they can contain multiple data types (see above). 

In [None]:
a = []

In [None]:
print(type(a))

Sequences of data or strings can be assigned to lists: 

In [None]:
x = ['apple', 'orange']

In [None]:
y = [2, 3, 7, None]

Elements can be added (appended) to a list using the *append* function. 

In [None]:
y.append(x[0])
y

Lists can also be concatenated: 

In [None]:
x + y

and sorted: 

In [None]:
x = [109,9,17,1]
x.sort()
x

A *range* of integers can be created with the `range(N)` function, where N is the number of elements. **Ranges will always start with 0**:

In [None]:
a = range(5)
print(a[0])
print(a[4])
print(a[5])

The numpy function $arange(start,stop,interval)$ also can be used to create a range of values:

In [None]:
np.arange(0,100,10)

### Tuples

A fixed-length sequence of objects. Objects are separated by commas, and can be surrounded by parentheses '(...)'. 

In [None]:
tup = 4,5,6
tup

In [None]:
tup = (4,5,6)
tup

#### Unpacking tuples

*Unpacking* refers to getting objects out of a data structure. For tuples, you define the variables in the tuple:

In [None]:
a,b,c = tup

In [None]:
b

## Dictionaries
Another data structure, and often encountered when importing data frames (e.g. Pandas), is a dictionary. Dictionaries are constructed using curly braces `{...}`, and are unordered collections of `key`-`value` pairs.

In [None]:
# note the whitespaces and formatting - your code can (and should) run over multiple lines to be readable
empty_dict = {}

grades = {'John':'A',
          'Emily':'A+',
          'Betty':'B',
          'Mike':'C',
          'Ashley':'A'}

In [None]:
# query the type of data structure 
type(grades)

In [None]:
# access an element of a dictionary
grades['John']

In [None]:
# or use the get function
grades.get('Betty')

You can retrieve all the `keys` or the `values` in the dictionary, and these objects are *iterable* (which means you can go over them in a loop):

In [None]:
grades.keys()

for name in grades.keys():
    print(name + ': ' + (grades[name]))

## Arrays
Arrays (requires NumPy) are critical structures for storing and working with geospatial datasets. They can be 1-D, 2-D, or multi-dimensional. Mathematial operations can be performed on each element quickly. 


In [None]:
import numpy as np

# generate some random data
data = np.random.randn(2,3)
data

In [None]:
# multiply each element in the array by 10
data*10

In [None]:
# add two arrays together (must have the same shape/dimensions!)
print(data.shape)

data + data

## Indexing and Slicing

Indexing is a way of accessing individual objects in a data structure. In python, *indexing* starts from 0. Our list x, which has two elements, will have apple at an index of 0 and orange at an index of 1. To index a structure, use square brackets `[ ]`. 

In [None]:
x = ['apple','orange','banana']
print(x[0])
print(x[1])

Indexing can also be done in reverse order, with the last element starting at -1

In [None]:
x[-1]

You can also select sections of different data structures with *slicing*, which consists of `start:stop` passed to the index operator `[]`:

In [None]:
seq = [7,2,3,7,5,6,0,1]
seq[1:5]

The number of elements in the slice is equal to `stop - start`:

In [None]:
len(seq[1:5]) # len() is a useful command!

Either the stop or the start can be omitted, and the slice defaults to the start or the end of the sequence, respectively:

In [None]:
seq[:5]

In [None]:
seq[3:]

Note: Indexing doesn't work for dictionaries! 

In [None]:
grades[1]

## Read data into python

Data can be read into python, rather than having to manually input variables. This is critical for performing data analysis. 

In [25]:
#We can read many types of data. Below we will just look at importing data in a .txt file and .csv file. 

import pandas as pd
data = pd.read_fwf("import_data1.txt", sep=" ")  #read_fwf - fwf means "fixed width lines" and is a good option for 
                                                #reading in data contained in a .txt file. 
data

Unnamed: 0,A,B,C
0,1,2,3.0
1,2,4,
2,3,6,802384.0


In [27]:
data2 = pd.read_csv("import_data1.csv")
data2

Unnamed: 0,A,B,C
0,1,2,3.0
1,2,4,
2,3,6,802384.0


If our data file does not have headers, we need to tell Python there is no header.

In [31]:
data = pd.read_csv("import_data1.csv", header=None)
data

Unnamed: 0,0,1,2
0,A,B,C
1,1,2,3
2,2,4,
3,3,6,802384


In [30]:
#We can define header values ourselves when we import the data
data = pd.read_csv("import_data1.csv", header=None, names=['Jim','Joe','Brenda'])
data

Unnamed: 0,Jim,Joe,Brenda
0,A,B,C
1,1,2,3
2,2,4,
3,3,6,802384


# Loops and Functions

## Loops
One task that is often done in geospatial coding is applying the same operation to a list of datasets. 



In [15]:
for i in range(0,5):
    print(i)

0
1
2
3
4


Another example: say you are working with census data, and you want to loop through each neighbourhood in Vancouver to examine vote results.  

In [16]:
# create list of neighbourhoods
hoods = ['East Vancouver','West Vancouver','North Vancouver','West End','False Creek']

In [17]:
# get the length of the list
N = len(hoods)

# simple loop to print the name of each neighbourhood
for item in hoods:  # what does this line do???
    print(item)

East Vancouver
West Vancouver
North Vancouver
West End
False Creek


You can also iterate using indexing:

In [18]:
# a list of fruit
a = ['banana','apple','cherry','lime']

# length of the list
N = len(a)

# create a sequence with the length of the list and print each element 
for i in range(N):
    print(a[i])

banana
apple
cherry
lime


## Functions
Functions are probably the most primary and important method of code organization. When you import a library into Python, you can access all the functions contained in the library. 

For example: below, we import the numpy package/library as 'np'. Functions within numpy can then be used/accessed using the `np.function()` convention. To get help on a function, use `np.function?`

In [None]:
import numpy as np
np.sin?

If you are writing your own code and you need to repeat the same lines of code more than once, its probably worth writing a your own function. 

Functions are declared with the `def` keyword, and exited (or returned from) using the `return` keyword. Multiple return statements are allowed. *Conditional* statements (e.g. if, else) are used to tell the function what to do. In python, the white spaces, indents, and colon (:) operator are critical - the function won't work without them. 

In [None]:
def my_function(x, y, z = 2):
    return z + x*y

In [None]:
my_function(6,2)

In [None]:
# use an if/else function for flow control
def my_function2(x, y, z = 2):
    if z > 1: 
        return z * (x + y) 
    else:
        return z / (x + y)

In the examples above, the function is called `my_function` or `my_function2`, `x` and `y` are *positional* arguments, and `z` is a *keyword* argument. To call this function, you need to specify x and y, and z defaults to a value of 2 unless it is specified:

In [None]:
my_function2(5, 2, z = 3)

In [None]:
my_function2(2, 2, -1)

In [None]:
my_function2(10,20) # z defaults to a value of 2

In [None]:
my_function2(3,6,11,2) # too many inputs

### Global and local variables

Functions can access variables that are either *global* or *local*. Variables assigned within a function are *local* and are typically destroyed once the function is finished. Global variables will mostly be avoided. 

In [None]:
def my_func():
    temp = [] # assign local variable inside the function
    # this is a loop! 
    for i in range(5):
        temp.append(i)

my_func()
temp

In [None]:
temp = [] # assign global variable outside the function

def my_func():
    # this is a loop
    for i in range(5):
        temp.append(i)

my_func()
temp

## End of Lab 1! Information below is included for additional reference; you are not expected to know/apply these additional materials on quizzes/exams. 

### User Input
Occasionally (i.e. for programming practice), you will want to get user inputs. These will generally be strings, which need to be cast/converted if you plan on using them for calculations.

In [19]:
# ask user input
answer = input('What day of the week is it? ')
print('Its ' + str(answer) + ' today!')

What day of the week is it?  Christmas


Its Christmas today!


In [None]:
# ask user input, convert to a float
answer_m = float(input('How old are you (in years)? '))

# hmmm, probably underestimating. Add 10% onto their age answer
true_answer_m = answer_m + 0.15*answer_m

# convert back to a string to use the concatenate function
print('Nice try, you are actually ' + str(true_answer_m) + ' years old.')