# Lab 1: Introduction to programming in Python

This lab will introduce you to some basic programming concepts using the Python language. Python is relatively easy to learn, widely-used across industry and academia, and has many useful packages for map-making and data analysis. But, it's important to remember that there are many programming languages out there that can be used to accomplish a given task. This is not a Python programming course; instead, we hope you will learn to approach spatial problems from a programmatic perspective and implement solutions to those problems using the Python language. 

Especially if you have limited experience with programming, some of these new concepts may be unintuitive at first. One important skill for a programmer is to know where to look for help. Even professional developers have many browser tabs open with code documentation, examples, and troubleshooting forums easily accessible. At the end of this notebook there is a list of [websites where you can go for help](#helpful_cell) if you get stuck.


## Mathematical operations

To begin, let's explore some of the capabilities of python in this notebook coding environment. For our eventual data analysis tasks we'll be doing a fair bit of mathematical operations. Try some of these out by running the code cells below.

### But first... How do I "run code"?

There are a couple of ways to run code in Colab:

* Ctrl-Enter runs the current cell and enters command mode.

* Shift-Enter runs the code and advances to the next cell. 

* Alt-Enter runs the current cell and inserts a new one below.

Alternatively you can simply press the play button in the upper left corner of the code cell. Go ahead and try all three ways out.


In [None]:
1 + 1

In [None]:
4 * 5

In [None]:
20 / 4

Here is a list of some mathematical operators in python. The [modulo operator](https://en.wikipedia.org/wiki/Modulo_operation) returns the integer remainder after two numbers are divided.

Operation  | Symbol | Example code | Result | 
-----------|:------:|:------------:|:------:|
Addition   |    +   |     `3+4`    |   `7`  |
Subtraction   |    -   |     `5-8`    |   `-3`  |
Multiplication   |    *   |     `8*9`    |   `72`  |
Division   |    /   |     `30/10`    |   `3.0`  |
Exponentiation   |    **   |     `3**2`    |   `9`  |
Modulo*   |    %   |     `11%3`    |   `2`  |

Python follows PEMDAS (Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction (from left to right)) order of operations. For example:

In [None]:
5 - 3**2

In [None]:
(5 - 3)**2

We may need some more advanced operations for a particular data analysis task:

In [None]:
sqrt(16)

Congratulations, you've encountered your first programming error! The python language has many functions in its [standard library](https://docs.python.org/3/library/), but most of these functions are not loaded by default. Loading all of these functions all of the time would slow down processing time by taking up a significant portion of the available computer memory. One of the packages (sometimes called modules) in the standard library is called `math` and contains, as you might expect, a collection of mathematical functions. We can access these functions by importing the package:

In [None]:
import math

Now let's try the square root function again:

In [None]:
math.sqrt(16)

In order to access a function in a package you have imported, type the package name followed by a `.` and then the function name. In this notebook, if you type `.` and press `tab`, a tooltip helper will appear that lists all available functions in the package. Try this in the cell below by adding a `.` after the word `math` and pressing `tab`. You can scroll through the list using the up and down arrow keys.

In [None]:
math

### Exercise 1

Create a code block and calculate the cosine of $\pi$. Find a way to access the constant - do not just use 3.14.

## Variable assignment

Often we will want to save the result of an operation to use later on. We can do this by creating a variable. Variables are names given to objects in Python that make things easier to code and makes the code faster. The objects can be a number, a text phrase, and a whole slew of other objects.

The following assigns a name to an integer value.

In [None]:
x = 3

Running the previous code cell does not produce any output. However, we have defined a variable named `x` in our notebook's memory and assigned a value of `3` to that variable. Variables in are retained in memory between code blocks within this notebook environment:

In [None]:
x

In [None]:
y = x + 2
y

In [None]:
y**x

Variables names in python can consist of upper case and lower case letters, digits 0-9, and the underscore chatacter (`_`). Python is a case-sensitive language, so `var1` and `VAR1` are treated as two different variables. One additional constraint is that a variable name may not start with a digit: `var2` is a valid name while `2nd_var` is not.

There are some community-accepted guidelines of 'good' variable names. In general, try to make your variables names descriptive but succinct. For example,

In [None]:
airtemperatureindegreesfahrenheit = 47.2
airtemperatureindegreesfahrenheit


is perfectly valid code and will compile without error. Even though this variable is nice and descriptive, a variable name like this is

a) difficult to read quickly and;

b) annoying to type over and over. Python programmers often use underscores in their variable names to increase readability. A better option for this variable might be:

In [None]:
air_temp_f = 47.2
air_temp_f

### Exercise 2

In Physics 101, the equation for kinetic energy is $KE=\frac{1}{2}mv^2$, where $m$ is the mass of an object and $v$ is its velocity. 

Create a code block and write three lines of code that

1.   Define 2 variables to hold the values of an object's mass and velocity.
2.   Calculate the kinetic energy of a 2.5 kilogram object with a velocity of 3.2 meters per second. Assign this value to another appropriately-named variable.



## Data types

You may have noticed that in the table of operations and some code cells above, the returned result sometimes has a decimal point and other times it does not. What's going on here? Does it matter? The answer is...sometimes.

Almost all programming languages have built-in functionality to differentiate several types of data. The type of data gives an indication to the [compiler](https://en.wikipedia.org/wiki/Interpreter_(computing) (i.e. the under-the-hood computer program that executes our python instructions) how we intend to use the data.

A few commonly-used data types in python:

Type  | Class | Example | 
-----------|:------:|:------------:|
Integer (`int`)  | Numeric   | 103  |  
Real (`float`)   | Numeric   | 0.22 | 
Boolean (`bool`) | Numeric   | `True` and `False` | 
String (`str`)   | Sequence  | `'msu bobcats'`    |  
List (`list()`)  | Sequence  | `[1, 2, 3]` or `['a', 'b', 'c']`   |  


### Numeric data

A whole number without any decimal places (i.e. an **integer**) can be assigned the `int` type, while any **real** number with a decimal is the `float` type (short for [floating-point](https://en.wikipedia.org/wiki/Floating-point_arithmetic) number). We can check the type of any object by using the built-in `type()` function:

In [None]:
type(6)

In [None]:
type(3.14)

Note that a decimal may be added to any integer to represent that number as a float:

In [None]:
type(1.)

The differences between the `int` and `float` data types are more important in other programming languages. Historically, a single integer occupied a much smaller space than a single floating point number in computer memory. Some languages like FORTRAN and C++ require the programmer to declare the type of number at variable creation in order to allocate the necessary memory (note that python automatically determines the type for us). As a result of differences in memory requirements, integer arithmetic was much faster than floating point arithmetic. 

Python allocates approximately the same amount of memory for an integer as a floating point number. On our desktop or laptiop computers the computational efficiency is approximately equal for both data types. However, there are some modern cloud computing applications that are very sensitive to the type of numeric data used in calculations. Additionally we will use some functions that require one numeric type or the other, and it will be important to understand the difference between these types.

**Boolean** variables refer to the binary `True` or `False`. Remember that python is a case-sensitive language and `false` does not mean the same thing as `False`. Technically these variables are numeric data because python represents `True` as `1` and `False` as `0`. We will explore boolean logic in more detail below.

### Sequence data - strings



Sequences of alphanumeric characters (including spaces) in python are called **strings** and have the `str` data type. We denote a string by using quotation marks: 

In [None]:
type('hello')

In [None]:
type(2.5)

In [None]:
type('2.5')

Python will accept both single-quoted and double-quoted strings (e.g. `'go bobcats'` and `"go bobcats"`). There is no official recommendation for which to use. Choose one and be consistent. Mixing single- and double-quotes will not work:

In [None]:
type("go bobcats')

Some programming languages also have a `char` data type that is assigned to a single alphanumeric character. In those languages, strings are comprised of two or more charactrers. Python does not contain the `char` data type and uses the `str` type for alphanumeric sequences of any length (including strings that are 1 or 0 characters long). An example of a 0-character empty string:

In [None]:
type("")

One useful function often used with strings is the `print()` function:

In [None]:
print('hello world')

At first glance this may not seem much different than the cell output we have seen above. But we can combine different functions with the `print()` function:

In [None]:
print('The square root of 9 is 3')

x = 9
print('x has the data type', type(x))
print('The square root of 9 is', math.sqrt(x))

### A word of caution

There are two types of errors in computer programming: syntax errors and logical errors. We have seen some examples of syntax errors already. Syntax errors are when the compiler does not understand our instructions, and will tell us as much by returning an error message instead of the result we were hoping for (e.g. trying to mix single- and double-quotes in a string). Syntax errors are easy to find (since our code won't run) and usually easy to fix (since the compiler gives us some indication of what the problem is). 

On the other hand, logical errors can be more difficult to detect and fix. These are when your code runs without an error message from the compiler, but the code isn't actually doing what you think it's doing. Say we wanted to take a number, add 3 to it, and double the result. If we start with the number 5, we should end up with 16. Then what is wrong with this code?

In [None]:
x = 5
result = x + 3 * 2
print(result)

Our code runs fine, but because we neglected to include parentheses we got the wrong result. The code we wrote does not accomplish the task we set out to do - this is a logical error. Luckily this one is a simple fix:

In [None]:
x = 5
result = (x + 3) * 2
print(result)

Now that we have learned about a few different data types, it's important that we keep track of the types we are assigning and using. Unintentionally mixing data types can lead to unexpected logical errors:

In [None]:
a = 2
b = 4
result = a * b
print('The result is', result, 'which has the data type', type(result))

In [None]:
a = '2'
b = 4
result = a * b
print('The result is', result, 'which has the data type', type(result))

Either of these examples may produce our desired functionality, depending on what it is we're trying to do. The take home message here is that we need to be careful about keeping track of data types throughout our code. Especially when the compiled code consists of hundreds of lines, it can be very time-consuming to find and correct logical errors.

### Sequence data - lists

So far we have seen data types that refer to a single object in computer memory (i.e. one number or one string). A **list** is a more complex data type that can hold sequences of objects:

In [None]:
my_list = list([20, 40, 60, 80, 100])
print(my_list)

Accessing individual element(s) of a list is called **indexing**. Python is 0-indexed. This means that the first element of a list is in position 0, the second element is in position 1, and so on. The syntax for list indexing involves a number in square brackets after the variable name:

In [None]:
first = my_list[0]
second = my_list[1]
print('The first element is', first)
print('The second element is', second)

The compiler will throw an error if you attempt to index with a number larger than the length of the list:

In [None]:
tenth = my_list[9]

Indexing using negative numbers will count backwards from the end of the list. The element in position -1 is the last element.

In [None]:
print('The second to last element is', my_list[-2])

You can select a range of elements (also called a slice) using a `:` between two numbers. The slice between two numbers $a$ and $b$ will select all elements starting in position $a$ up to **but not including** $b$. For example:

In [None]:
my_slice = my_list[2:4]
print(my_slice)

We can also index using variables that have numeric values instead of hard-coded numbers...

In [None]:
a = 2
b = 4
slice2 = my_list[a:b]
print(slice2)

...but only if the variables have the `int` data type.

In [None]:
x = 2
y = 4.0
slice3 = my_list[x:y]
print(slice3)

A more common way to construct a string is to simply use square brackets at variable assignment. Also note that the elements of a list do not necessarily need to be the same data type.

In [None]:
list2 = [49, 'cat', -3.5, 'dog', False]
print(type(list2))
print(type(list2[1]))
print(type(list2[2]))

## Methods

Python has a number of useful built-in [list methods](https://www.w3schools.com/python/python_ref_list.asp). We won't go over all of them now, but here are some brief examples:

In [None]:
list_of_lists = [my_list, list2]
print(list_of_lists)
print('list_of_lists has data type', type(list_of_lists))
print('first element of list_of_lists has data type', type(list_of_lists[0]))
print('first element of first element of list_of_lists has data type', type(list_of_lists[0][0]))

list3 = my_list + list2
print('list3:', list3)

letters = ['a', 'b', 'c']
list3.extend(letters)
print('list3:', list3)

list3.remove('cat')
print('list3:', list3)

list3.insert(3, 'giraffe')
print('list3:', list3)

You may have noticed something strange in the code block above. Previously whenever we saw a `.` between two words it indicated a package name before the period and a function name after the period (like `math.sqrt`). This time the object before the period (`list3`) is a variable that we have defined, not a package. What's going on here?

This is the way we access built-in methods. Different data types in Python are associated with certain methods specific to the data type (e.g. the available list methods are different from available [string methods](https://www.w3schools.com/python/python_strings_methods.asp)). When we define a variable in memory, it automatically comes with the methods associated with its data type. You will get an error message if you try to use a method that isn't associated with a particular data type:

In [None]:
test1 = [1, 2, 3]
test2 = ['a', 'b', 'c']
test3 = 'this is a string'

# Use the extend list method
test1.extend(test2)
print(test1)

# Try the same list method from a string
test3.extend(test2)
print(test3)

## Writing readable code - whitespace and comments

A quick note before we continue: you may have noticed various blank lines and whitespace between characters in the various code blocks above. For the most part, python ignores whitespace and skipped lines which makes it easier for us to organize our code. Readable code is expecially important when working in a collaborative environment. Take a look at the following example:

In [None]:
a          =   4
b=7




c =                         3

sum               = a            +b+    c
print   (    'sum:',         sum   ) 

The code compiles but you will drive your colleagues (and yourself) crazy if you write code like this. Especially avoid extra whitespace when using functions, like the `print` function in the example above. The official [Python Style Guidelines](https://www.python.org/dev/peps/pep-0008/) are fairly extensive and cover many topics that we don't need to consider quite yet. In general, keep things consistent and succinct:

In [None]:
a = 2.5
b = 7
c = -5.0
d = 2

result = a*b + c*d
print(result)

It's also good practice to add inline comments to your code. You can do this using the `#` character. Any text following this character is ignored. These comments will remind you the purpose of your lines of code when you return to a project after not working on it for awhile. Too few comments can leave you confused and frustrated, while too many comments can become distracting. As you code you will learn to strike a balance between the two extremes.

In [None]:
# Define constants
a = 2.5
b = 7
c = -5.0  # need float here because of reasons
d = 2

# Calculate result using formula from Singh et. al (2018)
result = a*b + c*d
print(result)

You can write a multiline comment by enclosing it in triple double-quotes. This is sometimes called a docstring. It's often included at the beginning of a large code block to describe the purpose of the code.

In [None]:
"""
The purpose of this code is to demonstrate a docstring.
In the real world, you might describe the various functions, 
variables, etc. that are important for running the code.

Remember that whatever you put in here will be ignored by the compiler.
"""

# Define constants
a = 2.5
b = 7
c = -5.0  # need float here because of reasons
d = 2

# Calculate result using formula from Singh et. al (2018)
result = a*b + c*d
print(result)

For all labs in this course, practice writing consistent, readable, well-documented code. It is much easier to learn good habits from the beginning than have to unlearn bad habits down the road. Your instructor will provide feedback on your coding style in addition to the functional abilities of your code.

## String manipulation and regular expressions

We work with strings (alphanumeric characters) extensively in data analysis tasks. Names of data columns, axis labels on figures, filenames and directories, and more are all stored as strings in python. Sometimes we can make an analysis task easier by interacting with or manipulating a one or more strings. 

Python has many built-in [string methods](https://www.w3schools.com/python/python_ref_string.asp) that we can use for this purpose. Note that strings in python are **immutable**: once we define a string and save it as a variable in memory, we can index the characters just like elements of a list, but the characters of that original string cannot be altered:

In [None]:
my_string = 'testing'
print(my_string[0])
my_string[0] = 'r' #this line will throw an error

Instead, we need to save the result of our manipulations as a new variable:

In [None]:
my_string = 'testing'
string2 = my_string.replace('t', 'r', 1)
print(my_string, string2)

combined = my_string + string2
print(combined)

combined2 = my_string + ' and ' + string2
print(combined2)

In most analysis tasks we will have to interact with data, which is often stored in files on our local machines or in Google Drive. In this case we often have to deal with filenames that include a long directory path, when all we actually interested in is the filename itself:

In [None]:
long_fname = '/very/long/path/to/data/files/station1234_data.csv'

One useful method in this scenario is the `split` method, which divides up a string based on a chatacter we specify (and note here that we need to put that character in quotes - it is its own string, after all!):

In [None]:
split_string = long_fname.split('/')
print(split_string)
print(type(split_string))

In this case we know the filename is always the last component of split string, and we can access the last element of the list by indexing on -1:

In [None]:
short_fname = split_string[-1]
print(short_fname)

A more advanced string manipulation topic involves the concept of [regular expressions](https://en.wikipedia.org/wiki/Regular_expression), or regex. Regex provide a way for us to search and match patterns of characters instead of the characters themselves. Let's go back to our long filename for a moment:

In [None]:
print(long_fname)

A very common scenario in spatial data analysis is that you have the same dataset from multiple stations or locations, for example streamflow data at streamgauges from multiple rivers over the same period of time. Perhaps you have a list of 100 filenames in the same format as `long_fname` above, and you need to extract **only** the station number from those long filenames. We already learned a way that we can get at the shorter filename, which we have saved as `short_fname`:

In [None]:
print(short_fname)

But there are some extraneous characters surrounding the station number. In this example with only a single station it would be no problem for us to write some code like `station_id = '1234'`. But we would be very sad if we had to do this for 99 other stations. 

Instead of knowing the exact station number, we can use regex to pull out only the numeric characters `0-9` from a string. The regular expression `re` library is a python standard library, but just like the `math` library it is not loaded automatically. We need to import it before we can use it:

In [None]:
import re

The function we will use here is called `re.findall()`. The functions takes two arguments. First we need to specify the pattern we want to match, and then we need to specify the string in which we want to search for that pattern:

In [None]:
station_id = re.findall('\d+', long_fname)
print(station_id)

Let's dissect this strange-looking string that is the first argument in the function above:
* `\d` is a special regex chatacter that matches a digit `0-9`
* `+` is called a **greedy** operator. When looking for pattern matches, it will find the first matching character (in this case a digit) and include all subsequent characters until it hits a non-matching character (in this case the underscore charactrer after the four digits in the station ID).
* Also note that we enclosed the regular expression inside quotes - it is a string, after all.



Note that as the name implies, the `re.findall` function will return all instances of a matching pattern in the string in a list. In our example above, the function returned a list with one element. If there are no matching sequences in the string, `re.findall` will return an empty list:

In [None]:
test = re.findall('\d{5}', long_fname)
print(test)
print(type(test))

In this example, `\d` still specifies a digit, while `{5}` tells the function to search for exactly 5 digits in a row. Since there are no sequences of 5 digits in `long_fname` we get an empty list.

### Some helpful tables

Here is a table summarizing some commonly used patterns in regular expressions (for a more extensive list visit [regex101.com](http://www.regex101.com)):

Token        | Description 
-------------|-------------
`'[abc]'`      | A single character of `a`, `b`, or `c` 
`'[^abc]'`     | A single character except `a`, `b`, or `c` 
`'[a-zA-Z]'`   | A single character in the range `a-z` or `'A-Z'` 
`'.'`          | Any single character
`'\s'`         | Any whitespace character (space, tab, etc.)
`'\S'`         | Any non-whitespace character
`'\d'`         | Any digit
`'\D'`         | Any non-digit
`'\w'`         | Any word character (inc. underscore) - same as `'[a-zA-Z0-9_]'`
`'\W'`         | Any non-word character
`'\S'`         | Any non-whitespace character


Count and positional tokens:

Token        | Description 
-------------|-------------
`a?`         | Zero or one of `a`
`a*`         | Zero or more of `a`
`a+`         | One or more of `a`
`a{3}`       | Exactly 3 of `a`
`a{3,}`      | 3 or more of `a`
`a{3,6}`     | Between 3 and 6 of `a`
`^`          | The start of a line
`$`          | The end of a line



You'll notice above that some characters like `\`, `.`, and `^` provide special instructions to the `re.findall` function. If you want to search a string for those characters specifically, you need to insert a backslash before those characters. This is called **escaping** the character. For example, if you wanted to search a string for the backslash character you would need to pass `'\\'` in the pattern argument.

Let's take a look at a few examples:

In [None]:
string = 'The Mississippi River datafile is in /data/rivers/station123456_data.csv'

# A '/' followed by 4 characters
test1 = re.findall('/\w{4}', string)
print(test1)

# Any capital letter and the character immediately following it
test2 = re.findall('[A-Z].', string)
print(test2)

# All sequences of 1 or 2 consecutive digits
test3 = re.findall('\d{1,2}', string)
print(test3)

# The word 'data' with any preceding or following punctuation
# Note that \W does not pick up an underscore
test4 = re.findall('\W?data\W?', string)
print(test4)

### Exercise 3

Complete the code block below by using regular expressions and the `re.findall` function.

In [None]:
string = 'Today in Math Class we learned that 0.9999... = 1.'

# Task 1: Find all sequences of characters following a space.


# Task 2: Find all sequences of more than one non-word characters.


# Task 3: Find all English words (i.e. all sequences of letters only)


# Task 4: Find all capitalized English words.


# Task 5: Find all sequences of any character followed by a period.


## Functions

So far we have seen several examples of built-in functions that come with python: `type()`, `print()`, `math.sqrt()`, and `re.findall()`. Some functions take one or more **arguments** - these are the variables inside the paremtheses after the function name. For example, the `re.findall` takes two arguments: 1) the regex pattern to match and 2) the string in which to search for the pattern.

We can also write our own functions. Let's break down a simple example:

In [None]:
def doubler(x):
    """
    This function doubles any number.

    Arguments
    =========
    x: The number to double

    Returns: int or float (same type as x)
    """
    
    return 2*x



Some important components of this function:
* `def` - this is the necessary syntax to inform the python interpreter that you are defining a function.
* `doubler(x):` - the name of the function, the arguments that the function will accept, and a colon. 
    * Naming functions is like naming any other variable - keep it descriptive but concise. The same rules apply (e.g. can include digits but cannot start with them).
    * Separate multiple arguments with a comma followed by a space.
    * The colon after the closed parenthesis should be the last character before starting the next line.

* Everything after the first line is indented one level. Indentation levels indicate to the interpreter that particular lines of code are separated into logical blocks. All code that is part of the function **must** be indented. We will take a closer look at indentation levels in the next section. The convention in python is to use four spaces for each indentation level.

* A docstring in triple double-quotes at the beginning of the function. Admittedly, this one is probably overkill but serves as a demonstration. This docstring has a brief description of the purpose of the function, the arguments, and what we should expect the function to return.

* `return`: For functions that return something, this will be the last line of the function. Note that a function does not have to return anything.

Let's test it out:

In [None]:
test1 = doubler(5)
test2 = doubler(-1.1)
test3 = doubler('hello')

print(test1)
print(test2)
print(test3)

Not bad - the first two examples do what we expected. However, the docstring specifies that the function doubles any number. If we want to be strict about that aspect of the function, we need to find some way to prevent it from 'doubling' strings. This will hopefully help us avoid logical errors down the road. We will explore some possibilities to address this in the next section.

### Exercise 4

Create a code block and write a function that takes two numbers and divides the first number by the second number. Include a docstring. Test your function a few times with different

## Boolean logic and conditional statements

Conditional statements in python provide a way to perform certain actions if a condition is met. Python uses the boolean `True` and `False` variables for the truth values of conditional statements. Essentially, this allows us to define some condition and then execute some code only if that condition is met. Before we get into conditional statements we need to take a look at **comparative** and **logical** operators.

### Comparative operators

In [None]:
10 > 1

In [None]:
10 < 1

Hopefully these are intuitive. With the `>` and `<` operators, python compares the two values on either side of the operator and returns a value of either `True` or `False`. We can use `>=` and `<=` to similar effect.



In [None]:
print(5 > 5)
print(5 >= 5)

We can use the `==` (equal) and `!=` (not equal) operators to check if two values are equal or not equal. **Note that `==` is different from `=` that we have been using this whole time for variable assignment!**

In [None]:
x = 1 + 1
x == 2

In [None]:
x = 4
y = 5
x != y

### Logical operators

We can combine two comparative operators with the logical operators `and`, `or`, and `not`. The `and` operator will return `True` if both comparisons are true.

In [None]:
print(3 > 2 and 10 > 9)
print(3 > 2 and 10 < 9)
print(3 < 2 and 10 > 9)
print(3 < 2 and 10 < 9)

The `or` operator will return `True` if at least one comparison is true.

In [None]:
print(3 > 2 or 10 > 9)
print(3 > 2 or 10 < 9)
print(3 < 2 or 10 > 9)
print(3 < 2 or 10 < 9)

The `not` operator will reverse the result, i.e. returns `True` if the comparison would normally return `False`.

In [None]:
print(not(3 > 100))
print(not(3 > 2 and 10 < 9))
print(not(3 > 2 or 10 < 9))

### Conditional statements

Now that we know about comparative and logical operators, we can use them as part of conditional statements. Here is an example to get us started:

In [None]:
x = 50

# Conditional statement (evaluates to True or False)
if x > 100: 
    # Run this code if True
    print('x is pretty big!')
else:
    # Run this code if False
    print('x is not so big...')


Let's dissect the example, which is sometimes called an if/else statement:
* We start with the word `if`. This might be the most commonly used conditional statement in python. 
* Immediately after `if` we have some sort of comparative or logical statement that can be evaluated to either `True` or `False`. This statement ends with a colon.
* The line after the colon is indented 4 spaces. This is code that will run only if the statement evaluates to `True`. We can add more lines of code in this section as long as they are also indented the same 4 spaces.
* Next we have the word `else` followed by a colon, and some more indented code underneath. This allows  us to specify code to run if the logical statement evaluates to `False`. Note that it is not necessary to have an `else` component with every `if` statement.

We can chain multiple if/else statements together using the `elif` keyword:

In [None]:
x = 5

# Conditional statement (evaluates to True or False)
if x > 100: 
    # Run this code only if first statement is True.
    print('x is pretty big!')
# If first condition is false, move on to test this statement.
elif x > 10 and x <= 100:
    # Run this code only if second statement is True.
    print('x is not so big...')
else:
    # Run this code if both conditions are False.
    print('x is tiny!')

Let's expand our example by adding some additional logic levels:

In [None]:
x = '50'

# First conditional statement
if type(x) == int or type(x) == float:
    # This code will only run if first condition is True
    # Second conditional statement
    if x > 100: 
        # Run this code if True
        print('x is pretty big!')
    else:
        # Run this code if False
        print('x is not so big...')
else:
    # Run this code if first condition is False
    print('x is not a number!')

Different levels of indentation are also helpful to draw our eyes to the logical structure of the code. We can take this another step further by wrapping our code inside a function (thereby adding another indentation level):

In [None]:
def is_big_number(x):
    """
    A function that tests whether x is a number, 
    and if so, whether it is a big number.

    Arguments
    =========
    - x: variable to test

    Returns: none
    """
    # First conditional statement
    if type(x) == int or type(x) == float:
        # This code will only run if first condition is True
        # Second conditional statement
        if x > 100: 
            # Run this code if True
            print(x, 'is pretty big!')
        else:
            # Run this code if False
            print(x, 'is not so big...')
    else:
        # Run this code if first condition is False
        print('That\'s not a number!')

Remember that running this cell only defines our function and saves it into memory. In order to use it we need to call it and pass various arguments:

In [None]:
is_big_number(50)
is_big_number(500.1)
is_big_number('50')


Let's return to our `doubler` function from earlier, which had some unwanted behavior when we passed in a string instead of a numeric variable. We can use conditional statements to try to catch logical errors:

In [None]:
def doubler(x):
    """
    This function doubles any number.

    Arguments
    =========
    x: The number to double

    Returns: int or float (same type as x)
    """
    # Check for numeric type
    if not(type(x) == int or type(x) == float):
        print('Please pass a number to the doubler!')
    else: # If numeric
        return 2*x


var1 = doubler(9)
var2 = doubler('9')
print(var1)
print(var2)

Note that we were successful in not doubling a string variable, but the variable `var2` was still initialized into memory and currently holds the value `None`. This may or may not be the behavior we want, depending on the situation. We will revisit this concept in a future lab.

### Exercise 5

Revisit your division function from Exercise 5. Use conditional statements to make sure both arguments have numeric types before performing the division (otherwise print out an appropriate message). Also check to make sure you don't try to divide by 0. Test your function a few times with different values, including non-numbers and 0.

## Loops and list comprehension

We've already seen that lists can hold any number of elements. Sometimes we want to apply some function to every element of a list. We can do this using loops. Python has two types of loops: `for` loops and `while` loops.

### For loops

For loops are a fairly common way to programatically access every element of a list in order. Let's take a look at some examples, starting fairly basic and gradually adding more of the concepts we have learned above.

In [None]:
for i in range(10):
    print(i)

Let's dissect this example:
* The first line consists of the `for` keyword, a variable called `i`, the word `in`, and a new function called `range()`.
    * `for` is necessary to indicate a loop to the compiler
    * The variable following `for` can have any name. `i` is often used as a placeholder variable. Feel free to use a different name that is more descriptive if the situation calls for it.
    * `in` always separates the variable and some sort of enumerable variable. For now, we will stick to the `range` function and predefined lists.
        * The syntax for `range()` is `range(start, stop, step)`. This function produces numbers starting at `start` moving up to **but not including** `stop`. If only a single number is given, the range will be from 0 to that number, incrementing by 1. If two numbers are provided, the range will be between those two numbers still with a step size of 1.

The first line ends with a colon and the next line is indented 4 spaces. This should look familiar by now.

Here is another example using a list and some more descriptive variables:

In [None]:
farm = ['cow', 'pig', 'llama', 'horse', 'chicken']
for animal in farm:
    print(animal)

We are not limited to just printing numbers. We can apply functions we have already created:

In [None]:
for i in range(40, 150, 50):
    is_big_number(i)

It is also very common to combine loops with conditional statements. Can you follow the logic with the modulo operator to see what's going on here? (See the Mathematical Operations section for a modulo refresher.)

In [None]:
for i in range(10):
    if i == 0:
        print(i, 'is neither even nor odd.')
    elif not i%2:
        print(i, 'is even.')
    else:
        print(i, 'is odd.')

What about functions other than just printing? It is easy to imagine a situation where we have lists containing data (numbers) that we want to manipulate in some way. How would we double every number in a list? Where do we store the new doubled numbers?

In [None]:
data = [1, 2, 3, 5, 7, 10, 20]
print(data*2) # Not what we want! 

for i in data:
    print(i*2) # Right idea, but we need to store, not print

new_data = [] # Create an empty list
for i in data:
    new_data.append(i*2)

#After the for loop, print out new_data
print(new_data)


### List comprehension

Above, we created an empty list and filled in the values one at a time using the `append` string method. This works, but is not very efficent. Python provides a tool called **list comprehension** as a more efficient route to apply a function to every element of a list:

In [None]:
data = [1, 2, 3, 5, 7, 10, 20]
doubled_data = [i*2 for i in data]
print(doubled_data)

List comprehension is like a more compact version of a `for` loop. Take a look at the syntax:
* Start with a open square bracket as the list constructor.
* The first thing after the bracket is the expression or function we want to apply to each element (in this case multiplying each element by 2).
    * In a `for` loop, this would be the indented second line after the colon on the first line.
* After our expression, we add what is essentially the first line of a `for` loop, without the colon.
* Don't forget the closing square bracket to complete the list.

List comprehension can handle more complex functions:

In [None]:
def my_func(x):
    if type(x) == str:
        return re.findall('\D+', x)[0]
    elif x > 0:
        return round(math.sqrt(math.exp(3**-x)), 4)
    else:
        return round(math.sqrt(-math.sin(x*math.pi/16)), 4)

data = [-3, -2, -1, 1, 5, 10, 'abc123', 'xyz456']
result = [my_func(x) for x in data]
print(result)

### Exercise 6

Part 1: In the code block below, use `re.findall` and regular expressions inside list comprehension to only the station numbers from the list of filenames. Note that the station numbers are not uniform in the number of digits they contain. Make sure you get a list of strings, not a list of lists of strings.

In [None]:
filenames = ['path/to/files/station1234.csv',
             'path/to/files/station5033.csv',
             'path/to/files/station11710.csv',
             'path/to/files/station8406098.csv',
             'path/to/files/station496.csv'
             ]



Part 2: Write some code that does the following:
* If a number is divisible by 3, print the number and the  word 'Fizz' 
* If a number is divisible by 5, print the number and the  word 'Buzz'
* If a number is divisible by both 3 and 5, print the number and the word 'FizzBuzz'.

Apply your code to the numbers from 1 to 100, counting by 2.


<a name="helpful_cell"></a>
## Where to go for help

If you are having trouble and getting errors when you try to compile your code, it is highly likely that someone else has already encountered your same issue. It's important to know that even professional software developers do not know every detail of every function they use off the top of their heads. Often at the end of a coding session you will end up with dozens of tabs in your internet browser as a result of trying to fix errors.

Here are some reputable places to look for help. Check these before you ask your instructor for help, because these websites are the first places your instructor will visit when trying to solve the problem.

* When using Google, it usually helps to begin your search with 'python'. 
    * If you are getting a specific error message, include that too, e.g. 'python TypeError'.
    * You can also use more general search terms like 'python import package'

* Often the first Google results are from a website called Stack Exchange. This is a public Q&A site that usually provides good advice.

* You can always check the official python [documentation](https://docs.python.org/3/) and [style guide](https://www.python.org/dev/peps/pep-0008/).

* You may also start to notice a few learning/tutorial websites popping up in your Google searches. [W3 Schools](https://www.w3schools.com/python/) is generally a good one, and [Programiz](https://www.programiz.com/python-programming) may be useful if you prefer video clips as part of the explanation.

* [regex101](http://www.regex101.com) for help/practice with regular expressions.