# Week 1 Review

This notebook contains a review of most topics covered in Week 1 of Create with Code. We will go through a portion of them together based on the "escape room" activity you just completed (yay!), but this notebook will be available to you so that you can go over whatever material you want on your own, as well.

# Computational Thinking

Computational thinking refers to the thought processes involved in expressing solutions as computational steps, or **algorithms**, that can be carried out by a computer.

* Computers take instructions **literally**, so we need to be thorough and **exact**
* Computers lack the ability to read between the lines
* When we are not literal enough, we have to adjust our logic so that the computer does what we want -- this is called *debugging*

# Terminal Commands

The terminal/command prompt has many commands. The most important one for us will be the `ls` command (`!ls` in the code cell).

|Command     |Linux/Mac | Windows|
|------------|----------|--------|
|Display current directory| `pwd` | `cd`|
|Change directory| `cd path`| `cd path`|
|List contents of directory| `ls`| `dir`|
|Copy a file/directory| `cp`| `copy`|
|Move a file/directory| `mv` | `move`|
|Rename a file/directory| `mv`| `ren`|
|Delete a file/directory| `rm` | `del`|

# Kernels

You can think of a *kernel* as a seperate Python session. We can assign a variable in one cell and then access it in a seperate cell. The cells are a nice way to divide our code up visually, but everything is connected underneath.

If we delete a cell where a value is assigned to a variable, that value will still be valid until the variable is reassigned or the kernel is reset.

To restart the kernel:

1.   Select the "Runtime" option
2.   Select "Restart Runtime"

# Markdown

*Markdown* is a writing format that makes it easy to type formatted **text**. Cells in Colab can be used for code or to write text in markdown, allowing us to write notes that explain what we are doing.

To create a markdown cell, click on the `+ Text` option in the header.

There are several typeseting options in markdown:

*   For italic text, wrap the text in \* : `*this will be italic*`

*   For bold wrap the text in \*\* : `** this will be bold **`. 

*   You can create a list using \* : `* item 1`
                                `* item 2`

# Data and Types

* Information is represented on computers as a collection of binary digits (0 or 1) known as **bits**
* For example, a collection of 8 bits (known as a byte) can represent $2^8 = 256$ different values
* Most of the data being used by a computer processor is stored in RAM, and modern laptops have around 4-8 GB (32 - 64 billion bits) of RAM

Instead of dealing with bits and bytes directly, we create *abstractions* of our data. 

In Python, we can find out what type a variable is by using the `type` command. For example:

In [None]:
a = 5
type(a)

In [None]:
a = 2.0
type(a)

In [None]:
a = "Word."
type(a)

The types `int`, `float` and `str` are examples of *primative*, or built-in, data types.

# Algorithms

An algorithm is a **set of instructions** necessary to complete a task. In order to tell a computer to do something, we must break it down into simpler tasks. This is where designing an algorithm comes into play.

*   An algorithm is **not necessarily unique** -- there may be several ways to accomplish the same task (like driving to school or making a pb&j)
*   Some algorithms may be much better than others in terms of **efficiency** -- this can be the difference between code taking an hour to run or taking 10 seconds to run

Designing and implementing algorithms is the key part of scientific computing (and programming in general).

# High Level Programming Languages, Compilers, and Interpreters

Python is a **high level programming language**. What does this mean?

Any software that runs on a computer is executed by that computer's central processing unit (CPU). The language that CPUs use is called *machine code*. Machine code supports only the **most basic types and operations**. This is called a *low level* programming language. Low level programming is very difficult and time consuming to do.

When programming in a high level language, we create one or more text files called *source code*. Python script is an example of source code. In order for source code to be executed, it must be translated into equivalent low level machine code. The two most common ways to do this are through *compilers* or *interpreters*.

A **compiler** takes the entire source code and translates it into a low level program known as an exectutable. 

An **interpreter** proceeds line by line, translating one line, executing it and then moving to the next line. Python is an interpreted language.

Many other high level programming languages exist. For example:

* Java
* C/C++
* C#
* Matlab
* Fortran

Each language has its own **syntax** and semantics, however the core ideas of programming in high level languages do not change.

# Object Oriented Programming, Objects, and Classes

Python is an **object oriented programming language**. We can think of the concept of OOP as being based upon modeling data and operations as paired (rather than seperate) elements.

A *class* represents a **type of data** that supports certain operations. An *object* is a particular *instance* of a class. For example, you might create a class called `DigitalPicture`. If you have a picture of yourself in Venice, you might call this object `VenicePic`. `VenicePic` is an object of type `DigitalPicture`.

Internally, each *instance* is represented by one or more pieces of data called *attributes* (also called *data members*, *fields*, or *instance variables*). Each object of the same type has the same attributes, but they may have different values. 

For example, a digital picture might have an attribute called `height`, but not all pictures have the same height. 

**Operations supported by an object** are known as *methods*. The methods supported by an object are the **primary way of interacting with that object**. Collectively, attributes and methods of an instance are called its *members*. 

To visualize this, we can use diagrams known as Unified Modeling Language (UML) diagrams. Here is an example of a UML diagram for the `DigitalPicture` class:

![uml](https://github.com/ag12s/CreateWithCodeModules/blob/main/images/picture_uml.png?raw=true)
 

# Comments

In almost every programming language, the programmer can leave **comments**. Comments have **no effect on the code output**, but can provide information that may **assist** themselves or anyone else reading their code. 

Comments are incredibly important. Whether you are dealing with long or short programs, it can be very difficult to figure out what is going on without them. 


In Python, comments are denoted by the symbol `#`. When it is used in a line of code, the rest of the line is considered a comment and is **ignored by the Python interpreter**.

There are two common styles. The first is to have a comment on a line by itself to explain what is about to come:

In [None]:
# This is a comment!
a = "This is code!"

The second is the *inline* a comment to the right of an existing command:

In [None]:
a = "This is code!" # This is a comment!

# The List Class

The list class is a built-in Python class. Lists store items and are frequently used in Python.

To create an empty list:

In [None]:
newList = list()

This command demonstrates several key principles. 

*   `list()` creates a **new instance of the list class** by calling its *constructor*
*   The **constructor** is responsible for **allocating memory for the initial state** of the object
*   The parentheses following the word `list` indicate that this is an **action** (in this case, creating an empty list)   



The word `newList` is an *identifier*. We can choose almost anything as an identifier, but good programming practices suggest that we choose something **meaningful** and **clear**.

There are a few **rules** to remember when choosing identifiers:
* identifiers must consist of letters, digits or underscore characters
* identifiers cannot begin with a digit
* certain reserve words, (e.g. class, for, if) have other meanings in Python and cannot be used as identifiers
* identifiers are all case sensitive (newList is not the same as NewList)


The entire command `newList = list()` is an **assignment statement**. We are assigning the identifier on the left side of the `=` operator with the object on the right side.

`newList` is an object of type `list`. We can see this by using the `type` command:

In [None]:
type(newList)

### Methods

#### Append

We can **add an item** to an existing list with `append()`.

The general syntax for calling a method belonging to an object is as follows:
**object**.**method**(**parameters**)

For example: `newList.append('item1')`

The leftmost identifier tells the method to append the object that it acts on. The word after the dot identifies the method to be used and the word inside the parentheses is called a *parameter*.

In [None]:
newList.append('item1')
newList

Two possible sources of error:


1.   Forgetting to put quotations around a non-variable entry
2.   Putting fewer or more than one entry to be appended at a time



In [None]:
# This will not work because item1 is not a variable we have
newList.append(item1)

In [None]:
# This will not work because we are not passing in anything to append
newList.append()

The `append` method is an example of a *mutator*. Mutators change the internal state of an object. We can continue to add items to the list:

In [None]:
newList.append('item2')
newList.append('item3')

newList

#### Insert

Another method associated with lists is the `insert()` method. Sometimes, the order of a list is important and we don't simply want to add items to the end of the list. 

Suppose we're trying to compile a list of the largest cities the US. We could start by creating an empty list as we did with the grocery list, or we could start with an already populated (but in our case incomplete) list:

In [None]:
largest_cities = ["New York", "Chicago", "Seattle"]

This is an alternative way to construct a list. To **construct an empty list**, it would suffice to call `largest_cities = []`.

As before, we can append cities to the end:

In [None]:
largest_cities.append("Tallahassee")
largest_cities

You'll notice that we're missing Los Angeles in our list. If we append it to end, our list would be out of order. Instead we can insert it just behind New York, using the `insert` method:

In [None]:
largest_cities.insert(1,"Los Angeles")
largest_cities

The number 1 indicates the **position** where we want to insert "Los Angeles". **Python begins indexing at 0**, so the first item in the list is numbered 0, the second is numbered 1, and so on. 

In [None]:
largest_cities.insert(3, "Boston")
largest_cities

#### Remove

Another example of a mutator is the  `remove()` method. Returning to our first list `newList`, let's remove *item2* from the list:

In [None]:
newList.remove('item2')
newList

There are two things to keep in mind with `remove`:


1.   If you have multiples of an item in a list, it will only remove the first one
2.   If you try to remove an item that is not in the list, you will get a `ValueError`



#### Return Values: Count, Index, and Pop

None of the methods discussed so far (`append`, `insert`, `remove`) provides a **return value**. They all execute an action and then return control back to the interpreter. Some methods that return a value are `count`, `index`, and `pop`.

The `count` method returns the **number of times an item occurs** in the list:

In [None]:
newList.count('item1')

Unlike the `remove` method, if we ask for the count of an object that isn't included in the list, we do not get an error.

In [None]:
newList.count('item2')

When we're working interactively we may want to just display the return value. If we wish to save this information for later use, however, we can do so by using the **assignment operator**:

In [None]:
numItem1 = newList.count('item1')
numItem1

Note that if we later add more of *item1* to the list, the value of numItem1 does not change:

In [None]:
newList.append('item1')
newList.append('item1')
newList

In [None]:
numItem1

But, we can rerun the `count` method to update this value:

In [None]:
numItem1 = newList.count('item1')
numItem1

The `index` method returns the **first location of an object** in a list:

In [None]:
newList

In [None]:
newList.index('item1')

The `index` method returns an error if the object is not in the list:

In [None]:
newList.index('item2')

The `pop` method is a **mutator that also returns a value**:

In [None]:
newList

In [None]:
newList.pop()

In [None]:
newList

The `pop()` method **removes the last object from the list** and **returns it** to the caller. 

`pop` can also take an **index as a parameter**. Then, `pop` removes the object at that index and returns that object to the caller.

In [None]:
newList.pop(0)

In [None]:
newList

### Indexing

To **return a certain entry** from our list, we can pass its index through brackets: `listname[index]`

In [None]:
largest_cities

In [None]:
largest_cities[0]

In [None]:
largest_cities[3]

List indexing can be used to **replace existing items**. Let's say we want to replace "Chicago" with "Jacksonville":

In [None]:
largest_cities[2] = 'Jacksonville'
largest_cities

Python also supports **negative indices**, which is interpreted as a backward motion. Thus, the index -1 denotes the last item on the list, -2 the second to last, and so on.

In [None]:
largest_cities[-1]

In [None]:
largest_cities[-2]

Lists also support *slicing*, i.e. **extracting a sublist**. 

Rather than a single index, a slice is defined by a pair of indices separated by a colon. The slice starts at the first of the two indices and continues up to *but not including* the seocnd index. For example:

In [None]:
largest_cities[2:5]

Slicing can also support **step sizes** by inputting a third argument in the slice syntax. For example, with a step size of $k$, the slice begins at the first index, takes each $k$-th item in the list and continues so long as it does not reach or pass the second index.

In [None]:
largest_cities[::2]

# Strings

*Immutable* objects **cannot be modified** after they are created, but can be implemented **more efficiently** than mutable objects. The `str` class ("string") is an example of an immutable object.

The string class is designed to store a sequence of characters and supports many methods that make sense only when dealing with characters.

In [None]:
a = "this is a string"
type(a)

In [None]:
a.capitalize()

Because strings are immutable, calling `a.capitalize()` **returns a new string** but does not change `a`.

In [None]:
a

To initialize strings, we can use either **single quotes** or **double quotes**:

In [None]:
a = 'this is a string'
a

In [None]:
b = "this is a string"
b

Strings cannot span multiple lines, but we can introduce **new lines** using the symbol `\n`:

In [None]:
a ="this is a string\non multiple lines"
print(a)

The **backslash** character is called an *escape character*. Escape characters allow us to **specify a character that cannot otherwise be expressed** naturally. For example, if we wish to include a quote in a string, we can use `\"`. 

In [None]:
a = "this is a \"quote\" "
print(a)

Strings use **zero-indexing**, just like lists. They also support many of the same **accessors**.

In [None]:
a = "this is a string"
len(a)

In [None]:
a[1]

However, because strings are **immutable**, we cannot do something like:

In [None]:
a[2] = "a"

Like lists, strings support **slicing**.

In [None]:
alphabet = "abcdefghijklmnopqrstuvwxyz"
alphabet[4:10]

To get all values until the **end of a string**, we leave the second entry blank:

In [None]:
alphabet[15:]

The same can be done from the **beginning of a string**:

In [None]:
alphabet[:15]

As with lists, string slicing can also support **step sizes** by inputting a third argument in the slice syntax. For example, with a step size of $k$, the slice begins at the first index, takes each $k$-th character, and continues so long as it does not reach or pass the second index (this also works for lists).

In short, the syntax is: `string[start:end:step size]`

In [None]:
alphabet[9:20:3]

The step size can also be **negative**.

In [None]:
alphabet[17:5:-3]

Taking a step size of `-1` **reverses** the string.

In [None]:
alphabet[::-1]

### Overloading

The `+` and `*` operators are **overloaded** for strings.

In [None]:
a = "This"
a = a + " is a string"
print(a)

In [None]:
print(a*3)

### Summary of String Methods

![string information](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/string_information1.png?raw=true)

![string information](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/string_information2.png?raw=true)

![new string](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/string_new1.png?raw=true)

![new string](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/string_new2.png?raw=true)

![string conversion](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/string_convert.png?raw=true)

# Numeric Data and Booleans

To create very large or very small floating point numbers, we can use the `e` character to indicate **scientific notation**. For example:

    2e15
This is is equivalent to $2\times 10^{15}$. Note that even though $2\times 10^{15}$ is **actually an integer**, Python **stores it as a float**.

In [None]:
a = 2e15
type(a)

Almost all numbers **cannot be stored with perfect precision** in digital form. 

For example, $\sqrt{2}$ and $\pi$ are both *irrational numbers* that require infinite digits to be represented exactly. Instead of doing this, we store irrational numbers as decimal numbers with a finite number of digits. 

This **floating-point representation** is standard in almost every programming language. Typically, this is done with around 16 digits of accuracy.


This is an important concept to understand, and can sometimes lead to **unexpected behavior**, even for numbers which should be perfectly represented with 16 digits.

In [None]:
a = 0.1
b = 0.2
print(a + b)

In [None]:
print(a + b - 0.3)

This is an example of *floating point error*. In scientific computing, it is important to remember that **no numbers are ever represented perfectly** and **small errors like this can occur**. If you are not careful, this can lead to serious problems down the road.

### Numeric Operations

When adding an `int` to another `int`, the result is an `int`. When adding a `float` to a `float` or an `int`, the result is a `float`. The same applies to subtraction or multiplication.

In [None]:
a = 5
b = 6.0
type(a + a)

In [None]:
type(a + b)

Division depends on what version of Python you are using. Python 3 does **true division**, meaning the result is probably exactly as you'd expect (we are using Python 3).

In [None]:
a = 5
b = 4
a/b

To do **powers** of numbers, Python employs the syntax "\*\*".

In [None]:
2**4

In [None]:
8**0.5 # Square root of 8

Python also supports the *modulus* operator (**mod** for short) with the symbol "%". This returns the **remainder** after division.

In [None]:
a = 5
b = 3
a%b

### Type Conversions

Converting one type to another is called *casting*. For example:

In [None]:
a = 4.9
print(int(a))

When casting a `float` to an `int` Python **truncates the decimal** part. To convert a `float` to the **nearest integer**, there is a function called `round`.

In [None]:
b = round(a)
print(b)
type(b)

`int` and `float` can also be cast to and from **strings** (`str`):

In [None]:
a = "3.5"
b = float(a)
print(b)
type(b)

In [None]:
b = str(b)
print(b)
type(b)

### Boolean Expressions

A boolean is a value that is either `true` or `false`. Python has a built-in type, `bool`, for handling booleans. 

Several functions return booleans (e.g. the `in` function for lists).

In [None]:
groceries = []
groceries.append("bread")

a = "bread" in groceries
a

In [None]:
type(a)

We can also **generate booleans** using the **greater than** or **less than** operators:

In [None]:
a = 6
a > 5

In [None]:
a <= 4

In [None]:
len("bacon") > 3

To test for **exact equality**, we can use the `==` operator.

In [None]:
a = 3
a == 3

This operator should be avoided with floating point numbers. Let's compute $\sqrt{2}$:

In [None]:
a = 2**0.5 #square root of 2

We know that $\sqrt{2}^2 = 2$, right?

In [None]:
print(a**2 == 2)

What happened here? 

Since we're dealing with floating point numbers, $\sqrt{2}$ is **not represented exactly**. We're only storing 16 digits of this irrational number. So when you square it, you're not getting exactly 2, but rather some number that is **very close** to 2.

In [None]:
print(a**2)

To test for equality in cases like this (any computation involving floating point numbers) it's necessary to see if the two numbers are "close enough", or within some very small **tolerance** of each other. 

In [None]:
tol = 1e-15
print(abs(a**2 - 2) < tol)

Here we've set the tolerance to $1\times 10^{-15}$. We then check that $|a^2 - 2|$ is less than this tolerance. 

There are three **logical operators** that can be **used with booleans**.

1) **`not`** - this returns the opposite of the boolean value

In [None]:
not len("bacon") > 3

2) **`and`** - this operator takes two booleans and returns `true` if they are both true, or `false` otherwise

In [None]:
len("bacon") > 3 and 5 > 1

In [None]:
len("bacon") > 3 and 1 > 5

3) **`or`** - this operator takes two booleans and returns `true` if at least one them is `true`

In [None]:
len("bacon") > 3 or 1 > 5

Note that this is considered an *inclusive or*, meaning that if both booleans are `true` it returns `true`. Occasionally, we may want an *exclusive or*, where we want to know if exactly one of the booleans is `true`. This can be acheived by using the syntax `x != y`.

In [None]:
a = len("bacon") > 3 
b = 5 > 1 
a != b

Consider a **compound boolean expression**:

In [None]:
x = 5
3 < x and x < 8

When the same value (in this case `x`) appears as an operand of the two booleans, these expression can be **chained** as follows:

In [None]:
3 < x < 8

This is a convenient syntax and can often be easier to read.

Understanding the relationship between **not**, **and**, and **or** is extremely important. Here they are summarized in a truth table:

![truth table](https://github.com/ag12s/CreateWithCodeModules/blob/main/images/truth_table.png?raw=true)

### Numeric and Boolean Operations Summary

![numeric operations](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/numeric_operators1.png?raw=true)

![numeric operators](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/numeric_operators2.png?raw=true)

# Data Structures

### The Range Class

Often, we will need a **sequence of integers**. Python has a built-in type called `range` to help construct such sequences. There are 3 different ways to construct one.  

In Python 2, `range` is a function that returns a `list`. In Python 3, `range` is its own type, but you can **convert it to a `list`** if you like by passing it as an argument to the `list` constructor.

`range(stop)` constructs a **sequence of integers starting at 0** and going up to, but **not including**, the integer represented by `stop`.

In [None]:
range(5)

In [None]:
list(range(5))

`range(start, stop)` constructs a sequence of integers **starting at `start`** and going up to but not including `stop`.

In [None]:
range(5,8)

In [None]:
list(range(5,8))

`range(start, stop, step)` constructs a sequence of integers starting at `start` and **advancing by `step`** so long as it does not reach or pass `stop`

In [None]:
range(5,10,2)

In [None]:
list(range(5,10,2))

Note that `step` can be **negative**:

In [None]:
list(range(10,5,-1))

### Tuples

A *tuple* is used to represent an **immutable sequence of objects**. Immutable objects can be stored more efficiently than mutable objects, but they are more limited.

The primary purpose of a tuple is to **encapsulate multiple pieces of information into a single object**. For example, it is common to represent colors by 3 separate values that describe the intesity of red, green, and blue. We could store all this information separately, but a better way is to store this information in a single tuple.

In [None]:
skyBlue = tuple((136, 207, 236))
type(skyBlue)

As with lists, we can **construct a tuple** without using the **tuple constructor**:

In [None]:
skyBlue = (136, 207, 236)

Tuples **support all of the nonmutating behaviors of lists**, with the **exception** of `count` and `index`, and they *do* support indexing.

Tuples can be **concatenated** with the "+" operator:

In [None]:
tuple_1 = ("hello", "hi")
tuple_2 = (0, 255, 0)

print(tuple_1 + tuple_2)

Tuples can also be **converted to a list** by passing them in to the `list` constructor:

In [None]:
tuple_list = list(tuple_2)
print(tuple_list)

Lists can be **converted to tuples** using the `tuple` constructor:

In [None]:
print(tuple(tuple_list))

### Dictionaries

Dictionaries represent a **mapping** from a set of objects known as *keys* to associated objects known as *values*. For example, we may want to store the latitude and longitude of a collection of cities.

- Auckland: (-36.52, 174.45)
- Berlin: (52.30, 13.25)
- Cairo: (30.20, 31.21)
- Havanna: (23.80, -82.23)
- New York: (40.47, -73.58)

Dictionaries provide a convenient way to store this information. Here, the name of the city acts as the **key** and a tuple representing the latitude and longitude is the **corresponding value**.

As with lists and tuples, there are two ways to initialize a dictionary. The first is to use the standard **constuctor syntax**:

In [None]:
latLong = dict()
type(latLong)

This results in a new dictionary object that initially contains zero entries.

We can add key-value pairs to our dictionary using an **assignment syntax**:

In [None]:
latLong["Auckland"] = (-36.52,174.45)
latLong["Berlin"] = (52.30, 13.25)
latLong["Cairo"] = (30.20, 31.21)
latLong["Havanna"] = (23.80,-82.23)
latLong["New York"] = (40.47,-73.58)

print(latLong)

Once we've added items to the dictionary, we can **retrieve** them:

In [None]:
latLong["Havanna"]

If we ask for a key that is not in the dictionary, we get an error:

In [None]:
latLong["Tallahassee"]

Another way to initialize and populate a dictionary is through the **literal form**. The literal form uses **curly braces**, in contrast to the square braces [ ... ] used by lists or the round braces ( ... ) used by tuples.

In [None]:
latLong = {"Auckland":(-36.52,174.45),"Havanna":(23.8,-82.23),"Berlin":(52.3,13.25),\
               "New York":(40.47,-73.58),"Cairo":(30.2,31.21)}
print(latLong)

The **keys** in a dictionary **must be unique**. If we attempt to add a **duplicate** key, it **updates the value already stored** for that key.

In [None]:
latLong["New York"] = (23.7,-82.23)
print(latLong)

Keys must be **immutable**. So, for example, `int`, `str`, or `tuple` would be accpetable keys, but `list` would not.

**Values**, on the other hand, can be of **any type** (mutable or immutable) and can be **repeated** as often as you like.

### Data Structure Command Summary

![dictionary methods](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/dictionary.png?raw=true)

# Functions

Pure functions exist as methods that can be **called outside the context of a particular class**. For example, we've already seen the `abs` and `len` functions. Remember that we call `abs(a)`, not `a.abs()`. Python provides many built-in functions.

![functions](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/functions.png?raw=true)

In [None]:
pow(2,3)

In [None]:
min(-2,-2.4,9.0)

In [None]:
ord("a")

### Print Function

After performing a computation, we may need to see the result. In an interactive or notebook session, we can simply type the name of the variable to see what it is:

In [None]:
a = 1
a

When **executing scripts**, however, we can no longer do this. Instead, we use the `print` command (in fact, we've already used this several times). The `print` function is used to **print information to the console**:

In [None]:
print(a)
print("a")

Note that the Python 2 syntax, `print a` (no parentheses), does not work in Python 3. The syntax above (**with parentheses**) works in both Python 2 and Python 3.


Of course, `print` can be used to display more useful information. For example, suppose we have a variable called `t` that represents some length of time in seconds. We can use `print` to display not only `t`, but also the accompanying units by calling `print` with **multiple arguments**.

In [None]:
t = 5.6
print(t, "s")

The `print` function **automatically inserts a space** between the two arguments. If we wish the avoid this, we can combine the two arguments into one:

In [None]:
print(str(t) + "s")

Note that we have to **convert `t` to a string** before "adding" (the correct term is *concatenating*) it to `s`.

In [None]:
print(t+"s")

The previous command is illegal because we are attempting to add a `float` to a `str`.

Another option is to use **f-string formatting**:

In [None]:
print(f"{t}s")

# Modules

There are hundreds of other useful functions and classes that have been developed for Python which are not automatically loaded, but instead placed into **specialized libraries** called *modules* that can be **individually loaded as needed**.

If you installed Python using Anaconda, you will already have many modules installed.

### Math Module

The math module provides functions to do **mathematical operations** beyond addition, multiplication, exponentiation, etc. For example, suppose we want to take the cosine of a number. This is **not built-in to Python**, but a function called `cos` in the **math library** does this.

To use the `cos` function, we must import it from the math module. There are three possible ways to do this. Whichever method you use should be done **at the beginning of your code** (this is not necessarily required, but it is the most common placement -- at the very least, you have to do this *before* you use the function):

1) Import the entire module

In [None]:
import math

At this point, we still cannot use the `cos` command directly. We must tell Python exactly **where the function is coming from** using a *qualified name*.

In [None]:
cos(2)

In [None]:
math.cos(2)

2) Specifically importing `cos` from the math library

In [None]:
from math import cos
cos(2)

3) Import everything in the module by using the \* wildcard

In [None]:
from math import *
cos(2)

In [None]:
sqrt(2)

This last option can be attractive, but is generally **discouraged** because different modules may use the same name for different functions. This method of importing imports not only the `cos` function, but all functions from the math library (e.g. `sqrt` and `tan`).

### Expressions

It is quite common to **perform several operations** as part of **a single expression**.

In [None]:
a = 18 + 5.5 + 1
a

Behind the scenes, Python adds 18 and 5.5 to get 23.5, it then adds 1 to get 24.5. In this case, the order of the two operations does not matter. In more complicated expressions, the **order can be important**.

In [None]:
a = 18*9**2/4
a

In [None]:
a = 9/4**2*18
a

#### Precedence

When there are two or more operations as part of an expression, we must figure out some way to **determine which operation is performed first**. We say that an operation performed first is given *precedence* over the others.

Mathematical expressions in Python follow **standard algebraic conventions**:

1. Brackets
2. Exponents
3. Division/Multiplication
4. Addition/Subtraction

In the expression `1 + 2 * 3`, the multiplication is done first, followed by the addition. 

In Python, as in algebra, we can **use brackets to prioritize an operation**.

In [None]:
1+2*3

In [None]:
(1+2)*3

Most operations with **equal precedence** are **evaluated left to right**, again to mimic standard algebraic rules.  

One **exception** is **exponents** which are **evaluated right to left**, which again is how we typically think of exponents.

$4^{3^2} = 4^9$

In [None]:
4**3**2

In [None]:
4**(3**2)

In [None]:
(4**3)**2

Precedence rules are **enforced for all data types**. 

In [None]:
"a"*3+"b"

### Calling Functions Within Expressions

**Function calls** have **high precedence**. When multiple function calls are used in the same expression, they are typically **evaluated from left to right**.

In [None]:
person = "George Washington"
person.split()[1]

More complicated expressions are evaluated by **first resolving commands inside parentheses**.

In [None]:
groceries = ["cereal", "milk", "apple"]
groceries.insert(groceries.index("milk") + 1, "eggs")
groceries

Here, we first evaluated `groceries.index("milk")`, then added 1 to it to find the index where we wish to insert "eggs".

# NumPy

The `NumPy` module (http://www.numpy.org/) is an almost indispensible module for scientific computing. It provides objects such as arrays and matrices as well as functions spanning linear algebra, fourier transforms, and statistics among numerous other things. 

To start, we must import NumPy. To reduce the amount of typing for ourselves later, we will rename the module `np` when we import it. Using `np` as shorthand for NumPy is a standard convention in Python programming.

In [None]:
import numpy as np

### Arrays

One important class that NumPy provides is the `array` class. An **array** is **similar to a `list`** in that it is a **collection of objects**.

NumPy arrays can be **initialized** in a similar way to lists. Below, we pass a list to the np.array() function to create our array object.

In [None]:
a = np.array([1,2.0,3.2])
print(a)
type(a)

You'll notice the type of `a` is `numpy.ndarray`. NumPy arrays can be **multidimensional**. You can think of a 1D array as a kind of **list** (but not a Python list) and a 2D array as a kind of **grid** or **table** of values. Higher dimensional arrays are certainly possible, you can think of a 3D array as a **stack of grids**. 

![np array](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/np_array.jpg?raw=true)
(Image credit: Dalesha Hemrajani)


The attribute `ndim` stores the **number of dimensions** in the array.

In [None]:
a.ndim

Multidimensional arrays can be initialized as an **array of arrays**.

In [None]:
b = np.array([[1, 2, 3.0], [1.2,2.2,2]])
print(b)

In [None]:
print(b.ndim)

The `shape` property tells us **how many rows and columns** are in our array, while the `size` property tells us the **total number of elements** in the array.

In [None]:
print(b.shape)
print(b.size)

`b.shape` is the tuple (2,3), meaning that `b` has 2 rows and 3 columns.

We could **initialize an array** as an **array of arrays** of **different sizes**. 

In [None]:
c = np.array([[1,2],[3,4,5,6]])

This is perfectly valid. What is the dimension, size, and shape of `c`?

In [None]:
print(c.ndim)
print(c.shape)
print(c.size)
print(c)

The way we are initializing `c`, it looks like we are trying to make a multidimensional array with the first row being [1,2] and the second row being [3,4,5,6]. Since the lengths of these two rows are unequal, we cannot make a grid out of them. 

Python can recognize this and instead of making an array of dimension 2, it creates an **array of dimension 1**. Instead of having 6 elements, it only has 2. **Each of the elements is a Python list**.

#### Indexing

Like lists and strings, arrays are **0 indexed**, meaning the **first entry in an array is at index 0**. 

Two-dimensional arrays have rows and columns. The entry at index [0,0] (first row, first column) is located at the upper left hand corner of the array. 

![row map](https://github.com/lukasbystricky/ISC-3313/blob/master/lectures/chapter2/images/row_column.gif?raw=true)

Like lists and strings, arrays support **indexing** and **slicing**.

In [None]:
a = np.array([1,2.0,3.2])
a[0] # first entry in a 

In [None]:
b = np.array([[1, 2, 3.0], [1.2,2.2,2]])
b[1] # second entry in b, each entry is a row

When we have a multidimensional array (or an array of arrays of equal or unequal length) we can **access the element at row i and column j** using the syntax:

In [None]:
b[1][0] # first element in the second row of b

Or, the equivalent syntax:

In [None]:
b[1,0]

Note that **arrays are mutable**. For example, we can **modify** an element of `b`:

In [None]:
b[0][0] = 8
print(b)

**Slicing** is done in exactly the same way:

In [None]:
a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
print(a)

In [None]:
print(a[0,0:2]) # first 2 entries of row 0

In [None]:
print(a[0:2,1:3]) # first 2 rows and columns 2 and 3

The **colon operator** by itself means the **entire row or column**:

In [None]:
print(a[:,1]) # entire second column

#### Operations

Arrays support several familiar operators. For example, you can **multiply** or **divide** them by a number:

In [None]:
a = np.array([1,2])
print(2*a)
print(a/2)

You can also **add** a number to them. 

It should be made clear, when you add/subtract/multiply/divide/exponentiate/etc. an array by a **single number** (known as a **scalar**) the operation is applied to **every** element of the array.

In [None]:
a = np.array([1,2])
print(a + 1)

You can even **add** two **arrays together**, so long as they are the **same size**.

In [None]:
b = np.array([3,4])
print(a + b)

Suppose we want to **multiply two NumPy arrays**, $a = [a_1,a_2,a_3]$ and $b = [b_1,b_2,b_3]$. How does NumPy do it?

In [None]:
a = np.array([1,2,3])
b = np.array([3,4,5])

print(a*b)

NumPy does what is called *element-wise multiplication*. In other words, `a*b` is equal to $[a_1 b_1, a_2 b_2, a_3 b_3]$.

Now suppose $A$ is a 2D array and $x$ is a 1D array. What is `A*x` in Python? 

In [None]:
x = np.array([1,2])
A = np.array([[3,2], [1,2]])

print(A*x)

It's still an **element-wise multiplication**. $A\mathbf{x}$ is defined to be:

$$ \begin{bmatrix} A_{11}x_1 & A_{12}x_2\\ A_{21}x_1 & A_{22} x_2\end{bmatrix}$$

We multiply each column of `A` by the **corresponding index** in `x`.

Likewise, if we call $A^2$, we get:

In [None]:
print(A**2)

which is 
$$ \begin{bmatrix} A_{11}^2 & A_{12}^2\\A_{21}^2 & A_{22}^2\end{bmatrix}.$$
We still get element-wise multiplication.

##### Arange

NumPy provides many useful operations to **generate and manipulate arrays**. 

For example, suppose we want to create an array ranging from 1 to 20. NumPy provides a function `np.arange` that does just that.

In [None]:
a = np.arange(1,21)
print(a)

The arange function works in a similar way to the `range` class we saw earlier. It can take in up to three arguments: a **starting value**, an **end value**, and a **step size**. It **returns a 1D array** that starts at the starting value and adds the step size until it reaches or exceeds the end value.

Unlike the `range` class, the starting, ending, and step values can be **floats**.


In [None]:
a = np.arange(1.1,2.0,0.1)
print(a)

In [None]:
a = np.arange(1,2.2,0.1)
print(a)

You'll notice that when we use floating point numbers, we **may or may not have the ending value as part of our array**. This is due to **floating point error**.

##### Linspace

The ambiguity in `arange` can cause problems. Fortunately, NumPy provides a separate function that **creates an array of a specified length**. The `np.linspace` function takes in a start value, end value, and **number of points**. Unless you are dealing with integers, `linspace` is preferred over `arange` to generate equally spaced arrays.

In [None]:
a = np.linspace(0,2.2,5)
print(a)

The last parameter in `linspace` is the number of desired points in the array. If the **start value** is **greater than** the **end value**, then `linspace` automatically takes **negative step sizes**.

In [None]:
a = np.linspace(5,1,5)
print(a)

##### Reshape

Suppose we wanted to create the array:
$$ A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ 7 &8&9\\10 & 11& 12\end{bmatrix}$$

What if we use `arange` or `linspace` to create a 1D array with the same data, and then **reshape** into an array with 4 rows and 3 columns? 

NumPy provides the function `np.reshape` that does just that.

In [None]:
a = np.arange(1, 13)
A = np.reshape(a, (4, 3))

print(A)

Note that the second argument to `np.reshape` is a tuple (r,c) where r is the number of rows and c is the number of columns we want the new array to have. 

##### Zeros

It can be useful to create an **array of zeros** to fill in with non-zero values later:

$$ A = \begin{bmatrix} 0 & 0 & 0\\ 0 & 0 & 0\\ 0 &0&0\\0 & 0& 0\end{bmatrix}$$

Again, NumPy saves the day with `np.zeros((n,m))`, where `n` is the number of rows and `m` is the number of columns of zeros that you want. We can even make 1D arrays of zeros with this function by doing `np.zeros(n)`.

In [None]:
a = np.zeros(5)
A = np.zeros((4, 3))
print(a,'\n')
print(A)

##### Sum and Mean/Average

We can get the **sum** and **mean**/average of an array using `np.sum(array)` and `np.mean(array)`, respectively. If the arrays are 2D or greater, you can **specify the `axis` over which the sum or mean is taken**. For example with a 2D array, `np.sum(A,axis=0)` would return a sum across the rows, `np.mean(A,axis=1)` would return the mean across the columns.

![axis](https://i.stack.imgur.com/Z29Nn.jpg)

In [None]:
a = np.arange(1, 13)
B = np.reshape(a, (4, 3))
print(a)

print("Sum  of a:", np.sum(a))
print("Mean of a:", np.mean(a))

print("\n", B)
print("Sum  of B:", np.sum(B))
print("Mean of B:", np.mean(B))
print("\nSum  across B rows:", np.sum(B,axis=0))
print("Mean across B rows:", np.mean(B,axis=0))
print("\nSum  across B columns:", np.sum(B,axis=1))
print("Mean across B columns:", np.mean(B,axis=1))

##### Max and Min

Finding the **largest** and the **smallest values** in an array is also very easy with NumPy! One simply needs to use `np.max(Array)` to find the largest value and `np.min(Array)` to find the smallest value.

For 2D or greater arrays, you can get the maximum/minimum values for a given **axis** of the array, e.g. `np.max(B, axis=0)` would get the maximum values across the rows of `B`, `axis=1` would give the maximum values across the columns.

In [None]:
a = np.arange(1, 13)
B = np.reshape(a, (4, 3))
print(a)
print("Max of a:", np.max(a))
print("Min of a:", np.min(a), "\n")
print(B)
print("Max of B:", np.max(B))
print("Max across B rows:", np.max(B,axis=0))
print("Min across B columns:", np.min(B,axis=1))

##### Where

Suppose we want to take all values of an array that are larger than 2.5 and make them zero. This is doable using `for` loops (we'll learn about these later) or simply doing it by hand.

However, `np.where()` can make this very easy. For our example, you would use `np.where(B < 2.5, B, 0.0)`. This translates to "for where in B it is less than 2.5, use the corresponding values in B, otherwise set to 0.0"

In [None]:
B = np.reshape(np.arange(1, 13), (4, 3))
print(B,"\n")
print(np.where(B < 2.5, B, 0.0), "\n")
print(np.where(B > 2.5, 0.0, B), "\n")

##### Stack

The `stack` method in NumPy allows us to **stack together multiple arrays** in a **larger**, **higher dimensional array**. Suppose we had three 1D arrays of length 3 each and we wanted to stack them on top of each other to form a single, (3 by 3) 2D array, `np.stack` would provide us the means to do this.

In [None]:
a = np.array([1,1,1])
b = np.array([2,2,2])
c = np.array([3,3,3])
print("a = ", a)
print("b = ", b)
print("c = ", c)
D = np.stack((a,b,c),axis=0)
print("\nD = \n",D)

#### Mathematical Operations

The `math` module only works on numbers -- if we pass in a NumPy array, we will get an error.

NumPy provides its own implementations of many mathematical functions that take in arrays and perform operations on each element.

In [None]:
a = np.array([0,1])
print(np.sin(a))

The fact that the NumPy and `math` modules provide functions with the same names demonstrates why it is a **bad idea to import everything at once from a module**.

#### Arrays vs. Lists

Arrays and lists are **similar** in many ways. Both represent a **collection of objects**. When should you use one over the other? 

Arrays are **mutable**, but they do not support methods such as `pop` or `append`. Once initialized, the **size of an array cannot be easily changed**. If your application needs to **change the size** of a collection, **lists** are the preferred option.

Another difference between arrays and lists is **how operators are defined**. We saw earlier how we can add two arrays together or multiply them by a number. This behaviour is different from how it is handled with lists:

In [None]:
a_array = np.array([1,2])
b_array = np.array([3,4])

a_list = [1,2]
b_list = [3,4]

print("a_array + b_array:", a_array + b_array)
print("a_list + b_list:  ", a_list + b_list)

print("2*a_array:", 2*a_array)
print("2*a_list: ", 2*a_list)

It is possible to **convert** from a list to an array or vice versa. NumPy arrays provide the method `tolist()` which converts an array to a list.

In [None]:
c = a.tolist()
type(c)

NumPy also provides the function `asarray`, which takes a list and returns an array.

In [None]:
d = np.asarray(c)
type(d)

# For Loops

We often need to **repeat a series of steps** for **each item of a sequence**. Such a repetition is called an *iteration* and can be expressed using a **control structure** known as a **for loop**. 

A for loop always begins with the following syntax:

    for identifier in sequence:

This is **followed by a block of code** we call the *body* of the loop. 

For example, given a list of names, we can print each name on its own line using a for loop:

In [None]:
guests = ["Kirk", "Spock", "Bones", "Scotty", "Uhura", "Sulu", "Chekov"]
for person in guests:
    print("Hi", person)

`guests` is the **sequence**, and `person` is called the **identifier** or *loop variable*. A sequence can be any object that represents a sequence of elements, e.g. a list, an array, a range, or a tuple. 

The body itself (in this case, the command `print("Hi" person`) specifies the commands to be exectuted each iteration. The **body must be indented**, although the precise amount of indentation depends on the programmer. It is important to be **consistent** though, so we highly recommend using the **tab key**.

#### Whitespace

In [None]:
a = 1
a =        1

The two statements above are equivalent expressions.

In [None]:
a = 1
print(a)

In [None]:
a =        1
print(a)

However, if we tried to do:

In [None]:
a = 1
    a = 1

we get an error. 

The only whitespace that matters in Python is the **indentation** (in other words the whitespace at the **very left of the statements**). Python looks at *blocks* of code. **Lines that are part of the same block must have the same indentation level**. 

A for loop (or any other control structure) is its own block. This means that all code inside a for loop must be at the same indentation level. 

    command outside for loop
    for something:
        command inside for loop
        command inside for loop
        command inside for loop
    command outside for loop

#### Index-Based Loops

The `range` object can be thought of as a **list of integers**. This can be very convenient for iteration. For example, the following code produces a countdown for a rocket launch:

In [None]:
for count in range(10,0,-1):
    print(count)    
print("BLASTOFF")

We can also uses `range` to serve as a **sequence of valid indices of a list**. For example, let's say we wanted to number each guest. We could do that using the following code:

In [None]:
for i in range(len(guests)):
    print(str(i+1)+".", guests[i])

The `range` object is essentially a list starting at 0 and going up to, but not including, the length of the guest list. In our case, it is the list [0,1,2,3,4,5,6]. Since we want our displayed numbers to start at 1 and go up to 7 we have to add 1 to i before we print it.

**Index-based looping** is very useful for tasks that require an **explicit knowledge of the position of an element within a list**.

#### Nested Loops

The body of a loop can contain several statements. It can even **include other loops**. This technique of using one control structure within the body of another is called *nesting*.

Consider a 2D NumPy array. Let's print each number in the array on its own line, along with its row and column:

In [None]:
import numpy as np
A = np.array([[1.5,2.1],[3.1,3.8]])

for r in range(A.shape[0]):
    for c in range(A.shape[1]):
        print("row",r,"column",c,"is",A[r,c])

The loop over `r` is called the *outer loop*. The body of the outer loop contains an *inner loop* over `c`, and inside its body we execute a print command.  

If we look at this code sequentially it is equivalent to:

In [None]:
r = 0
c = 0
print("row",r,"column",c,"is",A[r,c])
c = 1
print("row",r,"column",c,"is",A[r,c])
r = 1
c = 0
print("row",r,"column",c,"is",A[r,c])
c = 1
print("row",r,"column",c,"is",A[r,c])

#### List Comprehension

Python supports a syntax to do tasks called *list comprehension* using the form:

     result = [expression for identifier in sequence]
gives us a compact way to modify all the elements in a list.

List comprehensions can be a little faster for a computer to complete than the `for` loop structure we first looked at. However, list comprehensions aren't always the best option for repeated sequences with a complex list of operations needed for each cycle of the sequence.

In [None]:
guests = ["Kirk", "Spock", "Bones", "Scotty", "Uhura", "Sulu", "Chekov"]
guests = [person.lower() for person in guests]
print(guests)

# Conditional Statements

#### If

An important control structure is the *conditional statement*, more commonly known as an *if statement*. This structure allows us to specify a block of instructions that are to be executed **only when a certain value is true**. 

This is an extremely valuable tool, as it allows the execution to vary depending on values that are not necessarily known until the program is running.

The condition can be an arbitrary **boolean expression**. This may be a single variable of type `bool` or a more complex expression that evaluates to a boolean (actaully any expression can be evaluated to a boolean in Python, but this is generally discouraged as it makes code hard to read).

In [None]:
string = "is this a long string?"
if len(string)>20 and "s" in string:
    print("This is a long string, and it contains the letter s!")

In [None]:
string = "how about this?"
if len(string)>20 and "s" in string:
    print("This is a long string, and it contains the letter s!")

In this case, the first string is longer than 20 **and** contains the letter s, while the second string also contains the letter s but is still shorter than 20.

![if statement flow chart](https://github.com/ag12s/CreateWithCodeModules/blob/main/images/if_flowchart.png?raw=true)

As with for loops, the **body** of an if statement can **contain any valid Python code**. This includes other if statements or for loops. For example, we could have a **nested if statement**:

In [None]:
string = "how about this?"
if len(string)>20:
    if "s" in string:
        print("This is a long string, and it contains the letter s!")

#### Else

We use `if` to specify steps that are executed when a given condition is true. An **alternative set of steps** can be expressed as an `else` clause, to be **executed when the condition is false**.

In [None]:
string = "how about this?"
if len(string)>20:
    print("This is a long string!")
else:
    print("This is not a long string :-(")

![if else statement flow chart](https://github.com/ag12s/CreateWithCodeModules/blob/main/images/ifelse_flowchart.png?raw=true)

#### Elif

Sometimes, we may have **more than 2 possible outcomes**. For example:

In [None]:
string = "how about this?"
if len(string)>20:
    print("This is a long string!")
else:
    if (len(string) > 10):
        print("Sort of...")
    else:
        print("This is not a long string :-(")

Here we first check if the string is longer than 20 characters, if it's not we check if it's longer than 10 characters and if it's not we print "This is not a long string :-(". 

An alternative syntax is to use the `elif` command:

In [None]:
string = "how about this?"
if len(string)>20:
    print("This is a long string!")
elif (len(string) > 10):
    print("Sort of...")
else:
    print("This is not a long string :-(")

The advantage of this syntax is that it doens't require us to keep indenting for each new if statement.

#### Break

The `break` command, when **called from inside a loop**, causes an **immediate stop** to the entire loop. The current iteration is interrupted and control skips beyond the end of the loop. Typically, we call `break` if a condition has been met.

In [None]:
for i in range(10,0,-1):
    if i > 5:
        print(i)
    else:
        break

#### List Comprehension

Recall that this used the syntax:

     results = [expression for identifier in sequence]

Python supports adding an **optional condition** by employing the syntax:

     results = [expression for identifier in sequence if condition]

This expression is evaluated as:

    result = []
    for identifier in sequence:
        if condition:
            result.append(expression)
            
For example:

In [None]:
numbers = [1, 3, 5, 19]

numbers = [2*n for n in numbers if n < 10]
print(numbers)

# While Loops

We may need to express **repetition** even though we **cannot know the precise number of iterations in advance**. To do this we use a *while loop* with the syntax:

    while condition:
        body

As with if statements, the **condition** can be an **arbitarary boolean expression** and the **body** is an **indented block of code**.

![while statement flow chart](https://github.com/ag12s/CreateWithCodeModules/blob/main/images/while_flow.png?raw=true)

We can make **index-based while loops** by employing a **counter**:

In [None]:
guests = ["Kirk", "Spock", "Bones", "Scotty", "Uhura", "Sulu", "Chekov"]

i = 0
while i < len(guests):
    print(str(i + 1) + ".", guests[i])
    i += 1

Note that we must initialize the counter, `i`, *outside* the while loop and manually **increment** it each **iteration**.

#### Infinite Loops

When working with a while loop, the **number of iterations is not explicitly bounded**, but determined based on the **loop condition**. 

This can lead to a serious problem known as an *infinite loop*. In infinite loops the **condition never changes to false**, so the loop keeps repeating itself **indefinitely**.

If you run code and encounter an infinte loop you can exit it by typing control-c.

To avoid infinite loops, often we add a **counter** to keep track of the** number of iterations**. If we reach a **maximum number of iterations**, the loop **condition becomes false** and the loop **terminates**.

In [None]:
counter = 0
while counter < 10:
  print("Hello")
  counter += 1 # Remember that this is the same as counter = counter + 1

# Making Functions

Functions are the most general purpose **control structure**. Functions are **declared** with the **keyword `def`** (short for define). It is followed by the **name** of the function. **Parentheses** enclose a series of **parameters** that are **passed in by the caller**. If a function does not require any parameters, there must still be **opening and closing parentheses**. Finally an **indented body** contains the code that is executed when the fucntion is called.


    def function_name( parameters) :
        body
    

A **function name** follows the **same rules as a variable name**. It must only consist of **letters**, **digits**, and **underscores**, and it **cannot start with a digit**. It also **cannot be a reserved word**, for example "`class`".

We are free to choose the **names of the parameters**. The name we give a parameter is known as a *formal parameter*; it serves as a **placeholder for a piece of information from the caller**, known as the *actual parameter*. We cannot assume to know the name of the variable name used by the caller. In general, formal parameter names should be chosen in such a way as to suggest their meaning.

The **body** of a **function** can be **any valid Python code**. This includes loops and if statements. 

A difference between the body of a function and that of loops and if statements is that any code in the body of a function has **local scope**. This means that **variables inside a function do not interact with variables outside the function**.

**Local scope** is good because it means that we do not have to worry about what variable names are used by the caller and the caller doesn't have to worry about what variable names are used in the function. 

An **exception** to the local scope rule is **modules**. Modules that are loaded outside the function can still be used inside the function.

Because of local scope, the **caller cannot access variables defined inside the body of a function**. In order to retrieve any information from a function, the function should have a **return statement**. 

When a return statement is called, the function **ends**. Note that this does not mean that a return statement must be the last statement in a function. Sometimes you may want to have a conditional statement that returns different values to the caller depending on some condition.

Sometimes, you may like for a function to **return multiple values**. In this case, it is customary to **return a tuple** of values.

#### Flow of Control

When a function is called, whether it be a built-in function or one that you defined, control passes directly to the body of the function. The **body** of a function **can call other functions**. Once a function finishes, control is passed back to whoever called it.

**Before** the user calls a function, it must be **defined**. For example, the following code will not work because we are trying to call the function `fib` before it is defined:

In [None]:
print(fib(5))

def fib(N):
    # create list of Fibonacci numbers, append first two numbers
    fib = []
    fib.append(0)
    fib.append(1)

    # loop from 2 to N-1 (giving you N total terms)
    for i in range(2,N):
        # apply recurrance relation
        fib.append(fib[i-1]+fib[i-2])
    print(fib)

#### Optional Parameters

We've seen some built-in functions that take in **optional parameters**. For example, consider the countdown function:

In [None]:
def countdown():
    for c in range(10,0,-1):
        print(c)
        
    print("BLASTOFF")
    
countdown()

Suppose we wanted to **let the user select the starting value**. The starting value of 10 would then be called the **default parameter**.

In [None]:
def countdown(start = 10):
    for c in range(start,0,-1):
        print(c)
        
    print("BLASTOFF")
    
countdown(5)

In [None]:
def countdown(start = 10):
    for c in range(start,0,-1):
        print(c)
        
    print("BLASTOFF")
    
countdown()

We can have **multiple optional parameters**:

In [None]:
def countdown(start = 10, message = "BLASTOFF"):
    for c in range(start,0,-1):
        print(c)
        
    print(message)
    
countdown(5,"YIPPIE!")

The user can specify which parameter(s) to set:

In [None]:
countdown(message="YIPPIE!")

#### Lamda Functions

Functions that are only **one line** can be **defined inline** using **lambda functions**. Lambda functions only make sense when the function is actually something that can be defined in a single line. They **should not contain loops or conditional statements**.

#### Recursive Functions

There are some problems in mathematics and science where the method involved in solving that problem is done by **applying a set procedure in the middle of the very same procedure**! I know, sounds a bit confusing, doesn't it? A simple example of this is the factorial (!) function in mathematics where a number is multiplied by all of the integers less than it, excluding 0 and negative numbers:

- 0! = 1         = 1
- 1! = 1         = 1
- 2! = 2x1       = 2
- 3! = 3x2x1     = 6
- 4! = 4x3x2x1   = 24
- 5! = 5x4x3x2x1 = 120

Take a moment to think about the above examples given in order of 1! through 5!... there seems to be a pattern, in fact, we could rewrite the above list as:

- 0! = 1
- 1! = 1
- 2! = 2x1!
- 3! = 3x2!
- 4! = 4x3!
- 5! = 5x4!

That is, for a given integer *N*, it's factorial solution is nothing more than that *N* multiplied by the factorial solution of the integer *N-1*:

- N! = N x (N-1)!

We can exploit this pattern with **recursion**, the act of **using a procedure within itself**. Fortunately, Python supports recursive functions, so we can define our function to call itself as many times as needed to compute the factorial of any given integer.

In [None]:
def factorial_recursive(n):
  if n == 0:
    return 1
  else:
    return n * factorial_recursive(n - 1)

for n in range(6):
  print(factorial_recursive(n))
print("Voila! We have a recursive factorial function!")

# Plotting

To display plots in the notebook, we include the following directive at the top of the notebook (before the plots):

In [None]:
%matplotlib inline

We will import `matplotlib.pylot` module. To avoid lots of typing later on, we will call this module `plt`:

In [None]:
import matplotlib.pyplot as plt

#### Plotting Basics

The `plt.plot` command **creates a plot**:

In [None]:
plt.plot([1,2,4,3])
plt.show() 

You'll note that the x-axis ranges from 0 to 3 while the y axis ranges from 1 to 4. If you provide a **single list** or **array** to the `plot` command, `matlplotlib` assumes it is a sequence of y values and automatically generates the x values for you. The default x values start at 0 (the leading index for Python arrays) and has the same length as y. Therefore in this example x gets the values [0,1,2,3]. The plot above connects the points (0,1), (1,2), (1,4) and (1,3).

The plot command can take in an **arbitrary number of arguments**. For example, if you have a specific set of x values, you can input a list (or array) of x value in addition to the y values:

In [None]:
plt.plot([1,2,8,9], [1,2,4,3])
plt.show()

After the data, an **optional argument** to the plot command is a **string** which indicates the **color and type** of the plot.

The color is specified by one of the following characters:
* 'b' - blue (default)
* 'r' - red
* 'g' - green
* 'c' - cyan
* 'y' - yellow
* 'm' - magenta
* 'k' - black
* 'w' - white

To plot the individual data points, you can specify one of the following characters:
* 'o' - points plotted as circles
* 's' - points plotted as squares
* '^' - points plotted as triangles
* '.' - points plotted as dots

The line style can can be one given as one of:
* '-'  - continuous line (default)
* '--' - dashed line
* ':'  - dotted line

In [None]:
plt.plot([1,2,8,9], [1,2,4,3], "ro")
plt.show()

You may want to make the plots look better or easier to understand. For starters, the last plot had some data points on the edges of the plot. There is a command called `xlim` that takes in a list of 2 numbers: a **lower bound** for x and an **upper bound** for x, and sets the x axis to these values. The command `ylim` does the same thing for the y axis.

In [None]:
plt.plot([1,2,8,9], [1,2,4,3], "g^--")
plt.ylim([0,5])
plt.show()

All good plots should also have **axis labels**. The commands `xlabel` and `ylabel` do just that. The `title` command provides a **title**.

In [None]:
plt.plot([1,2,8,9], [1,2,4,3], "-ro")
plt.xlim([0,10])
plt.ylim([0,5])
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("A simple plot")
plt.show()

Plots can also have **legends**. The command `plt.legend()` creates one. In Python, there are two ways to populate them:

1. Pass an argument `label` to the plot command
2. Add a list of names directly to the `legend` command

In [None]:
plt.plot([1,2,8,9], [1,2,4,3], "-ro", label="red line with circles")
plt.xlim([0,10])
plt.ylim([0,5])
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("A simple plot")
plt.legend()
plt.show()

In [None]:
plt.plot([1,2,8,9], [1,2,4,3], "-g^")
plt.xlim([0,10])
plt.ylim([0,5])
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("A simple plot")
plt.legend(["green line with triangles"]) # This command takes in a list of names
plt.show()

#### Plotting Functions

Say we wish to plot the function $y(t) = t^2e^{-t^2}$ for $0 \leq t\leq 3$. To do this, we will **generate a number of equally spaced coordinates** in $t$. Then we can **evaluate** $y(t)$ at each of those points. We will use the `linspace` command from the NumPy module.

In [None]:
import numpy as np # Import the NumPy module

N = 20

# Create N equally spaced points on the interval [0,3]
t = np.linspace(0,3,N)
print(t)

To evaluate the function $y(t)$ at all these points, we can plug this array into $y$.

In [None]:
y = t**2*np.exp(-t**2)
print(y)

To plot $y$ vs $t$ we simply call `plot`:

In [None]:
plt.plot(t,y)
plt.xlabel("t")
plt.ylabel("y(t)")
plt.show()

Alternatively, we don't have to create `y` at all. The **function equation can be evaluated directly in the argument list** of the `plot` command:

In [None]:
plt.plot(t, t**2*np.exp(-t**2))
plt.xlabel("t")
plt.ylabel("y(t)")
plt.show()

#### Multiple Plots in the Same Figure

Oftentimes, we will want **multiple plots** on the **same figure**. This can be done by calling successive plot commands. It should be noted that one can generate plots without specifying a color and matplotlib will choose from a default color sequence.

In [None]:
plt.plot(t, t**2*np.exp(-t**2), 'm-')
plt.plot([1,2,8,9], [0.1,0.2,0.4,0.3], "g^--")
plt.xlabel("t")
plt.ylabel("y(t)")
plt.legend(["function","points"])
plt.show()

#### Scatter Plots

In [None]:
x = np.random.random(100)
y = np.random.random(100)
plt.scatter(x, y**2*np.exp(-y**2))

plt.xlabel("t")
plt.ylabel("y(t)")
plt.title("A basic scatter plot of random data")
plt.show()

Some **optional parameters** for `scatter` include `c = an array of values` which will **color-code the indivdual scatter points based on the values given to `c`**, or the parameter `alpha = some number` that controls the **transparency** of the scatter points. Additional useful parameters include `s = some number` for the **size** of the scatter points and `marker = some string character` for changing the **style** of scatter point used.

The `scatter` point styles include:
* 'o' - points plotted as circles
* 's' - points plotted as squares
* '^' - points plotted as triangles
* '.' - points plotted as dots

There are also a wide variety of other symbols that can be chosen.


In [None]:
x = np.random.random(250)
y = np.random.random(250)
z = x**2 * np.exp(-y**2)
plt.scatter(y, z, color='r', marker='+')
plt.scatter(x, z, c=z, s=100.0, alpha=0.25)

plt.xlabel("t")
plt.ylabel("y(t)")
plt.title("Transparent colorful points on top of red crosses")
plt.show()

#### Subplots

Sometimes, we wish to have **multiple individual plots in a single figure**. The `subplot` command lets us create a figure with $r$ rows and $c$ columns of plots.

Calling `plt.subplot(r,c,p)` creates a figure (if it doesn't already exist) containing a tiling of $r$ rows of plots and $c$ columns. Any plotting commands you call will be placed in entry $p$. The plots are numbered from 1 to $r\times c$, starting in the top left and proceeding left to right and then top to bottom.

To switch to another plot, call `plt.subplot` again with the new position. 

In [None]:
plt.subplot(2,1,1)
plt.plot(t, t**2*np.exp(-t**2))
plt.title("Top plot")
plt.xlabel("t")
plt.ylabel("y(t)")

plt.subplot(2,1,2)
plt.plot([1,2,8,9], [1,2,4,3], "-ro")
plt.axis([0, 10, 0, 5])
plt.xlabel("x values")
plt.ylabel("y values")
plt.title("Bottom plot")

# should add this to get correct spacing between subplots
plt.tight_layout()

#### Skimage, Imread, and Imshow

Skimage is a **library** which that contains **image processing** applications. We will use the `io` functions from it here to read images from a url: 

In [None]:
from skimage import io

When one looks at an image, one is essentially looking at a **table of values organized in rows and columns**. These values describe how the pixels of the image should look. For black and white images, these numbers merely represent the intensity of brightness of each pixel, giving darker and brighter regions with shades of grey in between. For color images, there will be a set of 4 numbers for each pixel: one for brightness and 3 for the values of red, green, and blue (RGB) colors that are mixed together to create the specific color of that pixel. 

We can use a nifty function called `imread` from `io` in Skimage to **obtain the table of values for an image** with `io.imread(string name to image location)`. In the code example below, we read in the data that represents the image we pointed the function to. The size of the resulting `tuplet object corresponds to the (height, width, number of channels). If the number of channels is three, our image has RGB values.

In [None]:
image = io.imread("https://github.com/ag12s/CreateWithCodeModules/blob/main/images/binary.jpg?raw=True")
print(image.shape)

We now have the image data from `imread`, but what about **re-visualizing the image**? We can use the `imshow` command from `plt` for that exact purpose!

The easiest thing to do is simply `plt.imshow(table of data)` to reveal the image whose data we previously read with `plt.imread()`:

In [None]:
plt.imshow(image)
plt.show()

# Pandas

Let's learn about a Python package called **Pandas**!

To make sure we have it installed, first run a *pip* command:

In [None]:
!pip install pandas

Then, we import the Pandas package (and NumPy, since we will use that, too):

In [None]:
import pandas as pd
import numpy as np

We import Pandas with the nickname "*pd*" because we may have to type the package name frequently while using it. Typing *pd* is just shorter than typing *pandas* every time, so this is something most users do.

#### Creating DataFrames

Let's create a **DataFrame**! To do that, we will first create arrays with some sample data about Marvel Cinematic Universe characters.

In [None]:
# Create an array with the names of characters
data1 = np.array(['Spider-Man','Loki','Hulk','Black Widow','Thor','Nick Fury','Iron Man'])

# Create an array with the number of MCU movies each character appears in
data2 = np.array([6,6,8,8,8,10,11])

# Create an array with the year each character first appeared in a MCU movie
data3 = np.array([2016,2011,2003,2010,2011,2008,2008])

# Create an array with their planet of origin
data4 = np.array(['Earth','Jotunheim','Earth','Earth','Asgard','Earth','Earth'])

In [None]:
# Combine the arrays into a DataFrame and assign category names
mcu_data = {'Name': data1, 'Movies': data2, 'Year': data3, 'Origin': data4}
df = pd.DataFrame(data = mcu_data)
df

#### Displaying DataFrames

Typing *df* uses Pandas' default print function. We can also write *print(df)*, but this will look different.

In [None]:
print(df)

To **preview** our DataFrame without outputting the entire thing, we use the command *head()*. This is very useful when we have a large DataFrame!

In [None]:
# Print the first 5 rows of the DataFrame
df.head()

To refer to a specific **column** in the DataFrame, we type the DataFrame's variable name and the column name:

In [None]:
df['Name']

In [None]:
# There is a difference between single and double brackets!
df[['Origin']]

We can also reference **multiple columns** at once:

In [None]:
df[['Name','Movies']]

If the column name is only one word, we can also do the following:

In [None]:
df.Name

The above code returns a Series. If we would rather have a NumPy array, we can do:

In [None]:
df.Name.values

To get all the column names, do *df.columns*. This also allows us to **set or change** the **column names**.

In [None]:
df.columns

And if we want to know the **data type** of each column, do:

In [None]:
df.dtypes

#### Editing and Manipulating DataFrames

In [None]:
# Example changing the column names
df.columns = ['First','second','3rd','fourth']
df.head()

In [None]:
# Changing the names back to how they were
df.columns = ['Name','Movies','Year','Origin']
df.head()

We can get **statistics** from our DataFrame such as the average, minimum, and maximum:

In [None]:
print('Average of numeric columns:\n', df.mean(), '\n')
print('Largest number of movies:', df['Movies'].max(), '\n')
print('Earliest year:', df['Year'].min(), '\n')

We can also **add data** to our DataFrame after it has already been created:

In [None]:
# Create a new variable to hold the information we want to add
new_mcu = {'Name': 'Captain America', 'Movies': 11, 'Year': 2011, 'Origin': 'Earth'}

# Add the information to our DataFrame
df = df.append(new_mcu, ignore_index = True)

# Print the new DataFrame
df

And we can **delete** rows or columns of data:

In [None]:
# Delete a row
df.drop(4)

In [None]:
# Delete a column
df.drop('Origin', axis = 1)

We can also **fill in missing data**! For example:

In [None]:
# Adding incomplete data to the DataFrame
new_mcu = {'Name': 'Happy Hogan', 'Year': 2008, 'Origin': 'Earth'}

# Add the information to our DataFrame
df = df.append(new_mcu, ignore_index = True)

# Print the new DataFrame
df

In [None]:
# Let's use fillna() to replace NaN values with the average (mean) of the other values in the column
movies_mean = df['Movies'].mean()

df.fillna(value = movies_mean)

Or, we can **delete rows** with **incomplete data**:

In [None]:
df = df.dropna()

df

What if we want to get rid of **duplicate data**? Let's try an example!

In [None]:
# Let's add a Loki duplicate:
new_mcu = {'Name': 'Loki', 'Movies': 6, 'Year': 2011, 'Origin': 'Jotunheim'}

# Add the information to our DataFrame
df = df.append(new_mcu, ignore_index = True)

df

In [None]:
# Right now, we can check how many unique names we have using nunique():
df['Name'].nunique()

In [None]:
# Now, let's drop the duplicate data (prune the variant?!)
df = df.drop_duplicates()

df

Dropping duplicates is a useful tool when, for example, you accidentally append a new row to your DataFrame twice. This way, you do not need to recreate the DataFrame or specify what row you want to drop (using *df.drop()*).

#### Filtering Data

We can **filter** our DataFrame by specific values or attributes. For example, let's filter out all the characters from Earth:

In [None]:
df.loc[df['Origin'] == 'Earth']

Or all character information with first movies after 2009:

In [None]:
# Get all first movies more recent than 2009
df.loc[df['Year'] > 2009]

We can also do this without the loc function:

In [None]:
df[df['Year'] > 2009]

There is also a function called *iloc* that does the same thing as *loc*, but using indices! We can use this to make the last exercise a bit easier:

In [None]:
df.append(df.iloc[[0]])

To get the **number** of characters whose first movie was after 2009 rather than the list of them (shown in the last cell), get the *shape* of the output:

In [None]:
df[df['Year'] > 2009].shape[0]

And we can have **multiple conditions**! Let's find entries for characters that have first movies before 2010 and no more than 8 movie appearances:

In [None]:
df[( df.Year < 2010 ) & (df.Movies <= 8 )]

#### Sorting Values and Indices

Maybe we want to **sort** the values somehow. Let's try sorting by year:

In [None]:
df = df.sort_values(by = ['Year'])

df

Notice in the DataFrame above that the indices are now **out of order**. To fix this, we can **reset** the **index values**:

In [None]:
df = df.reset_index(drop=True)

df

#### Combining/Adding Information to DataFrames

We can also **combine** DataFrames in a couple different ways. Below, let's try:

1.   **Concatenating** two DataFrames
2.   **Merging** two DataFrames

In [None]:
# To concatenate DataFrames, we need to have at least two DataFrames. Let's make a second MCU DataFrame!

data1 = np.array(['Falcon','Winter Soldier','Pepper Potts'])
data2 = np.array([6,8,7])
data3 = np.array([2014,2011,2008])
data4 = np.array(['Earth','Earth','Earth'])

mcu_data = {'Name': data1, 'Movies': data2, 'Year': data3, 'Origin': data4}
df2 = pd.DataFrame(data = mcu_data)
df2

In [None]:
# Now, we use concat() to combine the two DataFrames (df and df2)
df = pd.concat([df, df2]).reset_index()
df

In [None]:
# What if we have new information to add to our DataFrame for existing characters? Let's make some sample information!

# Create an array with the names of characters
data1 = np.array(['Spider-Man','Loki','Hulk','Black Widow','Thor','Nick Fury','Iron Man','Falcon','Winter Soldier','Pepper Potts'])

# Create an array with whether the character has super strength
data2 = np.array([True,True,True,False,True,False,False,False,True,False])

mcu_data = {'Name': data1, 'Super Strength': data2}
df3 = pd.DataFrame(data = mcu_data)
df3

In [None]:
# Use merge() to add the above data to the correct rows
df = pd.merge(df,df3,how = 'inner', on = 'Name')
df = df.drop(['index'],axis=1)
df

We can also **add a column** to the DataFrame by giving it a **name**!

In [None]:
df['Cool Character'] = True

df

#### Math with DataFrame Data

We can also do **math** with data stored in columns! Let's find the number of years since each character's first movie:

In [None]:
# The number of years since each character's first movie will be the current year minus the movie year
df['Years Since First Movie'] = 2021 - df['Year']
df

In [None]:
# We can also do math between two or more columns. Let's try finding the average number of movies per year for each character since their first movie until now:
df['Average Per Year'] = df['Movies'] / df['Years Since First Movie']
df

#### Iterating Over DataFrames

Lastly, let's go over how to **iterate** (or loop) through the rows in your DataFrame using iterrows():

In [None]:
for index, row in df.iterrows():
  print (row['Name'], '-->', row['Origin'])

#### Plotting with DataFrames

To **plot** the values in our DataFrame:

In [None]:
df.plot(kind = 'scatter', x = 'Year', y = 'Movies', title = 'First Appearance vs. Total Number of Appearances', color = 'b')

If we want to **color-code** the plot by planet of origin, we can split our DataFrame into **groups**. However, this will mean plotting with **Matplotlib**'s PyPlot instead of Pandas, so we must import this package:

In [None]:
import matplotlib.pyplot as plt

# Pandas combines parts of Matplotlib and NumPy, so using Matplotlib directly gives us some more options

In [None]:
# Group the DataFrame by planet of origin
groups = df.groupby("Origin")

# Plot each group on the same figure
for name, group in groups:
      plt.plot(group['Year'], group['Movies'], marker="o", linestyle="", label = name, alpha = 0.5)

# Show the legend and axis titles
plt.legend()
plt.ylabel('Movies')
plt.xlabel('Year')

#### Saving and Loading DataFrames

We can also use Pandas to **save our dataframe** to a CSV file, and **load it** back in from that later!

In [None]:
# Write data to a CSV file
df.to_csv('mcu_data.csv', index = False)

In [None]:
# Check what's in current directory
! ls

In [None]:
# Read data from a CSV file
mcu_df = pd.read_csv('mcu_data.csv')

mcu_df

#### Pandas Closing Remarks

Lastly, know that there are a ton of Pandas functions you can use! Very few people know all of them or know how to use them off the top of their heads -- most people will need to look it up. I use Pandas frequently, and I am constantly looking up which function to use or certain ways to use it! To do this, I reference the documentation, found [here](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html).

The list on the left has every function you might need. If you are not clear on how to use a function to fit your needs, don't be afraid to look it up! Sometimes, you will have to get creative depending on the complexity of what you are working on. Chances are someone else has posted the solution you need!