8 sections and 150 minutes means ~19 minutes per section, but some will take less time. 

Plus there will be some time to get people logged on and with a job allocated. 

# Use variables to store values

**Variables are names for values**

Variable names: 
* can **only** contain letters, digits, and underscore _ (typically used to separate words in long variable names)
* cannot start with a digit
* are **case sensitive** (age, Age and AGE are three different variables)
* should also be meaningful so you or another programmer know what it is

In Python, variable names that start with underscores like `__alistairs_real_age` have a special meaning so we won’t do that until we understand the convention.

In Python the = symbol assigns the value on the right to the name on the left.

The variable is created when a value is assigned to it.

Here, Python assigns an `age` to a variable age and a name in quotes to a variable `first_name`.

In [None]:
age = 42
first_name = 'Ahmed'

# Use `print` to display values

Python has a built-in function called `print` that prints things as text. Call the function (i.e., tell Python to run it) by using its name. Provide values to the function (i.e., the things to print) in parentheses. To add a string to the printout, wrap the string in single or double quotes. The values passed to the function are called **arguments**.

In [None]:
print(first_name, 'is', age, 'years old.')

`print` automatically puts a single space between items to separate them. And wraps around to a new line at the end.

# Python is case-sensitive

* Python thinks that upper- and lower-case letters are different, so `Name` and `name` are different variables.
* There are conventions for using upper-case letters at the start of variable names so we will use lower-case letters for now.

# Use meaningful variable names

* Python doesn’t care what you call variables as long as they obey the rules (alphanumeric characters and the underscore).
* Use meaningful variable names to help other people understand what the program does.
* The most important “other person” is your future self.

In [None]:
flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')

# Variables must be created before they are used

If a variable doesn’t exist yet, or if the name has been mis-spelled, Python reports an error. (Unlike some languages, which “guess” a default value.)

In [None]:
print(last_name)

The last line of an error message is usually the most informative.

# VARIABLES PERSIST BETWEEN CELLS

Be aware that it is the order of execution of cells that is important in a Jupyter notebook, not the order in which they appear. Python will remember all the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present. As an example, create two cells with the following content, in this order:

In [None]:
print(myval)

In [None]:
myval = 1

If you execute this in order, the first cell will give an error. However, if you run the first cell *after* the second cell it will print out `1`. To prevent confusion, it can be helpful to use the **Kernel -> Restart & Run All** option which clears the interpreter and runs everything from a clean slate going top to bottom.

# Variables can be used in calculations

We can use variables in calculations just as if they were values. Remember, we assigned the value 42 to age a few lines ago.

In [None]:
print(age)

In [None]:
age = age +3
print('Age in three years', age)

# Use an index to get a single character from a string

In Python, a string is a sequence of characters enclosed within single quotes ('...') or double quotes ("..."). Strings are used to represent text data and are immutable, meaning that once a string is created, it cannot be changed.

* The characters (individual letters, numbers, and so on) in a string are ordered. For example, the string 'AB' is not the same as 'BA'. Because of this ordering, we can treat the string as a list of characters. 
* Each position in the string (first, second, etc.) is given a number. This number is called an index or sometimes a subscript.
* Indices are numbered from 0. Use the position’s index in square brackets to get the character at that position.

![Diagram showing string indexing for word helium.](screenshots/helium.png 'index')

In [None]:
atom_name = 'helium'
print(atom_name[0])

# Use a slice to get a substring

* A part of a string is called a **substring**. A substring can be as short as a single character.
* An item in a list is called an element. Whenever we treat a string as if it were a list, the string’s elements are its individual characters.
* A slice is a part of a string (or, more generally, a part of any list-like thing).
* We take a slice with the notation [start:stop], where start is the integer index of the first element we want and stop is the integer index of the element *just after* the last element we want.
* The difference between stop and start is the slice’s length.
* Taking a slice does not change the contents of the original string. Instead, taking a slice returns a copy of part of the original string.

In [None]:
atom_name = 'sodium'
print(atom_name[0:3])

# Use the built-in function len to find the length of a string

Nested functions are evaluated from the inside out, like in mathematics.

In [None]:
print(len('helium'))

## Exercise 2:

If you assign `a = 123`, what happens if you try to get the second digit of `a` via `a[1]`?

In [None]:
a = 123
print(a[1])

*Solution:*

Numbers are not strings or sequences and Python will raise an error if you try to perform an index operation on a number. In the next lesson on types and type conversion we will learn more about types and how to convert between different types. If you want the Nth digit of a number you can convert it into a string using the `str` built-in function and then perform an index operation on that string.

In [None]:
a = str(123)
print(a[1])

## Exercise 3:

Which is a better variable name, `m`, `min`, or `minutes`? Why? Hint: think about which code you would rather inherit from someone who is leaving the lab:

```python
ts = m * 60 + s
tot_sec = min * 60 + sec
total_seconds = minutes * 60 + seconds
```

*Solution:*

`minutes` is better because `min` might mean something like “minimum” (and actually is an existing built-in function in Python that we will cover later).

## Exercise 5:

Given the following string:

`species_name = "Acer macrophyllum"`

What would these expressions return (try to predict the result before running the code yourself)?

```python
species_name[2:8]
species_name[11:] #(without a value after the colon)
species_name[:4] #(without a value before the colon)
species_name[:] #(just a colon)
species_name[11:-3]
species_name[-5:-3]
# What happens when you choose a stop value which is out of range? (i.e., try `species_name[0:20]` or `species_name[:103]`)
```

In [None]:
species_name = "Acer macrophyllum"

Use the following cells to try it out:

In [None]:
species_name[2:8]

In [None]:
species_name[11:]

In [None]:
species_name[:4]

In [None]:
species_name[:]

In [None]:
species_name[11:-3]

In [None]:
species_name[-5:-3]

In [None]:
species_name[0:20]

In [None]:
species_name[:103]

## Exercise 8: 

Given the answers to exercises 1 and 2, explain what `element[1:-1]` does.

*Solution*

Creates a substring from index 1 up to (not including) the final index, effectively removing the first and last letters from ‘oxygen’

In [None]:
element = 'oxygen'
print(element[1:-1])

## Exercise 9: 

How can we rewrite the slice for getting the last three characters of a string, so that it works even if we assign a different string? Test your solution with the following strings: carpentry, clone, hi.

*Solution*

In [None]:
element = 'carpentry'
print(element[-3:])

element = 'clone'
print(element[-3:])

element = 'hi'
print(element[-3:])

# TIME CHECK!

# Every value has a type

* Every value in a program has a specific type.
* Integer (`int`): represents positive or negative whole numbers like 3 or -512.
* Floating point number (`float`): represents real numbers like 3.14159 or -2.5.
* Character string (usually called “string”, `str`): text.
    * Written in either single quotes or double quotes (as long as they match).
    * The quote marks aren’t printed when the string is displayed.
    
# Use the built-in function `type` to find the type of a value

* Use the built-in function `type` to find out what type a value has.
* Works on variables as well.
    * But remember: the value has the type — the variable is just a label.

In [None]:
print(type(52))

In [None]:
fitness = 'average'
print(type('average'))
print(type(fitness))

# Types control what operations (or methods) can be performed on a given value

A value’s type determines what the program can do to it.

In [None]:
print(5 - 3)

In [None]:
print('hello' - 'h')

# You can use the “+” and “*” operators on strings

“Adding” character strings concatenates them.

In [None]:
full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)

Multiplying a character string by an integer *N* creates a new string that consists of that character string repeated *N* times.
* Since multiplication is repeated addition.

In [None]:
separator = '=' * 10
print(separator)

In [None]:
print(fitness * 10)

# Strings have a length (but numbers don’t)

The built-in function len counts the number of characters in a string.

In [None]:
print(len(full_name))

But numbers don’t have a length (not even zero).

In [None]:
print(len(52))

# Must convert numbers to strings or vice versa when operating on them

Cannot add numbers and strings.

In [None]:
print(1 + '2')

* Not allowed because it’s ambiguous: should 1 + '2' be 3 or '12'?
* Some types can be converted to other types by using the type name as a function.

In [None]:
print(1 + int('2'))
print(str(1) + '2')

# Can mix integers and floats freely in operations

Integers and floating-point numbers can be mixed in arithmetic. Python 3 automatically converts integers to floats as needed.

In [None]:
print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)

# Variables only change value when something is assigned to them

If we make one cell in a spreadsheet depend on another, and update the latter, the former updates automatically.
This does **not** happen in programming languages.

In [None]:
variable_one = 1
variable_two = 5 * variable_one
variable_one = 2
print('first is', variable_one, 'and second is', variable_two)

* The computer reads the value of `variable_one` when doing the multiplication, creates a new value, and assigns it to `variable_two`.
* Afterwards, the value of `variable_two` is set to the new value and not dependent on `variable_one` so its value does not automatically change when `variable_one` changes.

## Exercise 1:

What type of value is 3.4? How can you find out?

*Solution: It is a floating-point number (often abbreviated “float”). It is possible to find out by using the built-in function `type()`.*

In [None]:
print(type(3.4))

# TIME CHECK!

# Built-in Functions and Help

Python is a versatile programming language that comes with a rich set of built-in features designed to streamline development and increase productivity. Key features include a comprehensive standard library that supports a wide range of tasks such as file I/O, system calls, and Internet protocols. Python's dynamic typing and automatic memory management simplify coding and reduce errors. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Additionally, Python includes robust data structures like lists, dictionaries, and sets, and provides powerful modules for handling regular expressions, mathematics, and date/time operations. Its interactive interpreter, integrated debugging, and extensive documentation make it accessible and user-friendly for both beginners and experienced developers.


# A function may take zero or more arguments

* We have seen some functions already — now let’s take a closer look.
* An argument is a value passed into a function.
* `len` takes exactly one.
* `int`, `str`, and `float` create a new value from an existing one.
* `print` takes zero or more.
* `print` with no arguments prints a blank line.
    * Must always use parentheses, even if they’re empty, so that Python knows a function is being called.

In [None]:
print('before')
print()
print('after')

# Every function returns something

* Every function call produces some result.
* If the function doesn’t have a useful result to return, it usually returns the special value `None`. `None` is a Python object that stands in anytime there is no value.

In [None]:
result = print('example')
print('result of print is', result)

# Commonly-used built-in functions include max, min, and round

* Use `max` to find the largest value of one or more values.
* Use `min` to find the smallest.
* Both work on character strings as well as numbers.
    * “Larger” and “smaller” use (0-9, A-Z, a-z) to compare letters.

In [None]:
print(max(1, 2, 3))
print(min('a', 'A', '0'))

# Functions may only work for certain (combinations of) arguments

* `max` and `min` must be given at least one argument.
    * “Largest of the empty set” is a meaningless question.
* And they must be given things that can meaningfully be compared.

In [None]:
print(max(1, 'a'))

# Functions may have default values for some arguments

`round` will round off a floating-point number. By default, rounds to zero decimal places.

In [None]:
round(3.712)

 We can specify the number of decimal places we want.

In [None]:
round(3.712, 1)

# Functions attached to objects are called methods

* Functions take another form that will be common in our section focused on `pandas`.
* Methods have parentheses like functions, but come after the variable.
* Some methods are used for internal Python operations, and are marked with double underlines.

In [None]:
my_string = 'Hello world!'  # creation of a string object 

print(len(my_string))       # the len function takes a string as an argument and returns the length of the string

print(my_string.swapcase()) # calling the swapcase method on the my_string object

print(my_string.__len__())  # calling the internal __len__ method on the my_string object, used by len(my_string)

You might even see them chained together. They operate left to right.

In [None]:
print(my_string.isupper())          # Not all the letters are uppercase
print(my_string.upper())            # This capitalizes all the letters

print(my_string.upper().isupper())  # Now all the letters are uppercase

# Use the built-in function help to get help for a function

Every built-in function has online documentation.

In [None]:
help(round)

# The Jupyter Notebook has two ways to get help

**Option 1:** Place the cursor near where the function is invoked in a cell (i.e., the function name or its parameters),
* Hold down `Shift`, and press `Tab`.
* Do this several times to expand the information returned.

**Option 2:** Type the function name in a cell with a question mark after it. Then run the cell.

In [None]:
# place cursor after the 'd' below and use the shift + tab option
round

In [None]:
# question mark option:
round?

# Python reports a syntax error when it can’t understand the source of a program

Won’t even try to run the program if it can’t be parsed.

In [None]:
# Forgot to close the quote marks around the string.
name = 'Feng

In [None]:
# An extra '=' in the assignment.
age = = 52

Look more closely at the error message:

In [None]:
print("hello world"

* The message indicates a problem on first line of the input (“line 1”).
* The last line indicates incomplete input.

# Python reports a runtime error when something goes wrong while a program is executing

In [None]:
age = 53
remaining = 100 - aege # mis-spelled 'age'

Fix syntax errors by reading the source and runtime errors by tracing execution.

# TIME CHECK!

# A list stores many values in a single structure

* Doing calculations with a hundred variables called `pressure_001`, `pressure_002`, etc., would be at least as slow as doing them by hand.
* Use a *list* to store many values together.
    * Contained within square brackets `[...]`.
    * Values separated by commas `,`.
* Use len to find out how many values are in a list.

In [None]:
pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
print('pressures:', pressures)
print('length:', len(pressures))

# Use an item’s index to fetch it from a list
Just like strings.

In [None]:
print('zeroth item of pressures:', pressures[0])
print('fourth item of pressures:', pressures[4])

# Lists’ values can be replaced by assigning to them

Use an index expression on the left of assignment to replace a value.

In [None]:
pressures[0] = 0.265
print('pressures is now:', pressures)

# Appending items to a list lengthens it
Use `list_name.append` to add items to the end of a list.

In [None]:
primes = [2, 3, 5]
print('primes is initially:', primes)
primes.append(7)
print('primes has become:', primes)

* `append` is a ***method*** of lists.
    * Like a function, but tied to a particular object.
* Use `object_name.method_name` to call methods.
    * Deliberately resembles the way we refer to things in a library.
* We will meet other methods of lists as we go along.
* Use `help(list)` for a preview.
* `extend` is similar to `append`, but it allows you to combine two lists. For example:

In [None]:
teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print('primes is currently:', primes)
primes.extend(teen_primes)
print('primes has now become:', primes)
primes.append(middle_aged_primes)
print('primes has finally become:', primes)

Note that while `extend` maintains the “flat” structure of the list, appending a list to a list means the last element in `primes` will itself be a list, not an integer. Lists can contain values of any type; therefore, lists of lists are possible.

# Use `del` to remove items from a list entirely

* We use del `list_name[index]` to remove an element from a list (in the example, 9 is not a prime number) and thus shorten it.
* `del` is not a function or a method, but a statement in the language.

In [None]:
primes = [2, 3, 5, 7, 9]
print('primes before removing last item:', primes)
del primes[4]
print('primes after removing last item:', primes)

# The empty list contains no values
Use `[]` on its own to represent a list that doesn’t contain any values.
* “The zero of lists.”

# Lists may contain values of different types

A single list may contain numbers, strings, and anything else.

In [None]:
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

# Character strings can be indexed like lists
Get single characters from a character string using indexes in square brackets.

In [None]:
element = 'carbon'
print('zeroth character:', element[0])
print('third character:', element[3])

# Character strings are immutable

* Cannot change the characters in a string after it has been created.
    * ***Immutable:*** can’t be changed after creation.
    * In contrast, lists are ***mutable:*** they can be modified in place.
* Python considers the string to be a single value with parts, not a collection of values.
* Lists and character strings are both ***collections***.

In [None]:
element[0] = 'C'

# Indexing beyond the end of the collection is an error

* Python reports an IndexError if we attempt to access a value that doesn’t exist.
    * This is a kind of [runtime error](https://swcarpentry.github.io/python-novice-gapminder/04-built-in.html).
    * Cannot be detected as the code is parsed because the index might be calculated based on data.

In [None]:
print('99th element of element is:', element[99])

## Exercise 1: 

Fill in the blanks so that the program below produces the output shown.

```python
values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)
```

Output:
```python
first time: [1, 3, 5]
second time: [3, 5]
```

*Solution:*

In [None]:
values = []
values.append(1)
values.append(3)
values.append(5)
print('first time:', values)
values = values[1:]
print('second time:', values)

## Exercise 2:

Given this:

```python
print('string to list:', list('tin'))
print('list to string:', ''.join(['g', 'o', 'l', 'd']))
```

Output: 
```python
string to list: ['t', 'i', 'n']
list to string: gold
```

1. What does `list('some string')` do?
2. What does `'-'.join(['x', 'y', 'z'])` generate?

*Solution:*

1. `list('some string')` converts a string into a list containing all of its characters.
2. `join` returns a string that is the *concatenation* of each string element in the list and adds the separator between each element in the list. This results in `x-y-z`. The separator between the elements is the string that provides this method.

## Exercise 4: 

What does the following program print?

```python
element = 'fluorine'
print(element[::2])
print(element[::-1])
```

1. If we write a slice as low:high:stride, what does stride do?
2. What expression would select all of the even-numbered items from a collection?

*Solution:*

The program prints
```python
furn
eniroulf
```

1. `stride` is the step size of the slice.
2. The slice `1::2` selects all even-numbered items from a collection: it starts with element 1 (which is the second element, since indexing starts at 0), goes on until the end (since no end is given), and uses a step size of 2 (i.e., selects every second element).

# TIME CHECK!

# Most of the power of a programming language is in its libraries

***A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.***

* A library is a collection of files (called modules) that contains functions for use by other programs.
    * May also contain data values (e.g., numerical constants) and other things.
    * Library’s contents are supposed to be related, but there’s no way to enforce that.
* The Python [standard library](https://docs.python.org/3/library/) is an extensive suite of modules that comes with Python itself.
* Many additional libraries are available from [PyPI](https://pypi.python.org/pypi/) (the Python Package Index).

# A program must import a library module before using it

* Use import to load a library module into a program’s memory.
* Then refer to things from the module as module_name.thing_name.
    * Python uses . to mean “part of”.
* Using math, one of the modules in the standard library:

In [None]:
import math

print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))

* Have to refer to each item with the module’s name.
* `math.cos(pi)` won’t work: the reference to `pi` doesn’t somehow “inherit” the function’s reference to `math`.

# Use help to learn about the contents of a library module

Works just like help for a function.

In [None]:
help(math) # remember to import math above ^^
# help for a library is not available until the library is imported.

# Import specific items from a library module to shorten programs

* Use from ... import ... to load only specific items from a library module.
* Then refer to them directly without library name as prefix.

In [None]:
from math import cos, pi

print('cos(pi) is', cos(pi))

# Create an alias for a library module when importing it to shorten programs

* Use `import ... as ...` to give a library a short alias while importing it.
* Then refer to items in the library using that shortened name.

In [None]:
import math as m

print('cos(pi) is', m.cos(m.pi))

* Commonly used for libraries that are frequently used or have long names.
    * E.g., the `matplotlib` plotting library is often aliased as `mpl`.
* But can make programs harder to understand, since readers must learn your program’s aliases.

1. Import the random module from Python's standard library and explore the help. 
2. Which function would you select from that module? Are there alternatives?
3. Try to write a program that uses the function.

# TIME CHECK!

# A `for` loop executes commands once for each value in a collection

* Doing calculations on the values in a list one by one is as painful as working with `pressure_001`, `pressure_002`, etc.
* A **for loop tells* Python to execute some statements once for each value in a list, a character string, or some other collection.
* “for each thing in this group, do these operations”

In [None]:
for number in [2, 3, 5]:
    print(number)

The for loop above is equivalent to:

In [None]:
print(2)
print(3)
print(5)

# A `for` loop is made up of a collection, a loop variable, and a body.

```python
for number in [2, 3, 5]:
    print(number)
```
* The collection, `[2, 3, 5]`, is what the loop is being run on.
* The body, `print(number)`, specifies what to do for each value in the collection.
* The loop variable, `number`, is what changes for each *iteration* of the loop.
    * The “current thing”.


# The first line of the `for` loop must end with a colon, and the body must be indented.

* The colon at the end of the first line signals the start of a block of statements.
* Python uses indentation rather than `{}` or begin/end to show *nesting*.
    * Any consistent indentation is legal, but almost everyone uses four spaces.

In [None]:
for number in [2, 3, 5]:
print(number)

Indentation is always meaningful in Python.

In [None]:
firstName = "Jon"
  lastName = "Smith"

This error can be fixed by removing the extra spaces at the beginning of the second line.

# Loop variables can be called anything.

As with all variables, loop variables are:
* Created on demand.
* Meaningless: their names can be anything at all.

In [None]:
for kitten in [2, 3, 5]:
    print(kitten)

# The body of a loop can contain many statements.

* But no loop should be more than a few lines long.
* Hard for human beings to keep larger chunks of code in mind.

In [None]:
primes = [2, 3, 5]
for number in primes:
    squared = number ** 2
    cubed = number ** 3
    print(number, squared, cubed)

# Use range to iterate over a sequence of numbers.

The built-in function [<ins>range</ins>](https://docs.python.org/3/library/stdtypes.html#range) produces a sequence of numbers.
* Not a list: the numbers are produced on demand to make looping over large ranges more efficient.
`range(N)` is the numbers 0..N-1
* Exactly the legal indices of a list or character string of length N

In [None]:
for number in range(0, 3):
    print(number)

# The Accumulator pattern turns many values into one.

A common pattern in programs is to:
1. Initialize an *accumulator* variable to zero, the empty string, or the empty list.
2. Update the variable with values from a collection.

In [None]:
# Sum the first 10 integers.
total = 0
for number in range(10):
   total = total + (number + 1)
print(total)

Read `total = total + (number + 1)` as:
* Add 1 to the current value of the loop variable `number`.
* Add that to the current value of the accumulator variable `total`.
* Assign that to `total`, replacing the current value.
* We have to add `number + 1` because range produces 0..9, not 1..10.

## Exercise 2: 

Fill in the blanks in each of the programs below to produce the indicated result.

```python
# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)
```

*Solution* 

In [None]:
total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)

## Exercise 3: 

Fill in the blanks in each of the programs below to produce the indicated result.

```python
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____
for word in ["red", "green", "blue"]:
    lengths.____(____)
print(lengths)
```

*Solution*

In [None]:
lengths = []
for word in ["red", "green", "blue"]:
    lengths.append(len(word))
print(lengths)

## Exercise 4: 

Fill in the blanks in each of the programs below to produce the indicated result.

```python
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
    ____
print(result)
```

*Solution*

In [None]:
words = ["red", "green", "blue"]
result = ""
for word in words:
    result = result + word
print(result)

## Exercise 5: 

Create an acronym: Starting from the list ["red", "green", "blue"], create the acronym "RGB" using a for loop.

Hint: You may need to use a string method to properly format the acronym.

*Solution*

In [None]:
words = ["red", "green", "blue"]
acronym = ""
for word in words:
    acronym = acronym + word[0].upper()
print(acronym)

## Exercise 6:

Reorder and properly indent the lines of code below so that they print a list with the cumulative sum of data. The result should be [1, 3, 5, 10].

```python
cumulative.append(total)
for number in data:
cumulative = []
total = total + number
total = 0
print(cumulative)
data = [1,2,2,5]
```

*Solution*

In [None]:
data = [1,2,2,5]
cumulative = []
total = 0

for number in data:
    total = total + number
    cumulative.append(total)
print(cumulative)

## Exercise 7:

1. Read the code below and try to identify what the errors are without running it.
2. Run the code and read the error message. What type of NameError do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?
3. Fix the error.
4. Repeat steps 2 and 3, until you have fixed all the errors.

```python
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + "b"
print(message)
```

*Solution*

* Python variable names are case sensitive: `number` and `Number` refer to different variables.
* The variable `message` needs to be initialized as an empty string.
* We want to add the string "a" to message, not the undefined variable a.

In [None]:
# corrected code:
message = ""
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (number % 3) == 0:
        message = message + "a"
    else:
        message = message + "b"
print(message)

## Exercise 8:

1. Read the code below and try to identify what the errors are without running it.
2. Run the code, and read the error message. What type of error is it?
3. Fix the error.

```python
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[4])
```

*Solution*

This list has 4 elements and the index to access the last element in the list is 3.

In [None]:
# corrected code: 
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[3])

# TIME CHECK!

# Use `if` statements to control whether or not a block of code is executed.

An `if` statement (more properly called a conditional statement) controls whether some block of code is executed or not.
Structure is similar to a for statement:
* First line opens with if and ends with a colon
* Body containing one or more statements is indented (usually by 4 spaces)

In [None]:
mass = 3.54
if mass > 3.0:
    print(mass, 'is large')

mass = 2.07
if mass > 3.0:
    print (mass, 'is large')

# Conditionals are often used inside loops.

* Not much point using a conditional when we know the value (as above).
* But useful when we have a collection to process.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')

# Use `else` to execute a block of code when an `if` condition is not true.

* `else` can be used following an `if`.
* Allows us to specify an alternative to execute when the if *branch* isn’t taken.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

# Use `elif` to specify additional tests.

* May want to provide several alternative choices, each with its own test.
* Use `elif` (short for “else if”) and a condition to specify these.
* Always associated with an `if`.
* Must come before the `else` (which is the “catch all”).

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 9.0:
        print(m, 'is HUGE')
    elif m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

# Conditions are tested once, in order.

Python steps through the branches of the conditional in order, testing each in turn. So ordering matters.

In [None]:
grade = 85
if grade >= 90:
    print('grade is A')
elif grade >= 80:
    print('grade is B')
elif grade >= 70:
    print('grade is C')

Does not automatically go back and re-evaluate if values change.

In [None]:
velocity = 10.0
if velocity > 20.0:
    print('moving too fast')
else:
    print('adjusting velocity')
    velocity = 50.0

Often use conditionals in a loop to “evolve” the values of variables.

In [None]:
velocity = 10.0
for i in range(5): # execute the loop 5 times
    print(i, ':', velocity)
    if velocity > 20.0:
        print('moving too fast')
        velocity = velocity - 5.0
    else:
        print('moving too slow')
        velocity = velocity + 10.0
print('final velocity:', velocity)

# Compound Relations Using and, or, and Parentheses

Often, you want some combination of things to be true. You can combine relations within a conditional using `and` and `or`. Continuing the example above, suppose you have

In [None]:
mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
velocity = [10.00, 20.00, 30.00, 25.00, 20.00]

i = 0
for i in range(5):
    if mass[i] > 5 and velocity[i] > 20:
        print("Fast heavy object.  Duck!")
    else:
        print("Smooth sailing")

# TIME CHECK!

# Loading data into Python

For this section, we will use fictional data collected from fictional patients using a miracle drug to cure arthritis inflamation flare-ups in only 3 weeks of treatment. 

The data sets are stored in comma-separated values (CSV) format:

* each row holds information for a single patient
* columns represent successive days

The first three rows of our first file look like this:
```
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
```
The CSV file contains the number of inflammation flare-ups per day for the 60 patients in the initial clinical trial, with the trial lasting 40 days. Each row corresponds to a patient, and each column corresponds to a day in the trial. Once a patient has their first inflammation flare-up they take the medication and wait a few weeks for it to take effect and reduce flare-ups.

Each number represents the number of inflammation bouts that a particular patient experienced on a given day. For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study.

To begin processing the clinical trial inflammation data, we need to load it into Python. We can do that using a library called [<ins>**NumPy**</ins>](https://numpy.org/doc/stable), which stands for *Numerical Python*. In general, you should use this library when you want to do fancy things with lots of numbers, especially if you have matrices or arrays. To tell Python that we’d like to start using NumPy, we need to import it:

In [None]:
import numpy

Importing a library is like getting a piece of lab equipment out of a storage locker and setting it up on the bench. Libraries provide additional functionality to the basic Python package, much like a new piece of equipment adds functionality to a lab space. Just like in the lab, importing too many libraries can sometimes complicate and slow down your programs - so we only import what we need for each program.

Once we’ve imported the library, we can ask the library to read our data file for us:

In [None]:
numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

The expression `numpy.loadtxt(...)` is a function call that asks Python to run the function `loadtxt` which belongs to the `numpy` library. The dot notation in Python is used most of all as an object attribute/property specifier or for invoking its method. `object.property` will give you the `object.property` value, `object_name.method()` will invoke on `object_name` method.

As an example, John Smith is the John that belongs to the Smith family. We could use the dot notation to write his name `smith.john`, just as `loadtxt` is a function that belongs to the `numpy` library.

`numpy.loadtxt` has two parameters: the name of the file we want to read and the `delimiter` that separates values on a line. These both need to be character strings, so we put them in quotes.

Since we haven’t told it to do anything else with the function’s output, the notebook displays it. In this case, that output is the data we just loaded. By default, only a few rows and columns are shown (with ... to omit elements when displaying big arrays). Note that, to save space when displaying NumPy arrays, Python does not show us trailing zeros, so 1.0 becomes 1..

Our call to `numpy.loadtxt` read our file but didn’t save the data in memory. To do that, we need to assign the array to a variable. In a similar manner to how we assign a single value to a variable, we can also assign an array of values to a variable using the same syntax. Let’s re-run `numpy.loadtxt` and save the returned data:

In [None]:
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

This statement doesn’t produce any output because we’ve assigned the output to the variable data. If we want to check that the data have been loaded, we can print the variable’s value:

In [None]:
print(data)

Now that the data are in memory, we can manipulate them. First, let’s ask what type of thing data refers to:

In [None]:
print(type(data))

The output tells us that data currently refers to an N-dimensional array, the functionality for which is provided by the NumPy library. These data correspond to arthritis patients’ inflammation. The rows are the individual patients, and the columns are their daily inflammation measurements.

A Numpy array contains one or more elements of the same `type`. The type function will only tell you that a variable is a NumPy array but won’t tell you the type of thing inside the array. We can find out the `type` of the data contained in the NumPy array.

In [None]:
print(data.dtype)

With the following command, we can see the array’s `shape`:

In [None]:
print(data.shape)

The output tells us that the `data` array variable contains 60 rows and 40 columns. When we created the variable data to store our arthritis data, we did not only create the array; we also created information about the array, called members or attributes. This extra information describes `data` in the same way an adjective describes a noun. `data.shape` is an attribute of `data` which describes the dimensions of data. We use the same dotted notation for the attributes of variables that we use for the functions in libraries because they have the same part-and-whole relationship.

If we want to get a single number from the array, we must provide an index in square brackets after the variable name, just as we do in math when referring to an element of a matrix. Our inflammation data has two dimensions, so we will need to use two indices to refer to one specific value:

In [None]:
print('first value in data:', data[0, 0])

In [None]:
print('middle value in data:', data[29, 19])

The expression `data[29, 19]` accesses the element at row 30, column 20 (patient 29's inflamation report for day 19). While this expression may not surprise you, `data[0, 0]` might. Programming languages like Fortran, MATLAB and R start counting at 1 because that’s what human beings have done for thousands of years. Languages in the C family (including C++, Java, Perl, and Python) count from 0 because it represents an offset from the first value in the array (the second value is offset by one index from the first value). This is closer to the way that computers represent arrays (if you are interested in the historical reasons behind counting indices from zero, you can read [<ins>**Mike Hoye’s blog post**</ins>](https://exple.tive.org/blarg/2013/10/22/citation-needed/)). As a result, if we have an M×N array in Python, its indices go from 0 to M-1 on the first axis and 0 to N-1 on the second. It takes a bit of getting used to, but one way to remember the rule is that the index is how many steps we have to take from the start to get the item we want.

![array indexing diagram](screenshots/array_index.png 'array_indexing')

What may also surprise you is that when Python displays an array, it shows the element with index `[0, 0]` in the upper left corner rather than the lower left. This is consistent with the way mathematicians draw matrices but different from the Cartesian coordinates. The indices are (row, column) instead of (column, row) for the same reason, which can be confusing when plotting data.

# Slicing data

An index like `[30, 20]` selects a single element of an array, but we can select whole sections as well. For example, we can select the first ten days (columns) of values for the first four patients (rows) like this:


In [None]:
print(data[0:4, 0:10])

The slice `0:4` means, “Start at index 0 and go up to, but not including, index 4”. Again, the up-to-but-not-including takes a bit of getting used to, but the rule is that the difference between the upper and lower bounds is the number of values in the slice.

We don’t have to start slices at 0:

In [None]:
print(data[5:10, 0:10])

We also don’t have to include the upper and lower bound on the slice. If we don’t include the lower bound, Python uses 0 by default; if we don’t include the upper, the slice runs to the end of the axis, and if we don’t include either (i.e., if we use ‘:’ on its own), the slice includes everything. The following example selects rows 0 through 2 and columns 36 through to the end of the array.

In [None]:
small = data[:3, 36:]
print('small is:')
print(small)

# Analyzing data

NumPy has several useful functions that take an array as input to perform operations on its values. If we want to find the average inflammation for all patients on all days, for example, we can ask NumPy to compute data’s mean value:

In [None]:
print(numpy.mean(data))
# mean is a function that takes an array as an argument.

Let’s use three other NumPy functions to get some descriptive values about the dataset. We’ll also use multiple assignment, a convenient Python feature that will enable us to do this all in one line.

In [None]:
maxval, minval, stdval = numpy.amax(data), numpy.amin(data), numpy.std(data)

print('maximum inflammation:', maxval)
print('minimum inflammation:', minval)
print('standard deviation:', stdval)

Above we’ve assigned the return value from `numpy.amax(data)` to the variable `maxval`, the value from `numpy.amin(data)` to `minval`, and so on.

**PRO TIP**: How did we know what functions NumPy has and how to use them? If you are working in IPython or in a Jupyter Notebook, there is an easy way to find out. If you type the name of something followed by a dot, then you can use tab completion (e.g. type `numpy.` and then press `Tab`) to see a list of all functions and attributes that you can use. After selecting one, you can also add a question mark (e.g. `numpy.cumprod?`), and IPython will return an explanation of the method! This is the same as doing `help(numpy.cumprod)`. 

**FYI:** One might wonder why the functions are called `amax` and `amin` and not `max` and `min` or why the other is called `mean` and not `amean`. The package `numpy` does provide functions `max` and `min` that are fully equivalent to `amax` and `amin`, but they share a name with standard library functions `max` and `min` that come with Python itself. Referring to the functions like we did above, that is `numpy.max` for example, does not cause problems, but there are other ways to refer to them that could. In addition, text editors might highlight (color) these functions like standard library function, even though they belong to NumPy, which can be confusing and lead to errors. Since there is no function called `mean` in the standard library, there is no function called `amean`.

When analyzing data, though, we often want to look at variations in statistical values, such as the maximum inflammation per patient or the average inflammation per day. One way to do this is to create a new temporary array of the data we want, then ask it to do the calculation:

In [None]:
patient_0 = data[0, :] # 0 on the first axis (rows), everything on the second (columns)
print('maximum inflammation for patient 0:', numpy.amax(patient_0))

We don’t actually need to store the row in a variable of its own. Instead, we can combine the selection and the function call:

In [None]:
print('maximum inflammation for patient 2:', numpy.amax(data[2, :]))

What if we need the maximum inflammation for each patient over all days (as in the next diagram on the left) or the average for each day (as in the diagram on the right)? As the diagram below shows, we want to perform the operation across an axis:

![numpy diagram demonstrating the operation to find patient max inflamation and daily average inflammation](screenshots/numpy.png 'diagram')

To find the **maximum inflammation reported for each patient**, you would apply the `max` function moving across the columns (axis 1). To find the **daily average inflammation reported across patients**, you would apply the `mean` function moving down the rows (axis 0). 

To support this functionality, most array functions allow us to specify the axis we want to work on. If we ask for the max across axis 1 (columns in our 2D example), we get:


In [None]:
print(numpy.max(data, axis=1))

As a quick check, we can ask this array what its shape is. We expect 60 patient maximums:

In [None]:
print(numpy.max(data, axis=1).shape)

The expression `(60,)` tells us we have an N×1 vector, so this is the maximum inflammation per day for each patients. 

If we ask for the average across/down axis 0 (rows in our 2D example), we get:

In [None]:
print(numpy.mean(data, axis=0))

Check the array shape. We expect 40 averages, one for each day of the study:

In [None]:
print(numpy.mean(data, axis=0).shape)

Similarly, we can apply the `mean` function to axis 1 to get the patient's average inflammation over the duration of the study (60 values). 

In [None]:
print(numpy.mean(data, axis=1))

In [None]:
print(numpy.mean(data, axis=1).shape)

## Exercise 3: 

The patient data is *longitudinal* in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept. Let’s find out how to calculate changes in the data contained in an array with NumPy.

The `numpy.diff()` function takes an array and returns the differences between two successive values. Let’s use it to examine the changes each day across the first week of patient 3 from our inflammation dataset.

```python
patient3_week1 = data[3, :7]
print(patient3_week1)
```

Output:
```python
[0. 0. 2. 0. 4. 2. 2.]
```
Calling `numpy.diff(patient3_week1)` would do the following calculations
```python
[ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ]
```
and return the 6 difference values in a new array.
```python
numpy.diff(patient3_week1)
```

Output:
```python
array([ 0.,  2., -2.,  4., -2.,  0.])
```
Note that the array of differences is shorter by one element (length 6).

When calling `numpy.diff` with a multi-dimensional array, an axis argument may be passed to the function to specify which axis to process. When applying `numpy.diff` to our 2D inflammation array data, which axis would we specify?

*Solution*

Since the row axis (0) is patients, it does not make sense to get the difference between two arbitrary patients. The column axis (1) is in days, so the difference is the change in inflammation – a meaningful concept.

In [None]:
numpy.diff(data, axis=1)

## Exercise 5: 

How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease?

*Solution*

By using the `numpy.amax()` function after you apply the `numpy.diff()` function, you will get the largest difference between days.

In [None]:
numpy.amax(numpy.diff(data, axis=1), axis=1)


If inflammation values decrease along an axis, then the difference from one element to the next will be negative. If you are interested in the magnitude of the change and not the direction, the `numpy.absolute()` function will provide that.

Notice the difference if you get the largest absolute difference between readings.

In [None]:
numpy.amax(numpy.absolute(numpy.diff(data, axis=1)), axis=1)

## Time Exercise

In [None]:
import random

import numpy as np

In [None]:
L = [random.random() for _ in range(1000000)]
N = np.random.random(1000000)

In [None]:
print(type(L))
print(type(N))

In [None]:
%timeit sum(L)
%timeit np.sum(L)
%timeit sum(N)
%timeit np.sum(N)