# Running and Quitting

**Teaching:** 10 minutes  
**Exercises:** 5 minutes 

## Questions:
- How can I run Python programs? 

## Objectives:  
- Launch the Azure server.
- Create a Jupyter notebook.
- Understand the difference between a Python script and a Jupyter notebook.
- Create Markdown cells in a notebook.
- Create and run Python cells in a notebook.

## Key points:
- Python scripts are plain text files.
- Use the Jupyter Notebook for editing and running Python.
- The Notebook has Command and Edit modes.
- Use the keyboard and mouse to select and edit cells.
- The Notebook will turn Markdown into pretty-printed documentation.
- Markdown does most of what HTML does.

## Getting Started with Azure Notebooks
While many software developers will often use an integrated development environment (IDE) or a text editor to create and edit their Python programs, we will be using Azure notebooks during this lesson. 

Jupyter notebooks can also be run using the Anaconda Python distribution. If you have not already installed the Anaconda Python distribution, see [the setup instructions](https://swcarpentry.github.io/python-novice-gapminder/setup/) 
for installation instructions.

# Need to provide some details about running Azure

## Creating a Python script

*   To start writing a new Python program click the Text File icon under the *Other* header in the Launcher tab of the Main Work Area.
    *   You can also create a new plain text file by selecting the *New -> Text File* from the *File* menu in the Menu Bar.
*   To convert this plain text file to a Python program, select the *Save File As* action from the *File* menu in the Menu Bar and give your new text file a name that ends with the `.py` extension.
    *   The `.py` extension lets everyone (including the operating system) know that this text file is a Python program.
    *   This is convention, not a requirement.

## Creating a Jupyter Notebook
To open a new notebook click the Python 3 icon under the *Notebook* header in the Launcher tab in 
the main work area. You can also create a new notebook by selecting *New -> Notebook* from the *File* menu in the Menu Bar. Additional notes on Jupyter notebooks:.
  *   Notebook files have the extension `.ipynb` to distinguish them from plain-text Python programs.
  *   Notebooks can be exported as Python scripts that can be run from the command line.

## Use Jupyter Notebooks for editing and running Python.
*   While it's common to write Python scripts using a text editor, we are going to use the [Jupyter Notebook](https://jupyter.org/) for the remainder of this workshop.
*   This has several advantages:
    *   You can easily type, edit, and copy and paste blocks of code.
    *   Tab complete allows you to easily access the names of things you are using
        and learn more about them.
    *   It allows you to annotate your code with links, different sized text, bullets, etc.
        to make it more accessible to you and your collaborators.
    *   It allows you to display figures next to the code that produces them
        to tell a complete story of the analysis.
*   Each notebook contains one or more cells that contain code, text, or images.

## Code vs. Text
Jupyter mixes code and text in different types of blocks, called cells. We often use the term "code" to mean "the source code of software written in a language such as Python". A "code cell" in a Notebook is a cell that contains software; a "text cell" is one that contains ordinary prose written for human beings. 

## Example Python code cell: "Hello World."

## Use the keyboard and mouse to select and edit cells.
*   Pressing the <kbd>Return</kbd> key turns the border blue and engages Edit mode, which allows 
    you to type within the cell.
*   We need some other way to tell the Notebook we want to run what's in the cell.
*   Pressing <kbd>Shift</kbd>+<kbd>Return</kbd> together will execute the contents of the cell.

## Notebooks will turn Markdown into pretty-printed documentation.
*   Notebooks can also render [Markdown](https://en.wikipedia.org/wiki/Markdown).
    *   A simple plain-text format for writing lists, links, and other things that might go into a web page.
    *   Equivalently, a subset of HTML that looks like what you'd send in an old-fashioned email.
*   Turn the current cell into a Markdown cell by entering the Command mode (<kbd>Esc</kbd>/gray) 
    and press the <kbd>M</kbd> key.
*   Turn the current cell into a Code cell by entering the Command mode (<kbd>Esc</kbd>/gray) and 
    press the <kbd>y</kbd> key.

## Markdown can be used to provide documentation for Python code.
Asterisks can be used for creating bulleted lists.  
Numbers can be used for numbered lists.  
Adding spaces in a list allows sublists.
Pound signs can be used to make headings.

## Exercise: Create a bulleted and numbered list in Markdown in the cell below.

## Equation editing 

Notebooks can also include complex mathematical equations using [LaTeX](https://www.overleaf.com/learn/latex/mathematical_expressions) notation (click the link for some common applications). 
The equation is enclosed using dollar signs.

## Exercise: Use LaTeX to write out Euler's identity.

# Variables and Assignment
**Teaching:** 10 minutes  
**Exercises:** 10 minutes  
**Questions:**  
- How can I store data in programs?

## Objectives:
- Write programs that assign scalar values to variables and perform calculations with those values.
- Correctly trace value changes in programs that use scalar assignment.

## Key Points:
- Use variables to store values.
- Use `print` to display values.
- Variables persist between cells.
- Variables must be created before they are used.
- Variables can be used in calculations.
- Use an index to get a single character from a string.
- Use a slice to get a substring.
- Use the built-in function `len` to find the length of a string.
- Python is case-sensitive.
- Use meaningful variable names.  

## Use variables to store values.
*   **Variables** are names for values.
*   In Python the `=` symbol assigns the value on the right to the name on the left.
*   The variable is created when a value is assigned to it.
*   Here, Python assigns an age to a variable `age`
    and a name in quotes to a variable `first_name`.

In [2]:
age = 30
first_name = 'Steve'

*   Variable names
    * can **only** contain letters, digits, and underscore `_` (typically used to separate words in long variable names)
    * cannot start with a digit
    * are **case sensitive** (age, Age and AGE are three different variables)

## Use `print` to display values.

*   Python has a built-in function called `print` that prints things as text.
*   Call the function (i.e., tell Python to run it) by using its name.
*   Provide values to the function (i.e., the things to print) in parentheses.
*   To add a string to the printout, wrap the string in single or double quotes.
*   The values passed to the function are called **arguments**.

In [1]:
print(first_name, 'is', age, 'years old')

*   `print` automatically puts a single space between items to separate them.
*   And wraps around to a new line at the end.

## Variables must be created before they are used.

*   If a variable doesn't exist yet, or if the name has been misspelled,
    Python reports an error. (Unlike some languages, which "guess" a default value.)

In [5]:
print(last_name)

*   The last line of an error message is usually the most informative.
*   We will look at error messages in detail later.

## Variables Persist Between Cells.

Be aware that it is the *order* of execution of cells that is important in a Jupyter notebook, not the order in which they appear. Python will remember *all* the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present. 

As an example, create two cells with the following content, in this order:

> print(myval)  
> myval = 1

If you execute this in order, the first cell will give an error. However, if you run the first cell *after* the second cell it will print out `1`. To prevent confusion, it can be helpful to use the `Kernel` -> `Restart & Run All` option which clears the interpreter and runs everything from a clean slate going top to bottom.

## Variables can be used in calculations.

*   We can use variables in calculations just as if they were values.
    *   Remember, we assigned the value `30` to `age` a few lines ago.

In [None]:
age = age + 3
print('Age in three years:', age)

## Use indexing to grab part of a collection.

*   The characters (individual letters, numbers, and so on) in a string are
    ordered. 
*   For example, the string `'AB'` is not the same as `'BA'`. 
*   Because of this ordering, we can treat the string as a list of characters.
*   Each position in the string (first, second, etc.) is given a number. 
*   The number is called an *index* of the string, which is a kind of *collection*.
*   Indices are numbered from 0.
*   Use the position's index in square brackets to get the character at that
    position.
    
![an illustration of indexing](https://swcarpentry.github.io/python-novice-gapminder/fig/2_indexing.svg)

In [None]:
atom_name = 'helium'
print(atom_name[0])

## Use a slice to get a substring.

*   A part of a string is called a *substring*.
*   A *slice* is a part of a string (or, more generally, any list-like thing).
*   We take a slice by using `[start:stop]`, where `start` is replaced with the
    index of the first element we want and `stop` is replaced with the index of
    the element just after the last element we want.
*   Mathematically, you might say that a slice selects `[start:stop]`.
*   The difference between `stop` and `start` is the slice's length.
*   Taking a slice does not change the contents of the original string. Instead,
    the slice is a copy of part of the original string.  

>  atom_name = 'sodium'  
>  print(atom_name[0:3])

## Use the built-in function `len` to find the length of a string.

Define a variable "atom_name" as "helium," then print its length.

*   Nested functions are evaluated from the inside out,
     like in mathematics.

## Python is case-sensitive.

*   Python thinks that upper- and lower-case letters are different,
    so `Name` and `name` are different variables.
*   There are conventions for using upper-case letters at the start of variable names so we will use lower-case letters for now.

## Use meaningful variable names.

*   Python doesn't care what you call variables as long as they obey the rules
    (alphanumeric characters and the underscore).

>  flabadab = 30   
>  ewr_422_yY = 'Steve'   
>  print(ewr_422_yY, 'is', flabadab, 'years old')  

*   Use meaningful variable names to help other people understand what the program does.
*   The most important "other person" is your future self.

## Swapping Values Challenge

Type the following code, then figue out the values of the variables in this program
after each statement is executed.

>  x = 1.0     
>  y = 3.0     
>  swap = x    
>  x = y       
>  y = swap   
>  print(x,y,swap)

## Index

If you assign `a = 123`,
What happens if you try to get the second digit of `a` via `a[1]`?

## Choosing Variable Names

Which is a better variable name, `m`, `min`, or `minutes`?
Why?

## Slicing practice

What does the following program print?  

>  atom_name = 'carbon'  
>  print('atom_name[1:3] is:', atom_name[1:3])  

## Getting the last character in a string

How do you get the last character in the string below?  
name = 'Dave'

# Data Types and Type Conversion
**Teaching:** 10 minutes  
**Exercises:** 10 minutes

## Questions:
- What kinds of data do programs store?
- How can I convert one type to another?

## Objectives:
- Explain key differences between integers and floating point numbers.
- Explain key differences between numbers and character strings.
- Use built-in functions to convert between integers, floating point numbers, and strings.

## Keypoints:
- Every value has a type."
- Use the built-in function `type` to find the type of a value.
- Types control what operations can be done on values.
- Strings can be added and multiplied.
- Strings have a length (but numbers don't).
- Must convert numbers to strings or vice versa when operating on them.
- Can mix integers and floats freely in operations.
- Variables only change value when something is assigned to them.

## Every value has a type.
*   Every value in a program has a specific type.
*   Integer (`int`): represents positive or negative whole numbers like 3 or -512.
*   Floating point number (`float`): represents real numbers like 3.14159 or -2.5.
*   Character string (usually called "string", `str`): text.
    *   Written in either single quotes or double quotes (as long as they match).
    *   The quote marks aren't printed when the string is displayed.

## Use the built-in function `type` to find the type of a value.

*   Use the built-in function `type` to find out what type a value has.
*   Works on variables as well.
    *   But remember: the *value* has the type --- the *variable* is just a label.  

The following lines of code print the type of variable of the integer 52, the string "average," and the float 1.5.  
>  print(type(52))  
>  print(type('average'))  
>  print(type(3/2))  

## Types control what operations (or methods) can be performed on a given value.

* A value's type determines what the program can do to it.
* Every type has its own rules.
* Writing effective Python code requires the user to understand the rules.

Run the next few lines, but before running it, guess what you think the result will be.

>  print(5 - 3)  
>  print('hello' - 'h')  
>  print(5 * 3)  
>  print('hello' * 3)  
>  print('hello' + ' ' + 'world')  

## You can use the "+" and "*" operators on strings.

In [None]:
full_name = 'Dave' + ' ' + 'Lampert'
print(full_name)

*   Multiplying a character string by an integer _N_ creates a new string that consists of that character string repeated  _N_ times, since multiplication is repeated addition.

In [None]:
separator = '=' * 10
print(separator)

## Strings have a length (but numbers don't).

*   The built-in function `len` counts the number of characters in a string.

In [None]:
print(len(full_name))

*   But numbers don't have a length (not even zero).

In [None]:
print(len(52))

## <a name='convert-numbers-and-strings'></a> Must convert numbers to strings or vice versa when operating on them.

*   Cannot add numbers and strings.

In [None]:
print(1 + '2')

*   This operation is not allowed because addition is defined for these different variable types.
*   Some types can be converted to other types by using the type name as a function.

In [None]:
print(1 + int('2'))
print(str(1) + '2')

## Integers and floats can be mixed freely in operations.

*   Integers and floating-point numbers can be mixed in arithmetic.
    *   Python 3 automatically converts integers to floats as needed. (Integer division in Python 2 will return an integer, the *floor* of the division.)

In [None]:
print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)

## Variables only change value when something is assigned to them.

*   If we make one cell in a spreadsheet depend on another,
    and update the latter,
    the former updates automatically.
*   This does **not** happen in programming languages.

Guess what the output of the following will be, the run the code.  

>  first = 1  
>  second = 5 * first  
>  first = 2  
>  print('first is', first, 'and second is', second)  

## Fractions

What type of value is 3.4?  
How can you find out?  
Write code below to determine the answer.  

## Automatic Type Conversion

What type of value is 3.25 + 4?
Write code below to determine the answer. 

## Division Types

In Python 3, there are different types of division.
* The `//` operator performs integer (whole-number) floor division.
* The `/` operator performs floating-point division.
* The '%' (or *modulo*) operator calculates and returns the remainder from integer division.

>  print('5 // 3:', 5//3)  
>  print('5 / 3:', 5/3)  
>  print('5 % 3:', 5%3)  

## Strings to Numbers

Where reasonable, `float()` will convert a string to a floating point number, and `int()` will convert a floating point number to an integer. If the conversion doesn't make sense, however, an error message will occur.  

>  print("string to float:", float("3.4"))  
>  print("float to int:", int(3.4))  
>  print("string to float:", float("Hello world!"))  

## Arithmetic with Different Types

Which of the following will return the floating point number `2.0`?  

first = 1.0
second = "1"
third = "1.1"
1. `first + float(second)`
2. `float(second) + float(third)`
3. `first + int(third)`
4. `first + int(float(third))`
5. `int(first) + int(float(third))`
6. `2.0 * second`


# Built-in Functions and Help
**Teaching:** 15 minutes  
**Exercises:** 10 minutes  
## Questions:  
- How can I use built-in functions?
- How can I find out what they do?
- What kind of errors can occur in programs?

## Objectives:
- Explain the purpose of functions.
- Correctly call built-in Python functions.
- Correctly nest calls to built-in functions.
- Use help to display documentation for built-in functions.
- Correctly describe situations in which SyntaxError and NameError occur.

## Key points:
- Use comments to add documentation to programs.
- A function may take zero or more arguments.
- Commonly-used built-in functions include `max`, `min`, and `round`.
- Functions may only work for certain (combinations of) arguments.
- Functions may have default values for some arguments.
- Use the built-in function `help` to get help for a function.
- The Jupyter Notebook has two ways to get help.
- Every function returns something.
- Python reports a syntax error when it can't understand the source of a program.
- Python reports a runtime error when something goes wrong while a program is executing.
- Fix syntax errors by reading the source code, and runtime errors by tracing the program's execution.

## Use comments to add documentation to programs.

In [None]:
# This sentence isn't executed by Python.
adjustment = 0.5   # Neither is this - anything after '#' is ignored.

## A function may take zero or more arguments.

*   We have seen some functions already --- now let's take a closer look.
*   An *argument* is a value passed into a function.
*   `len` takes exactly one.
*   `int`, `str`, and `float` create a new value from an existing one.
*   `print` takes zero or more.
*   `print` with no arguments prints a blank line.
    *   Must always use parentheses, even if they're empty,
        so that Python knows a function is being called.

Try out the following print statements:  
>  print('before')  
>  print()  
>  print('after')  
>  print(print)

## Commonly-used built-in functions include `max`, `min`, and `round`.

*   Use `max` to find the largest value of one or more values.
*   Use `min` to find the smallest.
*   Both work on character strings as well as numbers.
    *   "Larger" and "smaller" use (0-9, A-Z, a-z) to compare letters.
    
Try out the following lines, but guess the result before executing the code.  

>  print(max(1, 2, 3))  
>  print(min('a', 'A', '0'))

## Functions may only work for certain (combinations of) arguments.

*   `max` and `min` must be given at least one argument.
    *   "Largest of the empty set" is a meaningless question.
*   And they must be given things that can meaningfully be compared.

In [11]:
max(1, 'a')

## Functions may have default values for some arguments.

*   `round` will round off a floating-point number.
*   By default, rounds to zero decimal places.
*   We can specify the number of decimal places we want.

Try out the following code, but guess the result first:  
>  round(3.712)   
>  round(3.712, 1)  
>  round(3.712, -1)  

## Use the built-in function `help` to get help for a function.

*   Every built-in function has online documentation.

In [None]:
help(round)

## Python reports a syntax error when it can't understand the source of a program.

Try to run the following:  
>  name = 'Feng  
>  print("hello world"

*   The message indicates a problem on first line of the input ("line 1").
    *   In this case the "ipython-input" section of the file name tells us that
        we are working with input into IPython,
        the Python interpreter used by the Jupyter Notebook.
*   The `<ipython-input>` part of the filename indicates that the error occurred in cell 6 of our Notebook.
*   Next is the problematic line of code, indicating the problem with a `^` pointer.

## <a name='runtime-error'></a> Python reports a runtime error when something goes wrong while a program is executing.

In [None]:
age = 53
remaining = 100 - aege # mis-spelled 'age'

*   Fix syntax errors by reading the source and runtime errors by tracing execution.

## The Jupyter Notebook has two ways to get help.

*   Place the cursor anywhere in the function invocation 
    (i.e., the function name or its parameters),
    hold down `shift`,
    and press `tab`.
*   Or type a function name with a question mark after it.

## Every function returns something.

*   Every function call produces some result.
*   If the function doesn't have a useful result to return,
    it usually returns the special value `None`.

In [None]:
result = print('example')
print('result of print is', result)

## What Happens When
What is the final value of `radiance`?  

>  radiance = 1.0  
>  radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))  

## Spot the Difference
Predict what each of the `print` statements in the program below will print.  

>  easy_string = "abc"  
>  print(max(easy_string))  
>  rich = "gold"  
>  poor = "tin"  
>  print(max(rich, poor))  
>  print(max(len(rich), len(poor)))  

# Reflection exercise

Over coffee, reflect on and discuss the following:
* What are the different kinds of errors Python will report?
* Did the code always produce the results you expected? If not, why?
* Is there something we can do to prevent errors when we write code?

# Libraries
**Teaching:** 10 minutes  
**Exercises:** 10 minutes

## Questions:
- How can I use software that other people have written?
- How can I find out what that software does?

## Objectives:
- Explain what software libraries are and why programmers create and use them.
- Write programs that import and use libraries from Python's standard library.
- Find and read documentation for standard libraries interactively (in the interpreter) and online.

# Key points:
- Most of the power of a programming language is in its libraries.
- A program must import a library module in order to use it.
- Use `help` to learn about the contents of a library module.
- Import specific items from a library to shorten programs.
- Create an alias for a library when importing it to shorten programs.

## Most of the power of a programming language is in its libraries.

*   A *library* is a collection of files that contain functions for use by other programs.
*   Libraries may also contain data values and other things.
*   The Python [standard library](https://docs.python.org/3/library/) is an extensive suite of modules that comes with Python itself.
*   Many additional libraries are available from [PyPI](https://pypi.python.org/pypi/) (the Python Package Index).

## A program must import a library module before using it.

*   Use `import` to load a library into a program's memory.
*   Then refer to things from the library as `library_name.thing_name`.
    *   Python uses `.` to mean "part of".
*   The `math` library is a commonly-used part of the standard Python library.  
*   The `math` library contains functions typically found on a calculator.

Try out the following code below:  

>  import math  
>  
>  print('pi is', math.pi)  
>  print('cos(pi) is', math.cos(math.pi))

*   We have to refer to each item with the module's name.
*   `math.cos(pi)` won't work, since `pi` is not defined yet.

## Use `help` to learn about the contents of a library module.

*   Works just like help for a function.  

>  help(math)

## Import specific items from a library module to shorten programs.
*   Use `from ... import ...` to load only specific items from a library module.
*   Then refer to them directly without library name as prefix.

>  from math import cos, pi  
>  
>  print('cos(pi) is', cos(pi))  

## Create an alias to shorten programs.

*   Use `import ... as ...` to give a library a short *alias* while importing it.
*   Then refer to items in the library using that shortened name.

>  import math as m  

>  print('cos(pi) is', m.cos(m.pi))  

*   Aliases are commonly used for libraries that are frequently used or have long names.
    *   E.g., `matplotlib` plotting library is often aliased as `plt`.
*   But can make programs harder to understand, since readers must learn the aliases.

## Importing using the wildcard (\*)
*   All the parts of a library can be imported using the asterisk (\*)
*   This approach is not recommended, since the origin of variables is less clear

>  from math import *  
>  print(pi)

## Explore the Math Module.

What function from the `math` module can you use to calculate a square root?  

## Exponents in Python
What is another way to compute the square root of a number?  
The double asterisk \** is used rather than the carrot ^ for exponents in Python.

## Locating the right libraries

Let's say that you want to select a random character from a string:  

bases = 'ACTTGCTTGAC'  

Which python library could help you? Try to write a program to print a random letter from this string.

## When Is Help Available?

When a colleague of yours types `help(math)`, Python reports an error:

> ~~~
> NameError: name 'math' is not defined
> ~~~

What has your colleague forgotten to do?

## There Are Many Ways To Import Libraries

Match the following print statements with the appropriate library calls.

> Print commands:
>
> 1. `print("sin(pi/2) =", sin(pi/2))`
> 2. `print("sin(pi/2) =", m.sin(m.pi/2))`
> 3. `print("sin(pi/2) =", math.sin(math.pi/2))`

> Library calls:
>
> 1. `from math import sin, pi`
> 2. `import math`
> 3. `import math as m`
> 4. `from math import *`

# Lists
**Teaching:** 10 minutes  
**Exercises:** 10 minutes

## Questions:
- How can I store multiple values?

## Objectives:
- Explain why programs need collections of values.
- Write programs that create, index, slice, and modify lists through assignment and method calls.

## Key points:
- A list stores many values in a single structure.
- Use an item's index to fetch it from a list.
- Lists' values can be replaced by assigning to them.
- Appending items to a list lengthens it.
- Use `del` to remove items from a list entirely.
- The empty list contains no values.
- Lists may contain values of different types.
- Character strings can be indexed like lists.
- Character strings are immutable.
- Indexing beyond the end of the collection is an error.

## A list stores many values in a single structure.
*   Doing calculations with a hundred variables called `pressure_001`, `pressure_002`, etc., is cumbersome.
*   Use a *list* to store many values together.
    *   Contained within square brackets `[...]`.
    *   Values are separated by commas `,`.
    *   Lists can be used to create collections of any object type.
*   Use `len` to find out how many values are in a list.

Try out the following code, which creates a list of pressure values.  

> pressures = [0.273, 0.275, 0.277, 0.275, 0.276]  
> print('pressures:', pressures)  
> print('length:', len(pressures))  

## Use an item's index to fetch it from a list.

*   Just like we did with strings.

> print('zeroth item of pressures:', pressures[0])  
> print('fourth item of pressures:', pressures[4])  

## Lists' values can be replaced by assigning to them.

*   Use an index expression on the left of assignment to replace a value.

> pressures[0] = 0.265  
> print('pressures is now:', pressures)

## Appending items to a list lengthens it.

*   Use `list_name.append` to add items to the end of a list.

Try out the following code:  

> primes = [2, 3, 5]  
> print('primes is initially:', primes)  
> primes.append(7)  
> print('primes has become:', primes)  

*   `append` is a *method* the belong to *list* objects.
    *   Like a function, but tied to a particular object.
*   Use `object_name.method_name` to call methods.
    *   Deliberately resembles the way we refer to things in a library.
*   We will meet other methods of lists as we go along.
    *   Use `help(list)` for a preview.

## Lists can be combined using extend
The `extend` method is similar to `append`, but it allows you to combine two lists.  

For example, try out the following code:  

> teen_primes = [11, 13, 17, 19]  
> middle_aged_primes = [37, 41, 43, 47]  
> print('primes is currently:', primes)  
> primes.extend(teen_primes)  
> print('primes has now become:', primes)  
> primes.append(middle_aged_primes)  
> print('primes has finally become:', primes)  

## Use `del` to remove items from a list entirely.
*   `del list_name[index]` removes an item from a list and shortens the list.
*   Not a function or a method, but a statement in the language.

Try out the following code:  

> primes = [2, 3, 5, 7]  
> print('primes before removing last item:', primes)  
> del primes[3]  
> print('primes after removing last item:', primes)  

## The empty list contains no values.
*   Use `[]` on its own to represent a list that doesn't contain any values.
    *   "The zero of lists."
*   Helpful as a starting point for collecting values.

> l = [ ]   
> l.append('first value')  
> l.append('second')

## Lists may contain values of different types.
*   A single list may contain numbers, strings, and anything else.

In [None]:
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

## Character strings can be indexed like lists.
*   Get single characters from a character string using indexes in square brackets.

> element = 'carbon'  
> print('zeroth character:', element[0])  
> print('third character:', element[3])  

## Character strings are immutable.
*   Cannot change the characters in a string after it has been created.
    *   *Immutable*: can't be changed after creation.
    *   In contrast, lists are *mutable*: they can be modified in place.
*   Python considers the string to be a single value with parts, not a collection of values.

In [None]:
element[0] = 'C'

*   Lists and character strings are both *collections*.

## Indexing beyond the end of the collection is an error.

*   Python reports an `IndexError` if we attempt to access a value that doesn't exist.
    *   This is a kind of [runtime error]({{ page.root }}/04-built-in/#runtime-error).
    *   Cannot be detected as the code is parsed
        because the index might be calculated based on data.

In [None]:
print('99th element of element is:', element[99])

## Fill in the blanks.
Fill in the blanks so that the program below produces the output shown.  

> values = ______  
> values.______(1)  
> values.______(3)  
> values.______(5)  
> print('first time:', values)  
> values = values[____]  
> print('second time:', values)  
> first time: [1, 3, 5]  
> second time: [3, 5]  

## Skipping through a list
We can write a slice as `low:high:stride`.  
What does `stride` do?  
Try the following code to see what is printed.    

> element = 'fluorine'  
> print(element[::2])  
> print(element[::-1])  

# For Loops
**Teaching:** 10 minutes  
**Exercises:** 15 minutes

## Questions:
- How can I make a program to repeat a task?

## Objectives:
- Explain what for loops are normally used for.
- Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.
- Write for loops that use the Accumulator pattern to aggregate values.

## Key points:
- A *for loop* executes commands once for each value in a collection.
- A `for` loop is made up of a collection, a loop variable, and a body.
- The first line of the `for` loop must end with a colon, and the body must be indented.
- Indentation is always meaningful in Python.
- Loop variables can be called anything, but it is best to use a meaningful name.
- The body of a loop can contain many statements.
- Use `range` to iterate over a sequence of numbers.
- The Accumulator pattern turns many values into one.

## A *for loop* executes commands once for each value in a collection.

*   Doing calculations on the values in a list one by one
    is as painful as working with `pressure_001`, `pressure_002`, etc.
*   A *for loop* tells Python to execute some statements once for each value in a list,
    a character string,
    or some other collection.
*   "for each thing in this group, do these operations"

In [None]:
for number in [2, 3, 5]:
    print(number)

*   This `for` loop is equivalent to:

In [None]:
print(2)
print(3)
print(5)

## A `for` loop is made up of a collection, a loop variable, and a body.

In [None]:
for number in [2, 3, 5]:
    print(number)

*   The collection, `[2, 3, 5]`, is what the loop is being run on.
*   The body, `print(number)`, specifies what to do for each value in the collection.
*   The loop variable, `number`, is what changes for each *iteration* of the loop.
    *   The "current thing".

## The first line of the `for` loop must end with a colon, and the body must be indented.
*   The colon at the end of the first line signals the start of a *block* of statements.
*   Python uses indentation to show *nesting*.
    *   Any consistent indentation is legal, but almost everyone uses four spaces.

In [None]:
for number in [2, 3, 5]:
print(number)

*   Indentation is always meaningful in Python.

In [None]:
firstName = "Jon"
  lastName = "Smith"

*   This error can be fixed by removing the extra spaces
    at the beginning of the second line.

## Loop variables can be called anything.

*   As with all variables, loop variables are:
    *   Created on demand.
    *   Meaningless: their names can be anything at all.

In [None]:
for kitten in [2, 3, 5]:
    print(kitten)

## The body of a loop can contain many statements.
*   Loops should normally only be a few lines long.
*   It is hard for human beings to keep larger chunks of code in mind.

Try out the following code, which prints the square and cube of the first few primes:

> primes = [2, 3, 5]  
> for p in primes:  
>     squared = p \** 2  
>     cubed = p \** 3  
>     print(p, squared, cubed)  

## Use `range` to iterate over a sequence of numbers.

*   The built-in function [`range`](https://docs.python.org/3/library/stdtypes.html#range) produces a sequence of numbers.
    *   *Not* a list: the numbers are produced to make looping more efficient.
*   `range(N)` is the numbers 0..N-1
    *   Exactly the legal indices of a list or character string of length N

> for number in range(0, 3):  
>    print(number)

## The Accumulator pattern turns many values into one.
*   A common pattern in programs is to:
    1.  Initialize an *accumulator* variable to zero, the empty string, or the empty list.
    2.  Update the variable with values from a collection.
    
The following code will sum the first 10 integers.  
> total = 0  
> for number in range(10):  
>   total = total + (number + 1)  
> print(total)  

*   Read `total = total + (number + 1)` as:
    *   Add 1 to the current value of the loop variable `number`.
    *   Add that to the current value of the accumulator variable `total`.
    *   Assign that to `total`, replacing the current value.
*   We have to add `number + 1` because `range` produces 0..9, not 1..10.

## Practice Accumulating
Fill in the blanks in each of the programs below to produce the indicated result.

The total length of the strings in the list: ["red", "green", "blue"] should be 12.
> total = 0  
> for word in ["red", "green", "blue"]:  
>     ______ = ______ + len(word)  
> print(total)  

## Cumulative Sum
Reorder and properly indent the lines of code below so that they print a list with the cumulative sum of data. The result should be `[1, 3, 5, 10]`.

> cumulative.append(sum)  
> for number in data:  
> cumulative = [ ]  
> sum += number  
> sum = 0  
> print(cumulative)  
> data = [1,2,2,5]  

## Final Day 1 Challenge
In calculus classes, students learn about infinite sequences and series. One famous series that converges is equal to the natural logarithm of 2.

$ln(2)=\sum\limits_{n=1}^{\infty} {\frac{(-1)^{n+1}}{n}} = 1 - \frac{1}{2} + \frac{1}{3} - \frac{1}{4} + ... \approx0.69315$

Write a Python code to assess the sum of the first 10,000 numbers in this series.  

$\sum\limits_{n=1}^{10,000} {\frac{(-1)^{n+1}}{n}} $

Compare the result with the actual value. Think about how long it would tak to assess this series with a calculator.

# Day 02

Today we will work through the following notebook. Anywhere you find `___`, replace the three underscores with the correct content.

# ===========================================================
# 01 Lists

## Learning Objectives

**Question:** How can I store mutliple values?

**Objectives:** 

## A list stores many values in a single structure.

*   Doing calculations with a hundred variables called `pressure_001`, `pressure_002`, etc.,
    would be at least as slow as doing them by hand.
*   Use a *list* to store many values together.
    *   Contained within square brackets `[...]`.
    *   Values separated by commas `,`.
*   Use `len` to find out how many values are in a list.

In [None]:
# 1. Create a list called pressures with the following
#    values: 0.273, 0.275, 0.277, 0.275, 0.276
pressures = [___]
print('pressures:', pressures)

# 2. Use the `len()` function to print the length of pressures
print('length:', ___)

## Use an item's index to fetch it from a list.

- You can select an item from a list by using an index just like lists.
  - Don't forget zero indexing.

In [None]:
# 1. Print the first (zeroth) item in `pressures`
print('zeroth item of pressures:', pressures[____])

# 2. Print the last item in `pressures`


## Replacing Lists' values

- You can replace a value in a list by using this same indexing convention.

In [None]:
pressures[0] = 0.265
print('pressures is now:', pressures)

## Adding items to a list

- Use `list_name.append` to add items to the end of a list.

In [None]:
primes = [2, 3, 5]
print('primes is initially:', primes)

# 1. Append 7 and 9 to the end of `primes`
primes.append(___)
___
print('primes has become:', primes)

- `append` is a *method* of lists.
  - A *method* is a function that is attached only to the object that owns it.
  - Access an objects methods by using `.`.
- `extend` is similar to `append`, but it allows you to combine two lists.

### Exercise: Run the following code and discuss with a neighbor the difference.

In [None]:
teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print('primes is currently:', primes)
primes.extend(teen_primes)
print('primes has now become:', primes)
primes.append(middle_aged_primes)
print('primes has finally become:', primes)

### Answer
While `extend` maintains the “flat” structure of the list, appending a list to a list makes the result two-dimensional - the last element in `primes` is a list, not an integer.

## Removing items from a list
- `del list_name[index]` removes an item from a list and shortens the list.
- `del` is not a function or a method; it's a Python key word.

In [None]:
primes = [2, 3, 5, 7, 9]
print('primes before removing last item:', primes)

# 1. Delete the last item in `primes`
___
print('primes after removing last item:', primes)

## Empty Lists
- Use `[]` on its own to represent a list that doesn't contain any values.
- Empty lists are helpful as a starting point for collecting values.

## Lists may contain values of different types.
- A single list may contain numbers, strings, and anything else.

In [None]:
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

## Character strings can be indexed
- Get single characters from a character string using indexes in square brackets.

In [None]:
element = 'carbon'
print('zeroth character:', element[0])

# 1. Find the third character in `element`
print('third character:', element[___])

## Character strings are immutable...lists are mutable
- You cannot change the characters in a string after you create it.
- *Immutable* = cannot be changed after creation.
- *Mutable* = can be changed after creation.

In [None]:
element[0] = 'C'

## Indexing beyond the end of the collection
- Lists and character strings are both *collections*.
- Python reports an `IndexError` if we attempt to access a value that doesn't exist.

In [None]:
print('99th element of element is:', element[99])

## Exercise:

What does the following program print?

```python
element = 'helium'
print(element[-1])
```

1. How does Python interpret a negative index?
1. How can you display all elements but the last one without changing the list values?
> Hint: You will need to use *slicing*. *Slicing* is done by using `:`. Try the following two code example to see *slicing* in action: `element[2:4]` and `element[1:]`.

# ===========================================================
# 02 Loops

**Objectives:**
- Explain what for loops are used for.
- Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.
- Analyze a for loops that uses the Accumulator pattern to aggregate values.

## A *for loop* executes commands once for each value in a collection.

- Doing calculations on the values in a list one by one
  is painful and does not help us with productivity
- A *for loop* tells Python to execute some statements over and over so
  we can to the same task on each item in a list of collect.

In [None]:
for number in [2, 3, 5]:
    print(number)

- This `for` loop is equivalent to:

In [None]:
print(2)
print(3)
print(5)

## Anatomy of a `for` loop

- The collection: `[2, 3, 5]` (what the loop is run on)
- The body: `print(number)` (what the loop should do for each item in the collection)
- The loop variable: `number` (what changes hold that *iteration's* item - the "current thing")

In [None]:
# 1. Put the collection back in this loop
for number in ___:
    print(number)

## `for` loop "punctuation"

- A `for` loop must end with a colon
- The colon at the end of the first line signals the start of a *block* of statements.
- Blocks must be indented
  - Any consistent indentation is legal, but common convention is four spaces

In [None]:
for number in [2, 3, 5]:
print(number)

- Python always takes note of white space and indentation.

In [None]:
# 1. Run this cell to see the error
# 2. Fix the error by removing the extra
#    space and run it again
firstName = "Jon"
  lastName = "Smith"

## Use Descriptive Loop variables
- As with all variables, loop variables are:
  - Created on demand.
  - Meaningless: their names can be anything at all.

In [None]:
# 1. Change the loop variable to a more
#    meaningful name
for kitten in [2, 3, 5]:
    print(kitten)

## Keep the Body of a Loop Short

- Loops can have as many lines as you want
- Try to keep loops short for readable code
  - A good guidline is no more than twelve lines

In [None]:
primes = [2, 3, 5]
# 1. Run this loop and look at the error.
# 2. Fix the loop so it runs correctly
for p in primes
    squared = p ** 2
    cubed = p ** 3
    print(p, squared, cubed)

## Use `range` to Loop over a Sequence of Numbers

- The built-in function [`range()`](https://docs.python.org/3/library/stdtypes.html#range) produces a sequence of numbers.
  - `range()`*does not* a list
    - Numbers are generated on demand
- `range(N)` is the numbers `0...N-1`
  - This matches Python indexing for collections

In [None]:
print('a range is not a list: range(0, 3)')
for number in range(0, 3):
    print(number)

In [None]:
# 1. Run this loop
#    Why is the output different than above?
print(range(0, 3))

## A Common Loop Pattern: The Accumulator Pattern
- A common pattern in many programs is as follows:
  1. Initialize an *accumulator* variable to zero, the empty string, or the empty list.
  2. Update or add values from a collection to the variable.

In [None]:
# Sum the first 10 integers.
total = 0
for number in range(10):
   total = total + (number + 1)
print(total)

## Exercise
1. Break down the steps of the loop. What is happening at each step?
2. Why is `total` computed as `total + (number + 1)` and not `total + number`?
3. Change the accumulator value `total` to be an empty list. Append each new value
   to the list and print the final list.

**Objectives:**
- Explain what for loops are used for.
- Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.
- Analyze a for loops that uses the Accumulator pattern to aggregate values.

# ===========================================================
# 03 Conditionals
*Making decisions with Python*

## Objectives
- Trace the execution of unnested conditionals and conditionals inside loops.

## Use `if` Statements to Control Whether or not the Program Executes Code
- Structure is similar to a `for` statement:
  - First line opens with `if` and ends with a colon
  - The body contains one or more statements and is indented

In [None]:
mass = 3.54
if mass > 3.0:
    print(mass, 'is large')

In [None]:
mass = 2.07
if mass > 3.0:
    print (mass, 'is large')

## You Can Nest Structures Within Each Other
- Follow Python's indentation rules to make more complicated code
- This is useful when we do not know all the items within a collection

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    # 1. Write an if statement within this
    #    loop that prints "m is large" for
    #    each number, m, that is larger than 3.0.
    ___

## Use `else` to Execute Code when an `if` condition is *not* True

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

## Use `elif` to Create Multiple Tests
- `elif` allows us to provide many choices
- `elif` is short for "else if"
- `elif` is always associated with an `if`.
- `elif` must come before the `else`.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    # 1. Print "m is HUGE" for each m
    #    in the list above
    ___
    elif m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

## Conditions are Tested Once and in Order

In [None]:
grade = 85
if grade >= 70:
    print('grade is C')
elif grade >= 80:
    print('grade is B')
elif grade >= 90:
    print('grade is A')

- Conditionals *do not* automatically go back and re-evaluate if values change.

In [None]:
velocity = 10.0
if velocity > 20.0:
    print('moving too fast')
else:
    print('adjusting velocity')
    velocity = 50.0

- You can use conditionals in a loop to update the values of variables.

In [None]:
velocity = 10.0
for i in range(5): # execute the loop 5 times
    print(i, ':', velocity)
    if velocity > 20.0:
        print('moving too fast')
        # 1. Decrease the velociy by subtracting
        #    5 from `velocity`'s value.
        ___
    else:
        print('moving too slow')
        # 2. Update `velocity` by adding
        # 10 to its current value
        ___
print('final velocity:', velocity)

## Exercise

1. What does this program print?
1. Copy the code into the next box
   and change the program so it prints `0.0`

```python
pressure = 71.9
if pressure > 50.0:
    pressure = 25.0
elif pressure <= 50.0:
    pressure = 0.0
print(pressure)
```

## Objectives
- Trace the execution of unnested conditionals and conditionals inside loops.

# ===========================================================
# 04 Functions

## Objectives
- Explain and identify the difference between function definition and function call.
- Write a function that takes a fixed number of arguments and produces a single result.

## Break Programs down into Functions to Make them Easier to Understand
- Functions help the programmer encapsulate their code
- Easier to read and understand the code
  - Others
  - You six months later
- Functions make the code reusable

## Define a Function using `def` with a Name, Parameters, and a Block of Code
- Begin the definition of a new function with `def`.
- Follow `def` with the name of the function.
- After the name, put the *parameters* (inputs) in parentheses.
  - Empty parentheses if the function doesn't take any inputs
- End the first line with a colon.
- The content of the function is an indented block of code.

In [None]:
def print_greeting():
    print('Hello!')

## Defining a Function Does not run it
- This simply creates a new functions within the environment
  - Like assigning a value to a variable
- You must call the function to execute the code it contains.

In [None]:
print_greeting()

## Arguments and Parameters
- Functions are most useful when they can operate on different data.
- You specify *parameters* within parenthesis when defining a function.
- You pass *paramters* to a function when you call it
- These paramters match with variables you defined in the function body
- Python matches parameters in the order they were defined by default

In [None]:
# 1. Create a string from the numbers in
#    YYYY/MM/DD format and put it in joined
# Hint: str(year) turns the year (as a number)
#       into year (as a string)
def print_date(year, month, day):
    joined = ___
    print(joined)

print_date(1871, 3, 19)

In [None]:
1871/3/19

- If you want to change the ordering of parameters,
  you can exlicitly name them when you call the function.

In [None]:
# 1. Call the function using month, day, year ordering
print_date(month=3, ___)

## Returning Results
- Functions return a result using `return`.

In [None]:
def average(values):
    if len(values) == 0:
        return None
    return sum(values) / len(values)

In [None]:
# 1. Find the average of 1, 3, and 4
a = average(___)
print('average of actual values:', a)

In [None]:
print('average of empty list:', average([]))

- Every functions returns something.
- If there is no value to return, the function returns `None`.

In [None]:
result = print_date(1871, 3, 19)
print('result of call is:', result)

## Exercise
1. Fill in the blanks to create a function that takes a list of numbers as an argument
   and returns the first negative value in the list.
2. What does your function do if the list is empty?
3. Test your function with several different lists to see how well it works.

```python
def first_negative(values):
    for v in ____:
        if ____:
            return ____
```

## Objectives
- Explain and identify the difference between function definition and function call.
- Write a function that takes a fixed number of arguments and produces a single result.

# ===========================================================
# 05 Reading Tabular Data Using Pandas

# Objectives
- Import the Pandas library.
- Use Pandas to load a CSV data set.
- Get summary information from a Pandas DataFrame.
- Download online data using Pandas.

In [None]:
# Before beginning this lesson, run this cell to download
# the figures for this notebook
! mkdir img
!wget -P img/ https://pandas.pydata.org/pandas-docs/stable/_images/01_table_dataframe.svg
!wget -P ./img/ https://pandas.pydata.org/pandas-docs/stable/_images/02_io_readwrite.svg

## Pandas
- Pandas is a widely-used Python library for statistics and plotting
- Its focus is tabular data
- It is similar to R in that it uses a structure called a dataframes.
- Dataframes can contain multiple data types

![Data Frame ](img/01_table_dataframe.svg)

Source: <https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html#getting-started>

- Pandas can read all kinds of tabular data

![Data Processed by Pandas](img/02_io_readwrite.svg)

Source: <https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html#getting-started>

In [None]:
# 1. Run this cell to download the data
# 2. Open the downloaded files to get a sense of the data

# Downloads a zip file from Carpentries webpage with Gapminder data
! wget http://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip .
# Unzips the file
! unzip python-novice-gapminder-data.zip

- Load Pandas with `import pandas as pd`

In [None]:
#1. Import the pandas library
___

# 1. Use `read_csv` to read the gapminder data
data = pd.read_('data/gapminder_gdp_oceania.csv')
print(data)

- The columns in a dataframe are the observed variables, and the rows are the observations.
- Pandas uses backslash `\` to show wrapped lines when output is too wide to fit the screen.

## `index_col`
- Use `index_col` to specify that a column's values should be used as row identifiers.

In [None]:
data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
print(data)

- Use `DataFrame.info` to find out more about a dataframe.

In [None]:
# `info()` is a method of data
data.info()

*   This is a `DataFrame`
*   Two rows named `'Australia'` and `'New Zealand'`
*   Twelve columns, each of which has two actual 64-bit floating point values.
*   Uses 208 bytes of memory.

## Attributes
- The `DataFrame.columns` attribute stores information about the dataframe's columns.
- Note that this is a varaible, *not* a method.
  - It doesn't have `()`
*   Called a *member variable*, just a *member*, or an *attribute*.

In [None]:
print(data.columns)

- Use `DataFrame.T` to transpose a dataframe.

*   Sometimes want to treat columns as rows and vice versa.
*   Transpose doesn't copy the data, just changes the program's view of it.

In [None]:
print(data.T)

## Summary Statistics
- Use `DataFrame.describe` to get summary statistics about data.

- DataFrame.describe() gets the summary statistics of only the columns that have numerical data. 
  All other columns are ignored.

In [None]:
# 1. Print the summary statistics for our dataframe
print(___)

# Exercise
1. `read_csv()` can download data directly from a webpage.
   Download a dataset called the Titanic Data Set by passing
   the following URL to `read_csv()` instead of a file path.
   Put the new dataframe in a variable called `titanic`.
2. Use `titanic.head()` to have a look at the new dataframe.

**Data URL:**
<https://github.com/pandas-dev/pandas/raw/master/doc/data/titanic.csv>

# Objectives
- Import the Pandas library.
- Use Pandas to load a CSV data set.
- Get summary information from a Pandas DataFrame.
- Download online data using Pandas.

# ===========================================================
# 06 Working with Data

## Objectives
- Select individual values from a Pandas dataframe.
- Select entire rows or entire columns from a dataframe.
- Select a subset of both rows and columns from a dataframe in a single operation.
- Filter data based on values in a dataframe.
- Apply a split-apply-combine workflow to a dataframe.
- Convert a dataframe between wide and long data formats.

In [None]:
# Before reading this lesson, run this code to download
# the notebook figures
!mkdir img
!wget -P img/ https://pandas.pydata.org/pandas-docs/stable/_images/06_groupby1.svg

## Some Notes on DataFrames

- A DataFrame is a collection of Series
  - The DataFrame is the way Pandas represents a table, and Series is the data-structure
    Pandas use to represent a column.
- Pandas is built on top of the Numpy library
    - We get to use those function too
- Benefits of using Pandas:
  - Interface to access individual records
  - Proper handling of missing values
  - Relational-databases operations between DataFrames

## Selecting values
- Use `DataFrame.iloc[ROW, COLUMN]` to select values by their numerical position

In [None]:
import pandas as pd
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
print(data.iloc[0, 0])

- Use `DataFrame.loc[..., ...]` to select values by their row/column labels.

In [None]:
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
print(data.loc["Albania", "gdpPercap_1952"])

- Use `:` on its own to mean all columns or all rows.
  - This follows Python's slicing notation.

In [None]:
print(data.loc["Albania", :])

- We get the same result printing `data.loc["Albania"]` (without a second index).

In [None]:
print(data.loc[:, "gdpPercap_1952"])

- Would get the same result printing `data["gdpPercap_1952"]`
- Slicing works with labels as well as numerical positions

In [None]:
# 1. Use slicing to select GDP data from Italy to Poland
#    and from 1962 to 1972.j
print(data.loc[___])

- Slicing using `loc` is inclusive at both ends
  - This differs from slicing using `iloc`  

## Putting it Together
- The result from slicing can be used in further operations.

In [None]:
# 1. Find the maximum GDP for the above countries and above years
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max())

In [None]:
# 1. Find the minimum GDP for the above countries and above years
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']___)

## Filtering Data
- Use comparisons to select data based on values.
- Comparisons are applied element by element.
- They return a similarly-shaped dataframe of `True` and `False` values.

In [None]:
# Use a subset of data to keep output readable.
subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
print('Subset of data:\n', subset)

# Which values were greater than 10000 ?
print('\nWhere are values large?\n', subset > 10000)

- A dataframe full of Booleans is sometimes called a *mask* because of how it can be used.

In [None]:
mask = subset > 10000
print(subset[mask])

- NaNs (Not a Number) are ignored by operations like max, min, average, etc.

In [None]:
print(subset[subset > 10000].describe())

## Exercise: Split-Apply-Combine

### Group By

- A common data-wrangling technique is the split-apply-combine technique
![Split-Apply-Combine](img/06_groupby1.svg)

Source: <https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/06_calculate_statistics.html#min-tut-06-stats>

## GDP of European Countries
We will practice this technique with the Gapminder data. Suppose we want to have a clearer view on how the European countries split themselves according to their GDP.

1.  We will split the countries in two groups during the years surveyed,
    those who presented a GDP *higher* than the European average and those with a *lower* GDP.
2.  We then estimate a *wealthy score* based on the historical (from 1962 to 2007) values,
    where we count how many times a country participated in the groups of *lower* or *higher* GDP.

**Part I**

Type the following code and run it.

```python
wealth_score = data[data > data.mean()].count(axis=1) / len(data.columns)
wealth_score
```

- We filtered all the countris that were above the mean GDP.
- We then counted all the non-NaN values.
  - Our filter used NaN's for years a country was equal to or less than the mean.
- `axis` tells `count()` whether or not to count across rows or columns.
  - `0` means count non-NaN's in each column
  - `1` means count non-NaN's in each row (what we did)
  - I like to think about what is being collapsed
- Finally, we devided the counts by the number of years (columns) to get a percentage score,

**Part II**

Use `groupby()` to sum the financial contribution of wealthy countries in different wealth-score categories across the years surveyed. Type the following code and run it.

```python
data.groupby(wealth_score).sum()
```

## Tidy Data
- [Tidy Data](http://vita.had.co.nz/papers/tidy-data.pdf) principles are guidlines for organizing data that makes analysis on a computer more efficient and effective.
- Keep the following principles in mind when organizing your data:
  - Each variable has its own column
  - Each observation has its own row
  - Each value must have its own cell (atomic values)
  - Each type of observational unit forms a table

## Long and Wide Formats
- Pandas has several functions that help us rearrange our data when we need to change its structure.
  - This frequently occurs when we need to plot our data.
- When data is in *wide format*, each variable has its own column.
  - Our data is currently in *wide format*

In [None]:
data.head()

- When data is in *long format*, multiple columns
  are melted into a single columns and entries are repeated.
- To change data to *long format*, `melt()` the columns

In [None]:
# Country is not currently a variable,
# but an index
long = data.reset_index().melt(id_vars='country', var_name='GDP_Year').sort_values(by='country')
long

- `pivot()` the data to go back to *wide format*

In [None]:
long.pivot(index='country', columns='GDP_Year')

## Objectives
- Select individual values from a Pandas dataframe.
- Select entire rows or entire columns from a dataframe.
- Select a subset of both rows and columns from a dataframe in a single operation.
- Filter data based on values in a dataframe.
- Apply a split-apply-combine workflow to a dataframe.
- Convert a dataframe between wide and long data formats.

# ===========================================================
# 09 Plotting

# Objectives
- Create a time series plot showing a single data set.
- Create a scatter plot showing relationship between two data sets.

## Matplotlib
- [`matplotlib`](https://matplotlib.org/) is a widely used scientific plotting library in Python.
- Pandas is built on top of Matplotlib
- A commonly used sub-library is called [`matplotlib.pyplot`](https://matplotlib.org/api/pyplot_api.html).
- The Jupyter Notebook will render plots inline if we ask it to using a "magic" command.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

## Matplotlib usage
- Here is the general outline for creating a plot
  using Matplotlib

In [None]:
time = [0, 1, 2, 3]
position = [0, 100, 200, 300]

plt.plot(time, position)
plt.xlabel('Time (hr)')
plt.ylabel('Position (km)');  # `;` is not Python. This makes the notebook surpress extra messages from Matplotlib

## Plotting with Pandas
- Since Pandas is built on matplotlib, we can plot data directly from a dataframe.
- Before plotting, we convert the column headings from a `string` to `integer` data type, since they represent numerical values

In [None]:
import pandas as pd

data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')

# Extract year from last 4 characters of each column name
# The current column names are structured as 'gdpPercap_(year)', 
# so we want to keep the (year) part only for clarity when plotting GDP vs. years
# To do this we use strip(), which removes from the string the characters stated in the argument
# This method works on strings, so we call str before strip()

years = data.columns.str.strip('gdpPercap_')

# Convert year values to integers, saving results back to dataframe

data.columns = years.astype(int)

# Look at it now

data

In [None]:
# Plot the data for Australia
data.loc['Australia'].plot();

In [None]:
# 1. Plot the data for New Zealand
__

# Tranposing for a Plot

- By default, dataframes plot with the rows as the X axis.
- We can transpose the data in order to plot multiple series.

In [None]:
data.T.plot()
plt.ylabel('GDP per capita')

## Plot Types
- Many styles of plot are available.

In [None]:
plt.style.use('ggplot')
data.T.plot(kind='bar')
plt.ylabel('GDP per capita');

- `.plot` has many attributes, including all the plot types it can produce

In [None]:
# List available plots
[method_name for method_name in dir(data.plot) if not method_name.startswith("_")]

- Let's make a scatter plot of Australia's GDP against New Zealand's GDP.

In [None]:
data.T.plot.scatter(x = ___, y = ___);

# Objectives
- Create a time series plot showing a single data set.
- Create a scatter plot showing relationship between two data sets.