# An Introduction to Python using Jupyter Notebooks
<a id='toc'></a>
## Table of Contents:
### Introduction
* [Python programs are plain text files](#python-programs)
* [Use the Jupyter Notebook for editing and running Python](#jn-editing-python) 
* [How are Jupyter Notebooks stored](#how-its-stored)
* [What you need to know](#need-to-know)
* [The Notebook has Control and Edit modes](#notebook-modes)
* [Use the keyboard and mouse to select and edit cells](#keyboard-mouse)
    * [Practice: Run your first Jupyter Notebook cells](#prac-jupyter)

### Using Markdown
* [The Notebook will turn Markdown into pretty-printed documentation](#markdown)
* [How to use Markdown](#how-to-markdown)
* [Markdown Exercises](#md-exercises)
* [Markdown Exercise Soultions](#md-solutions)

### Introduction to Python 1: Data
* [Intro to Python 1: Prerequisites](#python-1)
* [Programming with Python](#python-introduction)  
    * [What is Python and why would I use it?](#python-introduction)
    * [Special Characters](#python-sp-char)
    * [Variables](#variables)
        * [Practice](#prac-variable)
        * [Variables can be used in calculations](#variable-calc)
    * [Data Types](#data-types)
        * [Practice with Strings](#prac-strings)
        * [Practice with Numerics](#numbers)
        * [Practice with Booleans](#booleans)
    * [Python "Type" function](#py-type)
    * [Lists](#py-lists)
    * [Tuples](#py-tuples)
    * [Differences between lists and tuples](#lists-vs-tuples)
    * [Sets](#py-sets)
    * [Dictionaries](#py-dictionaries)
* [Python Statements](#py-statements)
    * [Conditionals](#py-conditionals)
    * [Loops](#py-loops)
        * [For Loops](#for-loops)
        * [While Loops](#while-loops)
* [Pandas: Working with Existing Data](#pandas)
    * [Pandas: Importing Data](#read-data)
    * [Pandas: Manipulating Data](#manipulate-data)
    * [Pandas: Writing Data](#write-data)
    * [Pandas: Working with more than file](#all-countries)
    * [Pandas: Slicing and selecting values](#slicing)

* Python I Exercises
    * [Problem 5: Assigning variables and printing values](#prob-variable)
    * [Problem 6: Print your first and last name](#py-concatenate)
    * [Problem 7: What variable type do I have?](#py-data-type)
    * [Problem 8: Creating and Working with Lists](#prob-lists)
    * [Problem 9: Creating and Accessing Dictionaries](#prob-dictionaries)
    * [Problem 10: Writing Conditional If/Else Statements](#prob-if-else)
    * [Problem 11: Reverse the string using a for loop](#prob-str-reverse-loop)
    * [Problem 12: Looping through Dictionaries](#prob-dict-loop)
    * [Problem 13: Checking assumptions about your data](#prob-unique)
    * [Problem 14: Slice and save summary statistics](#summary-stats)
* [Python I Exercise Soultions](#py1-solutions)

### Introduction to Python 2: A Tool for Programming
* [Intro to Python 2: Prerequisites](#python-2)
* [Setup if you are joining in for Python II](#python-2-setup)
* [Functions:](#functions)
    * [Why Use Functions?](#why-functions)
    * [Let's revist the reverse string and turn it into a function](#str-reverse-func)
    * [Let's look at a real world example of where constants could be used in functions](#temp-func)
* [Scripting](#scripting)
* Python II Exercises
* [Python II Exercise Soultions](#py2-solutions)

### Common Errors
* [Common Errors](#errors)



<a id='python-programs'></a>
### Python programs are plain text files
[Table of Contents](#toc)
*   They have the `.py` extension to let everyone (including the operating system) 
    know it is a Python program.
    *   This is convention, not a requirement.
*   It's common to write them using a text editor but we are going to use a [Jupyter Notebook](http://jupyter.org/).
*   There is a bit of extra setup, but it is well worth it because Jupyter Notebooks provide code completion
    and other helpful features such as markdown integration. This means you can take notes in this notebook while we are working throughout the session.
*   There are some pitfalls that can also cause confusion if we are unaware of them. While code generally runs from top to bottom, a Jupyter Notebook allows you to run items out of sequence. The order of code blocks running order will appear as a number to the left of the code text field.
*   Notebook files have the extension `.ipynb` to distinguish them from plain-text Python programs.

<a id='jn-editing-python'></a>
### Use the Jupyter Notebook for editing and running Python
[Table of Contents](#toc)
*   The [Anaconda package manager](http://www.anaconda.com) is an automated way to install the Jupyter notebook.
    *   See [the setup instructions]({{ site.github.url }}/setup/) for Anaconda installation 
        instructions.
*   It also installs all the extra libraries it needs to run.
*   Once you have installed Python and the Jupyter Notebook requirements, open a shell and type:

> `jupyter notebook`

*   This will start a Jupyter Notebook server and open your default web browser. 
*   The server runs locally on your machine only and does not use an internet connection.
*   The server sends messages to your browser.
*   The server does the work and the web browser renders the notebook.
*   You can type code into the browser and see the result when the web page talks to the server.
*   This has several advantages:
	- You can easily type, edit, and copy and paste blocks of code.
	- Tab completion allows you to easily access the names of things you are 
    using and learn more about them.
	- It allows you to annotate your code with links, different sized text, bullets, 
    etc to make it more accessible to you and your collaborators.
	- It allows you to display figures next to the code that produces them to 
    tell a complete story of the analysis.
    - **Note: This will modify and delete files on your local machine.**
*   The notebook is stored as JSON but can be saved as a .py file if you would
    like to run it from the bash shell or a python interpreter.
*   Just like a webpage, the saved notebook looks different to what you see when 
    it gets rendered by your browser.
    
<a id='how-its-stored'></a>
### How are Jupyter Notebooks Stored
[Table of Contents](#toc)
*   The notebook file is stored in a format called JSON.
*   Just like a webpage, what's saved looks different from what you see in your browser.
*   But this format allows Jupyter to mix software (in several languages) with documentation and graphics, all in one file.

<a id='need-to-know'></a>
### What you need to know for today's lesson
[Table of Contents](#toc)
**Jupyter Notebook options when running locally:**
![jn_options.png](jn_options.png)

**Jupyter Notebook options when running in Binder:**
![jn_binder_options.png](jn_binder_options.png)

*   Commands are only run when you tell them to run.  Some lessons require you to run their code in order.
*   The File menu has an option called "Revert to Checkpoint".  Use that to reset your file in case you delete something on accident.
*   The Kernel menu has an options to restart the interpreter and clear the output. 
*   The Run button will send the code in the selected cell to the interpreter.
*   The command pallate function will show you and let you set hotkeys.
*   Saving to browser storage is the button with a cloud and downward facing arrow. Click on this button frequently
to save progress as we go.
*   Restoring from browser storage is the button with a cloud and upward facing arrow. Click on this button if you
are disconnected or Binder quits working after you have refreshed the page. This will load your previously save work.

<a id='notebook-modes'></a>
### The Notebook has Control and Edit modes.
[Table of Contents](#toc)
*   Open a new notebook from the dropdown menu in the top right corner of the file browser page.
*   Each notebook contains one or more cells of various types.

> ## Code vs. Markdown
>
> We often use the term "code" to mean "the source code of software written in a language such as Python".
> A "code cell" in a Jupyter Notebook contains software code or that which is for the computer to read.
> A "markdown cell" is one that contains ordinary prose written for human beings to read.

*   If you press `esc` and `return` keys alternately, the outer border of your code cell will change from blue to green.
    *   The difference in color can be subtle, but indicate different modes of you notebook.
    *   <span style='color:blue'>Blue</span> is the command mode while <span style='color:green'>Green</span> is the
    edit mode.
*   If you use the "esc" key to make the surrounding box blue (enter into command mode) and then press the "H" key, a
 list of all the shortcut keys will appear.
*   When in command mode (esc/blue),
    *   The `B key` will make a new cell below the currently selected cell.
    *   The `A key` will make one above.
    *   The `X key` will delete the current cell.
*   There are lots of shortcuts you can try out and most actions can be done with the menus at the top of the page if you forget the shortcuts.
*   If you remember the `esc` and `H` shortcuts, you will be able to find all the tools you need to work in a notebook.

<a id='keyboard-mouse'></a>
### Use the keyboard and mouse to select and edit cells.
[Table of Contents](#toc)
*   Pressing the `return key turns the surrounding box green to signal edit mode and allows you type in the cell.
*   Because we want to be able to write many lines of code in a single cell, pressing the `return` key when the
border is green moves the cursor to the next line in the cell just like in a text editor.
*   We need some other way to tell the Notebook we want to run what's in the cell.
*   Pressing the `shift` and the `return` keys together will execute the contents of the cell.
*   Notice that the `return` and `shift` keys on the right of the keyboard are right next to each other.

<a id='prac-jupyter'></a>
### Practice: Running Jupyter Notebook Cell
[Table of Contents](#toc)

In [None]:
# Find the shortcut in the command pallate and run this cell.
message = "run me first"

If you ran the above cell correctly, there should be a number **1** inside the square brackets to the left of the cell. **Note:** the number will increase everytime you run the cell.

In [None]:
# Run this cell and see what the output is. 
print(message)

**If the output beneath the cell looks like this:**
```python
run me first
```
Then you have run the cells in the correct order and received the expected output. Why did we get this output?

**If the output beneath the cell looks like this:**
```python
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-a4525a899574> in <module>
      1 # Run this cell and see what the output is.
----> 2 print(message)

NameError: name 'message' is not defined
```

Then you have received an error. Read the error message to see what went wrong. Here we have a `NameError`because the computer does not know what the variable `message` is. We need to go back to the first code cell and run it correctly first to define the variable `message`. Then we should be able to run the second code cell and receive the first output (prints the string we assigned to the variable `message`).

**Getting Error Messages**:
Error messages are commonplace for anyone writing code. You should expect to get them frequently and learn how to
interpret them as best as possible. Some languages give more descriptive error messages than others, but in both
cases you are likely to find the answer with a quick Google search.

## Using Markdown
<a id='markdown'></a>
### The Notebook will turn Markdown into pretty-printed documentation.
[Table of Contents](#toc)
*   Notebooks can also render [Markdown][markdown].
    *   A simple plain-text format for writing lists, links and other things that might go into a web page.
    *   Equivalently, a subset of HTML that looks like what you would send in an old-fashioned email.
*   Turn the current cell into a Markdown cell by entering the command mode (esc/blue) and press the `M key`.
*   `In [ ]:` will disappear to show it is no longer a code cell and you will be able to write in Markdown.
*   Turn the current cell into a Code cell by entering the command mode (esc/blue) and press the `Y key`.

<a id='how-to-markdown'></a>
### How to use Markdown
[Table of Contents](#toc)
<div class="row">
  <div class="col-md-6" markdown="1">

**The asterisk is a special character in markdown. It will create a bulleted list.**
```markdown
Markdown syntax to produce output below.

*   Use asterisks
*   to create
*   bullet lists.
```

*   Use asterisks   
*   to create  
*   bulleted lists.  

**But what happens when I want to use asterisk in my text. We can use another special, the back slash `\`, also known as
 an escape charatcer. Place the back slash before any markdown special character without a space to use the special character in your text.**
```markdown
Markdown syntax to produce output below.

\*   Use asterisks
\*   to create
\*   bullet lists.
```

\*   Use asterisks  
\*   to create  
\*   bullet lists.

Note: Escape characters can change depending on the language you are writing in.

**Use can use numbers to create a numbered list:**
```markdown
Markdown syntax to produce numbered lists.
1.  Use numbers
1.  to create
1.  numbered lists.
```

1.  Use numbers
1.  to create
1.  numbered lists.

Note: That we did not have to type numbers in order but markdown still converted correctly in output. This is nice
because it saves us time when we modify or edit lists later because we do not have to renumber the entire list.

**Using differnt Headings to keep consistency through document:**
```markdown
Markdown syntax to produce headings.

# A Level-1 Heading
## A Level-2 Heading
### A Level-3 Heading
```

Print version of the three lines of markdown code from above.
# A Level-1 Heading
## A Level-2 Heading
### A Level-3 Heading

**Line breaks don't matter. But blank lines create new paragraphs.**
```markdown
**Markdown syntax:**

Line breaks
do not matter. _(accomplished by pressing the return key once)_

Sometimes though we want to include a line break without starting a new paragraph. We can accomplish this by including two spaces at the end of the line.

Here is the first line.  
The second line is on the second line but in same paragraph (no blank line).
```

**Print version of markdown code from above:**

Line breaks
don't matter. _(accomplished by pressing the return key once)_

Sometimes though we want to include a line break without starting a new paragraph. We can accomplish this by including two spaces at the end of the line.

Here is the first line.  
The second line is on the second line but in same paragraph (no blank line).

**Creating links in markdown:**

The information inside the `[...]` is what the user will see and the information inside the `(...)` is the pointer or url that the link will take the user to.
 ```markdown
**Markdown Syntax:**

[Create links](http://software-carpentry.org) with the following syntax `[...](...)`.
Or use [named links][data_carpentry].

_Notice the line below only defines the link and is not in printed output. Double click on the cell below this one if you don't believe me._
[data_carpentry]: http://datacarpentry.org 
```

**Output of markdown syntax:**

[Create links](http://software-carpentry.org) with `[...](...)`.  
Or use [named links][data_carpentry].

[data_carpentry]: http://datacarpentry.org

<a id='md-exercises'></a>
## Markdown Exercises
[Table of Contents](#toc)

### Creating Lists in Markdown
<a id='md-exercises-p01'></a>
**Problem 1: Creating Lists** Create a nested list in a Markdown cell in a notebook that looks like this:

1. Get funding.
1. Do work.
    *   Design experiment.
    *   Collect data.
    *   Analyze.
1. Write up.
1. Publish.

**Hint:**_Double click this cell to see the answer._

[Solution](#md-solutions-p01)

<a id='md-exercises-p02'></a>
### Math anyone?
**Problem 2: Math in python** What is displayed when a Python cell in a notebook that contains several calculations is executed? For example, what happens when this cell is executed?

In [1]:
7 * 3

21

What is displayed when a Python cell in a notebook that contains several calculations is executed? For example, what happens when this cell is executed?

In [None]:
7 * 3

2 + 1

6 * 7 + 12

[Solution](#md-solutions-p02)

<a id='md-exercises-p03'></a>
**Problem 3: Math in markdown** Change an Existing Cell from Code to Markdown

What happens if you write some Python in a code cell and then you switch it to a Markdown cell? For example, put the following in a code cell.

1. Run the cell below with `shift + return` to be sure that it works as a code cell. _Hint: it should give you the
same result as **Problem 2**_.
1. Select the cell below and use `escape + M` to switch the cell to Markdown and run it again with `shift + return`. What happened and how might this be useful?

In [None]:
7 * 3  
2 + 1

x = 6 * 7 + 12
print(x)

Print statements can help us find errors or unexpected results from our code. They allow us to check our assumptions.
 Does the computer have stored what we think it does?

This could also be useful if you wanted to show what the code generating your document looks like. Think code reviews,
colleagues, advisors, etc.

[Solution](#md-solutions-p03)

<a id='md-exercises-p04'></a>
**Problem 4:** Equations
    
Standard Markdown (such as we’re using for these notes) won’t render equations, but the Notebook will. 

`$\Sigma_{i=1}^{N} 2^{-i} \approx 1$`


Think about the following questions:
1. What will it display?
1. What do you think the underscore `_` does?
1. What do you think the circumflex `^` does?
1. What do you think the dollar sign `$` does?

Change the Code cell below containing the equation to a Markdown cell and run it.

In [None]:
$\Sigma_{i=1}^{N} 2^{-i} \approx 1$

**Note:** If you received a <span style='color:red'> SyntaxError</span>, then you need to change the cell to a Markdown
cell and rerun.

[Solution](#md-solutions-p04)

<a id='md-solutions'></a>
## Markdown Exercise Solutions
[Table of Contents](#toc)

<a id='md-solutions-p01'></a>
### Problem 1: Creating Lists
This challenge integrates both the numbered list and bullet list. Note that the bullet list is tabbed over to create the nesting necesary for the list.
```markdown
**Type the following in your Markdown cell:**
1. Get funding.
1. Do work.
    * Design experiment.
    * Collect data.
    * Analyze.
1. Write up.
1. Publish.
```
[Back to Problem](#md-exercises-p01)

<a id='md-solutions-p02'></a>
### Problem 2: Math in python
The output of running the code cell is 54 because 6 multiplied by 7 is 42 and 42 plus 12 equals 54. This equation was stored as a variable called `x` and the last line executed was `print(x)`, which simply prints out the value of variable `x` at the current time. However, it still did all the other mathematical equations `7*3` and `2+1`, but it did not print them out because we did not ask the computer to do so.

[Back to Problem](#md-exercises-p02)

<a id='md-solutions-p03'></a>
### Problem 3: Math in markdown
In step 1, The output of running the code cell is 54 because 6 multiplied by 7 is 42 and 42 plus 12 equals 54. This
equation was stored as a variable called `x` and the last line executed was `print(x)`, which simply prints out the
value of variable `x` at the current time. However, it still did all the other mathematical equations `7*3` and
`2+1`, but it did not print them out because we did not store the value and ask the computer to print them.

The Python code gets treated like markdown text. The lines appear as if they are part of one contiguous paragraph.
This could be useful to temporarly turn on and off cells in notebooks that get used for multiple purposes. It is also useful when you want to show the code you have written rather than the output of the code execution.
```markdown
7*3
2+1
x = 6 * 7 + 12 
print(x)
```

[Back to Problem](#md-exercises-p03)

<a id='md-solutions-p04'></a>
### Problem 4: Equations
`$\Sigma_{i=1}^{N} 2^{-i} \approx 1$`

$\Sigma_{i=1}^{N} 2^{-i} \approx 1$

The notebook shows the equation as it would be rendered from latex equation syntax. The dollar sign,`$`, is used to tell markdown that the text in between is a latex equation. If you are not familiar with latex, the underscore, `_`, is used for subscripts and the circumflex, `^`, is used for superscripts. A pair of curly braces, `{` and `}`, is used to group text together so that the statement `i=1` becomes the the subscript and `N` becomes the superscript. Similarly, `-i` is in curly braces to make the whole statement the superscript for `2`. `\sum` and `\approx` are latex commands for “sum over” and “approximate” symbols.

[anaconda]: https://docs.continuum.io/anaconda/install

[markdown]: https://en.wikipedia.org/wiki/Markdown

**A common error is to forgot to run the cell as markdown.** The python interpreter does not know what to do with the \$. Syntax errors generally mean that the user has entered something incorrectly (check for typos before assuming the line of code is wrong altogether.

```markdown
  File "<ipython-input-1-a80a20b3c603>", line 1
    $\Sigma_{i=1}^{N} 2^{-i} \approx 1$
    ^
SyntaxError: invalid syntax
```

[Back to Problem](#md-exercises-p04)

<a id='python-1'></a>
# Intro to Python I: Data
[Table of Contents](#toc)


**Prerequisites:** None

This workshop will help researchers with no prior programming experience learn how to utilize Python to analyze research data. You will learn how to open data files in Python, complete basic data manipulation tasks and save your work without compromising original data. Oftentimes, researchers find themselves needing to do the same task with different data and you will gain basic experience on how Python can help you make more efficient use of your time.

**Learning Objectives:**
1. Clean/manipulate data
1. Automate repetitive tasks

**Learning Outcomes:** you will be able to…
1. read data into Pandas dataframe
1. use Pandas to manipulate data
1. save work to a datafile useable in other programs needed by researcher
1. write if/else statements
1. build for and while loops

<a id='python-introduction'></a>
## Programming with Python
[Table of Contents](#toc)
### What is Python and why would I use it?

A programming language is a way of writing commands so that an interpreter or compiler can turn them into machine instructions. Python is just one many different programming languages.

Even if you are not using Python in your work, you can use Python to learn the fundamentals of programming that will apply accross languages.

**We like using Python in workshops for lots of reasons:**
* Widely used in science
* It's easy to read and write
* Huge supporting community - lots of ways to learn and get help  
* This Jupyter Notebook.  Not a lot of languages have this kind of thing (name comes from Julia, Python, and R).

<a id='python-sp-char'></a>
### Special Characters
[Table of Contents](#toc)

We have already worked with special characters in markdown. Similarly, python uses certain special characters as part of its syntax. **Note:** special characters are not consistent across languages so make sure you familiarize yourself with the special characters in the languages in which you write code.

**Python Special Characters:**

* `[` : left `square bracket`
* `]` : right `square bracket`
* `(` : left `paren` (parentheses)
* `)` : right `paren`
* `{` : left `curly brace`
* `}` : right `curly brace`
* `<` : left `angle bracket`
* `>` : right `angle bracket`
* `-` `dash` (not hyphen. Minus only when used in an equation or formula)
* `"` : `double quote`
* `'` : `single quote` (apostrophe)

<a id='variables'></a>
### Variables
[Table of Contents](#toc)

Variables are used to store information in the computer that can later be referenced, manipulated and/or used by our programs. Important things to remember about variables include:
* We store values inside variables.  
* We can refer to variables in other parts of our programs.
* In Python, the variable is created when a value is assigned to it.
    * Values are assigned to variable names using the equals sign `=`.  
* A variable can hold two types of things.  
    * Basic data types. For descriptions and details [(See Data Types)](#data-types) 
    * Objects - ways to structure data and code. In Python, all variables are objects.
* Variable naming convention:
    * Cannot start with a digit
    * Cannot contain spaces, quotation marks, or other punctuation
    * Using a descriptive name can make the code easier to read **(You will thank yourself later)**
    
<a id='prac-variable'></a>
### Practice
[Table of Contents](#toc)

In [None]:
# What is happening in this code python cell
age = 34
first_name = 'Drake'

In the cell above, Python assigns an age (in this example 34) to a variable `age` and a name (Drake) in quotation marks to a variable `first_name`.

If you want to see the stored value of a variable in python, you can display the value by using the print command 
`print()` with the variable name placed inside the parenthesis.

In [None]:
# what is the current value stored in the variable `age`
print(age)

**Write a print statement to show the value of variable `first_name` in the code cell below.**

In [None]:
# Print out the current value stored in the variable `first_name``


<a id='prob-variable'></a>
### Problem 5: Assigning variables and printing values
[Table of Contents](#toc)

1. Create two new variables called `age` and `first_name` with your own age and name
2. Print each variable out to dispaly it's value


**Extra Credit:** Combine values in a single print command by separating them with commas

In [None]:
# Insert your variable values into the print statement below
print(<insert variable here>, 'is', <insert variable here>, 'years old')

The `print` command automatically puts a single space between items to separate them and wraps around to a new line at the end.

<a id='variable-calc'></a>
## Variables can be used in calculations.
[Table of Contents](#toc)


*   We can use variables in calculations just as if they were values.
    *   Remember, we assigned **our own age** to `age` a few lines ago.

In [None]:
age = age + 3
print('My age in three years:', age)

* This now sets our age value **our current age + 3 years**. 
* We can also add strings together, but it works a bit differently. When you add strings together it is called **concatenating**.

In [None]:
name = "Sonoran"
full_name = name + " Desert"
print(full_name)

* Notice how I included a space in the quotes before "Desert". If we hadn't, we would have had "SonoranDesert"
* Can we subtract, multiply, or divide strings?

<a id='py-concatenate'></a>
## Problem 6: Printing your first and last name
[Table of Contents](#toc)

In the code cell below, create a new variable called last_name with your own last name.
Create a second new variable called full_name that is a combination of your first and last name.



In [None]:
# Print full name

<a id='data-types'></a>
### Data Types
[Table of Contents](#toc)

**Some data types you will find in almost every language include:**

| Data Type| Abbreviation | Type of Information | Examples |
| :-| :-| :-| :-|
| Strings  | str | characters, words, sentences or paragraphs| 'a' 'b' 'c' 'abc' '0' '3' ';' '?'|
| Integers | int | whole numbers | 1 2 3 100 10000 -100 |
| Floating point or Float | float | decimals | 10.0 56.9 -3.765 |
| Booleans | bool | logical test | True, False |

<a id='strings'></a>
### Strings
[Table of Contents](#toc)

One or more characters strung together and enclosed in quotes (single or double): "Hello World!"

In [None]:
greeting = "Hello World!"
print("The greeting is:", greeting)

In [None]:
greeting = 'Hello World!'
print('The greeting is:', greeting)

#### Need to use single quotes in your string?
Use double quotes to make your string.

In [None]:
greeting = "Hello 'World'!"
print("The greeting is:", greeting)

#### Need to use both?

In [None]:
greeting1 = "'Hello'"
greeting2 = '"World"!'
print("The greeting is:", greeting1, greeting2)

#### Concatenation

In [None]:
bear = "wild"
down = "cats"
print(bear + down)

Why aren't `greeting`, `greeting1`, `greeting2`, `bear`, or `down` enclosed in quotes in the statements above?

<a id='prac-strings'></a>
### Practice: Strings
[Table of Contents](#toc)

#### Use an index to get a single character from a string.
 * The characters (individual letters, numbers, and so on) in a string are ordered.
 * For example, the string ‘AB’ is not the same as ‘BA’. Because of this ordering, we can treat the string as a list of characters.
 * Each position in the string (first, second, etc.) is given a number. This number is called an index or sometimes a subscript.
 * Indices are numbered from 0.  
 * Use the position’s index in square brackets to get the character at that position.

In [None]:
#  String :        H e l i u m  
#  Index Location: 0 1 2 3 4 5

atom_name = 'helium'
print(atom_name[0], atom_name[3])

<a id='numbers'></a>
### Numbers
[Table of Contents](#toc)

* Numbers are stored as numbers (no quotes) and are either integers (whole) or real numbers (decimal).  
* In programming, numbers with decimal precision are called floating-point, or float.
* Floats use more processing than integers so use them wisely!
* Floats and integers come in various sizes but Python switches between them transparently.

In [None]:
my_integer = 10
my_float = 10.99998
my_value = my_integer

print("My numeric value:", my_value)

<a id='py-type'></a>
### Using Python built-in type() function
[Table of Contents](#toc)


If you are not sure of what your variables' types are, you can call a python function called `type()` in the same manner as you used `print()` function.
Python is an object-oriented language, so any defined variable has a type.  Default common types are **str, int, float, list and tuple.**  We will cover [list](#py-list) and [tuple](#py-tuple) later.

In [None]:
print("Type:", type(age))
print("Type:", type(first_name))

In [None]:
# Print out datatype of variables
print("my_value Type:", type(my_value))
print("my_float Type:", type(my_float))

<a id='booleans'></a>
### Boolean
[Table of Contents](#toc)

* Boolean values are binary, meaning they can only either true or false.
* In python True and False (no quotes) are boolean values

In [None]:
is_true = True
is_false = False

print("My true boolean variable:", is_true)
print("Type:", type(is_false))

<a id='py-data-type'></a>
### Problem 7: What variable type do I have? 
[Table of Contents](#toc)

size = '1024'  
What data type is `size`? Use some of the python you have learned to provide proof of your answer.
<ol style="list-style-type:lower-alpha">
  <li>float</li>
  <li>string</li>
  <li>integer</li>
  <li>boolean</li>
</ol>

In [None]:
# Write your explanation as a comment and write the python code that outputs support for your answer.


<a id='py-data-structures'></a>
## Data Structures
[Table of Contents](#toc)

Python has many objects that can be used to structure data including:

| Object | Data Structure | Mutable |
| :- | :- | :- |
| List | collections of values held together in brackets | Mutable |
| Tuple | collection of grouped values held together in parentheses | Immutable |
| Set | collections of unique values held together in curly braces | Mutable |
| Dictionary | collections of keys & values held together in curly braces | Mutable |

<a id='py-lists'></a>
### Lists
[Table of Contents](#toc)

Lists are collections of values held together in brackets: 

In [None]:
list_of_characters  = ['a', 'b', 'c'] 
print(list_of_characters)

<a id='prob-lists'></a>
### Problem 8: Creating and Working with Lists
[Table of Contents](#toc)

1. Create a new list called list_of_numbers with four numbers in it.

In [None]:
# Print out the list of numbers you created


* Just like strings, we can access any value in the list by it's position in the list.
* **IMPORTANT:** Indexes start at 0
    ~~~
    list:          ['a', 'b', 'c', 'd']
    index location:  0    1    2    3
    ~~~

In [None]:
# Print out the second value in the list list_of_numbers


2. Once you have created a list you can add more items to it with the append method

In [None]:
# Append a number to your list_of_numbers


#### Aside: Sizes of data structures

To determine how large (how many values/entries/elements/etc.) any Python data structure has, use the `len()` function

In [None]:
len(list_of_numbers)

Note that you cannot compute the length of a numeric variable:

In [None]:
len(age)

This will give an error: `TypeError: object of type 'int' has no len()`

However, `len()` can compute the lengths of strings

In [None]:
# Get the length of the string
print(len('this is a sentence'))

In [None]:
# You can also get the lengths of strings in a list
list_of_strings = ["Python is Awesome!", "Look! I'm programming.", "E = mc^2"]

# This will get the length of "Look! I'm programming."
print(len(list_of_strings[1]))

<a id='py-tuples'></a>
### Tuples
[Table of Contents](#toc)

Tuples are like a List, but **cannot be changed (immutable).**

Tuples can be used to represent any collection of data. They work well for things like coordinates.

In [None]:
tuple_of_x_y_coordinates = (3, 4)
print (tuple_of_x_y_coordinates)

Tuples can have any number of values

In [None]:
coordinates = (1, 7, 38, 9, 0)
print (coordinates)

icecream_flavors = ("strawberry", "vanilla", "chocolate")
print (icecream_flavors)

... and any types of values.

Once created, you `cannot add more items to a tuple` (but you can add items to a list).  If we try to append, like we did with lists, we get an error

In [None]:
icecream_flavors.append('bubblegum')

<a id='lists-vs-tuples'></a>
### The Difference Between Lists and  Tuples
[Table of Contents](#toc)

Lists are good for manipulating data sets.  It's easy for the computer to add, remove and sort items.  Sorted tuples are easier to search and index.  This happens because tuples reserve entire blocks of memory to make finding specific locations easier while lists use addressing and force the computer to step through the whole list.
![array_vs_list.png](array_vs_list.png)
Let's say you want to get to the last item.  The tuple can calculate the location because:

(address)=(size of data)×(index of the item)+(original address)

This is how zero indexing works.  The computer can do the calculation and jump directly to the address.  The list would need to go through every item in the list to get there.

Now lets say you wanted to remove the third item.  Removing it from the tuple requires it to be resized and copied.  Python would even make you do this manually.  Removing the third item in the list is as simple as making the second item point to the fourth.  Python makes this as easy as calling a method on the tuple object.

<a id='py-sets'></a>
### Sets
[Table of Contents](#toc)

Sets are similar to lists and tuples, but can only contain unique values and are held in braces


For example a list could contain multiple exact values

In [None]:
# In the gapminder data that we will use, we will have data entries for the continents
# of each country in the dataset
my_list = ['Africa', 'Europe', 'North America', 'Africa', 'Europe', 'North America']
print("my_list is", my_list)

# A set would only allow for unique values to be held
my_set = {'Africa', 'Europe', 'North America', 'Africa', 'Europe', 'North America'}
print("my_set is", my_set)

Just like lists, you can append to a set using the add() function

In [None]:
my_set.add('Asia')

# Now let's try to append one that is in:
my_set.add('Europe')

What will the print statements show now in the code cell below?

In [None]:
print("my_list is", my_list)
print("my_set is", my_set)

<a id='py-dictionaries'></a>
### Dictionaries
[Table of Contents](#toc)

* Dictionaries are collections of things that you can lookup like in a real dictionary:
* Dictionarys can organized into key and value pairs separated by commas (like lists) and surrounded by braces.
   * E.g. {key1: value1, key2: value2}
   * We call each association a "key-value pair".  
   


In [None]:
dictionary_of_definitions = {"aardvark" : "The aardvark is a medium-sized, burrowing, nocturnal mammal native to Africa.",
                             "boat" : "A boat is a thing that floats on water"}

We can find the definition of aardvark by giving the dictionary the "key" to the definition we want in brackets.

In this case the key is the word we want to lookup

In [None]:
print("The definition of aardvark is:", dictionary_of_definitions["aardvark"]) 

In [None]:
# Print out the definition of a boat


Just like lists and sets, you can add to dictionaries by doing the following:

In [None]:
dictionary_of_definitions['ocean'] = "An ocean is a very large expanse of sea, in particular each of the main areas into which the sea is divided geographically."
print(dictionary_of_definitions)

<a id='prob-dictionaries'></a>
### Problem 9: Creating and Accessing Dictionaries
[Table of Contents](#toc)

1. Create a dictionary called `zoo` with at least three animal types with a different count for each animal.
1. `print` out the count of the second animal in your dictionary 


In [None]:
# Zoo Dictionary


<a id='py-statements'></a>
## Statements
[Table of Contents](#toc)


OK great.  Now what can we do with all of this?  

We can plug everything together with a bit of logic and python language and make a program that can do things like:

* process data

* parse files

* data analysis

What kind of logic are we talking about?

We are talking about something called a "logical structure" which starts at the top (first line) and reads down the page in order

In python a logical structure are often composed of statements. Statements are powerful operators that control the flow of your script. There are two main types:

* conditionals (if, while)
* loops (for, while)


<a id='py-conditionals'></a>
### Conditionals
[Table of Contents](#toc)

Conditionals are how we make a decision in the program.
In python, conditional statements are called if/else statements.

* If statement use boolean values to define flow.
    * If something is True, do this. Else, do this
    * While something is True, do something.
    
**Building if/else statements in Python:**
1. Start first line with `if`  
1. Then `some-condition` must be a logical test that can be evaulated as True or False
1. End the first line with `:`  
1. Indent the next line(s) with `tab` or `4 spaces` (Jupyter does the indent automatically!)
    1.  `do-things`: give python commands to execute
1. End the statement with `else` and `:` (notice that if and else are in the same indent)  
1. Indent the next line(s) with `tab` or `4 spaces` (Jupyter does the indent automatically!)
    1. `do-different-things`: give python commands to execute

### Comparison operators:

`==` equality  
`!=` not equal  
`>` greater than   
`>=` greater than or equal to  
`<` less than  
`<=` less than or equal to  

In [None]:
weight = 3.56

if weight >= 2:
    print(weight,'is greater than or equal to 2')
else:
    print(weight,'is less than 2')

### Membership operators:

`in` check to see if data is **present** in some collection  
`not in` check to see if data is **absent** from some collection

In [None]:
groceries=['bread', 'tomato', 'hot sauce', 'cheese']

if 'basil' in groceries: 
    print('Will buy basil')
else:
    print("Don't need basil")

In [None]:
# this is the variable that holds the current condition of it_is_daytime 
# which is True or False 
it_is_daytime = True 

# if/else statement that evaluates current value of it_is_daytime variable
if it_is_daytime:
    print ("Have a nice day.")
else:
    print ("Have a nice night.")
    
# before running this cell
# what will happen if we change it_is_daytime to True?
# what will happen if we change it_is_daytime to False?

* Often if/else statement use a comparison between two values to determine True or False
* These comparisons use "comparison operators" such as ==, >, and <.
* \>= and <= can be used if you need the comparison to be inclusive.
* **NOTE**: Two equal signs `==` is used to compare values, while one equals sign `=` is used to assign a value
    * E.g.
        
        1 > 2 is False<br/>
        2 > 2 is False<br/>
        2 >= 2 is True<br/>
        'abc' == 'abc' is True

In [None]:
user_name = "Ben"

if user_name == "Marnee":
    print ("Marnee likes to program in Python.")
else:
    print ("We do not know who you are.")

* What if a condition has more than two choices? Does it have to use a boolean?
* Python if-statments will let you do that with elif
* `elif` stands for "else if"
 

In [None]:
if user_name == "Marnee":
    print ("Marnee likes to program in Python.")
elif user_name == "Ben":
    print ("Ben likes maps.")
elif user_name == "Brian":
    print ("Brian likes plant genomes")
else:
    print ("We do not know who you are")
    
# for each possibility of user_name we have an if or else-if statment to check the 
# value of the name and print a message accordingly.

What does the following statement print?

    my_num = 42
    my_num = 8 + my_num
    new_num = my_num / 2
    if new_num >= 30:
        print("Greater than thirty")
    elif my_num == 25:
        print("Equals 25")
    elif new_num <= 30:
        print("Less than thirty")
    else:
        print("Unknown")

<a id='prob-if-else'></a>
### Problem 10: Writing Conditional If/Else Statements
[Table of Contents](#toc)

Check to see if you have more than three entries in the `zoo` dictionary you created earlier. If you do, print "more than three animals".  If you don't, print "three or less animals"

In [None]:
# write an if/else statement


Can you modify your code above to tell the user that they have exactly three animals in the dictionary?

In [None]:
# Modify conditional to include exactly three as potential output

<a id='py-loops'></a>
### Loops
[Table of Contents](#toc)

Loops tell a program to do the same thing over and over again until a certain condition is met.  
* In python two main loop types: 
    * For loops
    * While loops

<a id='for-loops'></a>
### For Loops
[Table of Contents](#toc)



A for loop executes the same command through each value in a collection.

Building blocks of a for loop:

> `for` each-item `in` variable `:`  
>> `do-something`

**Building for loops in python:**
1. Start the first line with `for`  
1. `each-item` is an arbitrary name for each item in the variable/list.  
1. Use `in` to indicate the variable that hold the collection of information  
1. End the first line with `:`  
1. indent the following line(s) with `tab` or `4 spaces` (Jupyter does the indent automatically!)
    1. `do-something` give python commands to execute

In the example below, `number` is our `each-item` and the `print()` command is our `do-something`.

In [None]:
print(number)

In [None]:
# Run this cell and see if you figure out what this for loop does
for number in range(10): # does not include 10! 
    print(number)

#### LOOPING a set number of times
We can do this with the function `range()`. Range automatically creates a list of numbers in a specified range.

In the example above, we have a list of 10 numbers starting with 0 and increasing by one until we have 10 numbers. In the example below, we get the same end result although we have give two numbers to `range()`. In the example below we have given the start and end points of the range. **Note: Do not forget about python's zero-indexing**

In [None]:
# What will be printed
for number in range(0,10):
    print(number)

In [None]:
# What will be printed
for number in range(1,11):
    print(number)

In [None]:
# What will be printed
for number in range(10,0, -1):
    print(number)

In [None]:
# Change the code from the cell above so that python prints 9 to 0 in descending order


In [None]:
total = 0 # global variable

for i in range(10):
    total=total+i 
    print(total) 

In [None]:
total=0

for i in range(10):
    total=total+i

print(total)

#### Saving Time
Looping can save you lots of time. We will look at a simple example to see how it works with lists, but imagine if your list was 100 items long. You do not want to write 100 individual print commands, do you?

In [None]:
# LOOPING over a collection
# LIST

# If I want to print a list of fruits, I could write out each print statment like this:
print("apple")
print("banana")
print("mango")

# or I could create a list of fruit
# loop over the list
# and print each item in the list
list_of_fruit = ["apple", "banana", "mango"]

# this is how we write the loop
# "fruit" here is a variable that will hold each item in the list, the fruit, as we loop
# over the items in the list
print (">>looping>>")
for fruit in list_of_fruit:
    print (fruit)

#### Creating New Data
You can also use loops to create new datasets as well. In the cell below, we use a mathematical operator to create a new list `data_2` where each value is double that of the value in the original list `data`.

In [None]:
data = [35,45,60,1.5,40,50]
data_2 = []

for i in data:
    data_2.append(i*2)
    
print(data_2)

<a id='prob-str-reverse-loop'></a>
### Problem 11: Reverse the string using a for loop
[Table of Contents](#toc)

There are many ways to reverse a string. I want to challenge you to use a for loop. The goal is to practice how to build a for loop (use multiple print statements) to help you understand what is happening in each step.

In [None]:
string = "waterfall"
reversed_string = ""

# For loop reverses the string given as input


# Print out the both the original and reversed strings

**Extra Credit: Accomplish the same task (reverse a string) with out using a for loop.** _Hint: the reversing range example above gives you a clue AND Google always has an answer!_

In [None]:
# Reversing the string can be done by writing only one more line
string = "waterfall"


We can loop over collections of things like lists or dictionaries or we can create a looping structure.

In [None]:
# LOOPING over a collection
# DICTIONARY

# We can do the same thing with a dictionary and each association in the dictionary

fruit_price = {"apple" : 0.10, "banana" : 0.50, "mango" : 0.75}
for key, value in fruit_price.items():
    print ("%s price is %s" % (key, value))

<a id='prob-dict-loop'></a>
### Problem 12: Looping through Dictionaries
[Table of Contents](#toc)

1. For each entry in your `zoo` dictionary, print that key

In [None]:
# print only dictionary keys using a for loop


2. For each entry in your zoo dictionary, print that value

In [None]:
# print only dictionary values using a for loop


3. Can you print both the key and its associated value using a for loop?

In [None]:
# print dictionary keys and values using a single for loop


<a id='while-loops'></a>
### While Loops
[Table of Contents](#toc)

Similar to if statements, while loops use a boolean test to either continue looping or break out of the loop.

In [None]:
# While Loops
my_num = 10

while my_num > 0:
    print("My number", my_num)
    my_num = my_num - 1

NOTE:  While loops can be dangerous, because if you forget to to include an operation that modifies the variable being tested (above, we're subtracting 1 at the end of each loop), it will continue to run forever and you script will never finish.

That is it. With just these data types, structures and logic, you can build a program. We will write program with functions in [Python II: A tool for programming](#python-2)

<a id='pandas'></a>
## Pandas: Working with Existing Data
[Table of Contents](#toc)

Thus far, we have been creating our own data as we go along and you are probably thinking "How in the world can this save me time?" This next section is going to help you learn how to import data that you already have. [Pandas](https://pandas.pydata.org/docs/) is a python package that is great for doing data manipulation.

<a id='read-data'></a>
### Pandas: Importing Data
[Table of Contents](#toc)

**Importing packages:** Pandas is a package that is written for python but is not part of the base python install. In order to use these add on packages, we must first import them. This is conventionally the first thing you do in a script. If I were building a script using Jupyter Notebooks, I generally do all the importing of packages I need for the entire notebook in the first cell.    

In [None]:
# Import packages
import pandas

**Note:** pandas is a long name and you will generally find a shortened version of the name in online help resources. As such, we will use the same convention in this workshop. It only requires a small modification to the import statement.

In [None]:
# Import packages
import pandas as pd

Now that we have access to pandas at our disposal we are ready to import some data. We will be working a freely available dataset called [gapminder](https://www.gapminder.org/). The first data set we are going to look at is called `Afghanistan_Raw`. 

In [None]:
# import from excel spreadsheet
afghanistan_xlsx = pd.read_excel('gapminder_data/Afghanistan_Raw.xlsx')


In [None]:
# import from csv file
afghanistan_csv = pd.read_csv('gapminder_data/Afghanistan_Raw.csv')

The cell above assigns a `variable` to a pandas dataframe. 

To create a pandas dataframe: 
1. We use `pd` to tell python that we want to use the pandas package that we imported. 
1. We use `.read_excel()` or `.read_csv()` to tell pandas what type of file format we are giving it. 
1. We have given the `relative path` to the file in parentheses.

**Relative paths** are your best friend when you want you code to be easily moved or shared with collaborators. They use your current position in the computer's file structure as the starting point. 
* If you work on a script with relative paths on your work computer, email it to yourself and try to continue working on your personal home computer, it should work because the usernames may be differnt but are bypassed and computer's file structure are the same from the in which we are working.

* The `current working directory` is where the Jupyter Notebook is stored unless you manually change it.

#### Project Directory

Intro_Python_Resbaz_2020.ipynb  
    ├── americas_basic_stats.png  
    ├── array_vs_list.png  
    ├── gapminder_data  
    │   ├── Afghanistan_Raw.csv  
    │   ├── Afghanistan_Raw.xlsx  
    │   └── gapminder_by_country  
    ├── gapminder_final.csv  
    ├── gapminder_summ_stats.csv  
    ├── jn_binder_options.png  
    ├── jn_options.png  
    ├── oceania_barh.png  
    └── two_countries_summ_stats.csv  



**Absolute paths** can be useful if the thing you are trying to access is never going to move. They start at the root of the computer's file structure and work out to the file's location. **Note: this includes the computer's username.**
* If you work on a script with absolute paths on your work computer, email it to yourself and try to continue working on your personal home computer, it will fail because the usernames and computer's file structure are different.

    * My absolute path (work): /Users/**drakeasberry**/Desktop/2020_Resbaz_Python/intro_python
    * My absolute path (home): /Users/**drake**/Desktop/2020_Resbaz_Python/intro_python

In [None]:
print('This is the excel file:\n\n', afghanistan_xlsx)
print('\nThis is the csv file:\n\n', afghanistan_csv)

This prints out each file separatly and I have added a few line break `\n` just to make it a little easier read when it is printed. However, these may still feel unfamiliar and hard to read for you or your colleagues. If we do not include the data varibale inside the `print()`, then pandas will render a formatted table that is more visually pleasing. Let's look at the difference.

In [None]:
# Use print to label the output, but let pandas render the table
print('This is the excel file:')
afghanistan_xlsx

In [None]:
# Use print to label the output, but let pandas render the table
print('This is the csv file:')
afghanistan_csv

<a id='manipulate-data'></a>
### Pandas: Manipulating Data
[Table of Contents](#toc)

As you can see above, both ways of importing data have produced the same results. The type of data file you use is a personal choice, but not one that should be taken for granted. Microsoft Excel is licensed product and not everyone may have access to open `.xlsx` files whereas a `.csv`file is a comma separated values document that can be read by many free text editors. `.csv` files are also genereally smaller than the same information stored in a `.xlsx` file. My preferred choice is using `.csv` files due to smaller size and easier accessibility. 

In [None]:
afghanistan_csv.country.unique()

In [None]:
# Drop all rows with no values
afghanistan_csv.dropna(how='all')

In [None]:
# What prints now and why?
afghanistan_csv

In [None]:
# If we want to save the operations we need to store it in a variable (we will overwrite the existing one here)
afghanistan_csv = afghanistan_csv.dropna(how='all')
afghanistan_csv

In [None]:
# we will store a new dataframe called df to save some typing
# we will subset the data to only rows that have a country name
df = afghanistan_csv.dropna(subset=['country'])
df

In [None]:
df = df.rename(columns={'pop':'population'})

In [None]:
# We are only expecting Afghanistan to be the only country in this file
# Let's check our assumptions
df.country.unique()

<a id='prob-unique'></a>
### Problem 13: Checking assumptions about your data
[Table of Contents](#toc)

Investigate the remaining columns to see if you the data is as you except.

In [None]:
# this will give a quick overview of the data frame to give you an idea of where to start looks


In [None]:
# Hint: Check your assumptions about values dataframe


Our investigation has showed us that some of data has errors but probably still useful if we correct them.
* The year column is being read as a float instead of an object (we will not be doing mathematics on years)
* The year column still has a missing value
* The population column is being read as an object instead of an integer (we may want to do mathematics on population)
* The continent column has a typo `Asiaa` and `tbd`

Let's see if we can fix these issues together.

In [None]:
# Let's fix the typos in continent column
df = df.replace(to_replace =["Asiaa", "tbd"], value ="Asia")
df

In [None]:
# Let's take a closer look at year column by sorting
df.sort_values(by='year')

By sorting the dataframe based on year, we can see that the years are incrementing by 5 years. We can also deduce that the year 1982 is missing.  
Depending on the data, you will have to make a decision as the researcher:
* Are you confident that you can say that you have replaced the value correctly and the rest of the data is good?
* Do you delete the data based on the fact that it had missing data?

In this case, we are going to replace the missing value with 1982 because we believe it is the right thing to do in this particular case.

**Note:** In general, you should be very selective on replacing missing values. 

In [None]:
df['year'] = df['year'].fillna(1982)
df

In [None]:
# Finally, let's fix the datatypes of columns
df = df.astype({"year": int, "population": int})
df

In [None]:
# Let's check to see if it is working the way we think it is
df.info()

<a id='write-data'></a>
### Pandas: Writing Data
[Table of Contents](#toc)

Now that we have made all the changes necessary, we should save our corrected datafram as a new file.

In [None]:
# Save file with changes we made
df.to_csv('gapminder_data/Afghanistan_Fixed.csv')

<a id='all-countries'></a>
### Pandas: Working with more than file
[Table of Contents](#toc)

In [None]:
#Import pandas library using an alias
import pandas as pd
# Import glob library which allows us to use regular expressions to select multiple files
import glob

In [None]:
# Let's see where we are within the computer's directory structure
# The exclamation point allows us to utilize a bash command in the notebook
!pwd

In [None]:
# Let's see what files and folders are in our current location
!ls

In [None]:
# Let's see what files and folders are in the gapminder_data directory
!ls gapminder_data/

In [None]:
# Let's see what files and folders are in the gapminder_data/gapminder_by_country directory
!ls gapminder_data/gapminder_by_country/

We worked with one file `Afghanistan` in the previous section, now we will combine everything we have seen to work with all the countries data that we have.

1. Find files in `gapminder_data/gapminder_by_country/`  
1. Get all filenames into a list  
1. Remove `countries.cc.txt`  
1. For loop to append file lines into a pandas dataframe  
1. Add column names from `countries.cc.txt`  

In [None]:
# glob.glob will match files in the current directory based on a pattern
countries = sorted(glob.glob('gapminder_data/gapminder_by_country/*.cc.txt'))
len(countries)

In [None]:
# Remove header item from item of files
# If you try to run this cell more than once, you will get an error 
# because the item does not exist once it has been removed after the first execution of this cell
countries.remove('gapminder_data/gapminder_by_country/country.cc.txt')

In [None]:
# Check the length of the list to ensure the item was correctly removed
len(countries)

In [None]:
# creating dataframe from a for loop:
df = pd.DataFrame()

for country in countries:
    c=pd.read_csv(country,sep='\t',header=None)
    df=df.append(c,ignore_index=True)

# Import header and store as list
header = pd.read_csv('gapminder_data/gapminder_by_country/country.cc.txt', sep='\t')
column_names = list(header.columns)

# Add header to dataframe created with the loop
df.columns = column_names

In [None]:
# Gives us number of rows and columns
df.shape

In [None]:
# Get summary statistics
df.describe()

In [None]:
# Do you remember how to change column types


In [None]:
# Solution
# Do you remember how to change column types
df = df.astype({"year": str, "pop": int})
df.describe()

Save to summary of the dataframe `to_csv`, create a NEW file name, otherwise will overwrite the files we downloaded!

In [None]:
df.describe().to_csv('gapminder_summ_stats.csv')

In [None]:
ls

<a id='slicing'></a>
### Pandas: Slicing and selecting values
[Table of Contents](#toc)

<div class="alert alert-block alert-success">
<b>Pandas Dataframe:</b>   
- 2-dimensional representation of a table  
- Series is the data-structure Pandas use to represent a column.
</div>


Because it's a 2D, have to tell which rows and which columns want to select.  

In [None]:
df.head()

**`.loc[]` to select values by the name**


**`.loc[a:b,i:j]`**, where  
    a and b are the rows/countries   
    i and j are the columns/years

Need to set index first:

In [None]:
df=df.set_index('country')
df

In [None]:
df.loc['Brazil']

In [None]:
df.loc['Brazil':'Ecuador']

In [None]:
df.loc['Brazil':'Ecuador','year':'lifeExp']

In [None]:
df.loc[['Brazil','Ecuador'],'year':'lifeExp']

**`.iloc[]` to select values by the index**


**`.iloc[a:b,i:j]`**, where  
    a and b are the indexes of rows    
    i and j are the indexes of columns

In [None]:
df.iloc[9:16,:-1]

**Observation:**   
```
-3:-1, omits the final index (column gdpPercap) in the range provided, while a named slice includes the final element.
    ```

In [None]:
df.iloc[[9,16],-3:-1]

<a id='summary-stats'></a>
### Problem 14: Slice and save summary statistics
[Table of Contents](#toc)

Select two countries of your interest. Slice the `df` to select only these countries. Then, obtain summary statistics by country, and save to a file.

In [None]:
# pick two countries to subset and save file with a descriptive name


<a id='py1-solutions'></a>
## Python I: Problem Solutions
[Table of Contents](#toc)

### Problem 5: Assigning variables and printing values
1. Create two new variables called `age` and `first_name` with your own age and name
2. Print each variable out to dispaly it's value


In [None]:
age = '<your age>'
first_name = '<your first name>'
print(age)
print(first_name)

**Extra Credit:** You can also combine values in a single print command by separating them with commas

In [None]:
# Insert your variable values into the print statement below
print(first_name, 'is', age, 'years old')

Correct Output:
If you received this output, then you correctly assigned new variables and combined them correctly in the print statment. The information represented between `<>` should reflect your personal information at this point.
```markdown
<your age>
<your first name>
<your first name> is <your age> years old
```

If you received this output, then you forget to assign new variables.
```markdown
34
Drake
Drake is 34 years old
```

If you received this output, then you correctly assigned new variables but mixed up the order in the combined print statment.
```markdown
<your age>
<your first name>
<your age> is <your first name> years old
```

### Problem 6: Printing your first and last name

In the code cell below, create a new variable called last_name with your own last name.
Create a second new variable called full_name that is a combination of your first and last name.



In [None]:
# Print full name
first_name = 'Drake'
last_name = 'Asberry'

print(first_name, last_name)

### Problem 7: What variable type do I have? 

size = '1024'  
What data type is `size`? Use some of the python you have learned to provide proof of your answer.
<ol style="list-style-type:lower-alpha">
  <li>float</li>
  <li>string</li>
  <li>integer</li>
  <li>boolean</li>
</ol>

In [None]:
# Write your explanation as a comment and write the python code that outputs support for your answer.
size = '1024'
print()

### Problem 8: Creating and Working with Lists

1. Create a new list called list_of_numbers with four numbers in it.

In [None]:
# Print out the list of numbers you created
list_of_numbers = [0, 1, 2, 3]
print(list_of_numbers)

In [None]:
# Print out the second value in the list list_of_numbers
print(list_of_characters[1])

2. Once you have created a list you can add more items to it with the append method

In [None]:
# Append a number to your list
list_of_numbers.append(5)
print(list_of_numbers)

### Problem 9: Creating and Accessing Dictionaries

1. Create a dictionary called `zoo` with at least three animal types with a different count for each animal.
1. `print` out the count of the second animal in your dictionary 


In [None]:
# Zoo Dictionary
zoo = {'bears':25, 'lions':19, 'monkeys':67}
print(zoo['lions'])

### Problem 10: Writing Conditional If/Else Statements

Check to see if you have more than three entries in the `zoo` dictionary you created earlier. If you do, print "more than three animals".  If you don't, print "three or less animals"

In [None]:
# write an if/else statement
if len(zoo) > 3:
    print("more than three animals")
else:
    print("three or less animals")

Can you modify your code above to tell the user that they have exactly three animals in the dictionary?

In [None]:
# Modify conditional to include exactly three as potential output
if len(zoo) > 3:
    print("more than three animals")
elif len(zoo) < 3:
    print("less than three animals")
else:
    print("exactly three animals")

### Problem 11: Reversing Strings

There are many ways to reverse a string. I want to challenge you to use a for loop. The goal is to practice how to build a for loop (use multiple print statements) to help you understand what is happening in each step.

In [None]:
string = "waterfall"
reversed_string = ""

for char in string:
    #print(reversed_string)
    reversed_string = char + reversed_string
    #print(char)
    #print(reversed_string)

print('The original string was:', string)
print('The reversed string is:', reversed_string)

**Extra Credit: Accomplish the same task (reverse a string) with out using a for loop.** _Hint: the reversing range example above gives you a clue AND Google always has an answer!_

In [None]:
string = "waterfall"
print(string[::-1])

<a id='prob-dict-loop'></a>
### Problem 12: Looping through Dictionaries
[Table of Contents](#toc)

1. For each entry in your `zoo` dictionary, print that key

In [None]:
# print only dictionary keys using a for loop
for key in zoo.keys():
    print(key)

2. For each entry in your zoo dictionary, print that value

In [None]:
# print only dictionary values using a for loop
for value in zoo.values():
    print(value)

3. Can you print both the key and its associated value using a for loop?

In [None]:
# print dictionary keys and values using a single for loop
for key, value in zoo.items():
    print(key,value)

<a id='prob-unique'></a>
### Problem 13: Checking assumptions about your data
[Table of Contents](#toc)

Investigate the remaining columns to see if you the data is as you except.

In [None]:
# this will give a quick overview of the data frame to give you an idea of where to start looks
print('total rows in dataframe:', len(df))
df.info()

In [None]:
# Hint: Check your assumptions about values dataframe
df.year.unique()
columns = list(df.columns)
for column in columns:
    unique_val = eval('df.' + column + '.unique()')
    print(column, ':\nunique values:\n', unique_val, '\n\n')

### Problem 14: Slice and save summary statistics

Select two countries of your interest. Slice the `df` to select only these countries. Then, obtain summary statistics by country, and save to a file.

In [None]:
# My Solution
subset=df.loc[['China','Germany'],'pop':]
subset.describe().to_csv('china_germany_summ_stats.csv')
subset

<a id='python-2'></a>
## Intro to Python II: A Tool for Programming 
[Table of Contents](#toc)
    
**Prerequisites:** Intro to Python 1: Data OR knowledge of another programming language 

This workshop will help attendees build on previous knowledge of Python or other programming language in order to harness the powers of Python to make your computer work for you. You will learn how to write their own Python functions, save their code as scripts that can be called from future projects and build a workflow to chain multiple scripts together.

**Learning Objectives:**
1. Understand the syntax of python functions
1. Understand the basics of scripting in python
1. Understand data analysis cycles

**Learning Outcomes:** you will be able to…
1. Write your own functions
1. Save code as a script
1. Build a workflow

<a id='python-2-setup'></a>
## Setup if you are joining in for Python II
[Table of Contents](#toc)

**Run the next three code cells to have the data you need to work with in this section.**

In [None]:
# import libraries 
import pandas as pd

In [None]:
# Create a dictionary with rainfall, temperature and pressure
data={'rainfall_inches':[1.34,1.56,4.33],
      'temperature_F':[75,80,96],
      'pressure_psi':[10,2,35]}
data

In [None]:
string = "waterfall"
print(string[::-1])

<a id='functions'></a>
## Functions:
[Table of Contents](#toc)

Create your own functions, especially if you need to make the same operation many times. This will make you code cleaner.

* Functions are known by many names in other languages. Most commonly methods and subroutines.
* A function has a contract that guarantees certain output based on certain input(s)
* Variables get passed into the function
* The function then preforms actions based on the variables that are passed
* A new value is returned from the function

In python we are able to define a function with `def`. First you define the function and later you call the defined function. 

Here we define a function that we will call "add_two_numbers"
* def add_two_numbers():

In [None]:
# this defines our function
def add_two_numbers():
    answer = 50 + 15
    return answer

In [None]:
# this calls the function and stores in the variable `x`
x = add_two_numbers()
x

That function seems a little silly because we could just add 50 and 15 easier than defining a function to do it for us. However, imagine 50 was some constant that we need to add observations to. Now we could rewrite the function to accept an observation to add to our constant of 50.

In [None]:
# this defines our function
# the "num1" inside the parentheses means it is expecting us to pass a value to the function when we call it
def add_to_constant(num1):
    answer = 50 + num1
    return answer

In [None]:
# this calls the function and stores in the variable `x`
# the value we want to pass goes inside the parentheses in the call
y = add_to_constant(10)
y

Change the value that you pass to the function to see how it works.
<a id='why-functions'></a>
### Why Use Functions?
[Table of Contents](#toc)

Functions let us break down our programs into smaller bits that can be reused and tested.

Human beings can only keep a few items in working memory at a time.
Understand larger/more complicated ideas by understanding and combining pieces.
Functions serve the same purpose in programs.
Encapsulate complexity so that we can treat it as a single “thing”.
Enables reusablility.
Write one time, use many times.

1. Testability

* Imagine a really big program with lots of lines of code. There is a problem somewhere in the code because you are not getting the results you expect

* How do you find the problem in your code?
    * If your program is composed of lots of small functions that only do one thing then you can test each function individually.

2. Reusability

* Imagine a really big program with lots of lines of code. There is a section of code you want to use in a different part of the program.

* How do you reuse that part of the code?
    * If you just have one big program then you have to copy and paste that bit of code where you want it to go, but if that bit was a function, you could just use that function.

3. Writing cleaner code

* Always keep both of these concepts in mind when writing programs.
* Write small functions that do one thing
* Never have one giant function that does a million things.
* A well written script is composed of lots of functions that do one thing

<a id='str-reverse-func'></a>
### Let's revist the reverse string and turn it into a function
[Table of Contents](#toc)

In [None]:
def reverse_text(string):
    """Function to reverse text in strings.
    
    """
    result=string[::-1]
    return result

In [None]:
reverse_text("waterfall")

In [None]:
# you can also pass a variable to function
original='pool'
reverse_text(original)

<a id='temp-func'></a>
### Let's look at a real world example of where constants could be used in functions
[Table of Contents](#toc)

In [None]:
def convert_temp(temperature,unit):
    """Function to convert temperature from F to C, and vice-versa.
    Need temperature (integer or float) and unit (string, uppercase F or C)
    
    """
    t=int(temperature)
    u=str(unit)
    
    if u == 'C':
        fahr=(9/5*t)+32
        print('{}C is {}F'.format(t,int(fahr)))
    
    elif u == 'F': # or else:
        celsius=(t-32)*5/9
        print('{}F is {}C'.format(t,int(celsius)))

In [None]:
convert_temp(85,'C')

In [None]:
convert_temp?

In [None]:
# will demonstrate this depending on time

def convert_temp2():
    """Function to convert temperature from F to C, and vice-versa.
    User input.
    
    """
    t=int(input('Enter temperature:'))
    u=str(input('Enter unit (F or C):'))
    
    if u == 'C':
        fahr=9/5*t+32
        return '{}C is {}F'.format(t,int(fahr))
    
    elif u == 'F':
        celsius=(t-32)*5/9
        return '{}F is {}C'.format(t,int(celsius))
    
    else:
        return "Don't know how to convert..."

In [None]:
convert_temp2()

In [None]:
convert_temp2()

In [None]:
convert_temp2()

<a id='scripting'></a>
## Scripting
[Table of Contents](#toc)


For this section we are going to create a new Jupyter Notebook to ensure we are starting with a clean slate.

1. Save your progress in the current notebook.
1. `Go to File > New Notebook > Python 3` to start a new notebook.

<a id='errors'></a>
## Common Errors
[Table of Contents](#toc)



### Help yourself

In [None]:
help(print)

In [None]:
help(len)

In [None]:
?len

In [None]:
?data

In [None]:
dir(data)

```
help(your_data_object)
dir(your_data_object) 
```

### Variable errors

In [None]:
# need to create/define a variable before using it
chocolate_cake

In [None]:
# this also includes mispellings...
first_name='Nathalia'

In [None]:
firt_name

### Syntax errors

In [None]:
# Syntax errors: when you forget to close a ) 
## EOF - end of file
## means that the end of your source code was reached before all code blocks were completed
print(len(first_name)

In [None]:
print(len(first_name))

In [None]:
# Syntax errors: when you forgot a , 
tires=4
print('My car has'tires,' tires')

In [None]:
# Syntax errors: forgot to close a quote ' in a string
## EOL = end of line
print('My car has',tires,' tires)

In [None]:
tires=4
print('My car has',tires,' tires')

In [None]:
# Syntax errors: when you forget the colon at the end of a line
data=[1,2,3,4]

for i in data
    print(i**2)

In [None]:
# Indentation errors: forgot to indent
for i in data:
print(i**2)

In [None]:
for i in data:
    print(i**2)

### Index errors

In [None]:
groceries=['banana','cheese','bread']

In [None]:
groceries[3]

### Character in strings are IMMUTABLE

In [None]:
fruit='mango'

In [None]:
fruit[3]

In [None]:
fruit[3]='G'

### Item in list is MUTABLE

In [None]:
fruits=['mango','cherry']

In [None]:
fruits[1]

In [None]:
fruits[1]='apple'

In [None]:
fruits

### Character in item of a list is IMMUTABLE

In [None]:
fruits[1]

In [None]:
fruits[1][2]

In [None]:
fruits[1][2]='P'