# Plotting and Programming in Python: Day 2 

Welcome to day 2 of Plotting and Programming in Python! 
In this notebook, we will cover:
- Lists
- For Loops
- Conditionals
- Looping Over Data Sets
- Writing Functions
- Variable Scope
- Programming Style

## Episode 1: Lists & Indexing
Lists are a **container** object in python. They allow us to store multiple objects in one variable.
Lists:
- are contained in square breackets: [...]
- have values separated by commas: [ val1, val2, val3, ...]
- can contain numbers, strings, and other objects (including other lists!)
- are *indexable*, just like strings.



![image.png](attachment:257cd4ca-7f09-4b7d-b6c9-97964c37604a.png)

In [None]:
# make a list!



### Indexing
Lists are indexed, which means that each value exists in an order, starting with zero. You can use the **len()** function to see how long your list is. 

In [None]:
#use the len() and print() functions to check the length of the list you just made.
# what do you expect to see?



We can index lists to fetch individual values. 

In [None]:
#print the second value in your list.

#print the last value in your list.


Lists’ values can be replaced by assigning to them. We can call on one index of the list and assign a new value to it. 


In [None]:
# use an index expression to replace one value of your list, then print the result



**Slicing** allows us to index multiple values. Use `list[int:int]` to slie a string.

**character strings** are also indexable. 

In [None]:
#make a string, and print the third character in the sting


However, character strings are **immutable**, so we cannot reassign a character in an index

In [None]:
#What happens when we index beyond the end of a list?

### Adding to lists
**Appending** items to a list adds an item to the end of the list. *(Note: this will lengthen your list by one.)

We use the append() method to append items to lists.


In [None]:
# make a new list and print it

#append a new value to your list and print it

append() is an **object method**. Methods work similarly to functions, but they are tied to a certain object type.

Use `object_name.method_name` to call a method.

In [None]:
# can we append to a string?


we can use the **extend()** list method to combine lists. This is useful in cases where you have mutliple values to add to a list. 

In [None]:
#combine two lists using extend()



We can use `del list_name[index]` to completely remove an item from a list

*note: this is not a method or a function. it is a **statement**, similar to variable assignment.*

We've had values yes, but what about...**no** values?

You can create an empty list using only `[]`. This can be a useful starting point when you want to collect values into a list (Tune in next section!)

### Exercises!
Fill in the blanks so that the program below produces the output shown:
#### Input:
```
values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)
```
#### Output:
```
first time: [1, 3, 5]
second time: [3, 5]
```

In [None]:
#Fill in the blanks!
values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)

#### How big is a slice?
If start and stop are both non-negative integers, how long is the list `values[start:stop]`?

### Sort using methods
There are two list methods that allow us to sort lists: `sort()` and `sorted()`. They both sort the list on which they are called. Run these lines of code and see if you can spot the difference: 

In [None]:
# Program A
letters = list('gold')
result = sorted(letters)
print('letters is', letters, 'and result is', result)

In [None]:
# Program B
letters = list('gold')
result = letters.sort()
print('letters is', letters, 'and result is', result)

## Episode 2: For Loops

<div>
<img src="https://i.kym-cdn.com/photos/images/newsfeed/001/393/656/da7.jpg" alt="a cat stands on her back legs staring at a bowl of fruit loops, with the caption "brother may I have the loops" width="300"/>
</div>

A for loop executes commands once for each value in a collection (that is, in an *iteratable* object.

- Doing calculations on the values in a list one by one is as painful as working with pressure_001, pressure_002, etc.
- A for loop tells Python to execute some statements once for each value in a list, a character string, or some other collection.
- “for each thing in this group, do these operations”

```
for number in [2, 3, 5]:
    print(number)
```
is equivalent to:
```
print(2)
print(3)
print(5)
```

### Principle parts of a for loop
1) the **collection**, on which the loop is being run
    
    --> may be a list, string, or other container object that has order 
2) the **body**, the operation to be done on each for loop

    --> the indented block
3) The **loop variable**, is what changes for each iteration of the loop
    
    --> The “current thing”

### Structures
```
for number in [2, 3, 5]:
    number = number + 1
    print(number)
```
the first line of the for loop defines the loop variable and the collection. It must begin`for` and end with a colon.

#### Indentation
Python uses **indentation** to show nesting. Indentation is always meaningful in python. any consistent intentaion is legal, but standard practice is to use four spaces. 

### A few style notes:
Loop variables can be anything. They are meaningless outside of the for loop. 

-->*But,* it's best to name it something that you will understand, like `i`, `elem`, or `c`.

the body of a for loop can contain as many statements as you want.

-->*But,* it's usually best to limit the number of statements so that it's easier for brains to remember. 

### Ranges
Use `range` to iterate over a sequence of numbers.

- The built-in function range produces a sequence of numbers.

A `range` **is not** a `list`. the numbers are produced on demand to make looping over large ranges more efficient.

range(N) is the numbers 0 through N-1

what is range(4)?


### The Accumulator Pattern
The Accumulator pattern turns many values into one.

A common pattern in programs is to:
- Initialize an accumulator variable to zero, the empty string, or the empty list.
- Update the variable with values from a collection.


In [None]:
# make an accumulator using integers


### Exercises
#### Fill in the blank
Fill in the blanks in each of the programs below to produce the indicated result.

In [None]:
# Total length of the strings in the list: ["red", "green", "blue"] => 12

total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)

In [None]:
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
    ____
print(result)

#### Create an acronym 
Starting from the list `["red", "green", "blue"]`, create the acronym "RGB" using a for loop.

Hint: You may need to use a string method to properly format the acronym

#### Identifying Item Errors 
1) Read the code below and try to identify what the errors are without running it.
2) Run the code, and read the error message. What type of error is it?
3) Fix the error.


In [None]:
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[4])

## Episode 13: Conditionals

<div>
<img src="https://imgs.xkcd.com/comics/90s_flowchart.png" alt="a flow chart" width="300"/>
</div>

`if` statements control whether or not a block of code is executed.
    
if statements have similar structures to `for` loops:
- First line opens with if and ends with a colon
- Body containing one or more statements is indented (usually by 4 spaces)


In [None]:
#write some if statements to test if an integer is large


Conditionals are often used inside for loops. They allow us to handle items in a collection in different ways, or show different outputs depending on the item in question. 

In [None]:
# if statement nested in a for loop

Use else to execute a block of code when an if condition is not true.

`else` can be used following an if. This allows us to specify an alternative to execute when the `if` branch isn’t taken.


Use `elif` to specify additional tests.
- May want to provide several alternative choices, each with its own test.
- Use `elif` (short for “else if”) and a condition to specify these.
- Always associated with an `if`.
- Must come before the `else` (which is the “catch all”).

Python steps through the branches of the conditional in order, testing each in turn, so order matters.

Conditionals are tested **once**, in order. If a value is changed after the `if` block, it does not change the result of the `if` block.  

In [None]:
# here's what I mean

### Exercises

#### Tracing Values
What does this program print?
```
pressure = 71.9
if pressure > 50.0:
    pressure = 25.0
elif pressure <= 50.0:
    pressure = 0.0
print(pressure)
```

#### Initializing
Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.

In [None]:
values = [...some test data...]
smallest, largest = None, None
for v in values:
    if ____:
        smallest, largest = v, v
    ____:
        smallest = min(____, v)
        largest = max(____, v)
print(smallest, largest)

What are some advantages and disadvantages to this approach?

#### create the xkcd flowchart using a for loop and conditional:

<div>
<img src="https://imgs.xkcd.com/comics/90s_flowchart.png" alt="a flow chart" width="200"/>
</div>

In [None]:
decades = [70, 80, 90, 0, 10]
hammertime = True

for decade in decades:
    if _____ == ______:
        print(_______)
    elif _____ == _____:
        print(_______)
    else:
        print(_______)

## Episode 14: Looping over datasets
How can I process many data sets with a single command?

Given that:
- A filename is a character string.
- Lists can contain character strings,

we can use can a for loop to process files given a list of their names.

In [None]:
import pandas as pd
for filename in ['data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']:
    data = pd.read_csv(filename, index_col='country')
    print(filename, data.min())

### Use glob.glob to find sets of files whose names match a pattern.
In Unix, the term “globbing” means “matching a set of files with a pattern”.
The most common patterns are:
- `*` meaning “match zero or more characters”
- `?` meaning “match exactly one character”

Python’s standard library contains the `glob` module to provide pattern matching functionality. 

The `glob` module contains a function also called glob to match file patterns. 
E.g., `glob.glob('*.txt')` matches all files in the current directory whose names end with `.txt`.

the result of `glob.glob('pattern')` is a (possibly empty) list of character strings.


In [None]:
import glob
print('all csv files in data directory:', glob.glob('data/*.csv'))

In [None]:
print('all PDB files:', glob.glob('*.pdb'))

### Use `glob` and `for` to process batches of files.
This helps a lot if the files are named and stored systematically and consistently so that simple patterns will find the right data.

In [None]:
for filename in glob.glob('data/gapminder_*.csv'):
    data = pd.read_csv(filename)
    print(filename, data['gdpPercap_1952'].min())

### Exercises
#### Determining Matches


Which of these files is not matched by the expression `glob.glob('data/*as*.csv')`?
```
    data/gapminder_gdp_africa.csv
    data/gapminder_gdp_americas.csv
    data/gapminder_gdp_asia.csv
```


#### Minimum File size
Modify this program so that it prints the number of records in the file that has the fewest records.

*Note that the `DataFrame.shape()` method returns a tuple with the number of rows and columns of the data frame.*

In [None]:
import glob
import pandas as pd
fewest = ____
for filename in glob.glob('data/*.csv'):
    dataframe = pd.____(filename)
    fewest = min(____, dataframe.shape[0])
print('smallest file has', fewest, 'records')

#### Challenge: Comparing Data
Write a program that reads in the regional data sets and plots the average GDP per capita for each region over time in a single chart.

## Coffee Break!
<div>
<img src="attachment:3a22b318-d484-4a47-86e5-e267ad3ec893.gif" alt="a cat sips coffee" width="200"/>
</div>

*gif by Santiago Taberna*

## Episode 16: Writing Functions
We've worked with methods and functions that are built into python or into libraries and modules, but you have the power to create your own!

By defining your own functions and using them, you can **break down your program** and make it easier for others (and for yourself) to understand. 

For example, Let's you want to write a program that makes breakfast. You might define a function for making the coffee that includes loading the machine with grounds and water. To mkake pancakes, you can define one function to create the batter, and another one that takes batter and cooks them on the griddle. 

Functions can also be **re-used** multdef print_greeting():
    print('Hello!')
    print('The weather is nice today.')
    print('Right?')iple times in one program. So if for example, you want eggs for breakfast, you might call a `break_egg()` function multiple times to make the number of eggs you want. This reduces the overall number of lines of code you write, helping reduce complexity. 


In [None]:
def print_greeting():
    print('Hello!')
    print('The weather is nice today.')
    print('Right?')

**Defining a function does not run it.** In order to execute that code, we need to **call** the function, like so: 

### Arguments
**Arguments** in a function call are matched to its defined parameters.

Functions are most useful when they can operate on different data. Arguments allow us to specify **parameters** that:
- Become variables when the function is executed.
- Are assigned the arguments in the call (i.e., the values passed to the function).
        
If you don’t name the arguments when using them in the call, the arguments will be matched to parameters in the order the parameters are defined in the function.

For example, a function to make pancake batter might include arguments for a starch, a sweetener, a leavener, and fillings:
```
def make_pancakes(starch, sweet, leavener='baking soda', filling):
    batter = []
    batter.extend([starch, sweet, leavener])
    batter.whisk()
    batter.append(filling)
    batter.stir()
    return batter
    
chocolate = make_pancakes('flour', 'honey, 'baking soda', 'chocolate chips')
bluberry = make_pancakes(filling = 'blueberries', starch = 'flour', sweet ='sugar')
```
*Note: `stir` and `whisk` are not real functions*

In [None]:
def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)

print_date(1871, 3, 19)

Or, we can name the arguments when we call the function, which allows us to specify them in any order and adds clarity to the call site; otherwise as one is reading the code they might forget if the second argument is the month or the day for example.

In [None]:
print_date(month=3, day=19, year=1871)

Functions may return a result to their caller using return.
- Use `return ...` to give a value back to the caller.
- You can return anywhere in the function, 
- But functions are easier to understand if return occurs:
    - At the start to handle special cases.
    - At the very end, with a final result.


In [None]:
def average(values):
    if len(values) == 0:
        return None
    return sum(values) / len(values)

In [None]:
a = average([1, 3, 4])
print('average of actual values:', a)

In [None]:
print('average of empty list:', average([]))

**Every function returns something.** A function that doesn’t explicitly `return` a value automatically returns `None`.

In [None]:
result = print_date(1871, 3, 19)
print('result of call is:', result)

### Exercises
#### Definition and use
What does the following program print?
```
def report(pressure):
    print('pressure is', pressure)

print('calling', report, 22.5)
```

#### Find the First
Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty? What if the list has no negative numbers?

In [None]:
def first_negative(values):
    for v in ____:
        if ____:
            return ____

#### Encapsulation
Fill in the blanks to create a function that takes a single filename as an argument, loads the data in the file named by the argument, and returns the minimum value in that data.

In [None]:
import pandas as pd

def min_in_data(____):
    data = ____
    return ____

#### Calling by name
Earlier we saw this function: 
```
def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)
```
We saw that we can call the function using named arguments, like this:
```
print_date(day=1, month=2, year=2003)
```
1) What does `print_date(day=1, month=2, year=2003)` print?
2) When have you seen a function call like this before?
3) When and why is it useful to call functions this way?


#### ADVANCED: Encapsulation of an If/Print Block
The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) to the computer and then the computer will print a label.
```
import random
for i in range(10):

    # simulating the mass of a chicken egg
    # the (random) mass will be 70 +/- 20 grams
    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)

    print(mass)

    # egg sizing machinery prints a label
    if mass >= 85:
        print("jumbo")
    elif mass >= 70:
        print("large")
    elif mass < 70 and mass >= 55:
        print("medium")
    else:
        print("small")
```
The if-block that classifies the eggs might be useful in other situations, so to avoid repeating it, we could fold it into a function, `get_egg_label()`. Revising the program to use the function would give us this:
```
# revised version
import random
for i in range(10):

    # simulating the mass of a chicken egg
    # the (random) mass will be 70 +/- 20 grams
    mass = 70 + 20.0 * (2.0 * random.random() - 1.0)

    print(mass, get_egg_label(mass))
```
Create a function definition for `get_egg_label()` that will work with the revised program above. Note that the `get_egg_label()` function’s return value will be important. Sample output from the above program would be `71.23 large.`

A dirty egg might have a mass of more than 90 grams, and a spoiled or broken egg will probably have a mass that’s less than 50 grams. Modify your `get_egg_label()` function to account for these error conditions. Sample output could be `25 too light, probably spoiled.`


#### ADVANCED: Encapsulating Data Analysis
Assume that the following code has been executed:
```
import pandas as pd

data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0)
japan = data_asia.loc['Japan']
```
1) Complete the statements below to obtain the average GDP for Japan across the years reported for the 1980s.

In [None]:
year = 1983
gdp_decade = 'gdpPercap_' + str(year // ____)
avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2

2) Abstract the code above into a single function.

In [None]:
def avg_gdp_in_decade(country, continent, year):
    data_countries = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0)
    ____
    ____
    ____
    return avg

3) How would you generalize this function if you did not know beforehand which specific years occurred as columns in the data? For instance, what if we also had data from years ending in 1 and 9 for each decade? (Hint: use the columns to filter out the ones that correspond to the decade, instead of enumerating them in the code.)

## Variable Scope
- How do function calls actually work?
- How can I determine where errors occurred?

The **scope** of a variable is the part of a program that can ‘see’ that variable. The part of a program in which a variable is *visible* is called its scope.

- There are only so many sensible names for variables.
- People using functions shouldn’t have to worry about what variable names the author of the function used.
- People writing functions shouldn’t have to worry about what variable names the function’s caller uses.


In [None]:
pressure = 103.9

def adjust(t):
    temperature = t * 1.43 / pressure
    return temperature

Here, `pressure` is a **`global`** variable:
- it is outside of any particular funciton
- it is visible everywhere

 `t` and `temperature` are **local** variables in `adjust`.
- Defined in the function.
- Not visible in the main program.

*Remember: a function parameter is a variable that is automatically assigned a value when the function is called.*

In [None]:
print('adjusted:', adjust(0.9))
print('temperature after call:', temperature)

### Excercises
#### Local vs Global
Trace the values of all variables in this program as it is executed. (Use ‘—’ as the value of variables before and after they exist.)
```
limit = 100

def clip(value):
    return min(max(0.0, value), limit)

value = -22.5
print(clip(value))
```

## Programming Style
Programmers use certain conventions to make their code more easily read and understood by others.  Using a consistent coding style is a gift for future you as well as a gift for others who may want to re-purpose your code for their own ends.  

Python proposed a standard style through one of its first Python Enhancement Proposals (PEP), [PEP8](https://peps.python.org/pep-0008/).

Some points worth highlighting:

- document your code and ensure that assumptions, internal algorithms, expected inputs, expected outputs, etc., are clear
- use clear, semantically meaningful variable names
- use white-space, not tabs, to indent lines (tabs can cause problems across different text editors, operating systems, and version control systems)

More information about programming style, including standardization tools, may be found on the [Software Carpentry page for this episode.](https://swcarpentry.github.io/python-novice-gapminder/18-style.html#follow-standard-python-style-in-your-code.) 

### Use assertions to check for internal errors.
Assertions are a simple but powerful method for making sure that the context in which your code is executing is as you expect.

In [None]:
def calc_bulk_density(mass, volume):
    '''Return dry bulk density = powder mass / powder volume.'''
    assert volume > 0
    return mass / volume

If the assertion is False, the Python interpreter raises an AssertionError runtime exception. The source code for the expression that failed will be displayed as pUse docstrings to provide builtin helpart of the error message. To ignore assertions in your code run the interpreter with the ‘-O’ (optimize) switch. Assertions should contain only simple checks and never change the state of the program. For example, an assertion should never contain an assignment.

### Use docstrings to provide built-in help
If the first thing in a function is a character string that is not assigned directly to a variable, Python attaches it to the function, accessible via the builtin help function. This string that provides documentation is also known as a docstring.


In [None]:
def average(values):
    "Return average of values, or None if no values are supplied."

    if len(values) == 0:
        return None
    return sum(values) / len(values)

help(average)

### Excercises
#### Document this!
Use comments to describe and help others understand potentially unintuitive sections or individual lines of code. They are especially useful to whoever may need to understand and edit your code in the future, including yourself.

Use docstrings to document the acceptable inputs and expected outputs of a method or class, its purpose, assumptions and intended behavior. Docstrings are displayed when a user invokes the builtin help method on your method or class.

Turn the comment in the following function into a docstring and check that help displays it properly.

In [None]:
def middle(a, b, c):
    # Return the middle value of three.
    # Assumes the values can actually be compared.
    values = [a, b, c]
    values.sort()
    return values[1]

help(middle)

#### Clean up this code!
1) Read this short program and try to predict what it does.
2) Run it: how accurate was your prediction?
3) Refactor the program to make it more readable. Remember to run it after each change to ensure its behavior hasn’t changed.
4) Compare your rewrite with your neighbor’s. What did you do the same? What did you do differently, and why?


In [None]:
n = 10
s = 'et cetera'
print(s)
i = 0
while i < n:
    # print('at', j)
    new = ''
    for j in range(len(s)):
        left = j-1
        right = (j+1)%len(s)
        if s[left]==s[right]: new = new + '-'
        else: new = new + '*'
    s=''.join(new)
    print(s)
    i += 1