# Introduction

20 years ago, you could be a psychologist and never write a single bit of code. Today, that's no longer true. All research is, in part, computational now. The cogs program is nice because it recognizes this fact very explicitly. Most students at this point have either already taken an intro coding class or are taking one right now. One difficulty with the cogs program is the disparity in coding skills between students. For computer science students, you might find this lesson pretty boring. Unfortunately, this lecture isn't really meant for you. Things will get a lot more interesting in the next couple of weeks. For other students, I hope this one is at just the pace you need.

We'll be learning Python. Python is great because it is the most widely used language in data science industries. It's easier to learn than many other languages. We'll use Python and google colab throughout the entire semester

If it has been awhile since you've learned any coding there are a couple of things to remember:

- It won't make sense all at once. Coding is really demanding on your working memory. There are a ton of rules to keep track of, functions that you need to remember what they do and variables names that hide information. Taking two or more passes at the same material can do wonders if you feel your brain getting tired during class today.

- Write the code down as we go. Merely retyping the code someone else wrote can do wonders for your understanding. When you make a typo and you're cell doesn't run, you immediately learn why the code needs to be structured the way it is. I know you can download all the code from Canvas after class, but try to keep yourself active during the lecture.

- Errors messages are your friend, not your enemy. Sometimes people thing that receiving an error message means they did something bad or they are bad programming. That is simply not true. Errors are just as important to learning as the satisfaction of doing it right the first time.

- Tinker. If at any point, you find yourself wondering 'what would happen if I did *this*' please try that out. You can detach from the group lecture for a bit and try things out. You have plenty of tools to learn whatever you missed if you tune out part of the lecture.

# Reviewing Downey

The Downey reading introduced two concepts we'll need a lot: variables and libraries. Let's make sure we all extracted the important bits from the reading. He has an equation for compounding interest.

$$V = P(1 + \frac{r}{n})^{nt}$$

Where $P$ is the starting amount in your bank, $r$ is the yearly interest rate, $t$ is the number of years it is in the bank, and $n$ is compounding frequency. He also gives us these three four variables.

In [1]:
P = 2100
r = 0.034
n = 4
t = 7

Finally, he translates the formula into a Python expression:

In [2]:
P * (1 + r/n)**(n*t)

2661.6108980682593

I have two questions for you: 

- First, what if you get paid your interest only twice a year? In other words, modify this example so n = 2 and recalculate.

In [None]:
## Some space for you to work

In [None]:
## and a second space if you want it

- The second question has a bit of a preface. There is another interest formula for continuously compounding interest.

$ V = Pe^{rt} $

Where `e` is Euler's number, a special irrational number like $\pi$. Basic python doesn't know about special irrational numbers. So we need libraries (also called packages) to access them. Here is how we do that.

In [3]:
import numpy as np

np.e

2.718281828459045

Implement the continuously compounding interest rate formula and evaluate it.

In [None]:
## implement it here

## Handling error messages

***

Wait. Attempt the problem above before looking at this.

***

The first time I wrote my solution, I wrote `p * np.e ** (r * t)`. If you try to run this, you get an error message that claims `NameError: name 'p' is not defined`

In [6]:
p * np.e ** (r * t)

NameError: name 'p' is not defined

This means that, at no prior place in our code, did we define what `p` was. You must define your variables first before Python will allow operations on them. In our case, it's a capitalization problem. We used uppercase `P` as our varible above, instead of lower case `p`. 

The lesson here: when you see a NameError message, first check if you defined that variable above. If you think you did, pay super close attention to spelling and capitalization.

# Lists

Now it is time for new material. Here is what a list looks like.

In [None]:
a = [5,6,7,8]
a

What's special about lists is that python keeps track of the order items go into the list. And we can use the ordering to pull entries out. The position of each item is called an index. The index of the first item is 0. Annoying, I know but that's how it goes. Respectable computer scientists count up from zero. We stick square brackets next to a list with an index number to pull out the entry.

In [None]:
a[0]

In [None]:
a[3]

In [None]:
a[2]

Python knows to turn the indexed listed into regular numbers before it runs arithmetic on them. So you can write commands where it all happens together.

In [None]:
# 8 - 5 = 3

a[3] - a[0]

Be careful using arithemtic symbols near lists. They don't always do what you expect. The plus sign combines two lists.

In [None]:
a = [6,5,4]
b = [5,6,7]

a + b

In [None]:
[1,2] + [3]

While multipling a list will expand it.

In [None]:
a = [5,5]

a * 4

In [None]:
a = [6,7,8,9]

a * 3

## Two primatives

Two primative commands are very helpful with lists. `len()` will find out how long the list is. Here's a few examples:

In [None]:
len([1,2,3,4,5])

In [None]:
len([])

In [None]:
len([3,3,3,3])

Meanwhile, `sum` will add all the entires in the list

In [None]:
sum([1,2,3,4,5])

In [None]:
sum([])

In [None]:
sum([3,3,3,3])

Notice that `len()` and `sum()` both have parentheses. They are both *functions*. You tell the function which things you are trying to assess the length of by wrapping them in parentheses.

Together, len() and sum() can find the average value of a list. See if you can figure out how. Find the average of this list.

`[20,40,60,50]`

### hide answer

Don't peak until you've tried it.

*** 

I formatted this cell so it is hidden. If you peak at the answer before trying it yourself, you're doing a bad thing. 

In [None]:
sum([20,40,60,40]) / len([20,40,60,40])

## Handling errors with lists

Suppose we wrote this:

In [None]:
len(1,2,3,4,5)

"argument" in Python is a technical term. It refers to how many distinct objects you can give to a function. The arguments are seperated by commas. So this function is given 5 distinct arguments, all seperated by commas. `len()` isn't prepared for 5 arguments. It wants only one thing: one list. So once we wrap our numbers in square brackets, we get have a single object.

One helpful lesson here: square brackets for lists, parentheses for functions. 

# Defining your own functions

Reading Python code can be mentally draining. It's a strange, foreign language where you have to slowly parse through its sentences and symbols. The mentally draining part doesn't go away even when you become experienced, either. I still find myself constantly overwhelmed by long opaque blocks of code.

One strategy for managing the mental strain is to chunk. If you have five operations that always go together to accomplish a task, maybe group them together and give them a simple name. This way, you can just remember what the simple name is and what it does, rather than remembering what all 5 operations every time you need to do that task. We can chunk operations by defining our own functions.

The averaging procedure had two things going on, `len()` and `sum()`. But we want to chunk our averaging operation so there is only one thing to remember.

In [None]:
def average(a):
    
    output = sum(a) / len(a)
    
    return output

Notice a few things. First, the indentation. Indenting indicates that these lines define the function. When you are done writing the function, simply stop indenting. 

In [None]:
def average(a):
    
    # the function's definition
    
    output = sum(a) / len(a)
    
    return output

# everything else

average([20,40,60,40])

Second, notice that we do not use a specific list in the function definition. Instead, we use a variable `a`. So any list we hand over to `average()` will be temporarily renamed `a`.

Third, notice the `return` line. This indicates what the output of the function will be. `return` usually goes last in a function. 

To practice these skills, write a function called `bank` that applies one of the interest rate formulas from Downey. It should consume either three or four arguments and return the amount of money in your bank account after a period of interest. You can use either formula (continuously compounding or the regular compounding interest rate).

In [None]:
## write it here

## Sample answers

In [27]:
def bank(P,r,n,t):
    
    output = P * (1 + r/n)**(n*t)
    
    return output

bank(P,r,n,t)

2661.6108980682593

In [29]:
def bank(P,r,t):
    
    import numpy as np
    
    output = P * np.e ** (r*t)
    
    return output

bank(P,r,t)

2664.2893049323125

# For loops

Most of the problems you'll solve will involve going through each item in a list and using it in some way. Python has a couple of techniques to do this. To keep things simple, we'll only use `for` loops.



In [None]:
# here's what for statements look like. 
# Notice the colon.
# Notice the indention
# `for` comes first, then the name of a local variable, then 'in', then the name of our list

a = [1,2,3,4]

for item in a:
    print(item)

In [None]:
# the word 'item' is arbitrary. It's a local variable that we'll refer 
# to in the process of running the loop. So we can use silly names
# and get the same effect.

for big_bird in a:
    print(big_bird)

In [None]:
# we also don't have to use a variable name for the list. 
# We can just stick the list there.
# I've also replaced big_bird with i. Using i the conventional
# name for the item during for loops.

for i in [1,2,3,4]:
    print(i)

In [None]:
# usually we want to perform an operation on each item in a list.

for i in [3,6,9]:
    output = i / 3
    print(output)

## Simulating functions

Suppose we have a list of x values and we want to find the corresponding y values, given a function:

$$ y = 5 + 2x $$

If these are the x values 

`[1,3,3,5,7,10]`

then print out the y values.

In [None]:
# a cell for your answer.



### Hide answer

In [None]:
x = [1,3,3,5,7,10]

for i in x:
    y = 5 + 2*i
    print(y)

## Building new lists out of old lists

When working with lists, we usually want to transform the same list many times. The above approach can't help us here because it just prints out the answers. We need to save our answers to a new list. We'll learn two techniques to accomplish this. Each technique is more convenient in difference situations.

### Appending

The usual technique has us create a blank list first. As we transform the values in the first list, we'll move them into the blank list. `.append()` will get the job done.

In [None]:
xs = [1,3,3,5,7,10]
ys = []

for i in xs:
    y = 5 + 2*i
    ys.append(y)
    
ys

Notice the odd syntax on `ys.append()`. The little dot notation indicates we are using a special part of Python called a method. Methods and functions are similar but have slightly different syntax. Function spit out values. We need to store them somewhere so we usually use `=` to save their value. Methods are attached to complex objects like lists. They automatically save changes to the underlying list object. So there is no need for '='.

If the distinction isn't super clear, do not worry about it. You'll develop an intuition for methods and functions are you work.

### List comprehension

List comprehension is a really compact way getting the same job done.

In [None]:
xs = [1,3,3,6,7,9,10]
ys = [5 + 2*i for i in xs]

ys

The trick here is that we wrap a 'for each' statement inside square brackets. This means python is automatically building a new list as it works through the entries in the old list. Or, more generally:

`NEWLIST = [some_function(i) for i in OLDLIST]`

## Defining functions

Loop operations are a great place to exercise chunking. They can grow quite complicated and it would be nice if we could organize them into tidy little packages. So we could call the above operation `linear`.

In [None]:
def linear(xs):
    
    ys = [5 + 2*i for i in xs]
    
    return ys

Try writing another function called `linear2`. This one should implement the same linear equation $ y = 5 + 2x $, except use the append method instead. Verify that they lead you to the same answer.

# Plotting

What we want most in life is beauty. A package called `matplotlib` provides that beauty.

In [None]:
import matplotlib.pyplot

Now we can plot some points.

In [None]:
matplotlib.pyplot.plot([0,1,2,3,4,6,8,10])

Notice the weird dot syntax again. Plotting is a method attached to the complex object `matplotlib`. `matplotlib` has a sub-object called `pyplot` which has a method `plot()`.

If we stick a list of numbers into `plot()` it will treat the numbers as the y-axis and the index as the x-axis. Alternatively, if we feed `plot()` with two lists, seperated by a comma, we can control the respective x and y axes.

In [None]:
xs = [2,2.5,6,9,20]
ys = [1,2,3,4,5]

matplotlib.pyplot.plot(xs,ys)

Let's combine a couple of techniques at this point to check our understanding. Let's use one of techniques for building new lists to transform this list of numbers:

`[4,5,4,6,4,7]`

with this function:

$$ y = -1 + \frac{1}{2}*x$$

and then plot the results.

In [None]:
# give it a try



### hide answer

In [None]:
xs = [4,5,8,10,15,25]
ys = [-1 + 0.5*i for i in xs]

matplotlib.pyplot.plot(xs,ys)

## Abbreviating packages

Typing out `matplotlob.pyplot.plot()` all the time is a pain. You can shorten the process by using a special import command.

In [None]:
import matplotlib.pyplot as plt

plt.plot(xs,ys)

Every time you need the phrase `matplotlob.pyplot` you merely type `plt`.

In [None]:
plt.plot(xs,ys)

# Flipping coins

We will finish todays notes by building a mechanism that simulates flipping coins. We'll use this simulation to transition into talking about binomial distributions next class. Let's suppose I wanted to flip 10 coins and see how many are heads. We'll build up to this mechanism slowly.

## Numpy

In working with probabilities, we often need a way to generate random numbers. Base python does not have a great way of making these numbers. But we can use another package called `numpy` to accomplish that. It has a sub-object called `random` which gives us a collection of methods for making random numbers. The simplest one is also called `random()`. So, confusingly, we write our code like this:

In [None]:
np.random.random()

Try repeatedly running the above cell. Notice that all the numbers fall between 0 and 1. We have a trick for simulating a random event with a specific probability. We'll set a threshold value that will classify these random numbers as successes or failures. Then we'll compare whether the randomly generated value is less than the threshold value.

If we set it to 0.5, half the time we'll get success and the other half we'll get failure. So this is just like flipping a coin where the probability of heads is $\frac{1}{2}$.

In [None]:
np.random.random() < 0.5

But if we want the probability to be bigger, we can just set a bigger threshold. Now 80% of the time we get success and the 20% of the time we get failure.

In [None]:
np.random.random() < 0.8

## Ranges

We often want to repeat a process multiple times. We can use a `for` loop to accomplish this in conjunction with a new kind of object called a range.

In [None]:
range(10)

A range is like list of increasing integers. It's easier to see if you convert it into a list.

In [None]:
list(range(10))

If you want to repeat a process 10 times, you can use a `for` statement that loops over a range object. Let's print the number 5 10 times.

In [None]:
for i in range(10):
    print(5)

Now we are ready to flip a coin 10 times.

In [None]:
for i in range(10):
    print(np.random.random())

The above code does not classify flips into heads or tails. Rewrite the copy so it classifies them. In this case, we'll call heads True and tails False. Assume the coin has a 50% chance of landing on heads.

In [None]:
# try it

### Hide answer

In [None]:
for i in range(10):
    print(np.random.random() < 0.5)

# Wrap up

I hope you found today's material manageable and well-paced. Between the Downey readings and these notes, you have all of the basic skills you need to work on stats problems in Python. So you've accomplished a lot in a couple of sittings. Coding takes practice and the cognitive load can quickly build up. So if you feel uncertain about this material, it would really benefit you to go through some or all of this material again. Reading extra chapters out of the Downey textbook is another good strategy. 

Next class, we'll look at our main case study for the first unit and start testing hypotheses.