# Midterm review

Let's review what we have covered so far in the course.

## Python variables

In Python, we can create simple variables with numbers or text data. All Python variables can be displayed using the `print` function.

In [1]:
integer = 4
float = 5.3
string = "Hello world."
print(integer)
print(float)
print(string)

4
5.3
Hello world.


Each type of variable has ways we can operate on them. For numbers, we can do all the usual math operations.

In [2]:
a = 2
b = 3.5
c = 7
print(a * b)
print(a ** c)
print(a + b + c)

7.0
128
12.5


We can import the `math` module to get access to additional functions to do more advanced math operations and access mathmatical constants like $\pi$ and $e$.

In [3]:
import math
print(math.cos(2))
print(math.exp(1))
print(math.e ** 1)

-0.4161468365471424
2.718281828459045
2.718281828459045


There are also ways that we can work with text data. Some common operations are concatenating strings and putting various variables together to make a string using f-strings.

In [4]:
s1 = "Hello"
s2 = "world."
print(s1 + " " + s2)

Hello world.


In [5]:
user = "Mark S."
print(f"Greetings, {user}")

Greetings, Mark S.


There are many functions for working with strings that we haven't talked about much. We can see the functions that apply to a type of Python object using the `help` function.

In [6]:
# help(str)

For example, we can use `startswith` to check if a string starts with specific text, or `endswith` to check if it ends with some text.

In [7]:
s = "Hello world"
print(s.startswith("Hello"))
print(s.endswith("world"))

True
True


## Python lists

We can store sequences of any kind of variable in lists. Lists always have square braces (`[]`) around them. Items are separated by commas. Lists are most helpful if there is some order to the data; for example, if we want to store the responses that a participant made in a series of trials in a psychology study. The trials occurred in a specific order, so a list is a natural way to store information about them.

In [8]:
numbers = [1, 2, 3]
letters = ["a", "b", "c"]
mix = [1, 2.3, "Hello world."]

Once we've placed created a list, we can add items onto the end using `append`. Sometimes, we may want to start with an empty list and add items to the end as needed; this can be useful together with `for` loops.

In [9]:
growing = []
growing.append(1)
growing.append(2)
growing.append(3)
print(growing)

[1, 2, 3]


We can access data in the list based on its position in the list. To get one item, we can use indexing. Similar to creating a list, we access parts of the list using square braces (`[]`).

Remember that, in Python, indexing starts at zero. This means that the first item in the list can be accessed using index 0.

In [10]:
trials = [1, 2, 3, 4, 5]
print(trials[0])
print(trials[4])

1
5


We can use negative indices also, to get items relative to the end of the list.

In [11]:
print(trials[-1])

5


We can also access part of a list to get a new list, using slicing. Slices are defined using `start:stop` syntax.

Slicing works to select the items in the list that are between the slice start and stop (see below).

```
list:  [  1,  2,  3,  4,  5  ]
index: [  0   1   2   3   4  ]
slice: [0   1   2   3   4   5]
```

If the start is not specified, then the slice will start at the beginning of the list. If the stop is not specified, then the slice will go until the end of the list.

In [12]:
print(trials[3:4])
print(trials[:2])
print(trials[3:])

[4]
[1, 2]
[4, 5]


We can use `for` loops to process data in lists and to create new lists. Here, we initialize an empty list, then loop over a list of numbers and calculate the square of each number, adding that onto the list of `squares`.

In [13]:
numbers = [1, 2, 3, 4]
squares = []
for n in numbers:
    squares.append(n ** 2)
print(squares)

[1, 4, 9, 16]


List comprehensions are a way to do this in one line of code, looping over items in a list and creating a new list from the old items.

In [14]:
squares2 = [n ** 2 for n in numbers]
print(squares2)

[1, 4, 9, 16]


## Python dictionaries

Python dictionaries are useful for storing key, value pairs. For example, data for a study might be stored under a participant identifier key for the data for each participant.

We can create a dictionary using curly braces (`{}`). Each key, value pair is indicated by a colon, and pairs are separated by commas. Keys can be most anything, but can't be anything mutable, like lists, which can be changed after they are created.

In [15]:
d = {"001": 9, "002": 11, "003": 7}

The values can be any Python object. For example, this means we can have dictionaries of lists.

In [16]:
d2 = {"condition1": [1, 2, 3], "condition2": [4, 5, 6]}

We can even have dictionaries of dictionaries.

In [17]:
data = {
    "001": {"age": 23, "score": 9},
    "002": {"age": 29, "score": 11},
    "003": {"age": 32, "score": 10},
}

After creating a dictionary, we can continue adding key, value pairs using assignment. On the left side, we indicate a key that we want to define, then an equal sign and the value we want to assign to it. Note that we use square braces (`[]`) to access a dictionary, similarly to indexing a list, but we put in a key instead of a list index.

In [18]:
d3 = {"a": 1, "b": 2, "c": 3}
d3["d"] = 4
print(d3)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}


We can access data in the dictionary using square braces also. We can use this to programmatically access data that has previously been stored in the dictionary.

In [19]:
print(data["001"])

{'age': 23, 'score': 9}


For example, we can use this with a `for` loop to access data from a dictionary. Here, we'll use the `keys` method to get all keys in a dictionary. We'll then run some code for all the participants stored in our `data` dictionary.

In [20]:
for id in data.keys():
    age = data[id]["age"]
    score = data[id]["score"]
    print(f"{id}: {age} years; score: {score}")

001: 23 years; score: 9
002: 29 years; score: 11
003: 32 years; score: 10


## Python functions

In addition to the built-in Python functions, we can easily create our own functions to complete repetitive tasks.

We define functions using the `def` keyword. Before writing a function, it's a good idea to decide on what the inputs and outputs from the function should be. For example, say we want to make a function that squares numbers that are passed into that function. You should usually also include a one-line description at the start of the function as a docstring.

In [21]:
def square(x):
    """Calculate the square of a number."""
    return x ** 2

The list of inputs comes after the name of the function (`square`), within the parentheses (here, there is one variable called `x`). There is one output returned using the `return` statment (`x ** 2`).

We can also set default values. This can be very useful for more complex functions that can be configured in various ways. For example, the `print` function has a `sep` keyword that we can modify.

In [22]:
a = 1
b = 2
print(a, b)
print(a, b, sep=" ; ")

1 2
1 ; 2


We can define keyword arguments by adding an equal sign and some default value after a function input. This lets us set some default behavior that can be modified when calling the function.

In [23]:
def power(x, pow=2):
    """Calculate some power of a number."""
    return x ** pow
print(power(2))
print(power(2, pow=8))

4
256


## If/elif/else statements

We can use tests of conditions to determine what code to run. They are often useful in combination with functions, allowing a function to change what it runs depending on the inputs that are passed to it.

The simplest way to use an `if` statement is by checking one condition.

In [24]:
def greeting(user):
    if user == "world":
        print("Hello world.")
greeting("world")
greeting("someone else")  # doesn't print anything

Hello world.


We can also use more complex `if` statements to check for multiple conditions. For example, say we want to make a function that can calculate various statistics. We can use multiple `if`/`elif` statements to check for a series of different conditions. The conditions are checked in order, and the first one that is True is run. If the `stat` is `"mean"`, then the `return np.mean(x)` line will run. If the `stat` is `"std"`, then the first two checks will be False, and then the third block will run `return np.std(x)`.

When we are checking for multiple conditions, it's a good idea to include an `else` statement that runs in case none of the conditions applied. In that case, we can't really run the function, so we'll raise an exception. In this case, we use a `ValueError`, which indicates that we encountered an unexpected value in the function inputs.

In [25]:
import numpy as np
y = np.array([3, 9, 3, 4, 1, 7, 8, 5, 4])
def calculate_stat(x, stat):
    if stat == "mean":
        return np.mean(x)
    elif stat == "median":
        return np.median(x)
    elif stat == "std":
        return np.std(x)
    else:
        raise ValueError(f"Unknown statistic: {stat}")
print(calculate_stat(y, "mean"))
print(calculate_stat(y, "std"))
# calculate_stat(y, "min")  # throws an error because we the calculate_stat function doesn't expect it

4.888888888888889
2.469567863432541


## For loops

Sometimes, we may want to run some code multiple times, such as running the same operation on each item in a list. We can do this using `for` loops.

Say we have a list indicating the condition code for a set of trials in a psychology study, and we want to count up the number of trials for each condition. Here, each trial is either a "target" trial or a "lure" trial.

In [26]:
trial_type = ["target", "target", "lure", "target", "lure", "target", "target", "lure"]

For example, we can use a `for` loop with a counter to get the number of target trials. The `+=` operator takes a variable (in this case, `count`) and adds a number to it. For example, `count += 1` adds 1 to the current value of `count`.

In [27]:
count = 0
for tt in trial_type:
    if tt == "target":
        count += 1
print(count)

5


This code is a lot easier to use if we put it in a function. Let's define a function that can work on any list of trial type codes. We'll take in two inputs, which indicate the trial labels indicating their type, and the type that we want to count. We'll have the default to "target". The function will return the number of times that the `type_to_count` appears in the list of trials.

In [28]:
def count_trials(trial_types, type_to_count="target"):
    count = 0
    for tt in trial_types:
        if tt == type_to_count:
            count += 1
    return count

Now we have a function that can count any trial types we want.

In [29]:
trial_type = ["target", "target", "lure", "target", "lure", "target", "target", "lure"]
lure_count = count_trials(trial_type, type_to_count="lure")
print(lure_count)

3


There are other, slightly different ways we can run `for` loops. If we don't have a specific list to iterate over, we might instead want to run a for loop over a range of numbers. We can do this using the `range` function. For example, say we want to calculate the sum of the numbers 1 through 10. For this, we can use `range(1, 11)`, which will generate the numbers 1 through 10.

In [30]:
total = 0
for i in range(1, 11):
    total += i
print(total)

55


Like we did for the trial counting loop, we can turn this into a reusable function to make our code more flexible.

In [31]:
def count_range(start, stop):
    total = 0
    for i in range(start, stop):
        total += i
    return total

Now we can use our code to count the total for any range of numbers.

In [32]:
print(count_range(1, 101))  # sum of numbers 1–100

5050


Sometimes, we might want to get the index of each item in a list. We can use `range(len(mylist))` to generate these indices.

In [33]:
trial_type = ["target", "target", "lure", "target", "lure", "target", "target", "lure"]
for i in range(len(trial_type)):
    print(trial_type[i])

target
target
lure
target
lure
target
target
lure


Here, we generated all the indices of the list, and then just used those indices to access each item in the list and print them out.

Sometimes, it's useful to get both each item in a list and the corresponding index. We can get that using `enumerate`.

In [34]:
trial_type = ["target", "target", "lure", "target", "lure", "target", "target", "lure"]
for i, tt in enumerate(trial_type):
    print(i, tt)

0 target
1 target
2 lure
3 target
4 lure
5 target
6 target
7 lure


Notice that we now have two variables (`i, tt`) on the left side of the `in` keyword. This unpacks the two things that `enumerate` produces on each step of the loop, and assigns them to two variables (`i` gets the index, and `tt` gets the current trial type).

Finally, we can loop over two lists at the same time using the `zip` function. For example, say we have trial type and response on each of a set of trials.

In [35]:
trial_type = ["target", "lure", "lure", "target", "lure", "target", "target", "lure"]
response = ["old", "old", "new", "old", "new", "new", "old", "new"]

This code will count the number of responses that are hits and false alarms.

In [36]:
hits = 0
false_alarms = 0
for tt, r in zip(trial_type, response):
    if tt == "target" and r == "old":
        hits += 1
    elif tt == "lure" and r == "old":
        false_alarms += 1
print(hits, false_alarms)

3 1


For problems like this, it's convenient to loop over both lists at the same time. Like in the `enumerate` example, note that we need to unpack the two outputs of `zip` on each part of the loop. The `tt` variable gets the current trial type, while the `r` variable gets the current response.

## While loops

We can use `for` loops when we know in advance how many times we want to run some set of commands. If we instead want to run something until some condition is met, we can use a `while` loop instead. The `while` type of loop is less common in data science, but it is sometimes useful.

To illustrate the idea with a real-world example, say that we are at a donut shop. We can use a `for` loop to pick out a dozen donuts. For the numbers 1 through 12, we will pick a donut and ask the person behind the counter to place it in our box.

After we get the donuts, we could use a `while` loop to eat the donuts until we are full. We might not know in advance how many donuts we want to eat; instead, we might decide to eat a donut, check if we are still full, eat another donut if we are not full, etc., until we are full.

For example, say we want to count successive numbers until we get a sum over 10.

In [37]:
total = 0
current = 0
while total <= 10:
    current += 1
    total += current
print(current)

5


The current number starts at one. On each step of the loop, we check if the total is less than or equal to 10. If not, then we exit the loop. Each time the loop runs, we increment the current number and add it to the total.

This program will add successive numbers until the total gets over 10. This takes 5 steps, because $1 + 2 + 3 + 4 + 5 = 15$, which is greater than 10.

## NumPy arrays

While we can organize data in lists in Python, for a lot of scientific applications it's easier to work with NumPy arrays instead. Arrays are designed for working with sets of data points, such as response accuracy or response times on individual trials.

## Descriptive statistics