## Variables and basic types

It's common to introduce programming by first talking about variables, and we won't
break with this convention. A variable, as you probably already know, is a store
of data that can take on different values (hence the name *variable*), and is
typically associated with a fixed name.

### Declaring variables

In Python, we declare a variable by writing its name and then assigning it a
value with the equal (`=`) sign:

In [1]:
my_favorite_variable = 3

Notice that when we initialize a variable, we don't declare its *type* anywhere.
If you're familiar with *statically typed* languages like C++ or Java, you're
probably used to having to specify what type of data a variable holds when you
create it. For example, you might write `int my_favorite_number = 3` to indicate
that the variable is an integer. In Python, we don't need to do this. Python is
*dynamically typed*, meaning that the type of each variable will be determined
on the fly, once we start executing our program. It also means we can change the
type of the variable on the fly, without anything bad happening. For example, by
over-writing it with a character string value, instead of the integer value that
was once stored in this variable:

In [2]:
my_favorite_variable = "zzzzzzz"

### Printing variables

We can examine the contents of a variable at any time using the built-in
`print()` function:

In [3]:
print(my_favorite_variable)

zzzzzzz


If we're working in an interactive environment like a Jupyter notebook, we may
not even need to call `print()`, as we'll automatically get the output of the
last line evaluated by the Python interpreter:

In [4]:
# this line won't be printed, because it isn't the last line in the notebook cell to be evaluated
"this line won't be printed"

# but this one will
my_favorite_variable

'zzzzzzz'

As you can see in this example, the hash sign (`#`) is used for comments. That
means that any text that is after a `#` is ignored by the Python interpreter and
not evaluated.

### Built-in types

All general-purpose programming languages provide the programmer with different *types* of variables—things like strings, integers, Booleans, and so on. These are the most basic building blocks a program is made up of. Python is no different and provides us with several [built-in types](https://docs.python.org/3/library/stdtypes.html). Let's take a quick look at some of these.

#### Integers

An integer is a numerical data type that can only take on finite whole numbers
as its value. For example:

In [5]:
number_of_subjects = 20
number_of_timepoints = 1000
number_of_scans = 10

Any time we see a number written somewhere in Python code, and it's composed
only of digits (no decimals, quotes, etc.), we know we're dealing with an
integer.

In Python, integers support all of the standard arithmetic operators you're
familiar with--addition, subtraction, multiplication, etc. For example, we can
multiply the two variables we just defined:

In [6]:
number_of_subjects * number_of_timepoints

20000

Or divide one integer by another:

In [7]:
number_of_timepoints / number_of_scans

100.0

Notice that the result of the above division is *not* itself an integer! The
decimal point in the result gives away that the result is of a different type -- a
*float*.

#### Floats

A float (short for *floating point number*) is a numerical data type used to represent
real numbers. As we just saw, floats are identified in Python by the presence of
a decimal.

In [8]:
roughly_pi = 3.14
mean_participant_age = 24.201843727

All of the standard arithmetic operators work on floats just like they do on
ints:

In [9]:
print(roughly_pi * 2)

6.28


We can also freely combine ints and floats in most operations:

In [10]:
print(0.001 * 10000 + 1)

11.0


Observe that the output is of type `float`, even though the value is a whole
number, and hence could in principle have been stored as an `int` without any
loss of information. This is a general rule in Python: arithmetic operations
involving a mix of `int` and `float` operands will almost always return a
`float`. Some operations will return a `float` even if all operands are `int`s,
as we saw above in the case of division.

#### Exercise

The Python built-in `type()` function reports to you the type of a variable that is passed to it. Use the `type` function to verify that `number_of_subjects * number_of_timepoints` is a Python integer, while `number_of_timepoints / number_of_scans` is not. Why do you think that Python changes the result of a division into a variable of type float?

#### Strings

A string is a sequence of characters. In Python, we define strings by enclosing
zero or more characters inside a pair of quotes (either single or double quotes
work equally well, so you can use whichever you prefer; just make sure the
opening and closing quotes match!).

In [11]:
country = "Madagascar"
ex_planet = 'Pluto'

Python has very rich built-in functionality for working with strings. Let's look
at some of the things we can do.

We can calculate the length of a string:

In [12]:
len(country)

10

Or convert it to uppercase (try also `.lower()` and `.capitalize()`):

In [13]:
country.upper()

'MADAGASCAR'

We can count the number of occurrences of a substring (in this case, a single letter `a`):

In [14]:
country.count("a")

4

Or replace a matching substring with another substring:

In [15]:
country.replace("car", "truck")

'Madagastruck'

One thing that you might notice in the above examples is that they seem to use
two different syntaxes. In the first example, it looks like `len()` is a
*function* that takes a string as its parameter (or *argument*). By contrast,
the last 3 examples use a different "dot" notation, where the function comes
after the string (as in `country.upper()`). If you find this puzzling, don't
worry! We'll talk about the distinction in much more detail below.

#### Exercise

Write code to count how many times the combination "li" appears in the string "supercalifragilisticexpialidocious". Assign this value into a new variable named `number_of_li` and print its value.

#### Booleans

Booleans operate pretty much the same in Python as in other languages; the main
thing to recognize is that they can only take on the values `True` or `False`.
Not `true` or `false`, not `"true"` or `"false"`. The only values a boolean can
take on in Python are `True` and `False`, written exactly that way. For example:

In [16]:
enjoying_book = True

One of the ways that boolean values are typically generated in Python programs
is through logical or comparison operations. For example, we can ask whether the
length of a given string is greater than a particular integer:

In [17]:
is_longer_than_2 = len("apple") > 2
print(is_longer_than_2)

True


Or whether the product of the first two numbers below equals the third...

In [18]:
is_the_product = 719 * 1.0002 == 2000
print(is_the_product)

False


Or, we might want to know whether the conjunction of several sub-expressions is `True` or `False`:

In [19]:
("car" in country) and (len("apple") > 2) and (15 / 2 > 7)

True

This last example, simple as it is, illustrates a nice feature of Python: its
syntax is more readable than that of most other programming languages. In the
above example, we ask if the substring `"car"` is contained in the string
`country` using the English language word `in`. Similarly, Python's logical
conjunction operator is the English word `and`. This means that we can often
quickly figure out -- or at least, vaguely intuit -- what a piece of Python code
does.

#### Exercise

Some integer values are equivalent to the Python Boolean values. Use the
equality (`==`) operator to find integers that are equivalent to `True` and that
are equivalent to `False`.


#### None

In addition to these usual suspects, Python also has a type called `None`.
`None` is special and indicates that no value has been assigned to a variable.
It's roughly equivalent to the `null` value found in many other languages.

In [20]:
name = None

Note: `None` and `False` are *not* the same thing!

In [21]:
name == False

False

Also, assigning the value `None` to a variable is not the same as not defining
the variable in the first place. Instead, a variable that is set to `None` is
something that we can point to in our program without raising an error but
doesn't carry any particular value. These are subtle but important points, and
in later chapters, we'll use code where the difference becomes important.

## Collections

Most code we're going to want to write in Python will require more than just
integers, floats, strings, and booleans. We're going to need more complex data
structures, or *collections*, that can hold other objects (like strings,
integers, etc.) and enable us to easily manipulate them in various ways. Python
provides built-in support for many common data structures, and others can be
found in modules that come installed together with the language itself -- the
so-called "standard library" (e.g., in the
[collections](https://docs.python.org/3/library/collections.html) module).

### Lists

Lists are the most common collection we'll work with in Python. A list is a
*heterogeneous* collection of objects. By heterogeneous, we mean that a list can
contain elements of different types. It doesn't have to contain only strings or
only integers; it can contain a mix of the two, as well as all kinds of other
types.

#### List initialization

To create a new list, we enclose one or more values between square brackets (`[`
and `]`). Elements are separated by commas. Here is how we initialize a list
containing 4 elements of different types (an integer, a float, and two strings).

In [22]:
random_stuff = [11, "apple", 7.14, "banana"]

#### List indexing

Lists are *ordered* collections, by which we mean that a list retains a
memory of the position each of its elements was inserted in. The order of
elements won't change unless we explicitly change it. This allows us to access
individual elements in the list directly, by specifying their position in the
collection, or *index*.

To access the $i^{th}$ element in a list, we enclose the index $i$ in square
brackets. Note that Python uses 0-based indexing (i.e., the first element in the
sequence has index 0), and not 1 as in some other data-centric languages
(Julia, R, etc.). For example, it means that the following operation returns
the second item in the list and not the first.

In [23]:
random_stuff[1]

'apple'

Many bitter wars have been fought on the internet over whether 0-based or
1-based indexing is better. We're not here to take a philosophical stand on this
issue; the fact of the matter is that Python indexing is 0-based, and that's not
going to change. So whether you like it or not, you'll need to make your peace
with the idea that indexing starts from 0 while you're reading this book.

#### Exercise

Indexing with negative numbers: in addition to indexing from the beginning of the list, we can index from the end of the list using negative numbers (e.g., `random_stuff[-1]`). Experiment indexing into the list `random_stuff` with negative numbers. What is the negative number index for the last item in the list? What is the negative number index for the first item in the list? Can you write code that would use a negative number to index the first item in the list, without advance knowledge of its length?

#### List slicing

Indexing is nice, but what if we want to pull more than one element at a time
out of our list? Can we easily retrieve only part of a list? The answer is yes!
We can *slice* a list, and get back another list that contains multiple
contiguous elements of the original list, using the colon (`:`) operator.

In [24]:
random_stuff[1:3]

['apple', 7.14]

In the list-slicing syntax, the number before the colon indicates the start
position, and the number after the colon indicates the end position. Note that
the start is inclusive and the end is exclusive. That is, in the above example,
we get back the 2nd and 3rd elements in the list, but *not* the 4th. If it
helps, you can read the `1:3` syntax as saying *I want all the elements in the
list starting at index `1` and stopping just before index `3`*.

#### Assigning values to list elements

Lists are *mutable* objects, meaning that they can be modified after they've
been created. In particular, we very often want to replace a particular list
value with a different value. To overwrite an element at a given index, we
assign a value to it, using the same indexing syntax we saw above:

In [25]:
print("Value of first element before re-assignment:", random_stuff[0])

random_stuff[0] = "eleventy"

print("Value of first element after re-assignment:", random_stuff[0])

Value of first element before re-assignment: 11
Value of first element after re-assignment: eleventy


#### Appending to a list

It's also very common to keep appending variables to an ever-growing list. We
can add a single element to a list via the `.append()` function (notice again
that we are calling a function using the 'dot' notation, we promise that we'll
come back to that later!).

In [26]:
random_stuff.append(88)
print(random_stuff)

['eleventy', 'apple', 7.14, 'banana', 88]


#### Exercise

There are several ways to combine lists, including the `append` function you saw above, as well as the `extend` method. You can also add lists together using the addition (`+`) operator.

Given the following two lists:

`list1 = [1, 2, 3]`

`list2 = [4, 5, 6]`

How would you create a new list called `list3` that has the items: `[6, 5, 1, 2, 3]`, with as few operations as possible and only using indexing operations and functions associated with the list (hint: you can look up these functions in the Python [online documentation for lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists).

### Dictionaries (dict)

Dictionaries are another extremely common data structure in Python. A dictionary
(or `dict`) is a mapping from keys to values; we can think of it as a set of
key/value pairs, where the keys have to be unique (but the values don't). Many
other languages have structures analogous to Python's dictionaries, though
they're usually called something like *associative arrays*, *hash tables*, or
*maps*.

#### Dictionary initialization

We initialize a dictionary by specifying comma-delimited key/value pairs inside
curly braces. Keys and values are separated by a colon. It looks like this:

In [27]:
fruit_prices = {
    "apple": 0.65,
    "mango": 1.5,
    "strawberry": "$3/lb",
    "durian": "unavailable",
    5: "just to make a point"
}

Notice that both the keys and the values can be heterogeneously typed (observe
the last pair, where the key is an integer).

#### Accessing values in a dictionary

In contrast to lists, you can't access values stored in a dictionary directly by
their serial position. Instead, values in a dictionary are accessed by their
key. The syntax is identical to that used for list indexing. We specify the key
whose corresponding value we'd like to retrieve in between square brackets:

In [28]:
fruit_prices['mango']

1.5

And again, the following example would fail, raising a `KeyError` telling us
there's no such key in the dictionary:

In [29]:
fruit_prices[0]

KeyError: 0

However, the reason the above key failed is *not* that integers are invalid
keys. To prove that, consider the following:

In [30]:
fruit_prices[5]

'just to make a point'

Superficially, it might look like we're requesting the 6th element in the
dictionary and getting back a valid value. But that is not what is actually
happening here. If it's not clear to you why `fruit_prices[0]` fails while
`fruit_prices[5]` succeeds, go back and look at the code we used to create the
`fruit_prices` dictionary. Carefully inspect the keys and make sure you
understand what's going on.

#### Updating a dictionary

Updating a dictionary uses the same `[]`-based syntax as accessing values,
except we now make an explicit assignment. For example, we can add a new entry
for the `ananas` key:

In [31]:
fruit_prices["ananas"] = 0.5

Or over-write the value for the `mango` key:

In [32]:
fruit_prices["mango"] = 2.25

And then look at the dict again:

In [33]:
print(fruit_prices)

{'apple': 0.65, 'mango': 2.25, 'strawberry': '$3/lb', 'durian': 'unavailable', 5: 'just to make a point', 'ananas': 0.5}


#### Exercise

Add another fruit to the dictionary. This fruit should have several different
values associated with it, organized as a list. How can you access the second
item in this list in one single call?


### Tuples

The last widely-used Python collection we'll discuss here (though there are many
other more esoteric ones) is the *tuple*. Tuples are very similar to lists in
Python. The main difference between lists and tuples is that lists are
*mutable*, meaning, they can change after initialization. Tuples are
*immutable*; once a tuple has been created, it can no longer be modified.

We initialize a tuple in much the same way as a list, except we use parentheses
(round brackets) instead of square brackets:

In [34]:
my_tuple = ("a", 12, 4.4)

Just to drive home the immutability of tuples, let's try replacing a value and
see what happens:

In [35]:
my_tuple[1] = 999

TypeError: 'tuple' object does not support item assignment

Our attempt to modify the tuple raises an error. Fortunately, we can easily
convert any tuple to a list, after which we can modify it to our heart's
content.

In [36]:
converted_from_tuple = list(my_tuple)
converted_from_tuple[1] = 999
print(converted_from_tuple)

['a', 999, 4.4]


In practice, you can use a list almost anywhere you can use a tuple, though
there are some important exceptions. One that you can already appreciate is that
a tuple can be used as a key to a dictionary, but a list can't:

In [37]:
dict_with_sequence_keys = {my_tuple : "Access this value using a tuple!"}

In [38]:
dict_with_sequence_keys[converted_from_tuple] = "This will not work"

TypeError: unhashable type: 'list'

Admittedly, the error that this produces is a bit cryptic, but it relates
directly to the fact that a mutable object is considered a bit unreliable
because elements within it can change without notice.

## Everything in Python is an object

Our discussion so far might give off the impression that some data types in
Python are basic or special in some way. It's natural to think, for example,
that strings, integers, and booleans are "primitive" data types —- i.e., that
they're built into the core of the language, behave in special ways, and can't
be duplicated, or modified. And this is true in many other programming
languages. For example, in Java, there are exactly 8 primitive data types. If
you get bored of them, you're out of luck. You can't just create new ones -—
say, a new type of string that behaves just like the primitive strings, but adds
some additional functionality you think would be kind of cool to have.

Python is different: it doesn't *really* have any primitive data types. Python
is a deeply *object-oriented* programming language, and in Python, *everything
is an object*. Strings are objects, integers are objects, booleans are objects.
So are lists. So are dictionaries. *Everything* is an object in Python. We'll
spend more time talking about what objects are, and the deeper implications of
everything being an object, at the end of this chapter. For now, let's focus on
some of the practical implications for the way we write code.

### The dot notation

Let's start with the dot (`.`) notation we use to indicate that we're accessing
data or functionality inside an object. You've probably already noticed that
there are two kinds of constructions we've been using in our code to do things
with variables. There's the functional syntax, where we pass an object as an
argument to a function:

In [39]:
len([2, 4, 1, 9])

4

And then there's the object-oriented syntax that uses the dot notation, which we
saw when looking at some of the functionality implemented in strings:

In [40]:
phrase = "aPpLeS ArE delICIous"

phrase.lower()

'apples are delicious'

If you have some experience in another object-oriented programming language, the dot syntax will be old hat to you. But if you've mostly worked in data-centric languages (e.g., R or Matlab), you might find it puzzling at first.

What's happening in the above example is that we're calling a function attached to this object (this is called a "method" of the object) `lower()` *on* the `phrase` string itself. You can think of the dot operator `.` as expressing a relationship of belonging, or roughly translating as "look inside of". So, when we write `phrase.lower()`, we're essentially saying, "call the `lower()` method that's contained inside of `phrase`". (We're being a bit sloppy here for the sake of simplicity, but that's the gist of it.)

Note that `lower()` works on strings, but, unlike functions like `len()` and `round()`, it isn't a built-in function in Python. We can't just call `lower()` directly:

In [41]:
lower("TrY to LoWer ThIs!")

NameError: name 'lower' is not defined

Instead, it needs to be called via an instance that contains this function, as we did above with `phrase`.

Neither is `lower()` a method that's available on *all* objects. For example, this won't work:

In [42]:
num = 6

num.lower()

AttributeError: 'int' object has no attribute 'lower'

Integers, as it happens, don't have a method called `lower()`. And neither do most other types. But strings do. And what the `lower()` method does, when called from a string, is return a lower-cased version of the string to which it is attached. But that functionality is a feature of the string type itself, and *not* of the Python language in general.

Later, we'll see how we go about defining new types (or classes), and specifying what methods they have. For the moment, the main point to take away is that almost all functionality in Python is going to be accessed via objects. The dot notation is ubiquitous in Python, so you'll need to get used to it quickly if you're used to a purely functional syntax.

#### Inspecting objects

One implication of everything being an object in Python is that we can always find out exactly what data an object contains, and what methods it implements, by inspecting it in various ways.

We won't look very far under the hood of objects in this chapter, but it's worth knowing about a couple of ways of interrogating objects that can make your life easier.

First, you can always see the type of an object with the built-in `type()` function, which you also saw before:

In [43]:
msg = "Hello World!"

type(msg)

str

Second, the built-in `dir()` function will show you all of the methods implemented on an object, as well as *static attributes*, which are variables stored within the object. Be warned that this will often be a long list and that some of the attribute names you see (mainly those that start and end with two underscores) will look a little wonky. We'll talk about those briefly later.

In [44]:
dir(msg)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


That's a pretty long list! Any name in that list is available to you as an *attribute* of the object (e.g., `my_var.values()`, `my_var.__class__`, etc.), meaning that you can access it and call it (if it is a function) using the dot notation. Notice that the list contains all of the string methods we experimented with earlier (including `lower`), as well as many others.

#### Exercise

Find the methods associated with "`int`" objects. Are they different from the methods associated with "`float`" objects?

## Control flow

Like nearly every other programming language, Python has several core language
constructs that allow us to control the flow of our code -- the order in which
functions get called and expressions are evaluated. The two most common ones are
conditionals (if-then statements) and for-loops.

### Conditionals

Conditional (or if-then) statements allow our code to branch -— meaning, we can
execute different chunks of code depending on which of two or more conditions is
met. For example:

In [45]:
mango = 0.2

if mango < 0.5:
    print("Mangoes are super cheap; get a bunch of them!")
elif mango < 1.0:
    print("Get one mango from the store.")
else:
    print("Meh. I don't really even like mangoes.")

Mangoes are super cheap; get a bunch of them!


The printed statement will vary depending on the value assigned to the `mango`
variable. Try changing that value and see what happens when you re-run the code.

Notice that there are three statements in the above code: `if`, `elif` (which in
Python stands for "else if"), and `else`. Only the first of these (i.e., `if`)
is strictly necessary; the `elif` and `else` statements are optional.

#### Exercise

There can be arbitrarily many `elif` statements. Try adding another one to the code above that executes only in the case that mangos are more expensive than 2.0 and less expensive than 5.0.

### Loops

For-loops allow us to iterate (or loop) over the elements of a collection (e.g.,
a list) and perform the same operation(s) on each one. As with most things
Python, the syntax is quite straightforward and readable:

In [46]:
for elem in random_stuff:
    print(elem)

eleventy
apple
7.14
banana
88


Here we loop over the elements in the `random_stuff` list. In each iteration
(i.e., for each element), we assign the value to the temporary variable `elem`,
which only exists within the scope of the `for` statement (i.e., `elem` won't
exist once the for-loop is done executing). We can then do whatever we like with
`elem`. In this case, we just print its value.

#### Looping over a range

While we can often loop directly over the elements in an array (as in the above
example), it's also very common to loop over a range of integer indices, which
we can then use to access data stored in some sequential form in one or more
collections. To facilitate this type of thing, we can use Python's built-in
`range()` function, which produces a sequence of integers starting from `0` and
stopping before the passed value:

In [47]:
num_elems = len(random_stuff)

for i in range(num_elems):
    val = random_stuff[i]
    print(f"Element {i}: {val}")

Element 0: eleventy
Element 1: apple
Element 2: 7.14
Element 3: banana
Element 4: 88


#### Exercise

The content that was printed in each iteration of the loop in the last example is formatted using a so-called "f-string". This is a way to compose strings that change based on information from the code surrounding them. An f-string is a string that has the letter "f" before it, as in this example, and it can contain segments enclosed by curly braces (`{` and `}`) that contain Python statements. In this case, the Python statements in each curly bracket are variable names, and the values of the variables at that point in the code are inserted into the string, but you could also insert small calculations that produce a result that then gets inserted into the string at that location. As an exercise, rewrite the code above so that in each iteration through the loop the value of `i` and the value of `i` squared are both printed. Hint: powers of Python numbers are calculated using the `**` operator.

### Nested control flow

We can also nest conditionals and for-loops inside one another (as well as inside other compound statements). For example, we can loop over the elements of `random_stuff`, as above, but keeping only the elements that meet some condition—e.g., only those elements that are strings:

In [48]:
# create an empty list to hold the filtered values
strings_only = []

# loop over the random_stuff list
for elem in random_stuff:
    # if the current element is a string...
    if isinstance(elem, str):
        # ...then append the value to strings_only
        strings_only.append(elem)

print("Only the string values:", strings_only)

Only the string values: ['eleventy', 'apple', 'banana']


### Comprehensions

In Python, for-loops can also be written in a more compact way known as a *list
comprehension* (there are also dictionary comprehensions, but we'll leave that
to you to look up as an exercise). List comprehensions are just a bit of
*syntactic sugar* -— meaning, they're just a different way of writing the same code
but don't change the meaning in any way. Here's the list comprehension version
of the for-loop we wrote above:

In [49]:
p = [print(elem) for elem in random_stuff]

eleventy
apple
7.14
banana
88


We can also embed conditional statements inside list comprehensions. Here's a
much more compact way of writing the string-filtering snippet we wrote above:

In [50]:
strings_only = [elem for elem in random_stuff if isinstance(elem, str)]

print("Only the string values:", strings_only)

Only the string values: ['eleventy', 'apple', 'banana']


List comprehensions can save you quite a bit of typing once you get used to
reading them, and you may eventually even find them clearer to read. It's also
possible to nest list comprehensions (equivalent to for-loops within for-loops),
though that power should be used sparingly, as nested list comprehensions can be
difficult to understand.

#### Exercise

Using a comprehension, create a list, where each element is a tuple. The first element in each tuple should be the index of the element in `random_stuff` and the second element of the tuple should be its square.

### Whitespace is syntactically significant

One thing you might have noticed when reading the conditional statements and
for-loops above is that we always seem to indent our code inside these
statements. This isn't a matter of choice; Python is a bit of an odd duck among
programming languages, in that it imposes strong rules about how whitespace can
be used (i.e., whitespace is *syntactically significant*). This can take a bit
of getting used to, but once you do, it has important benefits: there's less
variability in coding style across different Python programmers, and reading
other people's code is often much easier than it is in languages without
syntactically significant whitespace.

The main rule you need to be aware of is that whenever you enter a *compound statement* (which includes for-loops and conditionals, but also function and class definitions, as we'll see below), you have to increase the indentation of your code. When you exit the compound statement, you then decrease the indentation by the same amount.

The exact amount you indent each time is technically up to you. But it's strongly recommended that you use the same convention everyone else does (described in the Python style guide, known as [PEP8](https://peps.python.org/pep-0008/)), which is to always indent or dedent by 4 spaces. Here's what this looks like in a block with multiple nested conditionals:

In [51]:
num = 800

if num > 500:
    if num < 900:
        if num > 700:
            print("Great number.")
        else:
            print("Terrible number.")

Great number.


#### Exercise
Modify the above snippet so that you (a) consistently use a different amount of indentation (for example, 3 spaces), and (b) break Python by using invalid indentation.

## Functions

Python would be of limited use to us if we could only run our code linearly from
top to bottom. Fortunately, as in almost every other modern programming
language, Python has *functions*: blocks of code that only run when explicitly
called. Some of these are built into the language itself (or contained in the
standard library's many modules we can import from, as we saw above):

In [52]:
approx_pi = 3.141592

round(approx_pi, 2)

3.14

Here we use the `round()` function to round a float to the desired precision (2
decimal places). The `round()` function happens to be one of the few dozen
"built-ins" included in the root Python namespace out of the box, but we can
easily define our own functions, which we can then call just like the built-in
ones. Functions are defined like this:

In [53]:
def print_useless_message():
    print("This is a fairly useless message.")

Here, we're defining a new function called `print_useless_message`, which, as you might expect, can print a fairly useless message. Notice that nothing happens when we run the above block of code. That's because all we've done is *define* the function; we haven't yet *called* or *invoked* it. We can do that like this:

In [54]:
print_useless_message()

This is a fairly useless message.


### Function arguments and return values

Functions can accept *arguments* (or parameters) that alter their behavior. When
we called `round()` above, we passed two arguments: the float we wanted to
round, and the number of decimal places we wanted to round it to. The first
argument is mandatory in the case of `round()`; if we try calling `round()`
without any arguments (feel free to give it a shot), we'll get an error. This
should make intuitive sense to you because it would be pretty strange to try to
round no value at all.

Functions can also explicitly *return* values to the user. To do this, we have to explicitly end our function with a `return` statement, followed by the variable(s) we want to return. If a function doesn't explicitly end with a `return` statement, then the special value `None` we encountered earlier will be returned.

Let's illustrate the use of arguments by writing a small function that takes a
single float as input, adds Gaussian noise (generated by the standard library's
`random` module), and returns the result.

In [55]:
import random

def add_noise(x, mu, sd):
    """Adds gaussian noise to the input.

    Parameters
    ----------
    x : number
        The number to add noise to.
    mu : float
        The mean of the gaussian noise distribution.
    sd : float
        The standard deviation of the noise distribution.

    Returns
    -------
    float
    """
    noise = random.normalvariate(mu, sd)
    return (x + noise)

The `add_noise()` function has three required parameters: The first (`x`) is the
number we want to add noise to. The second (`mu`) is the mean of the Gaussian
distribution to sample from. The third (`sd) is the distribution's standard deviation.

Notice that we've documented the function's behavior inside the function definition itself using what's called a *[docstring](https://www.python.org/dev/peps/pep-0257/)*. This a good habit to get into, as good documentation is essential if you expect other people to be able to use the code you write (including yourself in the future). In this case, the docstring indicates to the user what the expected type of each argument is, what the argument means, and what the function returns. In case you are wondering why it is organized in just this way, that is because we are following the conventions of docstrings established by the numpy project (and described in the [numpy docstring guide](https://numpydoc.readthedocs.io/en/latest/format.html)).

Now that we've defined our noise-adding function, we can start calling it. Note that because we're sampling randomly from a distribution, we'll get a different output every time we re-run the function, even if the inputs are the same.

In [56]:
add_noise(4, 1, 2)

2.550050038272467

#### Exercise

Based on the function definition provided above, define a new function that produces a sample of `n` numbers each of which is `x` with Gaussian noise of mean `mu` and standard deviation `std` added to it. The return value should be a list of length `n`, itself a parameter to the function.

### Function arguments

Python functions can have two kinds of arguments: *positional* arguments, and *keyword* (or *named*) arguments.

#### Positional arguments

Positional arguments, as their name suggests, are defined by position, and they
*must* be passed when the function is called. The values passed inside the
parentheses are mapped one-to-one onto the arguments, as we saw above for
`add_noise()`. That is, inside the `add_noise()` function, the first value is
referenced by `x`, the second by `mu`, and so on.

If the caller fails to pass the right number of arguments (either too few or too
many), an error will be generated:

In [57]:
add_noise(7)

TypeError: add_noise() missing 2 required positional arguments: 'mu' and 'sd'

In this case, the call to the function fails because the function has 3
positional arguments, and we only pass one


#### Keyword arguments

Keyword arguments are arguments that are assigned a default value in the
function *signature* (i.e., the top line of the function definition, that looks
like `def my_function(...)`). Unlike positional arguments, keyword arguments are
optional: if the caller doesn't pass a value for the keyword argument, the
corresponding variable will still be available inside the function, but it will
have whatever value is defined as the default in the signature.

To see how this works, let's rewrite our `add_noise()` function so that the
parameters of the gaussian distribution are now optional:

In [58]:
def add_noise_with_defaults(x, mu=0, sd=1):
    """Adds gaussian noise to the input.

    Parameters
    ----------
    x : number
        The number to add noise to.
    mu : float, optional
        The mean of the gaussian noise distribution.
        Default: 0
    sd : float, optional
        The standard deviation of the noise distribution.
        Default: 1

    Returns
    -------
    float
    """
    noise = random.normalvariate(mu, sd)
    return x + noise

This looks very similar, but we can now call the function without filling in
`mu` or `sd`. If we don't pass in those values explicitly, the function will
internally use the defaults (i.e., `0` in the case of `mu`, and `1` in the case
of `sd`). Now, when we call this function with only one argument:

In [59]:
add_noise_with_defaults(10)

9.9391820966154

Keyword arguments don't have to be filled in order, as long as we explicitly
name them. For example, we can specify a value for `sd` but not for `mu`:

In [60]:
# we specify x and sd, but not mu
add_noise_with_defaults(5, sd=100)

33.42359092478691

Note that if we didn't specify the name of the argument (i.e., if we called
`add_noise_with_defaults(5, 100)`, the function would still work, but the second
value we pass would be interpreted as `mu` rather than `sd` because that's the
order they were introduced in the function definition.

It's also worth noting that we can always explicitly name *any* of our
arguments, including positional ones. This is extremely handy in cases where
we're calling functions whose argument names we remember, but where we don't
necessarily remember the exact order of the arguments. For example, suppose we
remember that `add_noise()` takes the three arguments `x`, `mu`, and `sd`, but
we don't remember if `x` comes before or after the distribution parameters. We
can guarantee that we get the result we expect by explicitly specifying all the
argument names:

In [61]:
add_noise(mu=1, sd=2, x=100)

99.82944695944356

To summarize, functions let us define a piece of code that can be called as
needed and reused. We can define a default behavior and override it as
necessary.

*This tutorial was based on Chapter 5 of the [Neuroimaging and Data Science](https://neuroimaging-data-science.org/root.html) textbook.*