<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">
 
# Variables and Types
 
_Authors: Kiefer Katovich (San Francisco), Dave Yerrington (San Francisco), Joseph Nelson (Washington, D.C.), Sam Stack (Washington, D.C.)_

## Why Python?

### Pros

- Great data science ecosystem
- General-purpose (unlike R, which is specialized for statistics)
- Open-source
- Readable
- Relatively low barrier to entry, yet powerful enough to run Instagram

### Cons

- Slow (compared to lower-level languages such as C)
- Error-prone (compared to more rigid languages)
- Jack of all trades, master of none
- Limited support for creating desktop or mobile apps

## Variables

Variables are names that have been assigned to specific objects.

*Create a variable to represent the number of students in class.*

In [1]:
num_students = 20

num_students

20

*Create a variable to represent the number of instructors in class.*

In [2]:
num_instructors = 2

num_instructors

2

*Use those variables to calculate the total number of people in class.*

In [3]:
num_students + num_instructors

22

### Restrictions on Variable Names

- Cannot start with numbers (i.e., `2`, `10_data_points`).
- Cannot match names of Python keywords (i.e., '`for`', '`and`', '`elif`').
- Cannot contain spaces or periods.

### Best Practices for Variable Names
- Should be *descriptive* and *unambiguous*.
- Shorter is better all else being equal, but clarity comes first.

### Python Convention for Variable Names

Should be `snake_case` (all lowercase, with underscores between words).

## Types in Programming

The *type* of an object in a programming language tells the computer what operations are defined for that object and how they are defined.

For example, adding two `int`s (integer numbers, with no decimal) works as you would expect.

*Add two `int`s.*

In [4]:
1 + 1

2

Adding two `float`s (numbers with decimals) also works as you would expect. 

*Add two `float`s.*

In [5]:
1.5 + 1.5

3.0

For many "container" objects, including strings (sequences of characters), `+` concatenates.

*Add two `str`ings (sequences of characters).*

In [6]:
"1" + "1"

'11'

*Add two lists (ordered collections of objects).*

In [7]:
[1] + [1]

[1, 1]

For some types, `+` is not defined.

*Uncomment and run the line below to try to add two dicts (key-value pairs).*

In [8]:
# {"name": "Michael Jordan"} + {"number": 23}

> **Side Note:** Learning to read error messages is important.
>
> **Start at the *bottom*** to see what *type* of error was raised and a descriptive error message. These error messages will often seem very cryptic at first, but they will become clearer as you learn more about how Python works.
>
> Just above that, **you can see the line that triggered the error** and a few lines above it. Sometimes the line you need to fix is actually just above the line that triggered the error -- for instance, if you missed a closing bracket on line 2, then you would not get an error until line 3 because Python doesn't know until line 3 that the appropriate closing bracket is not coming.

The built-in function `len` returns the number of objects in many container types.

*Apply `len` to a string.*

In [9]:
len("11")

2

`len` is not defined for non-container types such as `int`s.

*Uncomment and run the cell below to try to apply `len` to an `int`.*

In [10]:
# len(11)

**Summary:** The type of an object determines what operations are defined for it and how they are defined.

## Overview of Common Types in Python

### Single Elements

- **Integer:** Whole number ranging from negative infinity to infinity, such as `1`, `0`, `-5`, etc.
- **Float:** A number with a decimal, such as `2.0` or `3.14159`.
- **Boolean:** `True` or `False`.

### Containers

- **Strings:** A sequence of characters, e.g., `"The fox is quick."`
- **Lists:** An ordered sequence of arbitrary Python objects, e.g., `[1, 'a', ['hello']]`.
- **Tuples:** Similar to a list, except that once it is created, it cannot be modified. E.g., `(1, 'a', ['hello'])`.
- **Dictionaries**: An unordered collection of key-value pairs, e.g., `{'name': 'Mark Twain', 'birth_year': 1835}`.

## Arithmetic Operators

Python has special built-in symbols called *operators* for performing common computations.

In [13]:
print(1 + 2)
print(1 - 2)
print(1 * 2)
print(1 / 2)

3
-1
2
0.5


Python uses `**` for exponents.

In [14]:
2 ** 3

8

The modulo operator % gives the remainder from division

In [15]:
5 % 2

1

Modulo is useful e.g. for determining whether a number is even or odd.

In [16]:
print(1 % 2)
print(2 % 2)
print(3 % 2)
print(4 % 2)

1
0
1
0


## Reassignment

Suppose we have a "counter" variable that we want to increment by one. We can simply add one to it and reassign it to itself:

In [17]:
counter = 0
print(counter)

counter = counter + 1
print(counter)

counter = counter + 1
print(counter)

0
1
2


Python also provides a shorthand for incrementing a variable in this way:

In [18]:
counter = 0
print(counter)

counter += 1
print(counter)

counter += 1
print(counter)

0
1
2


It also provides analogous operators for other basic operations:

In [19]:
counter /= 2
print(counter)

counter *= 5
print(counter)

counter -= 3
print(counter)

1.0
5.0
2.0


These reassignment operators also work with other types when the corresponding mathematical operator is defined.

*Use += to add on to a string.*

In [20]:
s = "hello "
s += "George"

s

'hello George'

**Exercise**

*Time:* 2 mins.\
*Format:* Individual\
*Post answers:* Yes

- Assign any `int` value to a variable `x`. Use the reassignment operators shown above to carry out the following steps.
    - Double `x`.
    - Add 9 to `x`.
    - Subtract 3 from `x`.
    - Divide `x` by 2.
    - Subtract your initial value from `x`.

In [21]:
x = 56789
x *= 2
x += 9
x -= 3
x /= 2
x -= 56789

In [22]:
# Running this cell will raise an `AssertionError` if you made a mistake
assert x == 3

## Booleans and Logical Operators

Booleans come in two values: `True` and `False`.

Booleans are frequently used to filter data or conditions. Sometimes, we may want all countries with populations greater than 4,000,000 or all people named Bob. Both of these result in a `True` or `False` condition that split our data into the groups we want.

In Python, there are several built-in commands for deciding how to filter results:

- `and`: Are both A and B true?
- `or`: Is at least one of A and B true?
- `not`: Is A false?

In [23]:
4 > 3

True

In [24]:
4 > 3 and 100 > 0

True

In [25]:
4 > 3 and 2 > 3

False

In [26]:
True and False

False

In [27]:
4 > 3 or 2 > 3

True

In [28]:
True or False

True

In [29]:
not 5 > 4

False

In [30]:
not False

True

**Exercise**

*Time:* 1 mins\
*Format:* Individual\
*Post answers:* No, just speak up on Zoom

- What should the expression `(3 > 4 or 5 < 12) and 2 > 3` evaluate to? Figure out your answer, then confirm it by having Jupyter evaluate that expression.


False

In [31]:
(3 > 4 or 5 < 12) and 2 > 3

False

$\blacksquare$

Boolean operators do surprising things when they operate on non-Boolean values. For instance, you might expect this code to return `True`, but it doesn't:

*Apply `or` to a Boolean expression and an `int`.*

In [32]:
x = 5
x == 3 or 5

5

> **Side note:** See [this article](https://realpython.com/python-or-operator/#using-or-with-common-objects) for an explanation of what is going on here.

Do it this way instead:

In [33]:
x == 3 or x == 5

True

Or alternatively:

In [34]:
x in [3, 5]

True

## Comparison Operators

- Less than: `<`
- Greater than: `>`
- Less than or equal to: `<=`
- Greater than or equal to: `>=`
- Equals: `==`
- Does not equal: `!=`

**Note:**

- The equality operator `==` checks whether two values are equal.
- The assignment operator `=` assigns a value to a variable.

**Slack poll.**

```
/poll "Which of these expressions evaluates to `True`?" "`2 > 1`" "`2 < 1`" "`2 > 2`" "`2 < 2`" "`2 >= 2`" "`2 <= 2`" "`2 != 2`" anonymous
```

#### Comparisons for Collections

In general, collections are considered equal when they have the same length and corresponding elements are equal.

In [35]:
print([1, 2] == [1, 2])
print([1.0, 2.0] == [1, 2])
print([1, 2] == [2, 1])
print([1] == [1, 2])

True
True
False
False


## Deeper Dive on Common Python Types

### Strings

*Create a string `s`.*

In [36]:
s = "hello world"
type(s)

str

*Find the length of `s`.*

In [37]:
len(s)

11

You can alternatively use single quotes to define a string.

In [38]:
# fmt: off
s = 'hello world'
type(s)

str

To define a string that *contains* a single quote, you can either put double quotes on the outside or "escape" the internal single quote with `\` (and similarly for a string that contains a double quote).

In [39]:
s2 = "I'm hungry"
print(s2)

# fmt: off
s2 = 'I\'m hungry'
print(s2)

I'm hungry
I'm hungry


There is no general preference in the Python community between single quotes and double quotes, but there is a consensus to use double quotes when your string contains single quotes and vice versa to avoid the need to use the escape character `\`.

*Replace an element of `s`.*

In [40]:
s.replace("world", "universe")

'hello universe'

`.replace()` is an example of a **method** -- a function that's built into all objects of a given type.

Functions (e.g. `print`, `len`, and `type`) and methods (e.g. `str.replace`) are effectively the same thing, except that a method is defined within a particular class and is called on instances of that class using `.` followed by the method name.

#### String Indexing

In some cases, we may want a part of the string (like the first character for alphabetizing or categorizing). Indexing helps us do that.

We can extract characters at specific index locations in a string using indexing.

*Index the first (index 0) character in the `s`.*

In [41]:
s[0]

'h'

The number you enter after the variable name in brackets (the `[0]`) is called the index (its plural is indices).

Counting in Python uses _zero-based indexing_, meaning that numbering starts with 0 instead of 1.

You can pass a range of index values to select multiple elements of a string. This process is called "slicing."

*Slice `s` to get its first three characters.*

In [42]:
s[0:3]

'hel'

Slicing in Python is *inclusive on the left* and *exclusive on the right*, so in this case we get the elements at positions 0, 1, and 2, but *not* the element at position 3.

It is helpful to think of Python indexes as being associated with the boundaries between items rather than the items themselves, so that e.g. `s[0:3]` corresponds to the items between 0 and 3 as shown below:

![](../assets/images/slicing.png)

> **Side Note:** If you are curious, [this article](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) presents an argument for Python's system of zero-based indexing and making ranges inclusive on the left but not on the right.

*Tip*: Right-click on the tab for this notebook and click on "New View for Notebook" to get two windows into this notebook side-by-side. You can then keep the figure above in view as we proceed. This technique is also useful during exercises so that you can keep the exercise in view on one side while scrolling back through the lesson for guidance.

<img src="../assets/images/new_view1.png" width=401>

<img src="../assets/images/new_view2.png" width=700>

*Slice from index 6 up to the end of the string.*

In [43]:
s[6:]

'world'

*Get a copy of the entire string by not specifying a start or end.*

In [44]:
s[:]

'hello world'

In addition to specifying a range, you can include a step size or character skip rate. This might be helpful if you want every other letter, for example. 

*Slice every second character starting at 0 and ending at 10.*

In [45]:
s[0:10:2]

'hlowr'

*Define a step size of 2; i.e., every other character.*

In [46]:
s[::2]

'hlowrd'

*Do the same for a list of numbers.*

In [47]:
[0, 1, 2, 3, 4, 5, 6][::2]

[0, 2, 4, 6]

You can also use a *negative* step to move from right to left, for instance to reverse a string.

*Reverse a string.*

In [48]:
s[::-1]

'dlrow olleh'

You can also use negative numbers to specify positions starting from the end.

*Select the last character.*

In [49]:
s[-1]

'd'

*Select all but the last three characters.*

In [50]:
s[:-3]

'hello wo'

*Select the last two characters.*

In [51]:
s[-2:]

'ld'

#### Concatenating and Interpolating Strings

Adding strings with `+` returns their concatenation.

In [52]:
x = "hello "
y = "world"

x + y

'hello world'

You can insert a non-string variable into a string if you first "cast" it to string using the `str` function.

In [53]:
dice_roll = 3

"You rolled a " + str(dice_roll) + "."

'You rolled a 3.'

You can alternatively give the `print` function multiple items separated with commas to print them separated by spaces.

In [54]:
print("You rolled a", dice_roll)

You rolled a 3


A more elegant approach to inserting variables into strings is to use an f-string.

In [55]:
dice_roll = 3

f"You rolled a {dice_roll}."

'You rolled a 3.'

An f-string allows you to insert arbitrary Python expressions into strings.

*Do math inside an f-string.*

In [56]:
f"You rolled {3 + 5}."

'You rolled 8.'

> **Side Note:** Python provides at least two additional ways to insert the values of variables into strings. `f-string`s are generally preferred to these other options as of Python 3.6 (which introduced `f-strings`), but you will still see these other approaches:
>
> - [C-style string formatting](https://docs.python.org/2.4/lib/typesseq-strings.html) (the standard approach in Python 2, uncommon now)
> - [.format method](https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3)

**Exercise**

*Time:* 4 mins\
*Format:* Pairs\
*Post answers:* Yes

- Create a string called `my_string` that is at least 12 characters long and contains at least one vowel.

In [57]:
my_string = "little house"

- Use a built-in function and comparison operator to make sure that it is at least 12 characters long.

In [58]:
len(my_string) >= 12

True

- Replace all instances of one of the vowel types that occurs in your string with the string `'vowel'`.

In [59]:
my_string.replace("e", "vowel")

'littlvowel housvowel'

- Use the addition operator to concatenate another string to your string.

In [60]:
my_string + " in the big woods"

'little house in the big woods'

- Produce the same result using `my_string` in an `f-string`.

In [61]:
f"{my_string} in the big woods"

'little house in the big woods'

- **BONUS:** Replace all vowels in your string with the string `'vowel'`. (This task is tricky, and it requires skills that have been covered in the prework but not in this lesson.)

*Note*: Simply using one `replace` call for each vowel won't work! (Why not?) Try creating a new, empty string and then appending to it appropriate as you iterate over your existing string.


A chain of replace methods (e.g. `my_string.replace('a', 'vowel').replace('e', 'vowel').replace('i', 'vowel').replace('o', 'vowel').replace('u', 'vowel')`) won't work because it replaces one of the vowels in "vowel" with the word "vowel."

In [62]:
new_string = ""

for char in my_string:
    if char in "aeiou":
        new_string += "vowel"
    else:
        new_string += char

In [63]:
new_string

'lvowelttlvowel hvowelvowelsvowel'

$\blacksquare$

### Lists

A **list** is a mutable sequence of Python objects, which can have any combination of types.

**Warning:** Do not use `list` as a variable name. `list` is a built-in Python function, so running e.g. `list = [1, 2, 3, 4]` overwrites that function. For generic examples, use a name like `alist` or `my_list` instead. For real code, use a descriptive variable name that indicates what the list represents (e.g. `model_accuracies` for a list of model accuracy scores).

*Create an example list.*

In [64]:
my_list = [1, 2, 3, 4]

Lists are **mutable**, meaning that you can modify them after they are created.

*Replace the item at index 1 with `999`.*

In [65]:
my_list[1] = 999
my_list

[1, 999, 3, 4]

*Create a list of strings.*

In [66]:
names = ["Carol", "Anne", "Jessica"]
print(names)

['Carol', 'Anne', 'Jessica']


*Add a new item to the list.*

In [67]:
names.append("Michelle")
names

['Carol', 'Anne', 'Jessica', 'Michelle']

**Exercise**

*Time:* 1 mins\
*Format:* Individual\
*Post answers:* No, answer on mic

- What kind of object is `append`? What other object of this kind have we seen in this notebook?


`append` is a **method**, like `replace` on strings.

$\blacksquare$

*Slice `names` to get the names "Anne" and "Jessica".*

In [68]:
names[1:3]

['Anne', 'Jessica']

*Slice `names` to get the names "Carol" and "Jessica".*

In [69]:
names[::2]

['Carol', 'Jessica']

Lists can have mixed types. In fact, a list can contain arbitrary combinations of Python objects.

*Create a list with mixed types.*

In [70]:
[1, "a", 1.0, ["hi"]]

[1, 'a', 1.0, ['hi']]

Python has some useful built-in functions for containers in addition to `len`, such as `sum`, `max`, and `min`.

*Get the sum, max, and min of a list of `int`s.*

In [71]:
nums = [1, 2, 3]

print(sum(nums))
print(max(nums))
print(min(nums))

6
3
1


### `range`

We can create a container of values in a range using the range() function. The order of arguments is `start`, `stop`, `step`, just like in slicing.

In [72]:
range(10, 30, 2)

range(10, 30, 2)

`range()` actually produces a special `range` object rather than a list. You can "cast" this object to a list in order to see its contents.

*Cast the `range` object to a list.*

In [73]:
list(range(10, 30, 2))

[10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

`step` is 1 by default.

In [74]:
list(range(10, 20))

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

If you give `range` just one argument, it interprets that argument as "stop" and uses a default value of 0 for "start".

In [75]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

**Exercise**

*Time:* 6 mins\
*Format:* Pairs\
*Post answers:* Yes

- Create a list of five elements.

In [76]:
my_list = [1, 2, "a", "3", 7]

- Display the last three elements.

In [77]:
my_list[-3:]

['a', '3', 7]

- Insert two new elements at the end of the list (one after the other).

In [78]:
my_list.append("oops")
my_list.append("big oops")

- Print just the elements of your list that have odd-numbered indices.

In [79]:
my_list[1::2]

[2, '3', 'oops']

- Create a list of all of the odd positive integers less than 10.

In [80]:
list(range(1, 10, 2))

[1, 3, 5, 7, 9]

- **BONUS:** Create a range object that generates all of the numbers from 1 to 100, inclusive; then slice it to get every fifth number starting with 17 and ending with 82. You might want to cast the result to a list so that you can confirm your solution.

In [81]:
list(range(1, 100)[16:82:5])

[17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82]

$\blacksquare$

### Tuples

Tuples are like lists, but they are **immutable**.

Immutability has two big advantages:

1. **Safety:** Immutability eliminates a major source of bugs in which one part of your program changes an object in a way that another part doesn't expect.
2. **Hashability:** Only immutable objects are "hashable," meaning that Python can assign them a sort of address that it can use to find them later. Only hashable objects can be used in certain roles that require rapid lookups, such as dictionary keys or items in sets.

*Create a tuple of two numbers to represent a point in the `(X, Y)` plane.*

In [82]:
point = (10, 20)
point

(10, 20)

Parentheses are not strictly necessary for creating a tuple, but they help ensure the correct order of operations.

*Create the same tuple without using parentheses.*

In [83]:
10, 20

(10, 20)

Tuples can be sliced and indexed, just like lists and strings.

In [84]:
point[0]

10

You cannot append to a tuple -- why not?

*Uncomment and run the cell below to see the error you get when you try to append to a tuple.*

In [85]:
# point.append(30)

You can "unpack" a container by assigning it to a tuple of variables with the same length as the container.

*Unpack `point` into `x` and `y`. Put parentheses around `x` and `y`.*

In [86]:
(x, y) = point

print(x)
print(y)

10
20


*Unpack `"hi"` into `a` and `b`. Do not put parentheses around `a` and `b`.*

In [87]:
a, b = "hi"

print(a)
print(b)

h
i


### Dictionaries

A *list* stores values in an **ordered sequence** of cubbyholes that we access by **position**.

A *dictionary* stores values in an **unordered set** of cubbyholes that access by a name that we call a **key**.

#### Example

*Create a dictionary mapping English color words to the corresponding Spanish words. Misspell one so that we can correct it later.*

In [88]:
color_translations = {"red": "rojo", "green": "verde", "blue": "azule"}

*Retrieve the value for "red."*

In [89]:
color_translations["red"]

'rojo'

*Add an entry for "yellow."*

In [90]:
color_translations["yellow"] = "amarillo"

Dictionary keys are unique. Assigning a value to a key that already exists will overwrite its current value rather than creating a second instance of that key value.

*Correct our misspelling.*

In [91]:
color_translations["blue"] = "azul"
color_translations

{'red': 'rojo', 'green': 'verde', 'blue': 'azul', 'yellow': 'amarillo'}

#### Hashability

Dictionary *values* can be anything, but dictionary *keys* have to hashable. For instance, Python lists are not hashable, so they cannot be dictionary keys.

*Uncomment and run the cell below to see the error you get when you try to use a list as a dict key.*

In [92]:
# color_translations[["purple", "violet"]] = "morado"

Tuples are hashable, so they can be dictionary keys.

*Use a tuple as the key instead.*

In [93]:
color_translations[("purple", "violet")] = "morado"
color_translations

{'red': 'rojo',
 'green': 'verde',
 'blue': 'azul',
 'yellow': 'amarillo',
 ('purple', 'violet'): 'morado'}

A tuple key doesn't make much sense here, so let's delete it.

In [94]:
del color_translations[("purple", "violet")]
color_translations

{'red': 'rojo', 'green': 'verde', 'blue': 'azul', 'yellow': 'amarillo'}

## For Reference: How Python Uses Various Sorts of Brackets

- `[]`: lists, indexing/lookup/slicing, list comprehensions.
- `()`: order of operations, tuples, function and method calls and definitions, generator expressions.
- `{}`: dictionaries and sets.

<img src="../assets/images/containers.png" width=600>

## Summary

- Python has the following built-in types (among others):
    - Single-element:
        - `int` for whole numbers
        - `float` for decimal numbers
        - `bool` for `True` and `False` values
    - Container:
        - `str` for sequences of characters
        - `list` for holding arbitrary objects
        - `tuple` that is similar to `list` but immutable 
        - `dict` for storing arbitrary objects in named "cubbyholes."
- Ordered container types such as `str`, `list`, and `tuple` allow you to select items by position.
- The built-in `range` function returns a special `range` object containing all of the integers from a specified starting number (by default 0) to a specified stopping number (not included) with a specified step size (by default 1).
- Python has built-in functions such as `len` that take in objects, perform some operations on them, and return a result.
- Python types have built-in methods such as `str.replace`, which are basically functions that are attached to instances of the given type.

| Type | Create with... | Mutable? | Supports indexing? | Requires hashable elements? |
|--------|--------------------------------------|----------|---------------------------------|----------------------------------|
| string | ' ' or " " | No | Position-based | No |
| list | [] | Yes | Position-based | No |
| tuple | () or just commas | No | Position-based | No |
| dict | {} with colons between key and value | Yes | Key-based | Keys yes, values no |