<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">
 
# Introduction to Python Fundamentals
 
_Authors: Kiefer Katovich (San Francisco), Dave Yerrington (San Francisco), Joseph Nelson (Washington, D.C.), Sam Stack (Washington, D.C.)_
 
---

## Why Python?

### Pros

- Stable (released in 1991)
- Has great data science ecosystem
- General-purpose
- Open-source
- Readable
- Dynamic typing enables fast initial development
- Has a (relatively) great learning curve

### Cons

- Slower than lower-level languages such as C
- Dynamic typing leads to bugs
- Difficult to integrate into JVM-based production systems
- Arguably not as nice as R for some core data analysis tasks

## JupyterLab

Before we get started, let's go over interacting with iPython in JupyterLab.

### Running Code in Jupyter

Code cells are run by pressing `shift + enter` or using the `Run` button in the toolbar.

In [1]:
# This is a cell.

In [2]:
# Assigning a variable:
v = 1

In [3]:
# If the last line in a cell returns a result, Jupyter prints it to an output field automatically.
v
v

1

In [4]:
# You can also use the standard Python `print` function within Jupyter.
print(v)
print(v)

1
1


### Running Terminal Commands in Jupyter

You can run terminal commands in Jupyter by starting a code cell with `!`.

Example:

In [5]:
!ls

solutions_control_flow.ipynb solutions_types.ipynb


### Jupyter Modes

Jupyter has an "edit mode" and a "command mode."

A white cell with cursor inside it indicates that you are in edit mode. In this mode, you can edit the text inside the cell. Press `esc` or `control + m` to switch to command mode.

A grayed-out cell with no cursor inside it indicates that you are in command mode. In this mode, you can use keyboard shortcuts to perform various commands. Press `enter` or click with the mouse inside the cell to switch to edit mode.

Here are some of the most useful commands accessible in command mode:

- A: create new cell above
- B: create new cell below
- D, D (press d twice): delete current cell
- Z: Undo cell deletion
- M: switch cell to Markdown
- Y: switch cell to Python

### Getting Out of Trouble

- Use the "Stop" button in the toolbar at the top of the notebook to interrupt code execution.
- If you get into a bad state and want to start your session over, press the circular arrow "Restart Kernel" button next to "Stop".

### Managing Notebooks

You can manage your notebooks in the "Running" tab on the far left of the Jupyter Lab interface. Be sure to shut down notebooks you are done with to avoid running out of memory!

`CTRL + c` in the terminal will shut down your entire Jupyter server.

### Understanding Jupyter

Jupyter uses a client-server architecture: the notebook that runs in your browser is the front-end **client**, which sends **requests** to the back-end **server**. The server receives a request, does some processing, and returns a **response**. The client notebook then displays that response to you.

In this case, the client and server processes are **logically separate** but are both running on **your local computer**. Even though you are working in a web browser, you are not connecting to remote machines through the internet. You are just talking to a different process on the same machine.

One of the advantages of the client-server design is that you **can** interact with remote Jupyter servers over the internet (e.g. on jupyter.org) in exactly the same way you interact with a Jupyter server that is running locally.

## Variables

Variables are names that have been assigned to specific objects.

Some variable names are better than others:

### Restrictions

- Cannot start with numbers (i.e., `2`, `10_data_points`).
- Cannot match names of Python keywords (i.e., '`for`', '`and`', '`elif`').
- Cannot contain spaces or periods.

### Best Practices
- Should be *descriptive* and *unambiguous*.
- Shorter is better all else being equal, but clarity comes first.

### Python Convention

Should be `snake_case` (all lowercase, with underscores between words).

## Types in Programming

The *type* of an object in a programming language tells the computer what operations are defined for that object and how they are defined. For example:

In [6]:
# Adding two ints works as you would expect
# /scrub/
1 + 1

2

In [7]:
# Adding two floats also works as you would expect. How is the result different this time?
# /scrub/
1.0 + 1.0

2.0

In [8]:
# For many container objects, including strings, `+` concatenates.
# /scrub/
'1' + '1'

'11'

In [9]:
# `+` with lists
# /scrub/
[1] + [1]

[1, 1]

In [10]:
# `+` is not defined for dicts
# /scrub/
{'a': 1} + {'b': 5}

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

**Note:** Learning to read error messages is an important skill to develop.

Start at the *bottom* to see what *type* of error was raised and a descriptive error message. These error messages will often seem very cryptic at first, but they will become clearer as you learn more about how Python works.

Just above that, you can see the line that triggered the error and a few lines above it. Sometimes the line you need to fix is actually just above the line that triggered the error -- for instance, if you missed a closing bracket on line 2, then you would not get an error until line 3 because Python doesn't know until line 3 that the appropriate closing bracket is not coming.

In [11]:
# The built-in function `len` returns the number of objects in many container types
# /scrub/
len('11')

2

In [12]:
# `len` is not defined for ints
# /scrub/
len(11)

TypeError: object of type 'int' has no len()

## Overview of Common Types in Python

### Single Elements

- **Integers:** Whole numbers ranging from negative infinity to infinity, such as 1, 0, -5, etc.
- **Floats:** Short for "floating point number;" usually used with decimals, such as 2.8 or 3.14159.

### Collections

- **Strings:** A sequence of characters, e.g., "The fox is quick." (We often think of strings as single elements, but we can slice them and get their lengths just like we can for tuples and lists.)
- **Tuples:** An ordered sequence with a fixed number of elements; e.g., in `x = (1, 2, 3)`, the parentheses makes it a tuple. `x = ("Kirk", "Picard", "Spock")` — once you've defined this, you can't change it.
- **Lists:** An ordered sequence without a fixed number of elements, e.g., `x = [1, 2, 3]`. Note the square brackets. `x = ["Lord", "of", "the", "Rings"]` — this can be changed as you like.
- **Dictionaries**: An unordered collection of key-value pairs, e.g., `x = {'Mark': 'Twain', 'Apples': 5}`. To retrieve each value (the part after each colon), use its key (the part before each colon). For example, `x['Apples']` retrieves the value `5`.

Strings, lists, tupes have an inherent order (the first element is at index `0`, the second element is at index `1`, etc.) and we can call each element by that ordinal number (such as `x[0]` or `x[100]`). Dictionaries do not have an order (so `x[0]` will fail), but they use the name of the key to return that element.

**Example**:
- A **sign-up list** is similar to a Python list or tuple: it supports referring to a person in terms of when he or she signed up (i.e., "the eighth person to sign up").
- An **address book** is more like a Python dictionary: it supports looking up a person by name (i.e., "the contact info for Bill Personson").

In [13]:
# Assigning a float:
# /scrub/
x = 1.0
type(x)

float

In [14]:
# Assigning an int:
# /scrub/
y = 1
type(y)

int

In [15]:
# Assigning a string:
# /scrub/
z = '1'
type(z)

str

**Recall:** `x = 1` does not *assert that* `x` is `1` or *check whether* `x` is `1`; instead, it *assigns* the value `1` to the variable `x`.

## Operators

### Arithmetic Operators

Python has special built-in symbols called *operators* for performing common computations.

In [16]:
# /scrub/

print(1 + 2)
print(1 - 2)
print(1 * 2)
print(1 / 2)

3
-1
2
0.5


There is also `//` division, whose output will be the rounded-down whole number.

> **Note:** Division works differently in Python 2: dividing two `int` objects with `/` always returns an `int`, so that in Python 2 e.g. `5 / 2 == 2`.

In [17]:
# /scrub/

print(3.0 // 2)
print(-3.0 // 2)

1.0
-2.0


Python uses `**` for exponents.

In [18]:
# /scrub/

2 ** 2

4

The modulo operator % gives the remainder from division

In [19]:
# /scrub/

5 % 2

1

### Booleans and Boolean Evaluation Operators

Booleans come in two values: `True` and `False`.

Booleans are frequently used to filter data or conditions. Sometimes, we may want all countries with populations greater than 4,000,000 or all people named Bob. Both of these result in a `True` or `False` condition that split our data into the groups we want.

In Python, there are several built-in commands for deciding how to filter results:

- `and`: Are both A and B true?
- `or`: Is at least one of A and B true?
- `not`: Is A false?

In [20]:
# /scrub/

4>3

True

In [21]:
# /scrub/

4>3 and 100>0

True

In [22]:
# /scrub/

4>3 and 2>3

False

In [23]:
# /scrub/

True and False

False

In [24]:
# /scrub/

4>3 or 2>3

True

In [25]:
# /scrub/

True or False

True

In [26]:
# /scrub/

not 5>4

False

In [27]:
# /scrub/

not False

True

**Exercise (1 min.)**

- What should the expression `(3>4 or 5<12) and 2>3` evaluate to? Figure out your answer, then confirm it by having Jupyter evaluate that expression.

/scrub/

False

In [28]:
# /scrub/

(3>4 or 5<12) and 2>3

False

$\blacksquare$

### Comparison Operators

- Less than: `<`
- Greater than: `>`
- Less than or equal to: `<=`
- Greater than or equal to: `>=`
- Equals: `==`
- Does not equal: `!=`

**Slack poll.**

```
/poll "Which of these expressions evaluates to `True`?" "`2 > 1`" "`2 < 1`" "`2 > 2`" "`2 < 2`" "`2 >= 2`" "`2 <= 2`" "`2 != 2`" anonymous
```

#### Comparisons for Collections

In general, collections are considered equal when they have the same length and corresponding elements are equal.

In [29]:
# /scrub/

print([1, 2] == [1, 2])
print([1., 2.] == [1, 2])
print([1, 2] == [2, 1])
print([1] == [1, 2])

True
True
False
False


## Deeper Dive on Common Python Types

### Strings

**Exercise (1 min.)**

- What are strings?

/scrub/

Strings are essentially any character sequence. They are most often used as a way of storing text. Strings are used frequently, because most of the data that humans create are text-based, such as restaurant reviews or emails.

$\blacksquare$

In [30]:
# Example
# /scrub/
s = "Hello world"
type(s)

str

In [31]:
# Find the length of the string
# /scrub/
len(s)

11

In [32]:
# Use single quotes to define a string
# /scrub/
s = 'Hello world'
type(s)

str

In [33]:
# Define a string that contains a single quote
# /scrub/
s2 = 'I\'m hungry'
print(s)
s2 = "I'm hungry"
print(s)

Hello world
Hello world


There is no general preference in the Python community between single quotes and double quotes, but there is a consensus to use double quotes when your string contains single quotes and vice versa to avoid the need to use the escape character `\`.

In [34]:
# Replacing an element of a string
# /scrub/
s2 = s.replace("world", "test")
print(s2)

Hello test


`.replace()` is an example of a **method** -- a piece of functionality that's built into all objects of a given type.

#### String Indexing

In some cases, we may want a part of the string (like the first character for alphabetizing or categorizing). Indexing helps us do that.

We can extract characters at specific index locations in a string using indexing.

In [35]:
# Indexing the first (index 0) character in the string:
# /scrub/
s[0]

'H'

The number you enter after the variable name in brackets (the `[0]`) is called the index (its plural is indices).

Counting in Python uses _zero-based indexing_, meaning that numbering starts with 0 instead of 1.

In [36]:
# This is called "slicing." We start at the left index 
#   and go up to but not include the right index.

# Objects at indexes 0, 1, and 2:
# /scrub/
s[0:3]

'Hel'

**Note:** If you are curious, [this article](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) describes some benefits of including the left index but not the right index and using zero-based indexing.

In [37]:
# From index 6 up to the end of the string:
# /scrub/
s[6:]

'world'

In [38]:
# No start or end specified:
# /scrub/
s[:]

'Hello world'

In [39]:
# Use negative numbers to index from the right side
# /scrub/
s[-1]

'd'

In addition to specifying a range, you can include a step size or character skip rate. This might be helpful if you want every other letter, for example. 

In [40]:
# Every second character starting at 0 and ending at 10
# /scrub/
s[0:10:2]

'Hlowr'

In [41]:
# Define a step size of 2; i.e., every other character
# /scrub/
s[::2]

'Hlowrd'

In [42]:
# The same, but for a list of numbers
# /scrub/
[0, 1, 2, 3, 4, 5, 6][::2]

[0, 2, 4, 6]

#### Concatenating and Interpolating

In [43]:
# Adding strings with `+` returns their concatenation
# /scrub/
x = 'Hello '
y = 'world'

x + y

'Hello world'

In [44]:
# Conversion from int to str is required!
# /scrub/
dice_roll = 3

'You rolled a ' + str(dice_roll) + '.'

'You rolled a 3.'

In [45]:
# Multiplying strings concatenates them the specified number of times
# /scrub/
x = 'Hello '
x * 5

'Hello Hello Hello Hello Hello '

In [46]:
# You can also use `f-strings` to insert the values of variables into strings
# (Python 3.6+ only)
# /scrub/
dice_roll = 3

f'You rolled a {dice_roll}.'

'You rolled a 3.'

**Note:** Python provides at least two additional ways to insert the values of variables into strings. `f-string`s are generally preferred to these other options as of Python 3.6 (which introduced `f-strings`), but you will still see these other approaches:
- [C-style string formatting](https://docs.python.org/2.4/lib/typesseq-strings.html)
- [.format method](https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3)

**Exercise (4 mins.)**

- Create your own string of at least 12 characters, including at least one vowel, and assign it to a variable called `my_string`.

In [47]:
# /scrub/
my_string = 'auldfhiuayiwq'

- Use a built-in function and comparison operator to make sure that it is at least 12 characters long.

In [48]:
# /scrub/
len(my_string) > 12

True

- Replace all instances of one of the vowel types that occurs in your string with the string `'vowel'`.

In [49]:
# /scrub/
my_string.replace('a', 'vowel')

'voweluldfhiuvowelyiwq'

- Use the addition operator to concatenate another string to your string.

In [50]:
# /scrub/
my_string + 'cottage'

'auldfhiuayiwqcottage'

- **BONUS:** Replace all vowels in your string with the string `'vowel'`. (This task is tricky, and it requires skills that have been covered in the prework but not in this lesson.)

*Note*: Simply using one `replace` call for each vowel won't work! (Why not?) Try creating a new, empty string and then appending to it appropriate as you iterate over your existing string.

/scrub/

A chain of replace methods (e.g. `my_string.replace('a', 'vowel').replace('e', 'vowel').replace('i', 'vowel').replace('o', 'vowel').replace('u', 'vowel')`) won't work because it replaces one of the vowels in "vowel" with the word "vowel."

In [51]:
# /scrub/
new_string = ''

for char in my_string:
    if char in 'aeiou':
        new_string += 'vowel'
    else:
        new_string += char

In [52]:
# /scrub/
new_string

'vowelvowelldfhvowelvowelvowelyvowelwq'

$\blacksquare$

### Lists

A **list** is a mutable sequence of Python objects, which can have any combination of types.

**Warning:** Do not use `list` as a variable name. `list` is a built-in Python function, so running e.g. `list = [1, 2, 3, 4]` overwrites that function. For generic examples, use a name like `alist` or `my_list` instead. For real code, use a descriptive variable name that indicates what the list represents (e.g. `model_accuracies` for a list of model accuracy scores).

In [53]:
# Example
# /scrub/
my_list = [1, 2, 3, 4]

In [54]:
# Change list contents
# /scrub/
my_list[1] = 999
my_list

[1, 999, 3, 4]

In [55]:
# List of strings:
# /scrub/
names = ['Carol', 'Anne', 'Jessica']
print(names)

['Carol', 'Anne', 'Jessica']


In [56]:
# Add a new item to a list
# /scrub/
names.append('Michelle')
names

['Carol', 'Anne', 'Jessica', 'Michelle']

**Exercise (1 min., post to Slack right away)**

- What kind of object is `append`? What other object of this kind have we seen in this netbook?

/scrub/

`append` is a **method**, like `replace` on strings.

$\blacksquare$

In [57]:
# slice `names` to get the names "Anne" and "Jessica"
# /scrub/
names[1:3]

['Anne', 'Jessica']

In [58]:
# slice `names` to get the names "Carol" and "Jessica"
# /scrub/
names[::2]

['Carol', 'Jessica']

In [59]:
# Lists can have mixed types
# /scrub/
[1, 'a', 1.0, ['hi']]

[1, 'a', 1.0, ['hi']]

In [60]:
# We can create a list of values in a range using the range() function.
# The order of arguments is `start`, `stop`, `step`.
# /scrub/
range(10, 30, 2)

range(10, 30, 2)

In [62]:
# range() produces a "range object," which is a special kind of generator.
# The main thing you need to know about generators for now is that you can
# "cast" them to lists
# /scrub/
list(range(10, 30, 2))

[10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

**Exercise (5 mins.)**

- Create a list of five elements.

In [63]:
# /scrub/
my_list = [1, 2, 'a', '3', 7]

- Print the last three elements.

In [64]:
# /scrub/
my_list[-3:]

['a', '3', 7]

- Insert two new elements at index 2 (one after the other)

In [65]:
# /scrub/
my_list.insert(2, 'oops')
my_list.insert(2, 'big oops')

- Add one element to the end.

In [66]:
# /scrub/
my_list.append('that is better')

- Take out one element of your choice.

In [67]:
# /scrub/
my_list.remove('oops')

- Print just the elements of your list that have odd-numbered indices.

In [68]:
# /scrub/
my_list[1::2]

[2, 'a', 7]

- **BONUS:** Create a range object that generates all of the numbers from 1 to 100, inclusive; then cast it to a list and slice the list to get every fifth number starting with 17 and ending with 82.

In [69]:
# /scrub/
list(range(1, 100))[16:82:5]

[17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82]

$\blacksquare$

### Dictionaries

A *list* stores values in an **ordered sequence** of cubbyholes that we access by **position**.

A *dictionary* stores values in an **unordered set** of cubbyholes that access by a name that we call a **key**.

In [70]:
# Example
# /scrub/
params = {'key1' : 1.0,
          'key2' : 2.0,
          'key3' : 3.0,}

In [71]:
# Retrieve the value for key2 in the params dictionary:
# /scrub/
params['key2']

2.0

In [72]:
# Add a new dictionary entry
# /scrub/
params['key4'] = 'D'

In [73]:
# Reassign the values of key-value pairs
# /scrub/
params['key1'] = 'A'
params['key2'] = 'B'

In [74]:
# Dictionaries also have methods.

# Convert a dictionary to a list of tuples (key-value pairs).
# /scrub/
list(params.items())

[('key1', 'A'), ('key2', 'B'), ('key3', 3.0), ('key4', 'D')]

### Tuples

Tuples are like lists, but they are **immutable**.

Mutability can be helfpful, but immutability has two big advantages:

1. **Safety:** Immutable objects don't let you create bugs by changing them and not realizing it. 
2. **Hashability:** Only immutable objects are "hashable," which means among other things that only immutable objects can serve as dictionary keys.

> **Note:** Roughly speaking, "hashing" an object means running it through a complicated **hash function** that generates an "address." For instance, Python hashes each dictionary key to generate an "address" where it stores the associated object. When you ask for the value associated with that key, Python simply runs that key through the hashing function and checks the returned location. This approach allows dictionary lookups to be almost instantaneous regardless of the dictionary size. **However, it would not work if dictionary keys were mutable,** because in that case we would not get the same "address" when we applied the hash function to the same object before and after a mutation.

In [75]:
# Example
# /scrub/
point = (10, 20)

In [76]:
# tuples can be sliced, just like lists and strings:
# /scrub/
point[0]

10

In [77]:
# you cannot append to a tuple -- why not?
# /scrub/
point.append(30)

AttributeError: 'tuple' object has no attribute 'append'

In [78]:
# It is often convenient to "unpack" a tuple or other container into multiple variables
# /scrub/
x, y = point

print(x)
print(y)

10
20


## For Reference: How Python Uses Various Sorts of Brackety Things

- `[]`: lists, indexing/lookup/slicing, list comprehensions.
- `()`: order of operations, tuples, function and method calls and definitions, generator expressions.
- `{}`: dictionaries and sets.