# Day One

In today's workshop, we'll learn how to combine data types into structures and how to use them for specific purposes. We will also cover looping and interacting with operating systems. Let's get started.

## Data Model

>Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects.

Every object in Python has a **type**, a **value**, and an **identity**. We've already seen several data types, such as `int`, `float`, and `str`. An object's type determines its supported operations as well as the possible values it can take.

In some cases, an object's value can change. We call these type of objects *mutable*. Objects whose values cannot be changed are known as *immutable*. The object type determines its mutability. Numbers and strings, for example, are immutable; lists and dictionaries, which we'll cover shortly, are mutable.

To make this concrete, let's describe what an object's identity is. This can be thought of as an object's address in memory. Specifically, it's the memory address for the *value* of the object. Once an object has been created, it's identity never changes.

In [1]:
x = 'hello'

In [2]:
hex(id(x))

'0x1097fc618'

The variable `x`'s identity or memory address is `0x1079ff458` (represented as a hexadecimal string). Note that the memory addresses will be different each time this code is run.

What happens if we create a new variable, `y`, and set it equal to `x`?

In [3]:
y = x

In [4]:
hex(id(y))

'0x1097fc618'

In [5]:
hex(id(x))

'0x1097fc618'

The address in memory is the same because both variables *point* to (or reference) the same *value*.

Now, let's make `x` take on some other value.

In [6]:
x = 'goodbye'

In [7]:
hex(id(x))

'0x1097fc6f8'

Now, the address *is* different.

Let's see what happens if we set `x` to equal `'hello'` once more.

In [8]:
x = 'hello'

In [9]:
hex(id(x))

'0x1097fc618'

`x` is once again pointing to the memory address associated with `'hello'`.

What does this have to do with mutability? It seems as though we were actually able to change `x`'s value. To answer this, we'll show an example using a mutable object&mdash;a list in this case.

In [10]:
a = [1, 2, 3]

In [11]:
hex(id(a))

'0x1097f8d48'

In [12]:
a.append(4)
a

[1, 2, 3, 4]

In [13]:
hex(id(a))

'0x1097f8d48'

Notice what happened. We added `4` to the list, but the memory address *did not* change. This is what is means to be mutable. The value in memory address `0x107f26608` was originally `[1, 2, 3]`, but is now `[1, 2, 3, 4]`. The address in memory for this object's value will never change.

In [14]:
a.append('#python')
a

[1, 2, 3, 4, '#python']

In [15]:
hex(id(a))

'0x1097f8d48'

Now let's see what happens when we assign our list `a` to a new variable `b`.

In [16]:
b = a

In [17]:
b

[1, 2, 3, 4, '#python']

In [18]:
hex(id(b))

'0x1097f8d48'

That makes sense. `a` and `b` both reference the same object&mdash;`[1, 2, 3, 4, '#python']`.

>Assignment statements in Python do not copy objects, they create bindings between a target and an object.

If we modify `b`, what will happen to `a`?

In [19]:
b[-1] = 'Python'

In [20]:
b

[1, 2, 3, 4, 'Python']

In [21]:
a

[1, 2, 3, 4, 'Python']

In [22]:
hex(id(a)) == hex(id(b))

True

The changes made to `b` have affected `a` because they both point to the same data. It's possible that this behavior is unwanted. As a solution, we can make a copy of the object so that modifying one does not affect the other. To do so, we can use the built-in `copy` module.

In [23]:
import copy

In [24]:
c = copy.copy(a)

This is referred to as making a *shallow* copy. While the values in `a` and `c` are the same, their respective memory addresses are different.

In [25]:
hex(id(a)) == hex(id(c))

False

A shallow copy creates a new container (a list in this case)&mdash;which is why the addresses in memory are different&mdash;with *references* to the *contents* of the original object.

In [26]:
hex(id(a[-1]))

'0x10786da08'

In [27]:
hex(id(c[-1]))

'0x10786da08'

The addresses in memory for the individual elements are the same for both lists. Because we've made a copy, though, we can now modify one list without affecting the other.

In [28]:
c[-1] = 'PYTHON'

In [29]:
c

[1, 2, 3, 4, 'PYTHON']

In [30]:
a

[1, 2, 3, 4, 'Python']

What if we were dealing with nested mutable? For this, we'll use a dictionary.

In [31]:
d0 = {'key' : {'nested' : 'thing'}}
d1 = copy.copy(d0)

In [32]:
d1

{'key': {'nested': 'thing'}}

In [33]:
d1['key']['nested'] = 'dict'

In [34]:
d0 == d1

True

In [35]:
d0

{'key': {'nested': 'dict'}}

Our intention was to change `d1`, but `d0` was also changed. This is because shallow copies reference contents&mdash;they don't copy them. For this, the `copy` module provides the `deepcopy()` function. Let's try that again.

In [36]:
d0 = {'key' : {'nested' : 'thing'}}
d1 = copy.deepcopy(d0)
d1['key']['nested'] = 'dict'

In [37]:
d0 == d1

False

In [38]:
d0

{'key': {'nested': 'thing'}}

In [39]:
d1

{'key': {'nested': 'dict'}}

Now that we've learned about mutability and copying objects, let's dive into data structures.

## Data Structures

A data structure can be thought of as a "container" for storing data that includes functions, called "methods," that are used to access and manipulate that data. Python has several built-in data structures.

### Basics

#### Lists

A list is a sequence of values. The values are called elements (or items) and can be of any type&mdash;integer, float, string, boolean, etc.

As a simple example, consider the following list.

In [40]:
[1, 2, 3]

[1, 2, 3]

Notice how the list was constructed. We used square brackets around the list elements.

Let's look at a few more examples.

In [41]:
[1.0, 8.0, 6.8]

[1.0, 8.0, 6.8]

In [42]:
['this', 'is', 'also', 'a', 'valid', 'list']

['this', 'is', 'also', 'a', 'valid', 'list']

In [43]:
[True, False, True]

[True, False, True]

It's also fine to have a list with different element types.

In [44]:
[1, 2.0, 'three']

[1, 2.0, 'three']

Lists can even be nested&mdash;which means you can have lists within lists.

In [45]:
[350, 'barrows', 'hall', ['berkeley', 'CA']]

[350, 'barrows', 'hall', ['berkeley', 'CA']]

This nesting can be arbitrarily deep, but it's not usually a good idea as it can get confusing. For example, it may be difficult to access specific items for an object like:

```python
[[[1, 2], [3, 4, [5, 6]]], [7, 8, 9]]
```

Speaking of accessing elements, let's describe how to do that. We'll first create a new list and assign it to a variable called `first_list`.

In [46]:
first_list = [9, 8, 7.0, 6, 5.4]

To access list elements, we use the square bracket notation. For example, if we're interested in the middle element&mdash;the "two-eth" element&mdash;we use the following.

In [47]:
first_list[2]

7.0

This is called indexing and the value inside of the brackets must be an integer. (Recall that indices in Python start at `0`.) A list can be thought of mapping (or correspondence) between indices and elements.

Let's say you're interested in the *last* element of this list. How could you do that? If you know the length of the list, you could access it using something like:

```python
first_list[len(first_list) - 1]
```

Why is the `-1` needed?

There is an easier way. Python provides negative indices that let you access elements from "back-to-front."

In [48]:
first_list[-1]

5.4

With this notation, the last element is accessed with `-1` (because `-0 == 0`). Use `-2` to access the second-to-last item, `-3` to access the third-to-last element, and so on.

We can also use the slice operator on lists to access multiple elements. The operator takes the following form: `[n:m]`. The first value before the colon (`:`) specifies the start position and the second value specifies the end position. The former is inclusive and the latter is exclusive. Let's take a look at what we mean.

To motivate this, let's label the indices of our list.

```
list:  [9, 8, 7.0, 6, 5.4]
index: [0, 1,   2, 3,   4]
```

The code we'll submit is: `first_list[0:2]`. This tells Python to include values associated with position 0, position 1, but **not** for position 2.

In [49]:
first_list[0:2]

[9, 8]

This is how Python has decided to make this operator work. This isn't intuitive, but thinking about it in the following way might help. If we consider the indices to be to the *left* of each item, we can think of the slice operator as accessing elements *between* those indices.

With lists, because they are mutable, we can modify elements.

In [50]:
first_list[-1] = 5.43

In [51]:
first_list

[9, 8, 7.0, 6, 5.43]

#### Dictionaries

A dictionary is a mapping from *keys* to *values*, where the keys, which must be unique, can be (almost) any type. A key and its associated value is referred to as a *key-value pair* or item. Dictionaries can be thought of as *unordered* key-value pairs.

There are several ways to construct a dictionary. We can use braces (`{}`) or the built-in `dict()` function.

In [52]:
{}

{}

In [53]:
dict()

{}

Of course, these are empty. Let's add comma separated key-value pairs to the first and use the assignment operator (`=`) for the second.

In [54]:
{'one' : 1, 'two' : 2}

{'one': 1, 'two': 2}

In [55]:
dict(one=1, two=2)

{'one': 1, 'two': 2}

Keys and values are themselves separated by colons.

Dictionaries are typically used for accessing values associated with keys. In the example above, we started to create a mapping between number words and their integer representations. Let's expand on this.

In [56]:
nums = {'one' : 1, 'two' : 2, 'three' : 3, 'four' : 4, 'five' : 5, 'six' : 6}

In [57]:
nums

{'five': 5, 'four': 4, 'one': 1, 'six': 6, 'three': 3, 'two': 2}

Notice that the key-value pairs are *not* in the order we specified when creating the dictionary. This isn't a problem, though, because we use the keys to look up the corresponding values. We do this using bracket notation, like we did with strings and lists.

In [58]:
nums['five']

5

If the key does not exist, you'll get an error.

In [59]:
nums['seven']

KeyError: 'seven'

We can add the value for 'seven' by doing the following:

In [60]:
nums['seven'] = 7

In [61]:
nums

{'five': 5, 'four': 4, 'one': 1, 'seven': 7, 'six': 6, 'three': 3, 'two': 2}

We mentioned earlier that keys can be of almost any type. Values *can* be of any type and we can also mix types.

In [62]:
mixed = {'one' : 1.0, 'UC Berkeley' : 'Cal', 350 : ['Barrows', 'Hall']}

In [63]:
mixed

{'UC Berkeley': 'Cal', 'one': 1.0, 350: ['Barrows', 'Hall']}

In this example, we used string and integer keys. We could have actually used any *immutable* objects.

Notice that we used a list as a value, which is valid. What if we tried using a list, which is mutable, as a key?

In [64]:
{['this'] : 'will not work'}

TypeError: unhashable type: 'list'

We get a `TypeError` saying that we can't use an unhashable type. What does this mean? In Python, dictionaries are implemented using hash tables. Hash tables use hash functions, which return integers given particular values (keys), to store and look up key-value pairs. For this to work, though, the keys have to be immutable, which means they can't be changed.

#### Tuples

A tuple is a sequence of values. The values, which are indexed by integers, can be of any type. This sounds a lot like lists, right?

>Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain an heterogeneous sequence of elements.... Lists are mutable, and their elements are usually homogeneous....

By convention, a tuple's comma-separated values are surrounded by parentheses.

In [65]:
(1, 2, 3)

(1, 2, 3)

Parentheses aren't necessary, though.

In [66]:
t = 1, 2, 3

In [67]:
type(t)

tuple

The commas are what define the tuple. In fact, any set of multiple comma-separated objects *without* identifying symbols, such as brackets for lists, default to tuples.

We can't create a tuple with a single element using the following syntax.

In [68]:
type((1))

int

We need to include a comma following the value.

In [69]:
type((1,))

tuple

The construction of `t`, above, is an example of *tuple packing*, where the values `1, 2, 3` are "packed" into a tuple.

We can also perform the opposite operation, called *sequence unpacking*.

In [70]:
a, b, c = t

In [71]:
print(a, b, c)

1 2 3


For this, the number of variables on the left must equal the number of elements in the sequence.

This can be used with functions. In Python, functions can only return a single value. However, that value can be a tuple. In this case, you are effectively returning multiple values.

Most list operators work on tuples. To access tuple elements, for example, we can use the bracket operator.

In [72]:
t = ('a', 'b', 'c', 'd')

In [73]:
t[0]

'a'

We can also use the slice operator.

In [74]:
t[1:3]

('b', 'c')

Because tuples are immutable, we cannot modify tuple elements.

In [75]:
t[0] = 'A'

TypeError: 'tuple' object does not support item assignment

However, we can create a new tuple using existing tuples.

In [76]:
t0 = 'A',
t1 = t[1:]

In [77]:
t0 + t1

('A', 'b', 'c', 'd')

#### Sets

A set is an unordered collection of unique elements. Because sets are unordered, they do not keep track of element position or order of insertion. As a result, sets do not support indexing or slicing.

>Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference.

To construct a set, we can use braces (`{}`) or the built-in `set()` function.

In [78]:
{3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3}

{1, 2, 3, 4, 5, 6, 7, 8, 9}

This returns the *unique* values passed in. In this case, the digits between 1-9, inclusive.

Let's say we had the following list of fruits.

In [79]:
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']

We can find the unique fruits by using the `set()` function.

In [80]:
set(basket)

{'apple', 'banana', 'orange', 'pear'}

Sets are useful for finding unique values and for performing mathematical operations like the ones previously mentioned.

In the following section, we'll explore several operators that the data structures covered above respond to.

### Operators

There are several operators supported in Python. They are:

* arithmetic
* comparison (relational)
* assignment
* logical
* bitwise
* membership
* identity

We've already covered some of these either directly or in passing. We'll discuss how some of these operate on the data structures we've learned about thus far.

#### Arithmetic

The arithmetic operators are the ones you're probably most familiar with. These include `+`, `-`, `*`, `/`, and `**` to name a few. Of course, not all of these work on all Python data types.

Previously, we saw how the `+` and `*` operators, which correspond to concatenation and repetition, operate on strings. It turns out that lists and tuples respond in similar ways.

In [81]:
[1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

In [82]:
(1, 2, 3) + (4, 5, 6)

(1, 2, 3, 4, 5, 6)

In [83]:
['Cal'] * 3

['Cal', 'Cal', 'Cal']

In [84]:
('D-Lab',) * 3

('D-Lab', 'D-Lab', 'D-Lab')

#### Comparison

These types of operators "compare the values on either sides of them and decide the relation among them."

In [85]:
[1, 2, 3] == [1, 2, 3]

True

In [86]:
[0, 2, 3] == [1, 2, 3]

False

>The comparison uses *lexicographical* ordering: first the first two items are compared, and **if they differ this determines the outcome of the comparison**; if they are equal, the next two items are compared, and so on, until either sequence is exhausted.

In [87]:
[0, 2, 3] < [1, 2, 3]

True

In the comparison above, because the `0` is less than the `1`, the result is `True`. Once this is determined, subsequent values are *not* compared. In the example below, the return value is `True` even though `20` is greater than `2`.

In [88]:
[0, 20, 30] < [1, 2, 3]

True

The behavior is the same with tuples.

In [89]:
(0, 20, 30) < (1, 2, 3)

True

In [90]:
(0, 1, 2) == (0, 1, 3)

False

Interestingly, the behavior is slightly different with sets. Consider the list and set comparisons below.

In [91]:
[0, 3, 4] < [1, 2, 9]

True

In [92]:
set([0, 3, 4]) < set([1, 2, 9])

False

With sets, the comparisons are made for every element in each corresponding sequence.

Comparisons can be made with dictionaries, too.

In [93]:
{'one' : 1} == {'one' : 1}

True

But we can only check for equality.

In [94]:
{'one' : 1} < {'one' : 1}

TypeError: unorderable types: dict() < dict()

#### Membership

These operators test for membership&mdash;that is, whether the particular item exists&mdash;in a sequence.

In [95]:
'D-Lab' in ['D-Lab', 'UC Berkeley']

True

In [96]:
1 in (0, 1, 2)

True

In [97]:
99 in {1868, 350, 102}

False

For dictionaries, membership is tested against the keys.

In [98]:
cities = {'Berkeley' : 'California',
          'Miami' : 'Florida',
          'New York' : 'New York',
          'Seattle' : 'Washington'}

In [99]:
'Berkeley' in cities

True

The other membership operator is `not in`.

In [100]:
99 not in {1868, 350, 102}

True

### High-Performance Container Datatypes

https://docs.python.org/2/library/collections.html

## Methods

Lists, unlike strings, are mutable. What that means is that their content can be changed.

List methods are....

Let's say we wanted to add an element to `first_list`. There are several ways to do this. One way is to use the `.append()` method.

In [101]:
first_list.append(3)

By default, `.append()` adds an element to the *end* of a given list.

In [102]:
first_list

[9, 8, 7.0, 6, 5.43, 3]

Notice how we invoked this method. We did not use an assignment operator (e.g., `x = x.append(y)`). This is because&mdash;and this is important&mdash;list methods are all void, which means that they *modify* lists and return `None`.

Sometimes when we're adding elements to a list, we may with to insert it in a given position. For this, we can use the `.insert()` method. It takes two arguments&mdash;the first is the *position* and the second is the *value*. Let's say we wanted to add an item to the front of the list. We could do it using:

In [103]:
first_list.insert(0, 10)

In [104]:
first_list

[10, 9, 8, 7.0, 6, 5.43, 3]

In this case, if our desired list is `[1, 2, 3, 4, 5, 6, 7]`, we would need to use the assignment operator, `=`.

## Control Flow

## Conditionals

## Input and Output

### `os`

### `glob`

### `subprocess`