# Introduction to Python - Lecture 04 (15 October 2018)
## Story so far...
+ primitive python objects / data-types and operations (numbers, strings)
+ logical / boolean operators
+ variables and variable naming conventions (pep8, reserved keywords)
+ expressions and simple statements (assignment)
+ misc (user input, comments, mutability, terminology; git: branching etc.; atom, jupyter lab)
+ Compound statements: if/else conditionals
<br />


# Sequential Types
+ Introduce composite objects types (data structures) -> way to organize data for processing
    + lists  
    + Dictionaries
    + Sets
    + Tuples
<br />
---

# Using what we have learnt so far how many variables can we store?

# Sequential Data Types

It is often necessary to group information together. This is the function of sequential data types.
The primary difference between different sequential types is how data is accessed and stored.

## Strings

+ Strings are sequences of characters.
+ Each character in a string is assigned a specific index.
  + The first index is always **0**
  + This always corresponds to the leftmost character
  + Each subsequent character will have an index one greater than the previous index.
 
Eg:


String: 'ABCDEF'

|String|A|B|C|D|E|F|
|------|-|-|-|-|-|-|
|Index |0|1|2|3|4|5|

Each character can be accessed using the relavant index placed in square brackets [] after the string.

```python
'ABCDEF'[index]
```

Accessing a subset of the string is similar.

Instead of using a single index within the square brackets a start and end index are provided.

```python
'ABCDEF'[start_index:end_index]
```

The character at the end index is not included.

Leaving either the start or end blank will result in the first or last index being used repectively.

```python
'ABCDEF'[start:]
'ABCDEF'[:end]
```

#### Practice

In [None]:
# Print B

sequence = 'ABCDEF'
subseq = sequence[]
print(subseq)

In [None]:
# Print E

sequence = 'ABCDEF'
subseq = sequence[]
print(subseq)

In [None]:
# Print CD

sequence = 'ABCDEF'
subseq = sequence[]
print(subseq)

In [None]:
# Print ABC

sequence = 'ABCDEF'
subseq = sequence[]
print(subseq)

In [None]:
# Print DEF

sequence = 'ABCDEF'
subseq = sequence[]
print(subseq)

# Lists

+ Another sequence data type (like strings), that stores sequence of objects. For ex.
    ```python
    [1, 2, 3, 4, 5]
    ```
+ More generic - elements / items / components can be of **any type**, including **mixed**.  
    ```python
    [1, 2, 'a', [3, 4]]
    ```
+ Some examples from real world:
    - List of employees in a company
    - List of genes associated with a disease
    - List of book recommendations for a user
    - List of items in an order basket
    - List of all citi-bike stations in the city [link](https://gbfs.citibikenyc.com/gbfs/en/station_information.json)
<br />
+ **Key characteristics**
    - Elements have position and order (**ordered collection**)
    - Elements can be heterogeneous (**arbitrarily typed**)  
    - Lists can expand or contract dynamically
    - can be single- or multi-dimensional
<br />

# Common List operations:
    - Create
    - Access elements or chunks
    - Modify elements or chunks
    - Check membership of an element
    - Find position / index of a specific element
    - Traverse through the list and do something
    - Make it bigger / smaller (add and remove elements)
    - Sort / reverse
    - ...  

```python
help(list)
help(list.index)
```

+ Some generic operations
    - len(x), sum(x), max(x), etc.  
<br />
+ User-defined operations (will be covered later)

In [None]:
help(list.pop)

# Lists: Create


```python
x = [1,2,3,4,5]		    # direct assignment
y = [1, 'a', [1,2,3]]
z = []                     # creates an empty list

print(type(x), x)
print(type(y), y)

# build it incrementally (see below)

# More advanced: List comprehensions (chk out for a potential lightning talk...)
```

# Lists: Access and Modify
+ All sequences (lists, strings, ...) support two basic access operations:
    - Indexing
    - Slicing
```python
vowels = ['a', 'e', 'i', 'o', 'u']
print(vowels[0]) # indexing starts with '0'
print(vowels[1])
print(vowels[-1]) # negative indices go backwards
print(vowels[10]) # out of range raises IndexError: You're responsible to respect list length
print(vowels[1:3]) # slicing syntax: [start_idx : stop_idx[ : step_size]]; excludes stop_idx; 
print(vowels[::2])  # step_size is optional
print(vowels[::-1])  # what does this do?
```

+ Indices are like mappings


# Lists are mutable (unlike strings)

```python
vowels = ['a', 'e', 'i', 'o', 'u']
print(id(vowels), vowels)
vowels[0] = 'A'
vowels[1:3] = ['E', 'I']    # slice reassignment
print(id(vowels), vowels)
```

# Lists: Membership
+ <font color='blue'>**in**</font> operator, similar to string type

```python
x = [1,2,3,4,5]
print(1 in x)      # boolean expression: evaluates to True/False
print(10 not in x) 
```

# Lists: index of a specific element
```python
x = ['a', 'b', 'c', 'd', 'e']
print(x.index('b'))
print(x.index('d'))          # ValueError exception
```

# Lists: Add / remove elements

```python
x = [1,2,3,4,5]
x.append(10)
print(x)
x.pop()
print(x)
x.pop(2)
print(x)
x.extend([11,12,13,14,15])     # or x + [11,12,13,14,15]
print(x)
```

## In-place operations

As lists are mutable, some functions will change the original list and others will return a new list.

```python
x = [5, 5, 2, 6, 1, 9, 8, 3]
print(id(x), x)
y = sorted(x)
x.sort()
print('---SORTED---')
print(id(x), x)
print(id(y), y)
```

```python
help(x.reverse)
```

In [None]:
help(x.sort)

# Tuples

Tuples and lists are very similar:
  + Both store values
  + Both can hold different data types
  + Both are accessed in the same way
  
The main difference between them is that tuples are immutable.

This means that once a tuple is created it cannot have any additional values added to it.

There are performance benefits to this:
  + Lists require more memory than they use
    + This allows for adding new elements without rebuilding the list (when enough elements are added the list does need to be rebuilt)
  + Tuples are a fixed size, so the memory requirement is known in advance
  
Another use for tuples is as keys for dictionaries which we will discuss next.
  + This is due to tuples being immutable
  
### Defining a tuple
Where lists are defined using \[ \] tuples are defined using ()
```python
t = (1, 2, 3)
```
Alternatively than can be defined using the tuple construct
```python
t = tuple(1, 2, 3)
```


### Attempting to assign a value to a tuple will result in a TypeError

```python
t = (1, 2, 3)
print(t[0])
t[0] = 2
```


<generator object <genexpr> at 0x7f95103d9570>

### Tuples allow for mixed types

```python
t = (1, 'a', (2, 3.0))
```

---
# Dictionaries (a.k.a. HashMap/HashTable)

+ Consist of a **set of mappings** between _**<font color='blue'>unique</font>**_ **keys** and their **values**.

#### Basic syntax:
<font color='magenta'>**\{**</font> **key1**: value1, **key2**: value2, ...  <font color='magenta'>**}**</font>
   
```python
# Example:
genetic_code = {'uuu': 'phe', 'uua': 'leu', 'aug': 'met', 'uaa': 'stop'}
```

**Comparison with Lists**
+ Lists are ordered: the order in which elements are added is the order in which they are stored
    + Access by position/index
        + Ex. letters = ['a', 'b', 'c', 'd', 'e', 'f']
        + letters[0] is 'a' etc.
+ Dictionaries are unordered
    + Access by key
        + Ex. dict_ = {'key1': 'a', 'key2': 'b', 'key3': 'c'}
        + dict_['key1'] is 'a'
        + {'key1': 'a'}

The association between a **key** and a **value** is often refered to as a **key**-**value** pair or an **item**.


#### Keys

+ must be immutable (string, integer, float, tuple)
+ must be unique


#### Values
+ Can be of any type, mutable or immutable, simple or composite (arbitrarily complex, heterogeneous)
    + primitives (character: 'a', integer: 0, float: 3.4)
    + sequential Types (string: 'asd', list: [0, 1, 2], another dictionary: {'key':'value'}, tuple: (0, 1, 2)
    + user Defined Types (discussed later) (functions, classes, objects etc.)


### Some Real World Examples

+ {**&lt;gene_id&gt;**: **&lt;**gene sequence**&gt;**, ...}
+ {**&lt;email&gt;**: **&lt;**user data**&gt;**, ...}
+ {**&lt;soc security&gt;**: **&lt;**individual**&gt;**, ...}
+ {**&lt;emp id&gt;**: **&lt;**emp data**&gt;**, ...}

#### Lookup Table

```python
elements = {'H': 'hydrogen',   'He': 'helium', 
            'Li': 'lithium',  'C': 'carbon', 
            'O': 'oxygen',  'N': 'nitrogen'}
complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
print('H', '->', elements['H'])
print('A', '->', complement['A'])
```

In [5]:
complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
complement['Z']

KeyError: 'Z'

#### Database Records
```python
person = {'name': 'John', 
          'surname': 'Grisham', 
          'contact': 
              {
              'phone': {'office': '123-456-7890',
                        'cell': '456-789-0123'
                       },
              'email': ['johnny@gmail.com', 'john.grisham@writers.com']
              }
          }
print(person['name'])
print(person['contact'])
print(person['contact']['phone'])
print(person['contact']['email'])
```

In [8]:
person = {'name': 'John', 
          'surname': 'Grisham', 
          'contact': 
              {
              'phone': {'office': '123-456-7890',
                        'cell': '456-789-0123'
                       },
              'email': ['johnny@gmail.com', 'john.grisham@writers.com']
              }
          }
# print(person['name'])
# print(person['contact'])
print(person['contact']['phone'])
# print(person['contact']['email'])

{'office': '123-456-7890', 'cell': '456-789-0123'}


## Operations

```python
help(dict)
```

+ Create
+ Access keys, values or (key, value) pairs / items
+ Modify items
+ Check membership of a key
+ Traverse through the dictionary and do something
+ Make it bigger / smaller (add and remove items)
+ …


In [9]:
help(dict)

Help on class dict in module builtins:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      True if the dictionary has the specified key, else False.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __init__(self,

## Creating a dictionary
Several ways to create a dictionary

```python
dict_x = {'a': 1, 'b': 2}      # initialize by assignment
dict_y = dict(a=1, b=2)        # use dict built-in function
print(dict_x, dict_y)
print("The value for key '{}' is {}".format('a', dict_x['a']))
```
+ **keys** = 'a', 'b'
+ **values** = 1, 2
+ **items** = ('a', 1), ('b', 2)

+ Access by key:
```python
print("The value for key '{}' is {}".format('a', dict_x['a']))
```

#### Dictionaries can also be built incrementally - see example later

In [10]:
dict_x = {'a': 1, 'b': 2}      # initialize by assignment
dict_y = dict(a=1, b=2)        # use dict built-in function
print(dict_x, dict_y)
print("The value for key '{}' is {}".format('a', dict_x['a']))


{'a': 1, 'b': 2} {'a': 1, 'b': 2}
The value for key 'a' is 1


# Iterating over a dictionary

### Pattern 1: &lt;dict&gt;.keys()
<font color='blue'>**Note:**</font> &lt;dict&gt; is a placeholder for a dictionary object

```python
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_keys lazy obj: ', my_dict.keys())                       # lazy object
print('dict_keys unpacked: ', list(my_dict.keys()))                 # forceful typecast
print('Inside for loop:')
for key in my_dict.keys():               # for loop unpacks the lazy object internally
    print('key: ', key)
```

In [14]:
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_keys lazy obj: ', my_dict.keys())                       # lazy object
print('dict_keys unpacked: ', list(my_dict.keys()))                 # forceful 
print('Inside for loop:')
for key in my_dict.keys():               # for loop unpacks the lazy object internally
    print('key: ', key)

dict_keys lazy obj:  dict_keys([0, 1, 2, 3])
dict_keys unpacked:  [0, 1, 2, 3]
Inside for loop:
key:  0
key:  1
key:  2
key:  3


### Pattern 2: &lt;dict&gt;.values()


```python
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_values lazy obj: ', my_dict.values())                    # lazy object
print('dict_values unpacked: ', list(my_dict.values()))
print('Inside for loop')
for value in my_dict.values():
    print('value: ', value)
```

In [15]:
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_values lazy obj: ', my_dict.values())                    # lazy object
print('dict_values unpacked: ', list(my_dict.values()))
print('Inside for loop')
for value in my_dict.values():
    print('value: ', value)

dict_values lazy obj:  dict_values(['a', 'b', 'c', 'd'])
dict_values unpacked:  ['a', 'b', 'c', 'd']
Inside for loop
value:  a
value:  b
value:  c
value:  d


### Pattern 3: &lt;dict&gt;.items()

```python
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_items lazy obj: ', my_dict.items())                     # lazy object
print('dict_items unpacked: ', list(my_dict.items()))
print('Inside for loop: ')
for item in my_dict.items():
    print(item)
    # print('item: {}, key: {}, value: {}'.format(item, item[0], item[1]))
```

In [16]:
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_items lazy obj: ', my_dict.items())                     # lazy object
print('dict_items unpacked: ', list(my_dict.items()))
print('Inside for loop: ')
for item in my_dict.items():
    print(item)

dict_items lazy obj:  dict_items([(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')])
dict_items unpacked:  [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]
Inside for loop: 
(0, 'a')
(1, 'b')
(2, 'c')
(3, 'd')


#### <font color='blue' size=3>Sidebar: List/Tuple unpacking</font> 
If there are the same number of variables as elements in a sequence, python will assign each element to a variable
```python
val1, val2 = [1, 2]
print(val1, val2)
```
If there are more or less elements, python will throw a ValueError
```python
val1, val2 = [1]
val1, val2 = [1, 2, 3]
```

In [19]:
val1, val2 = [1, 2, 3]
print(val1, val2)

ValueError: too many values to unpack (expected 2)

### Pattern 4: Item split into key/value
```python
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_items lazy obj:', my_dict.items())
print('Inside for loop:')
for key, value in my_dict.items():
    print("key: {}, value: {}".format(key, value))
```

In [20]:
my_dict = {0: 'a', 1:'b', 2: 'c', 3: 'd'}
print('dict_items lazy obj:', my_dict.items())
print('Inside for loop:')
for key, value in my_dict.items():
    print("key: {}, value: {}".format(key, value))

dict_items lazy obj: dict_items([(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')])
Inside for loop:
key: 0, value: a
key: 1, value: b
key: 2, value: c
key: 3, value: d


### Membership - check if a key exists in a dictionary

```python
some_dict = {'a': 0, 'b': 1, 'c': 2}
print("our dict: ", some_dict)
print("a in our dict: ", 'a' in some_dict)
```

In [22]:
some_dict = {'a': 0, 'b': 1, 'c': 2}
print("our dict: ", some_dict)
print("a in our dict: ", 'z' in some_dict)

our dict:  {'a': 0, 'b': 1, 'c': 2}
a in our dict:  False


In [23]:
for a in some_dict:
    print(a)

a
b
c



### Modifications

+ Changing the value for a key

```python
some_dict['a'] = 10
print("our dict (now): ", some_dict)
```

+ Adding individual key-value pair to a dictionary

```python
some_dict['d'] = 3
print("our dict (now): ", some_dict)
```

+ Updating a dictionary with another dictionary (updates existing values; adds new key-value pairs)
```python
some_other_dict = {'a': 99, 'e': 999}
some_dict.update(some_other_dict)           # This is an in-place operation (check out help(dict.update))
print("our dict (now): ", some_dict)
```

In [26]:
print(some_dict)
some_dict['d'] = 3
print(some_dict)

{'a': 10, 'b': 1, 'c': 2}
{'a': 10, 'b': 1, 'c': 2, 'd': 3}


In [28]:
# help(some_dict.update)
print(some_dict)
some_other_dict = {'a': 99, 'e': 999}
some_dict.update(some_other_dict)
print(some_dict)

{'a': 10, 'b': 1, 'c': 2, 'd': 3}
{'a': 99, 'b': 1, 'c': 2, 'd': 3, 'e': 999}


### Extra - pretty Printing

+ Complicated dictionaries do not print nicely.
+ pprint is a library that prints dictionaries in a more structured manner
    + external library that needs to be imported
    + it comes standard with python installation
+ If you want to configure the output, create a pretty printer object first before using it (ow default config is used)

```python
import pprint
dict_ = {'name': 'Joe', 'Surname': 'van Niekerk', 'email': 'jvn@c.m', 
        'friends': [{'name': 'Sally'}, {'name': 'Dave'}, {'name': 'Rick'}, {'name': 'James'}]}
print('\n' +'-'*50)
print("No pretty printing")
print('-'*50)
print(dict_)
print('\n' + '-'*50)
print("Default pretty printing")
print('-'*50)
pprint.pprint(dict_)
print('\n' + '-'*50)
print("Custom pretty printing")
print('-'*50)
pp = pprint.PrettyPrinter(indent=4)   # create a pprint object with desired attributes (more on this later)
pp.pprint(dict_)
```

In [33]:
import pprint
dict_ = {'name': 'Joe', 'Surname': 'van Niekerk', 'email': 'jvn@c.m', 
        'friends': [{'name': 'Sally'}, {'name': 'Dave'}, {'name': 'Rick'}, {'name': 'James'}]}
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(dict_)
# dict_

{   'Surname': 'van Niekerk',
    'email': 'jvn@c.m',
    'friends': [   {'name': 'Sally'},
                   {'name': 'Dave'},
                   {'name': 'Rick'},
                   {'name': 'James'}],
    'name': 'Joe'}
