# Module 2: Python

## Module 2.1: Python I

### What is Programming?

#### Definitions:

A `program` is a sequence of instructions that specifies how to perform a computation.

An `algorithm` is a step-by-step list of instructions that, if followed exactly, will solve the problem under consideration.

A `programming language` is a formal language that specifies a set of instructions that can be used to produce various kinds of output - how to speak to the computer and give it directions.

A `statement` is a unit of execution, often represented by one line of code. Statements are executed one by one.

A `control flow` indicates how statements are executed, such as conditional statements, loops, and functions.

#### Why Python?

##### Advantages/Strengths

While python is **easy to learn**, it is also very **powerful**.

It has a **efficient, `high-level data structure`** and a **simple but effective** approach to `object oriented programming`.

It is being used for rapid application development in many areas on most platforms:
- most areas of programming, including web development, system administration (scripting), and network programming
- particularly used for introductory programming courses, data analysis, bioinformatics, and machine learning

##### Disadvantages/Limitations

`Dynamic typing` can be powerful, but problematic.

`Whitespace` defines blocks, not brackets

**Version issues** mean Python 3 is not compatible with Python 2
- we will use Python 3 in the class

|          | Python 2                | Python 3               |
|----------|-------------------------|------------------------|
| print    | >>>print 'abc' abc      | print ('abc') abc      |
| division | >>> 3/2 1 >>> 3/2.0 1.5 | >>> 3/2 1.5 >>> 3//2 1 |
| string   | Latin characters        | Unicode characters     |

### Basic I/O, Variables, Types

`Standard input` and output, also known as `standard streams`, are preconnected input and output when a program is executed.

Input: keyboard input; output: console display

#### Basic Functions

##### `print()`

`print()` generates a standard output from the passed argument that prints to the console

In [152]:
print("Hello, World!")

Hello, World!


##### `input()`

`input` prints its argument to the console and waits for user input.

Returns the user input as a **string**.

In [153]:
#print(input("What do you want to say?"))

#### `range()`

`range()` takes up to three arguments: start, stop, step. All must be integers. It returns a range object which contains a list of numbers.

**start**: the first number in the list (default is 0)\
**stop**: the "up to but not including" number in the list\
**step**: the interval or "step" between the numbers in the list, must be positive (default is 1)

In [154]:
print(range(0,5))
print([i for i in range(0,5)])
print(range(4))
print([i for i in range(4)])

print(range(-10,9,2))
print([i for i in range(-10,9,2)])
print(range(-10,10,-2))
print([i for i in range(-10,10,-2)])

range(0, 5)
[0, 1, 2, 3, 4]
range(0, 4)
[0, 1, 2, 3]
range(-10, 9, 2)
[-10, -8, -6, -4, -2, 0, 2, 4, 6, 8]
range(-10, 10, -2)
[]


#### Variables

A `variable` can store a `value` by assignment.

Python reads from left to right.

You can store many types of values in a variable.

Values can be reassigned, and are not bound to a data type.

##### Examples

- `text = "This is text"`
    - stores "This is text" within the variable `text`
- `x = 12`
    - stores `12` in the variable `x`
- `x = 5`
    - stores `5` in the variable `x` - even if we had previously assigned `x = 12`, we are now overwriting it, so now `x =! 12`
- `x = 'cheese'`
    -stores 'cheese' in the variable `x` - even if we previously had assigned `x=5`, we are overwriting it now

In [155]:
text = "this is text"
print(text)
x = 12
print (x)
x = 5
print(x)
x = "cheese"
print(x)

this is text
12
5
cheese


##### Variable Names:

- can be arbitrarily long
- may contain both letters and digits
    - **MUST** always start with a letter
- **Case-sensitive**
- Can contain underscores (_)
- **CANNOT** contain spaces or special characters
- **CANNOT** be a `keyword`
    - and, as, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, nonlocal, not, or, pass, raise, return, try, while, with, yield, True, False, None

Notes from PEP 8.0:
- most variables should be in snake_case, where words are lower-case and separated with underscores
- *DO NOT* use l, O, I as single-character variable names because they may be indistinguishable from numerals
- mixedCase or CamelCase is only used where it is already the prevailing style to retain backwards compatibility
- Constants are in ALL_CAPS with underscores between words
- TypeVariables are in CamelCase, preferrably with short names and abbreviations, with _co or _contra suffixes to declare covariant or contravariant behavior repectively
- Classes should be in CamelCase, preferrably with short names as possible

#### Data Types

**Data types** describe the form in which we store or use data.

| Data Type                       | Example | Python  |
|---------------------------------|---------|---------|
| Integer                         | 1       | int     |
| Float*                           | 1.00    | float   |
| Complex Numbers**                | 1+2j    | complex |
| Strings                         | "Hello" | str     |
| Boolean Values                  | True    | bool    |
| User-Defined Types (or classes) |         |         |


*While real numbers may be infinitely complex, there is only a limited (though vast) complexity that can be represented in the computer system - this limitation causes these real numbers to be truncated to `floats`.

**Complex numbers contain both real and imaginary components. By default, Python stores both components as floats


##### Strings

Strings can be encased in:
- single quotes `'`
- double quotes `"`
- triples of single quotes `'''`
- triples of double quotes `"""`

A single-quoted string may contain double-quotes inside, and vice versa.

A triple-quoted string can **span multiple lines**

##### Converting Data Types

- `int()` converts into an **integer**
    - **rounding** does not always follow an 'intuitive' pathway!
- `float()` converts into a **float**
- `str()` converts into a **string**

In [156]:
x = 3.9999
y = -3.9999
z = "12"

print(int(x))
print(int(y))
print(int(z))

print(float(x))
print(float(y))
print(float(z))

print(str(x))
print(str(y))
print(str(z))

3
-3
12
3.9999
-3.9999
12.0
3.9999
-3.9999
12


### Control Flows

##### Whitespace

`Whitespace` is **very meaningful** in Python, **especially indendation and newlines**.

- `\` indicates a newline, which says to go to the next line
- `Consistent indentation` is used rather than braces or any other kind of bracket, resulting in visually nested code
- A **colon** (`:`) defines the start of a new block in many constructs

#### Conditionals and Booleans

`Conditionals` and `booleans` act as basic **logic gates** dictating `control flow` in Python.

A `boolean` either evaluates to `TRUE` or `FALSE`

A common `conditional` is an `if statement`

##### Booleans

A `boolean` is a simple evaluation if a statement is true or not, as a result, it only returns `TRUE` or `FALSE`.

Things that are always `FALSE`:
- a boolean value of `FALSE`
- numbers 0 (int), 0.0 (float), and 0j (complex)
- an empty string ("")
- an empty list []
- an empty dictionary {}
- an empty set ()

Things that are always `TRUE`:
- a boolean value of `TRUE`
- all non-zero numbers
- any string containing at least one character
- a non-empty data structure

In [157]:
x=12
y=3
z=0
test=""

print(bool(x==12))
print(bool(x==y))
print(bool(x<y))

print(bool(x))
print(bool(z))
print(bool(test))


True
False
False
True
False
False


You can also combine boolean expressions, so long as parentheses are used to disambiguate the expression.

| True if...                 | A     | B     | Combined    |
|----------------------------|-------|-------|-------------|
| if A is True and B is True | True  | True  | (A) and (B) |
| if A is True or B is True  | True  | False | (A) or (B)  |
|                            | False | True  |             |
|                            | True  | True  |             |
| if a is False              | False | -     | not (A)     |

A `range test` measures if a value is within a set range, and returns `True` or `False`

In [158]:
Time = 4
if (3 <= Time <= 5):
    print ("Office Hour")
else:
    print("NO")
    
Time = 2
if (3 <= Time <= 5):
    print ("Office Hour")
else:
    print ("NO")

Office Hour
NO


##### `if` Statements

Use the basic conditional **logic gates** to determine which statements are reached by the program.

Each `if` statement is actually a `boolean`.

In [159]:
x = 1

if x == 3:
    print ("X equals 3.")
elif x == 2:
    print("X equals 2.")
else:
        print( "X equals something else")
print("This is outside the if statement")

X equals something else
This is outside the if statement


##### Operators

| Operator | Corresponds to           | Example | Evaluation |
|----------|--------------------------|---------|------------|
| +        | Addition                 | 1+1     | 2          |
| -        | Subtraction              | 4-1     | 3          |
| *        | Multiplication           | 2 * 3   | 6          |
| /        | Division                 | 7/2     | 3.5        |
| //       | Integer (Floor) Division | 7//2    | 3          |
| %        | Remainder (Modulo)       | 9%2     | 1          |
| **       | Exponent                 | 4 ** 3  | 64         |

Order of Operations:
1. Parentheses
2. Exponentiation
3. Multiplcation and Division are equal (right to left)
4. Addition and Subtraction are equal (right to left)
5. Operators with the *same* precedence are evaluated from left-to-right

Basically, PEMDAS from algebra

##### Comparison Operators

| Operator | Meaning                  | Example |
|----------|--------------------------|---------|
| ==       | equal to*                | 1 == 1  |
| !=       | not equal to             | 2 != 3  |
| <        | less than                | 2 < 3   |
| >        | greater than             | 5 > 2   |
| <=       | less than or equal to    | 2 <= 5  |
| >=       | greater than or equal to | 5 >= 3  |

*"Equal to" is in contrast to `is`. When two variables X and Y have the **same value** `X==Y` is `TRUE`. However, for `X is Y` to evaluate to `TRUE`, both X and Y must refer to the **identical same object**. 

#### Loops

##### For Loop

Between `for` and `while` loops, a `for` loop is simpler.


A `for` loop:
- Repeats for each item in a given sequence
- Typically used when we know exactly how many times we need something to repeat

**Can often be coupled with the `range()` function**

**Notation:** \
`for item in collection:`\
`.    statements/code`

In [160]:
string = "testing"

for letter in string:
    print(letter)

t
e
s
t
i
n
g


`for` loops can be quite powerful when combined with `range()` and `len()`, especially when you want the index of a sequence data structure.

In [161]:
t=[5,23,8,10]
for i in range(len(t)):
    print(f"The original value at index {i} is {t[i]}")
    t[i] = t[i]**(i+1)
print(t)

The original value at index 0 is 5
The original value at index 1 is 23
The original value at index 2 is 8
The original value at index 3 is 10
[5, 529, 512, 10000]


##### While Loop

- Repeats while a given condition holds
- Typically used when we do **not** know how many times we need something to repeat
- Much more prone to accidentally making infinite loops

In [162]:
x = 3
while x < 5:
    print (x, "still in the loop")
    x=x+1

3 still in the loop
4 still in the loop


#### Loop Control Flows

`else`: used to specify an else-clause to be executed at the end of a loop

`break`: quits the inner-most loop, skipping any else-clause

In [163]:
x=2
while True:
    #would generate an infinite loop without iteration
    if x < 0:
        print("breaking the loop!")
        break
    else:
        print(x)
        x -= 1

2
1
0
breaking the loop!


`continue`: continues the next cycle of the loop

Typically part of a `continue while`, which establishes a `while` condition...

or a `continue for`, which assigns the next item in sequence to execute the dependent code block

`pass`: executes nothing, typically used as a placeholder

In [164]:
x = 246
for i in range(1,10):
    if x % i == 0:
        pass
    else:
        print(f"{x} is not evenly divisible by {i}")

246 is not evenly divisible by 4
246 is not evenly divisible by 5
246 is not evenly divisible by 7
246 is not evenly divisible by 8
246 is not evenly divisible by 9


## Module 2.2: Python II

**Data Structures** are the methods by which we organize and store data. It typically involves more than one single point of data.

### Sequence Data Structures

Sequence data structures store multiple objects in a particular order.

Includes **mutable** structures like lists (`list`), and also **immutable** structures like tuples(`tuple`) and strings (`str`).

**Mutable** structures allow you to change the contents, such as a single element, of the structure. **Immutable** structures require you to create a new structure to make any changes.

They all:
- represent *finite* ordered sets
- support access by *index*
- provide the ability to take a *slice* (subsequence)
- share many *operations*, such as concatenation

##### Lists (`list`)

A **mutable** sequence of data, typically used to store a collection of *homogenous* items. but items in the sequence are **not restricted** to the same data type.

**Notation**: `[]`

**Elements**:
- surrounded by square brackets and separated by commas
- do not have to exist - a list can be empty
- may be *any* Python **object or data type**, including other lists
- do not have to be the same data type
- are associated with a specific position in the sequence, such that an item can be located with its **index**

In [165]:
test_list_1 = [1,24,76]
test_list_2 = ["red", 98, ["cheese", 67], "blue"]
test_list_3 = []

print(test_list_1)
print(test_list_2)
print(test_list_3)


[1, 24, 76]
['red', 98, ['cheese', 67], 'blue']
[]


Lists have a built in method called `.sort()`, which modifies the contents of the list by changing their order. 

It arranges strings alphabetically, and numbers from most-negative to most-positive.

In [166]:
friends_list = ["Elizabeth", "Anabelle", "Mary-Anne", "George", "Kevin"]
number_list = [10,8,-2,7,0,8]
print(friends_list)
print(number_list)

friends_list.sort()
number_list.sort()

print(friends_list)
print(number_list)


number_list = [10,8,-2,7,0,8]
friends_list = ["Elizabeth", "Anabelle", "Mary-Anne", "George", "Kevin"]

['Elizabeth', 'Anabelle', 'Mary-Anne', 'George', 'Kevin']
[10, 8, -2, 7, 0, 8]
['Anabelle', 'Elizabeth', 'George', 'Kevin', 'Mary-Anne']
[-2, 0, 7, 8, 8, 10]


This typically only works in a **homogenous** collection, where all elements are of the same data type. Otherwise, you are likely to get an `AttributeError` as it cannot compare different data types with the logical `>` operator.

In [167]:
mixed_list = ["Elizabeth", 0, -7, 2, "Annabelle", 2, "Zander"]

try:
    mixed_list.sort()
    print(mixed_list)
except Exception as e:
    print(e)

'<' not supported between instances of 'int' and 'str'


##### Tuples (`tuple`)

An **immutable** sequence of data very similar to a list. They are typically used to a store homogenous items, but items in the tuple are **not** restricted to the same data type. 

**Notation**: `()`

**Elements**:
- surrounded by round brackets and separated by commas
- do not have to exist - a tuple can be empty
- may be *any* Python **object or data type**, including other lists
- do not have to be the same data type
- are associated with a specific position in the sequence, such that an item can be located with its **index**

In [168]:
test_tuple_1 = (1,24,76)
test_tuple_2 = ("red", 98, ["cheese", 67], "blue")
test_tuple_3 = ()

print(test_tuple_1)
print(test_tuple_2)
print(test_tuple_3)

(1, 24, 76)
('red', 98, ['cheese', 67], 'blue')
()


The creation of a tuple is called **tuple packing**.

The individual elements of a tuple can be assigned to variables in a process called **tuple unpacking**. The number of variables and the number of elements in the tuple should match.

**While you can also unpack lists and strings, it is a key use of tuples.**

In [169]:
(first,second,third,fourth)=test_tuple_2

print(test_tuple_2)
print(first)
print(second)
print(third)
print(fourth)

('red', 98, ['cheese', 67], 'blue')
red
98
['cheese', 67]
blue


##### Strings (`str`)

An **immutable** sequence of data stored as a string of unicode characters.

**Notation**: `''`

**Elements**:
- surrounded by single quotes (`'`) or double quotes (`"`)
- do not have to exist - a string can be empty
- each element in the string is associated with a specific position in the sequence, such that an item can be located with its **index**

In [170]:
test_string_1 = "I want to say 'Hello, World!'"
test_string_2 = 'The answer is 42'
test_string_3 = "MITOCHONDRIA IS THE POWERHOUSE OF THE CELL"

print(test_string_1)
print(test_string_2)
print(test_string_3)

I want to say 'Hello, World!'
The answer is 42
MITOCHONDRIA IS THE POWERHOUSE OF THE CELL


#### Common Features of Sequence Data Structures

In [171]:
test_list_1 = [1,24,76]
test_list_2 = ["red", 98, ["cheese", 67], "blue"]
test_list_3 = []
number_list = [10,8,-2,7,0,8]
friends_list = ["Elizabeth", "Anabelle", "Mary-Anne", "George", "Kevin"]
mixed_list = ["Elizabeth", 0, -7, 2, "Annabelle", 2, "Zander"]

test_tuple_1 = (1,24,76)
test_tuple_2 = ("red", 98, ["cheese", 67], "blue")
test_tuple_3 = ()

test_string_1 = "I want to say 'Hello, World!'"
test_string_2 = 'The answer is 42'
test_string_3 = ""

##### Indexing

Since sequence data structures organize their elements in a specific order, we can use that order to access a specific element.

The position of any element in a sequence data structure is its `index`. 

If an index is **negative** it 'counts' in reverse order, so an index of `-1` is always the last element.

The first element is always given an index of `0`. 

Notation: `sequence[index]`

Lists:

In [172]:
print(friends_list)
print(friends_list[0])
print(friends_list[2])
print(friends_list[-1])

['Elizabeth', 'Anabelle', 'Mary-Anne', 'George', 'Kevin']
Elizabeth
Mary-Anne
Kevin


Tuples:

In [173]:
print(test_tuple_2)
print(test_tuple_2[2])
print(test_tuple_2[3])
print(test_tuple_2[1])

('red', 98, ['cheese', 67], 'blue')
['cheese', 67]
blue
98


Strings:

In [174]:
print(test_string_1)
print(test_string_1[0])
print(test_string_1[-1])
print(test_string_1[4])

I want to say 'Hello, World!'
I
'
n


When a list or tuple contains another sequence data structure, you can sequence indexes to access a specific element of the sub-structure.

In [175]:
print(test_list_2[2])
print(test_list_2[2][1])
print(test_list_2[2][0][2])

['cheese', 67]
67
e


In [176]:
print(test_tuple_2[2])
print(test_tuple_2[2][1])
print(test_tuple_2[2][0][2])

['cheese', 67]
67
e


##### Slices `:`

We can take a subsection of a sequential list by using their indexes to take a 'slice'. The first number indicates the index of the first element included in the slice, the second number indicates the index of the **"up to but not including"** element.

If left blank, the slice will extend to the respective end of the sequence data structure.

Lists:

In [177]:
long_list = [0,1,2,3,4,5,6,7,8,9,10]
sub_list = long_list[2:6]
print(sub_list)
print(long_list[:3])
print(long_list[2:])

[2, 3, 4, 5]
[0, 1, 2]
[2, 3, 4, 5, 6, 7, 8, 9, 10]


Tuples:

In [178]:
long_tuple = (0,1,2,3,4,5,6,7,8,9,10)
sub_tuple = long_tuple[2:6]
print(sub_tuple)
print(long_tuple[:3])
print(long_tuple[2:])

(2, 3, 4, 5)
(0, 1, 2)
(2, 3, 4, 5, 6, 7, 8, 9, 10)


Strings:

In [179]:
print(test_string_2)
print(test_string_2[4:8])
print(test_string_2[:3])
print(test_string_2[2:])

The answer is 42
answ
The
e answer is 42


Similarly to how we can sequence **indexes** to access elements of a sequential data structure within another sequential data structure, we can also sequence **slice** to take a subsection of a sequential data structure element

Lists:

In [180]:
nested_long_list = [0,1,2,["nested1","nested2","nested3","nested4"],4,5,6]
nest_list_slice = nested_long_list[3][1:3]
print(nest_list_slice)

['nested2', 'nested3']


Tuples:

In [181]:
nested_long_tuple = (0,1,2,["nested1","nested2","nested3","nested4"],4,5,6)
nest_tuple_slice = nested_long_tuple[3][1:3]
print(nest_tuple_slice)

['nested2', 'nested3']


##### Concatenation (`+`)

We can create a new sequential data structure by appending two compatible data structures together with `+`

Lists:

In [182]:
joined_list = test_list_1 + test_list_2
print(joined_list)
print(test_list_1 + test_list_2)

[1, 24, 76, 'red', 98, ['cheese', 67], 'blue']
[1, 24, 76, 'red', 98, ['cheese', 67], 'blue']


Tuples:

In [183]:
joined_tuple = test_tuple_1 + test_tuple_2
print(joined_tuple)
print(test_tuple_1 + test_tuple_2)


(1, 24, 76, 'red', 98, ['cheese', 67], 'blue')
(1, 24, 76, 'red', 98, ['cheese', 67], 'blue')


Strings:

In [184]:
joined_string = test_string_1 + test_string_2
print(joined_string)
print(test_string_1 + test_string_2)


I want to say 'Hello, World!'The answer is 42
I want to say 'Hello, World!'The answer is 42


##### Multiplication (`*`)

Multiplication of a sequence data structure is similar to performing concatenation of the sequence data structure, to itself, the multiplication number of times.

Lists:

In [185]:
multiplied_list = test_list_1 * 2
print(multiplied_list)
print(test_list_1 + test_list_1)

print(test_list_1*4)

[1, 24, 76, 1, 24, 76]
[1, 24, 76, 1, 24, 76]
[1, 24, 76, 1, 24, 76, 1, 24, 76, 1, 24, 76]


Tuples:

In [186]:
multiplied_tuple = test_tuple_1 * 2
print(multiplied_tuple)
print(test_tuple_1 + test_tuple_1)
print(test_tuple_1*4)

(1, 24, 76, 1, 24, 76)
(1, 24, 76, 1, 24, 76)
(1, 24, 76, 1, 24, 76, 1, 24, 76, 1, 24, 76)


Strings:

In [187]:
multiplied_string = test_string_1 * 2
print(multiplied_string)
print(test_string_1 + test_string_1)
print(test_string_1*4)

I want to say 'Hello, World!'I want to say 'Hello, World!'
I want to say 'Hello, World!'I want to say 'Hello, World!'
I want to say 'Hello, World!'I want to say 'Hello, World!'I want to say 'Hello, World!'I want to say 'Hello, World!'


##### Sorting

A sequential data structure may be **sorted** by the elements in the list. 

- For **mutable** data structures, such as `list`s, there are built in methods that change the original structure.

- For **immutable** data structures, such as `tuple`s and `string`s, there are built-in functions that can generate a new structure in the sorted order

##### Packing and Unpacking

The creation of a sequence data structure is called **packing**.

The individual elements of a sequence data structure can be assigned to variables in a process called **unpacking**. The number of variables and the number of elements in the sequence data structure should match.

**While you can also unpack lists and strings, it is a key use of tuples.**


In [188]:
[first,second,third,fourth]=test_list_2

print(test_list_2)
print(first)
print(second)
print(third)
print(fourth)

['red', 98, ['cheese', 67], 'blue']
red
98
['cheese', 67]
blue


In [189]:
(first,second,third,fourth)=test_tuple_2

print(test_tuple_2)
print(first)
print(second)
print(third)
print(fourth)

('red', 98, ['cheese', 67], 'blue')
red
98
['cheese', 67]
blue


### Non-Sequence Data Structures

Non-sequence data structures store multiple objects without a particular order; there cannot be duplicate elements.

#### Sets

A **mutable,** unordered collection of data elements without duplicate elements, similar to sets in mathematics.

**Notation**: `{}` or `set()`

**Typical Use**:
- membership test
- elimination of duplicate items

In [190]:
fruits = set(['apple','orange','banana','apple'])
print(fruits)
print(type(fruits))
print('apple' in fruits)
print('cheese' in fruits)

{'orange', 'banana', 'apple'}
<class 'set'>
True
False


Since sets are mutable, you can use the built-in `.add()` and `.remove()` functions to modify their contents.

In [191]:
fruits.add('grape')
fruits.remove('apple')
print(fruits)


{'grape', 'orange', 'banana'}


There are several operators that can be used to modify sets, or compare them.

In [192]:
a = set('asbestosis kills')
b = set('cheesemonger wife')

**Elements of Set A, Exclude B (`-`)**

Subtracting Set B from Set A returns all the elements from Set A that are not in Set A

In [193]:
print(a-b)

{'l', 'a', 't', 'b', 'k'}


**Elements in Union of Set A and Set B (`|`)**

Requesting elements in Set A **or** Set B returns all elements that are in Set A, Set B, or both.

In [194]:
print(a|b)

{'i', 'm', 's', 'c', 'g', 'e', 'h', 'b', 'n', 'k', 'w', 'l', 'a', 'f', 't', 'r', 'o', ' '}


**Intersection of Set A and Set B (`&`)**

Requesting elements in Set A and Set B returns all elements that are in **both** Set A and Set B.

In [195]:
print(a&b)

{'i', 's', 'e', 'o', ' '}


**Unique Elements of Set A and Set B (`^`)**

Requesting the XOR returns the elements in Set A that are not in Set B, and the elements of Set B that are not in Set A.

This is similar to taking the union, and subtracting the intersection.

In [196]:
print(a^b)

{'l', 'm', 'c', 'g', 'a', 'f', 'h', 't', 'b', 'n', 'r', 'k', 'w'}


#### Dictionaries

A **mutable,** unordered collection of **mapped** data elements, wherein each `key` is mapped to a `value`. In Python, dictionaries are also called **hash tables** or **lookup tables**.

**Notation**: `{}`

**Elements**:
- **keys** can be any **immutable** data type
- **values** can be any data type
- a dictionary can hold keys, values, or values and keys of different data types
- Duplicate **values** are allowed, but duplicate **keys** are not

The **key** functions similarly to an **index**. Once a key is made, it is **immutable**. Rather, a whole new key must be made. Keys can be added or removed.

In [197]:
symbol_to_name = {
    "H":"hydrogen",
    "He":"helium",
    "Li":"lithium",
    "C":"carbon",
    "O":"oxygen",
    "N":"nitrogen",
}

user_data = {
    'user':'alpha1',
    'pswd': '123password',
}

print(symbol_to_name)
print(symbol_to_name["He"])
print(user_data)
print(user_data['pswd'])


{'H': 'hydrogen', 'He': 'helium', 'Li': 'lithium', 'C': 'carbon', 'O': 'oxygen', 'N': 'nitrogen'}
helium
{'user': 'alpha1', 'pswd': '123password'}
123password


Additional entries can be added to a dictionary simply by assigning a **value** to the **key**.

In [198]:
user_data['id number'] = '8675309'

print(user_data)

{'user': 'alpha1', 'pswd': '123password', 'id number': '8675309'}


Since **key** names must be unique, assigning a value to an existing key will replace its definition.

In [199]:
user_data['id number'] = '0118999'

print(user_data)

{'user': 'alpha1', 'pswd': '123password', 'id number': '0118999'}


Removing dictionary entries uses built-in methods.

`del` removes a specific key:value pair.

`.clear()` deletes every entry in the dictionary, reverting it to an empty dictionary.

In [200]:
del user_data['id number']
print(user_data)

user_data.clear()
print(user_data)

{'user': 'alpha1', 'pswd': '123password'}
{}


You can access the data stored in dictionaries through built-in methods.

`.keys()` returns a list of all keys, very useful

`.values()` returns a list of all values

`.items()` returns all key:value pairs as a list of tuples

In [201]:
print(symbol_to_name.keys())
print(symbol_to_name.values())
print(symbol_to_name.items())

dict_keys(['H', 'He', 'Li', 'C', 'O', 'N'])
dict_values(['hydrogen', 'helium', 'lithium', 'carbon', 'oxygen', 'nitrogen'])
dict_items([('H', 'hydrogen'), ('He', 'helium'), ('Li', 'lithium'), ('C', 'carbon'), ('O', 'oxygen'), ('N', 'nitrogen')])


`for` loops can be used to convert a `dictionary` into a `list`, or otherwise return the matched sets of **keys** and **values** for operations over the entire dictionary.

In [202]:
for element in symbol_to_name.keys():
    print(element, symbol_to_name[element], sep=" is ")

H is hydrogen
He is helium
Li is lithium
C is carbon
O is oxygen
N is nitrogen


### Common Features of Data Structures

In [203]:
test_list_1 = [1,24,76]
test_list_2 = ["red", 98, ["cheese", 67], "blue"]
test_list_3 = []
number_list = [10,8,-2,7,0,8]
friends_list = ["Elizabeth", "Anabelle", "Mary-Anne", "George", "Kevin"]
mixed_list = ["Elizabeth", 0, -7, 2, "Annabelle", 2, "Zander"]

test_tuple_1 = (1,24,76)
test_tuple_2 = ("red", 98, ["cheese", 67], "blue")
test_tuple_3 = ()

test_string_1 = "I want to say 'Hello, World!'"
test_string_2 = 'The answer is 42'
test_string_3 = ""

test_set_1 = set("alpha beta parkinglot")
test_set_2 = set("42 is the answer they gave us")

test_dictionary_1 = {
    'answer':42,
    'question':'life, the universe, and everything',
}

test_dictionary_2 = {
    'alpha':1,
    'beta':2,
    'charlie':3,
    'delta':4,
}

#### Content Check (`in`)

We can use the logical operator `in` to determine if an item is in a sequence data structure.

It is **direct, case sensitive, and literal.**

It **does not** "dig in" to an element that is a sequence data structure. It simply checks the entire element as a whole.

- If the element is present, it will return `True`.
- If the element is **not** present, it will return `False.`

For dictionaries, it searches **keys** only.

Lists:

In [204]:
print(test_list_2)
print(98 in test_list_2)
print("cheese" in test_list_2)
print("cheese" in test_list_2[2])

['red', 98, ['cheese', 67], 'blue']
True
False
True


Tuples:

In [205]:
print(test_tuple_2)
print(98 in test_tuple_2)
print("cheese" in test_tuple_2)
print("cheese" in test_list_2[2])


('red', 98, ['cheese', 67], 'blue')
True
False
True


Strings:

In [206]:
print(test_string_2)
print('42' in test_string_2)
print('A' in test_string_2)
print('a' in test_string_2)
print(' ' in test_string_2)
print("z" in test_string_2)
print("T" in test_string_2)
print('t' in test_string_2)
print("answer" in test_string_2)

The answer is 42
True
False
True
True
False
True
False
True


Sets:

In [207]:
print(test_set_2)
print(42 in test_set_2)
print('42' in test_set_2)
print('4' in test_set_2)

{'i', 'u', 'y', 's', '4', 'g', 'e', 'a', 'h', 't', 'n', 'r', 'v', ' ', 'w', '2'}
False
False
True


Dictionaries:

In [208]:
print(test_dictionary_1)
print(42 in test_dictionary_1)
print('42' in test_dictionary_1)
print('answer' in test_dictionary_1)

{'answer': 42, 'question': 'life, the universe, and everything'}
False
False
True


#### Length (`len`)

We can determine the number of elements within a data structure by passing it through the built in `len()` function, which takes the data structure as its argument.

If an element is a data structure, it does not "dig in" to count the individual elements within that sub-structure. Rather, it counts that element as 1.

For dictionaries, it returns the **number of keys**.

Lists:

In [209]:
print(test_list_1)
print(len(test_list_1))
print(test_list_2)
print(len(test_list_2))
print(test_list_3)
print(len(test_list_3))

[1, 24, 76]
3
['red', 98, ['cheese', 67], 'blue']
4
[]
0


Tuples:

In [210]:
print(test_tuple_1)
print(len(test_tuple_1))
print(test_tuple_2)
print(len(test_tuple_2))
print(test_tuple_3)
print(len(test_tuple_3))

(1, 24, 76)
3
('red', 98, ['cheese', 67], 'blue')
4
()
0


Strings:

In [211]:
print(test_string_1)
print(len(test_string_1))
print(test_string_2)
print(len(test_string_2))
print(test_string_3)
print(len(test_string_3))

I want to say 'Hello, World!'
29
The answer is 42
16

0


Sets:

In [212]:
print(test_set_1)
print(len(test_set_1))
print(test_set_2)
print(len(test_set_2))

{'l', 'i', 'g', 'a', 'e', 'o', 'h', 't', 'b', 'r', 'p', 'n', 'k', ' '}
14
{'i', 'u', 'y', 's', '4', 'g', 'e', 'a', 'h', 't', 'n', 'r', 'v', ' ', 'w', '2'}
16


Dictionaries:

In [213]:
print(test_dictionary_1)
print(len(test_dictionary_1))
print(test_dictionary_2)
print(len(test_dictionary_2))

{'answer': 42, 'question': 'life, the universe, and everything'}
2
{'alpha': 1, 'beta': 2, 'charlie': 3, 'delta': 4}
4


### Reference Semantics

Assignmenet manipulates references.

1. A value is created and stored in memory
2. The variable name is created
3. A **reference** to the **memory location** that stores the value is then assigned to the variable
4. When we call on the variable, it **follows this path** to find the value in memory it provides.

So if we assign one variable to another variable...

In [214]:
a = [1,2,3]
#a references list [1,2,3]
b = a
#b references a (and whatever a references)
a.append(4)
#changing a thus results in...
print(b)

[1, 2, 3, 4]


When we change the value of a variable, we are **changing which memory location it is referencing**. 

So when we do something like incrementing...

1. The reference of the variable provides the memory location it references
2. The value stored at that memory location is retrieved
3. The calculation or process occurs, producing a **new data element** that is assigned to a **fresh memory location**
4. The variable is assigned a new reference, to the **new memory location**

In [215]:
x = 3
#the integer 3 is assigned a memory location
#x is assigned a "reference" to that location

x = x + 1
#The reference for x is retrieved
#The value stored at this reference is
#the integer 3

#The calculation occurs, and yields an integer of 4
#The integer 4 is stored in a new location
#x is mapped to the new memory location

**Mutability** depends on the **data type**. If data is **mutable**, then the new data **replaces what existed at the same memory location**.

These data types include `list`s, `dictionary`s, and some user-defined types.

In [216]:
x = 3
y = x
x = 4
print(y)

a = [1,2,3]
b=a
a.append(4)
print(b)

3
[1, 2, 3, 4]


### List Comprehensions

A powerful and popular feature in Python, list comprehensions allow the generation of a new list by applying a function to every member of an original list.

This helps eliminate many for-loops in Python!

**Notation:** `[expression for element in list]`

In [217]:
li = [3,6,2,7]
squared = [elem**2 for elem in li]
print(squared)

for_list = []
for elem in li:
    for_list.append(elem**2)
print(for_list)

[9, 36, 4, 49]
[9, 36, 4, 49]


If a `list` contains elements of different types, then the expression must operate correctly on all members of the list.

If the elements of the list are other data structures, **unpacking** or **keys** can be used in the `element` to match the 'shape' of the elements.

In [218]:
tuple_list = [('a',1),('b',2),('c',3)]
print( [n*3 for (x,n) in tuple_list] )

[3, 6, 9]


This can be further expanded by adding a `filter` to the list comprehension.

Each element of the list is checked for a filter condition. If the condition returns `False`, **that element** is omitted from the list before the list comprehension is evaluated.

**Notation:** `[expression for element in list if filter]`

In [219]:
li = [3,6,2,7,1,9]
squared = [elem**2 for elem in li if elem > 4]
print(squared)

[36, 49, 81]
