# Lecture 4

# Section 3: Dictionaries and Sets

Hint: All the examples and explanations from this part of today's lecture can be found in chapter 6 of the book.

## 3.1 Introduction
* **lists** and **tuples** are _sequences_, i.e. _collections_ in which elements have a certain order.
* A **dictionary** is a collection that stores **key–value pairs** that map immutable keys to values, just as a conventional dictionary maps words to definitions. 
* A **set** is an unordered collection of unique immutable elements.

## 3.2 Dictionaries
* A dictionary _associates_ keys with values. 
* Each key _maps_ to a specific value. 
* Sample dictionary keys and values:

| Keys | Key type | Values | Value type
| :-------- | :-------- | :-------- | :--------
| Country names | `str` | Internet country codes | `str` 
| Decimal numbers | `int` | Roman numerals | `str` 
| States | `str` | Agricultural products | list of `str` 
| Hospital patients | `str`  | Vital signs | tuple of `int`s and `float`s 
| Baseball players | `str`  | Batting averages | `float` 
| Metric measurements | `str`  | Abbreviations | `str` 
| Inventory codes | `str`  | Quantity in stock | `int` 

### Unique Keys
* Keys must be _immutable_ and _unique_.
* Multiple keys can have the same value. 

### 3.2.1 Creating a Dictionary
* Create a dictionary by enclosing in curly braces, `{}`, a comma-separated list of key–value pairs, each of the form _key_: _value_. 
* Create an empty dictionary with `{}`. 

In [None]:
country_codes = {'Finland': 'fi', 'South Africa': 'za', 
                  'Nepal': 'np'}
                 

In [None]:
country_codes

* Dictionaries are _unordered_ collections.
* Do _not_ write code that depends on the order of the key–value pairs. 

### Determining if a Dictionary Is Empty 

In [None]:
len(country_codes)

* Can use a dictionary as a condition to determine if it’s empty—non-empty is `True` and empty is `False`

In [None]:
if country_codes:
    print('country_codes is not empty')
else:
    print('country_codes is empty')
    

In [None]:
country_codes.clear()

In [None]:
if country_codes:
    print('country_codes is not empty')
else:
    print('country_codes is empty')
    

### 3.2.2 Iterating through a Dictionary 

In [None]:
days_per_month = {'January': 31, 'February': 28, 'March': 31}

In [None]:
days_per_month

* Dictionary method **`items`** returns each key–value pair as a tuple: 

In [None]:
for month, days in days_per_month.items():
    print(f'{month} has {days} days')
    

### 3.2.3 Basic Dictionary Operations

* Intentionally provided the incorrect value `100` for the key `'X'`:

In [None]:
roman_numerals = {'I': 1, 'II': 2, 'III': 3, 'V': 5, 'X': 100}

In [None]:
roman_numerals

### Accessing the Value Associated with a Key

In [None]:
roman_numerals['V']

### Updating the Value of an Existing Key–Value Pair

In [None]:
roman_numerals['X'] = 10

In [None]:
roman_numerals

### Adding a New Key–Value Pair

In [None]:
roman_numerals['L'] = 50

In [None]:
roman_numerals

* String keys are case sensitive. 
* Assigning to a nonexistent key inserts a new key–value pair. 

### Removing a Key–Value Pair

In [None]:
del roman_numerals['III']

In [None]:
roman_numerals

* Method **`pop`** returns the value for the removed key.

In [None]:
roman_numerals.pop('X')

In [None]:
roman_numerals

### Attempting to Access a Nonexistent Key

In [None]:
roman_numerals['III']

* Method **`get`** returns its argument’s corresponding value or `None` if the key is not found. 
* IPython does not display anything for `None`. 
* `get` with a second argument returns the second argument if the key is not found.

In [None]:
roman_numerals.get('III') # returns None

In [None]:
roman_numerals.get('III', 'not in dictionary')

In [None]:
roman_numerals.get('V')

### Testing Whether a Dictionary Contains a Specified Key

In [None]:
'V' in roman_numerals

In [None]:
'III' in roman_numerals

In [None]:
'III' not in roman_numerals

### 3.2.4 Dictionary Methods `keys` and `values` 

In [1]:
months = {'January': 1, 'February': 2, 'March': 3}

In [2]:
for month_name in months.keys():
    print(month_name, end='  ')
   

January  February  March  

In [3]:
for month_number in months.values():
    print(month_number, end='  ')

1  2  3  

### Dictionary Views
* Methods `items`, `keys` and `values` each return a **view** of a dictionary’s data. 
* When you iterate over a **`view`**, it “sees” the dictionary’s **current contents**—it does **not** have its own copy of the data.

In [4]:
months_view = months.keys()

In [5]:
for key in months_view:
    print(key, end='  ')
    

January  February  March  

In [6]:
months['December'] = 12

In [7]:
months

{'January': 1, 'February': 2, 'March': 3, 'December': 12}

In [8]:
for key in months_view:
    print(key, end='  ')
    

January  February  March  December  

### Converting Dictionary Keys, Values and Key–Value Pairs to Lists

In [9]:
list(months.keys())

['January', 'February', 'March', 'December']

In [10]:
list(months.values())

[1, 2, 3, 12]

In [11]:
list(months.items())

[('January', 1), ('February', 2), ('March', 3), ('December', 12)]

### Processing Keys in Sorted Order 

In [12]:
for month_name in sorted(months.keys()):
     print(month_name, end='  ')

December  February  January  March  

### 3.2.6 Example: Dictionary of Student Grades
* Script with a dictionary that represents an instructor’s grade book. 
* Maps each student’s name (a string) to a list of integers containing that student’s grades on three exams.  

In [None]:
grade_book = {            
    'Susan': [92, 85, 100], 
    'Eduardo': [83, 95, 79],
    'Azizi': [91, 89, 82],  
    'Pantipa': [97, 91, 92] 
}

all_grades_total = 0
all_grades_count = 0

for name, grades in grade_book.items():
    total = sum(grades)
    print(f'Average for {name} is {total/len(grades):.2f}')
    all_grades_total += total
    all_grades_count += len(grades)
    
print(f"Class's average is: {all_grades_total / all_grades_count:.2f}")

### 3.2.7 Example: Word Counts 
* Script that builds a dictionary to count the number of occurrences of each word in a **tokenized** string. 
* Python automatically concatenates strings separated by whitespace in parentheses. 

In [None]:
text = ('this is sample text with several words ' 
        'this is more sample text with some different words')

word_counts = {}

# count occurrences of each unique word
for word in text.split():
    if word in word_counts: 
        word_counts[word] += 1  # update existing key-value pair
    else:
        word_counts[word] = 1  # insert new key-value pair

print(f'{"WORD":<12}COUNT')

for word, count in sorted(word_counts.items()):
    print(f'{word:<12}{count}')

print('\nNumber of unique words:', len(word_counts))

### Python Standard Library Module `collections` 
* The Python Standard Library already contains the counting functionality shown above. 
* A **`Counter`** is a customized dictionary that receives an iterable and summarizes its elements. 

In [None]:
from collections import Counter

In [None]:
text = ('this is sample text with several words '
        'this is more sample text with some different words')

In [None]:
counter = Counter(text.split())

In [None]:
for word, count in sorted(counter.items()):
    print(f'{word:<12}{count}')
    

In [None]:
print('Number of unique keys:', len(counter.keys()))

### 3.2.8 Dictionary Method `update` 
* Can insert and update key–value pairs.
* Method `update` also can receive an iterable object containing key–value pairs, such as a list of two-element tuples.

In [None]:
country_codes = {}

In [None]:
country_codes.update({'South Africa': 'za'})

In [None]:
country_codes

* Purposely inserting an incorrect value:

In [None]:
country_codes.update(Australia='ar')

In [None]:
country_codes

* Correcting the incorrect value:

In [None]:
country_codes.update(Australia='au')

Instead of 
* providing a `dict` (`{key: value}`) as an argument to the method `update`, 
* we can also provide a comma separated list of **keyword arguments** (`Australia='au'`)

In [None]:
country_codes

## 3.3 Sets
* A set is an unordered collection of **unique values**. 
* May contain **only immutable objects**, like strings, `int`s, `float`s and tuples that contain only immutable elements. 
* Sets do not support indexing and slicing. 

### Creating a Set with Curly Braces
* Duplicates are ignored, making sets great for **duplicate elimination**.

In [None]:
colors = {'red', 'orange', 'yellow', 'green', 'red', 'blue'}

* Though the output below is sorted, sets are **unordered**&mdash;do not write order-dependent code. 

In [None]:
colors

### Determining a Set’s Length

In [None]:
len(colors)

### Checking Whether a Value Is in a Set

In [None]:
'red' in colors

In [None]:
'purple' in colors

In [None]:
'purple' not in colors

### Iterating Through a Set
* There’s no significance to the iteration order.

In [None]:
for color in colors:
    print(color.upper(), end=' ')
    

### Creating a Set with the Built-In `set` Function

In [None]:
numbers = list(range(10)) + list(range(5))

In [None]:
numbers

In [None]:
set(numbers)

* To create an empty set, must use the **`set()`**, because **`{}` represents an empty dictionary**.
* Python displays an empty set as `set()` to avoid confusion with an empty dictionary (`{}`).

In [None]:
set()

### 3.3.1 Comparing Sets

In [None]:
{1, 3, 5} == {3, 5, 1}

In [None]:
{1, 3, 5} != {3, 5, 1}

* `<` tests whether the set to its left is a **proper subset** (_echte Teilmenge_) of the one to its right—all the elements in the left operand are in the right operand, and **the sets are not equal**.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/Set_subsetAofB.svg/300px-Set_subsetAofB.svg.png" style="margin:auto"/>

In [None]:
{1, 3, 5} < {3, 5, 1}

In [None]:
{1, 3, 5} < {7, 3, 5, 1}

* The `<=` operator tests whether the set to its left is an **improper subset** (_Teilmenge_) of the one to its right—that is, all the elements in the left operand are in the right operand, and **the sets might be equal**:

In [None]:
{1, 3, 5} <= {3, 5, 1}

In [None]:
{1, 3} <= {3, 5, 1}

* You may also check for an improper subset (_Teilmenge_) with the set method **`issubset`**:

In [None]:
{1, 3, 5}.issubset({3, 5, 1})

In [None]:
{1, 2}.issubset({3, 5, 1})

* Similarly, you may also check for a **proper superset** (_echte Obermenge_) with `>` and **improper supersets** (_Obermenge_) with `>=` or set method **`issuperset`**.

In [None]:
{1, 3, 5} > {3, 5, 1}

In [None]:
{1, 3, 5, 7} > {3, 5, 1}

In [None]:
{1, 3, 5} >= {3, 5, 1}

In [None]:
{1, 3, 5} >= {3, 1}

In [None]:
{1, 3} >= {3, 1, 7}

In [None]:
{1, 3, 5}.issuperset({3, 5, 1})

In [None]:
{1, 3, 5}.issuperset({3, 2})

* Argument to `issubset` or `issuperset` can be _any_ iterable. 
* For a non-set iterable argument, the methods first convert the iterable to a set, then perform the operation.

### 3.3.2 Mathematical Set Operations

### Union (_Vereinigungsmenge_)
* The **union** of two sets is a set consisting of all the unique elements from both sets.
* The union operator requires two sets, but method `union` may receive any iterable as its argument (this is true for subsequent methods in this section as well).

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/30/Venn0111.svg/380px-Venn0111.svg.png" style="margin:auto"/>

In [None]:
{1, 3, 5} | {2, 3, 4}

In [None]:
{1, 3, 5}.union([20, 20, 3, 40, 40])

### Intersection (_Schnittmenge_)
The **intersection** of two sets is a set consisting of all the unique elements that the two sets have in common. 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/99/Venn0001.svg/384px-Venn0001.svg.png" style="margin:auto"/>

In [None]:
{1, 3, 5} & {2, 3, 4}

In [None]:
{1, 3, 5}.intersection([1, 2, 2, 3, 3, 4, 4])

### Difference (_Differenzmenge / Restmenge_)
The **difference** between two sets is a set consisting of the elements in the left operand that are not in the right operand. 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Venn0100.svg/384px-Venn0100.svg.png" style="margin:auto"/>

In [None]:
{1, 3, 5} - {2, 3, 4}

In [None]:
{1, 3, 5, 7}.difference([2, 2, 3, 3, 4, 4])

### Symmetric Difference (_Symmetrische Differenz_)
The **symmetric difference** between two sets is a set consisting of the elements of both sets that are not in common with one another. 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Venn0110.svg/384px-Venn0110.svg.png" style="margin:auto"/>

In [None]:
{1, 3, 5} ^ {2, 3, 4}

In [None]:
{1, 3, 5, 7}.symmetric_difference([2, 2, 3, 3, 4, 4])

### Disjoint (_Disjunkte Mengen_)
Two sets are **disjoint** if they do not have any common elements. 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Disjunkte_Mengen.svg/342px-Disjunkte_Mengen.svg.png" style="margin:auto"/>

In [None]:
{1, 3, 5}.isdisjoint({2, 4, 6})

In [None]:
{1, 3, 5}.isdisjoint({4, 6, 1})

 ------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 1 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).         