<a href="https://colab.research.google.com/github/allegheny-college-cmpsc-101-fall-2023/course-materials/blob/main/Notes/Templates/dictionaries_CMPSC101_Fall2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sets and Dictionaries

### Sets

- are a container, like lists and tuples
- use {} curly braces!
- `my_set = {1, 4.5, (1,2,3), 'hello'}`
- they are Iterable - you can use them in a loop (like tuple, range, list, str)
- BUT
- they are unordered (unlike lists and tuples) - you cannot index into a set
- the order of the items that come out is undefined

`my_set[0] # this will fail`

- they contain only unique items (no duplicates)
- they contain only objects that are hashable!
- hashable refers to objects that have \_\_hash__() defined as method

documentation: https://python-reference.readthedocs.io/en/latest/docs/sets/
- .add()
- .update()
- .remove()
- .discard()
- .union()
- .intersection()
- .difference()
- `if elem in set`

In [None]:
# 7. with this code, what is happening?
# - set defined with strings inside
# - crash because and element of the set is accessed by an ordered index [0]
# - sets are not ordered

department = {'computer', 'and', 'information', 'science'}
# look at entire set
print("Entire set:", department)
# add and element
department.add('Hello')
# look at entire set
print("Entire set:", department)
# look at first element
print("First elem:", department[0])

In [None]:
# 8. with this code, what is happening?
# - set defined with strings inside
# - RUNS because strings are hashable
# - sets can only contain hashable items

department = {'computer', 'computer', 'computer', 'science'}
# look at entire set
print("Entire set:", department)
print(len(department))

In [None]:
# 9. with this code, what is happening?
# - word_ variables are storing lists with strings
# - the lists are put into a set
# - crash because lists are not hashable
# - sets can only contain hashable items

word_1, word_a, word_first, word_alpha = ['computer'], ['and'], ['information'], ['science']
department = {word_1, word_a, word_first, word_alpha}

In [None]:
# containment example
example_set = {'hello','darkness','my old friend', 1234}
contains_hello = 'hello' in example_set
contains_HELLO = 'HELLO' in example_set
print(f"contains_hello is {contains_hello}, and contains_HELLO is {contains_HELLO}")

### Hashability

In [None]:
# test hashability of tuples
a = (1.0, True, "hello")
print(hash(a))
# tiny numeric changes impact hash!
a = (1.01, True, "hello")
print(hash(a))

# what goes wrong here?
# a_altered cannot be hashed because it contains a list which cannot be hashed because lists are mutable
a_altered = ([1.01], True, "hello")
print(hash(a_altered))


In [None]:
# test hashability of sets
# what goes wrong here?
# Sets are mutable, and cannot be hashed
a = {1,2,"3"}
print(hash(a))


In [None]:
# demonstrating the hashability of items inside a set
example_set = {'hello','darkness','my old friend', 1234}

for item in example_set:
  print(f"item {item} has hash: {hash(item)}")


In [None]:
# sets are not subscriptable
example_set = {'hello','darkness','my old friend', 1234}
for i in range(len(example_set)):
  item = example_set[i]
  print(f"item {item} has hash: {hash(item)}")


### Dictionary

- dictionaries have keys and values
- keys must be hashable
  - each key has can be represented by some unique integer
  - actually, the hashes are not unique, but we aim to find a hash function that can make a well-distributed bunch of hashes.
- `my_dict = {key1:value1, key2:value2}`
- ```python
my_dict = {}
my_dict[key1] = value1
my_dict[key2] = value2
```
- there are many ways to create!
- dictionaries are unordered
- dictionary values are accessed by key `my_dict[key]`
- dictionaries are not indexed by regular indices
- documentation: https://python-reference.readthedocs.io/en/latest/docs/dict/


In [9]:
month_map1 = {
 'Jan':1,
 'Feb':2,
 'Mar':3,
 'Dec':12}

month_map2 = {}
month_map2[1] = 'Jan'
month_map2[2] = 'Feb'
month_map2[3] = 'Mar'
month_map2[12] = 'Dec'


print(month_map1)
print(month_map2)

{'Jan': 1, 'Feb': 2, 'Mar': 3, 'Dec': 12}
{1: 'Jan', 2: 'Feb', 3: 'Mar', 12: 'Dec'}


In [10]:
# Iterable
for key in month_map1:
  print(key)
  print(month_map1[key])

Jan
1
Feb
2
Mar
3
Dec
12


In [1]:
# create a dictionary
mlb_team_one = {
    'Colorado' : 'Rockies',
    'Boston'   : 'Red Sox',
    'Minnesota': 'Twins',
    'Milwaukee': 'Brewers'
}

In [2]:
# create a dictionary
mlb_team_two = dict([
    ('Colorado', 'Rockies'),
    ('Boston', 'Red Sox'),
    ('Minnesota', 'Twins'),
    ('Milwaukee', 'Brewers')
])

In [3]:
# create a dictionary
mlb_team_three = dict(
    Colorado='Rockies',
    Boston='Red Sox',
    Minnesota='Twins',
    Milwaukee='Brewers',
    Seattle='Mariners'
)

In [4]:
# display and manipulate the contents of a dictionary

# display the address
print(type(mlb_team_one))

# display the contents
print('one:', mlb_team_one)
print('two:', mlb_team_two)
print('three', mlb_team_three)

# lookup specific values using a key
print(mlb_team_one['Minnesota'])
print(mlb_team_one['Colorado'])

# add a new value to the dictionary
mlb_team_one['Kansas City'] = 'Royals'

# lookup the new value inside of the dictionary
print(mlb_team_one['Kansas City'])

<class 'dict'>
{'Colorado': 'Rockies', 'Boston': 'Red Sox', 'Minnesota': 'Twins', 'Milwaukee': 'Brewers'}
{'Colorado': 'Rockies', 'Boston': 'Red Sox', 'Minnesota': 'Twins', 'Milwaukee': 'Brewers'}
{'Colorado': 'Rockies', 'Boston': 'Red Sox', 'Minnesota': 'Twins', 'Milwaukee': 'Brewers', 'Seattle': 'Mariners'}
Twins
Rockies
Royals


In [None]:
# attempt to access a key that does not exist in a dictionary
print(mlb_team_one['Toronto'])

In [12]:
# comment every line below to explain what is happening

class my_dictionary():
  """A dictionary using integer keys."""
  def __init__(self, num_buckets):
    """Create empty dictionary."""
    self.total_num_buckets = num_buckets
    self.buckets = []
    for i in range(num_buckets):
      self.buckets.append([])

  def add_entry(self, key, value):
    selected_bucket = self.buckets[key%self.total_num_buckets]
    for i in range(len(selected_bucket)):
      if selected_bucket[i][0] == key:
        selected_bucket[i] = (key, value)
        return
    selected_bucket.append((key, value))


  def get_value(self, key):
    selected_bucket = self.buckets[key%self.total_num_buckets]
    for i in range(len(selected_bucket)):
      if selected_bucket[i][0] == key:
        return selected_bucket[i][1]
    return None

  def __str__(self):
    result = '{'
    for bucket in self.buckets:
      for entry in bucket:
        result += f'{entry[0]}:{entry[1]}'
    return result + '}'



In [13]:
import random
D = my_dictionary(15)
for i in range(20):
  key = random.randint(0, 10**5 -1)
  D.add_entry(key, i)
print('Printing the dictionary, D')
print(D)

Printing the dictionary, D
{50985:836450:1089506:185898:166348:670595:4740:111416:263186:1456740:1286861:035636:548866:1982707:1536987:1772868:385723:787973:1347924:988799:16}


In [14]:
for hash_bucket in D.buckets: # violates abstraction barrier
      print(hash_bucket)

[(50985, 8), (36450, 10)]
[(89506, 18)]
[]
[(5898, 1), (66348, 6)]
[]
[(70595, 4), (740, 11)]
[(1416, 2), (63186, 14)]
[]
[]
[]
[(56740, 12)]
[(86861, 0), (35636, 5), (48866, 19)]
[(82707, 15), (36987, 17)]
[(72868, 3), (85723, 7), (87973, 13)]
[(47924, 9), (88799, 16)]
