# Python Dictionaries and Sets


## Dictionaries

* Dictionaries are collections of Key - Value pairs
  * `key: value`
* also known as associative array
* unordered
* keys unique in one dictionary
* useful for storing, extracting information

https://automatetheboringstuff.com/2e/chapter5/

https://realpython.com/python-dicts/

In [1]:
# Let's create an empty dictionary

empty_dict = {}
len(empty_dict)

# Note: can also use dict() instead of {}

0

In [2]:
print(empty_dict)

{}


In [3]:
type(empty_dict)

dict

In [4]:
# A dictionary with some content:

tel = {'jack': 4098, 'sape': 4139}
print(tel)

{'jack': 4098, 'sape': 4139}


In [5]:
# add a new key-value pair (key = "guido", value = 4127)

tel['guido'] = 4127
print(tel)

{'jack': 4098, 'sape': 4139, 'guido': 4127}


In [6]:
# reading a value by key
print(tel['guido'])

4127


In [7]:
# if you assign a new value to a key, the old value gets overwritten

tel['guido'] = 1337
print(tel)

{'jack': 4098, 'sape': 4139, 'guido': 1337}


In [8]:
# len() lets us get the length of an collection (e.g. a dictionary)

len(tel)

3

---

Accessing dictionary keys and values:

In [9]:
print(tel.keys())

dict_keys(['jack', 'sape', 'guido'])


In [10]:
print(tel.values())

dict_values([4098, 4139, 1337])


In [11]:
print(tel.items())

dict_items([('jack', 4098), ('sape', 4139), ('guido', 1337)])


In [12]:
# get a value from key in dictionary
# very fast even in large dictionaries! O(1)
tel['jack']

4098

In [13]:
# getting value for non-existing key will fail (and stop your program from executing)

tel['peteris']

KeyError: 'peteris'

In [14]:
# we can use Python's error handling to do something in case of such errors

try:
    tel['peteris']
except KeyError:
    print("Element not found!")

Element not found!


---

We can also:
- check if a key is in a dictionary before retrieving it
- or use the `.get()` method for handling situations when key is not in the dictionary

In [15]:
# check for key in our dictionary
'guido' in tel

True

In [16]:
'peteris' in tel

False

In [17]:
key = 'peteris'

# we can write code that checks if a key is in a dict:
if key in tel:
    print(tel[key])
else:
    print("No such key!")

No such key!


In [18]:
help(dict.get)

Help on method_descriptor:

get(self, key, default=None, /) unbound builtins.dict method
    Return the value for key if key is in the dictionary, else default.



In [19]:
# get() method lets us return a default value when key does not exist:
key = 'peteris'
value = tel.get(key, "No such key!")
print(value)

print()

key = 'guido'
value = tel.get(key, "No such key!")
print(value)

No such key!

1337


In [20]:
# remove key value pair
del tel['sape']

In [21]:
tel['sape']

KeyError: 'sape'

In [22]:
tel.keys()

dict_keys(['jack', 'guido'])

---

Looking for values in a dictionary:

In [23]:
tel.values()

dict_values([4098, 1337])

In [24]:
# Note: this will be slower as we are going through all the key:value pairs

1337 in tel.values()

True

In [25]:
112 in tel.values()

False

In [26]:
# a small program that allows adding key:value pairs
# to the dictionary only if the key is not in the dictionary

key = 'minipolice'
value = 91100

if key not in tel:
    tel[key] = value
    print(f"Added new key {key} value {value} pair")
else:
    print("You already have key", key, "value", tel[key])

Added new key minipolice value 91100 pair


---
#### Uzdevums: 

Izveidot funkciju, kas izdrukā vārdnīcas atslēgu un vērtību pārus, sakārtojot tos atslēgu vērtību secībā.
* Te var noderēt funkcija sorted()

In [27]:
my_dict1 = {"jack": 4011, "zoe": 4086, "andy": 4519, "uldis": 4123, "ādams": 4529}

---

### Working with Dictionaries

* `items()` = a list of dictionary items (key-value pairs)
* `keys()` = a list of keys
* `values()` = a list of values

* `get()` = read a value corresponding to the key
* `pop()` = remove the element specified by the key and return the corresponding value
* `update()` = update a dictionary with elements from another dictionary


In [28]:
help(dict)

Help on class dict in module builtins:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |
 |  Built-in subclasses:
 |      StgDict
 |
 |  Methods defined here:
 |
 |  __contains__(self, key, /)
 |      True if the dictionary has the specified key, else False.
 |
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getitem__(self, key, /)
 |      Return self[key].
 |
 |  __gt__(self, value, /)
 |      Return self>va

In [29]:
tel

{'jack': 4098, 'guido': 1337, 'minipolice': 91100}

In [30]:
# return the corresponding value AND delete it from dictionary
print(tel.pop("jack"))

4098


In [31]:
tel

{'guido': 1337, 'minipolice': 91100}

In [32]:
# we get a key error if we try to pop() it again
tel.pop("jack")

KeyError: 'jack'

In [33]:
# just getting the value does not delete it from dict
tel["guido"]

1337

In [34]:
more_tel = {"dana": 4345, "xeny": 4678, "guido": 4444}

In [35]:
# update dictionary 1 with elements from dictionary 2
tel.update(more_tel)

In [36]:
tel

{'guido': 4444, 'minipolice': 91100, 'dana': 4345, 'xeny': 4678}

---

### Dictionary keys and values can be of various types

Questions:
- what data types can be dictionary keys?
- ... what about types of values?
- can a dict contain another dict?

Experiment to find out!

In [37]:
my_dict2 = {"text": "More text", "300": 300}

In [38]:
my_dict2

{'text': 'More text', '300': 300}

In [39]:
%%time

# let's construct a large dictionary

big_dict = {}

for i in range(20_000_000):
    big_dict[i] = i * 2

CPU times: user 664 ms, sys: 191 ms, total: 855 ms
Wall time: 971 ms


In [40]:
%%time

# looking for a key in a dictionary is very *fast*

"-1" in big_dict

CPU times: user 1 μs, sys: 1e+03 ns, total: 2 μs
Wall time: 3.1 μs


False

In [41]:
%%time

# looking for a value takes longer

"-1" in big_dict.values()

CPU times: user 168 ms, sys: 58.9 ms, total: 227 ms
Wall time: 235 ms


False

---

### Dictionaries may contain lists and other dicts

That allows us to store almost any information in them.

In [42]:
my_dict3 = {"list": [1, 2, 3], 45: 365, "dict": {"a": 10, "b": [4, 5, 6]}}

In [43]:
# get 6 out of this dict
my_dict3['dict']

{'a': 10, 'b': [4, 5, 6]}

In [44]:
# get 6 out of this dict
my_dict3['dict']['b']

[4, 5, 6]

In [45]:
# get 6 out of this dict
my_dict3['dict']['b'][-1]

6

In [46]:
internal_list = my_dict3['dict']['b']

internal_list.append(8)

In [47]:
internal_list

[4, 5, 6, 8]

In [48]:
my_dict3

{'list': [1, 2, 3], 45: 365, 'dict': {'a': 10, 'b': [4, 5, 6, 8]}}

---

### Exercise: counting things

Write a function that:
- takes a list of words as an argument
- calculates how frequently each word appears in a list (use a dict for that)
- returns a dictionary with word frequency information

In [49]:
def count(word_list):
    freq = {}

    # build a dictionary here with word frequency information
    
    return freq

In [50]:
word_list = ["ābols", "lapa", "liepa", "lapa", "aaa", "ābols", "varde"]

In [51]:
print(count(word_list))

{}


### Exercise: dictionary

1) Write a function that uses a dictionary defined below and takes a text word as an argument. The function should return a translation of this word if it is in the dictionary.

Example: `translate_word("dog")` should output `"suns"`

If a word is not in the dictionary return the word as-is.

2) Write a function that takes a text sentence as an argument and returns a translation of this sentence (using the dictionary defined below). It should replace words in the dictionary with their translations, leaving unknown words as they are.

Example: `elephant is very happy` should be translated as `zilonis ir very laimīgs`

3) Add some new words to the dictionary used in these exercises. For example, add some smileys 😄

In [52]:
my_dict = {
    "apple": "ābols",
    "pear": "bumbieris",
    "cat": "kaķis",
    "dog": "suns",
    "elephant": "zilonis",
    "bear": "lācis",
    "beer": "alus",
    "a": "",
    "an": "",
    "the": "",
    "is": "ir",
    "and": "un",
    "but": "bet",
    "big": "liels",
    "large": "liels",
    "small": "mazs",
    "cold": "auksts",
    "warm": "silts",
    "hot": "karsts",
    "tasty": "garšīgs",
    "sad": "noskumis",
    "happy": "laimīgs",
    "white": "balts",
    "grey": "pelēks",
    "green": "zaļš",
    "yellow": "dzeltens",
    "red": "sarkans",
    "black": "melns",
    "(smile)": "😄"
}

In [53]:
# Subtask 1

def translate_word(eng_word):  
    # write Python code that returns a translation of eng_word 
    # (if it can be found in the dictionary)

    lv_word = ""
    
    return lv_word

In [54]:
translate_word("dog")

''

In [55]:
# Subtask 2

def translate(en_sentence):

    # write Python code that translates the English sentence into Latvian
    lv_sentence = ""

    return lv_sentence

In [56]:
translate("elephant is verry big")

''

---

## Sets

- unordered
- unique members only
- curly braces {3, 6, 7}
 - like dictionaries but with keys only

https://realpython.com/python-sets/
 
https://www.hackerearth.com/practice/python/working-with-data/set/tutorial/

Both dictionaries and sets are useful but you will probably use dictionaries more often than sets.

In [57]:
# a set of numbers
s1 = {3, 6, 7, 3, 3, 6}

s1

{3, 6, 7}

In [58]:
s1 = set([3, 6, 7, 3, 3, 6])
s1

{3, 6, 7}

In [59]:
# a set may contain many things
s2 = {"a", "set", "of", "words", "and", "more", "words"}

s2

{'a', 'and', 'more', 'of', 'set', 'words'}

In [60]:
# words contain characters, let's make a set of them
my_str = "Glāžšķūņa rūķīši dzērumā čiepj Baha koncertflīģeļu vākus"
my_str = my_str.lower()

s3 = set(my_str)

In [100]:
# my_str is a pangram!
# it contains all characters of the Latvian alphabet
#  - https://en.wikipedia.org/wiki/Pangram

In [61]:
s3

{' ',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u',
 'v',
 'z',
 'ā',
 'č',
 'ē',
 'ģ',
 'ī',
 'ķ',
 'ļ',
 'ņ',
 'š',
 'ū',
 'ž'}

---

### Exercise: Counting Letters

- Count the number of times each letter appears in my_str.
- Print the result with letters sorted alphabetically.

See if you can re-use the functions defined earlier in this notebook.

In [63]:
my_str = "Glāžšķūņa rūķīši dzērumā čiepj Baha koncertflīģeļu vākus"

In [64]:
def simbolu_skaits(teikums):

    # write Python code for counting character frequency
    word_freq = {}
    
    # ...

    print(word_freq)

In [65]:
simbolu_skaits(my_str)

{}


---

### Sets (continued...)

- issubset
- issuperset

In [66]:
s1

{3, 6, 7}

In [67]:
numset = set(range(10))
print(numset)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


In [68]:
# check if a value is IN a set:
9 in numset

True

In [69]:
9 in s1

False

In [70]:
# let's see methods we can use on sets
dir(set)

['__and__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

In [71]:
help(set)

Help on class set in module builtins:

class set(object)
 |  set() -> new empty set object
 |  set(iterable) -> new set object
 |
 |  Build an unordered collection of unique elements.
 |
 |  Methods defined here:
 |
 |  __and__(self, value, /)
 |      Return self&value.
 |
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x.
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __iand__(self, value, /)
 |      Return self&=value.
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __ior__(self, value, /)
 |      Return self|=value.
 |
 |  __isub__(self, value, /)
 |      Return self-=value.
 |
 |  __iter__(self, /)
 |      Implement iter(self).
 |
 |  __ixor__(self, value, /)
 |      Return self^=value.
 |
 |  __l

In [72]:
help(set.intersection)

Help on method_descriptor:

intersection(...) unbound builtins.set method
    Return the intersection of two sets as a new set.

    (i.e. all elements that are in both sets.)



In [73]:
# check if one set is a subset of the other
s1.issubset(numset)

True

In [74]:
# numset is a superset of s1
numset.issuperset(s1)

True

---

set operations:
- difference 
- intersection
- symmetric_difference
- union

In [75]:
s1

{3, 6, 7}

In [76]:
s4 = {1, 2, 3}

In [77]:
# elements that are in any of these sets
s1.union(s4)

{1, 2, 3, 6, 7}

In [78]:
# set difference = elements that are in s1 AND are not in s4
s1.difference(s4)

{6, 7}

In [79]:
s4.difference(s1)

{1, 2}

In [80]:
s1.symmetric_difference(s4)

{1, 2, 6, 7}

In [81]:
s1.intersection(s4)

{3}

In [82]:
s1

{3, 6, 7}

In [83]:
# are these sets disjoint (have no elements in common)?
s1.isdisjoint(numset)

False

In [84]:
s1.isdisjoint({-12, 0})

True

---

### What other things we can do with sets?

- remaining set operations
- can we do mathematical operators on sets?
 - `+, -, ...`

Let's try:

In [85]:
s1 - s4

{6, 7}

In [86]:
s4 - s1

{1, 2}

In [87]:
s1.union(s4)

{1, 2, 3, 6, 7}

In [88]:
s1

{3, 6, 7}

In [89]:
s1.update(s4)

In [90]:
# set s1 has changed:
s1

{1, 2, 3, 6, 7}

---

### Example: Comparing Python method names

- We can use `dir()` to get a list of methods for python data types `list` and `str`. 
- Next, we can use set operations to compare these lists

In [91]:
list_list = dir(list)

list_list

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [92]:
# Here we use something called "list comprehension".

list_list2 = [item for item in list_list if "__" not in item]

list_list2

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [93]:
list_set = set(list_list2)

list_set

{'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort'}

In [94]:
# Let's do the same with string method list
str_list = [item for item in dir(str) if "__" not in item]

str_list

['capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [95]:
str_set = set(str_list)

In [96]:
# set intersection will give us a list of methods
# available for both lists and text strings

str_set.intersection(list_set)

{'count', 'index'}

In [97]:
list_set - str_set

{'append',
 'clear',
 'copy',
 'extend',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort'}

In [98]:
str_set - list_set

{'capitalize',
 'casefold',
 'center',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill'}

In [99]:
help(str.join)

Help on method_descriptor:

join(self, iterable, /) unbound builtins.str method
    Concatenate any number of strings.

    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.

    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'



---

We found out that `list` and `str` have 2 methods in common:
- index()
- count()

Next: try to find out what methods are common for `list` and `dict`.

## Additional information

### Topic 1 - dictionaries

- [Dictionaries official documentation](https://docs.python.org/3/library/stdtypes.html#dict)
- [Automate the boring stuff with Python: Dictionaries](https://automatetheboringstuff.com/2e/chapter5/)

### Topic 2 - sets

- [Sets official documentation](https://docs.python.org/3/library/stdtypes.html#set)