**What you learn:**

In this notebook you will learn about basic data structures in Python. This includes sets and maps (dictionaries).

Based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.

Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

This notebook is available on https://github.com/BigDataAnalyticsGroup/python.

## Set

a set is an unordered, duplicate-free collection of items

In [1]:
a_list = [42, 9, 53, 7, 9] 
a_list, type(a_list)

([42, 9, 53, 7, 9], list)

In [2]:
a_set = {42, 9, 53, 7, 9}
a_set, type(a_set)

({7, 9, 42, 53}, set)

In [3]:
dir(a_set)

['__and__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

In [4]:
a_set.add(43)
a_set

{7, 9, 42, 43, 53}

In [5]:
a_set.remove(43)
a_set

{7, 9, 42, 53}

In [6]:
a_set

{7, 9, 42, 53}

In [7]:
a_set.pop() # remove an arbitrary element from the set

7

In [8]:
a_list

[42, 9, 53, 7, 9]

In [9]:
# you can convert a list to a set:
conv = set(a_list)
conv, type(conv)

({7, 9, 42, 53}, set)

In [10]:
a_set

{9, 42, 53}

In [11]:
# and vice versa:
# you can convert a list to a set:
conv = list(a_set)
conv, type(conv)

([9, 42, 53], list)

## Map (aka Dictionaries): key-value pairs

A dictionary (aka map) is an unordered collection of keys that are mapped to values, the values mapped to may contain duplicates.

Initialize a dictionary using curly brackets `{}`:

In [12]:
d = {} # empty dictionary
d[1] = "foo" # add a key-value mapping by using []-bracket (key).
d[7] = "bar"
d[3] = "blubb"
d

{1: 'foo', 7: 'bar', 3: 'blubb'}

In [13]:
d['KI'] = 'AI'
d

{1: 'foo', 7: 'bar', 3: 'blubb', 'KI': 'AI'}

In [14]:
d['KI'] 

'AI'

In [15]:
#notice that the type of {} is dictionary and NOT set (this is for historic reasons)
type(set())

set

list all keys present in the dictionary:

In [16]:
list(d.keys())

[1, 7, 3, 'KI']

In [17]:
#wordcount_map = {} # create a new, empty dict
wordcount_map = {"anchor":2, "dock":3} # create a new dict and add key-values
wordcount_map

{'anchor': 2, 'dock': 3}

In [18]:
# add keys and values:
wordcount_map["the"] = 10
wordcount_map["a"] = 8
wordcount_map["boat"] = 1
wordcount_map

{'anchor': 2, 'dock': 3, 'the': 10, 'a': 8, 'boat': 1}

In [19]:
print(wordcount_map["the"]) # value of a key
print(list(wordcount_map.keys())) # List of keys
print(list(wordcount_map.values())) # List of values

10
['anchor', 'dock', 'the', 'a', 'boat']
[2, 3, 10, 8, 1]


In [20]:
print("a" in wordcount_map) # True

True


In [21]:
mySet = {3,7,2,5}


In [22]:
42 in mySet

False

In [23]:
print(list(wordcount_map.items())) #prints tuples of key-value pairs

[('anchor', 2), ('dock', 3), ('the', 10), ('a', 8), ('boat', 1)]


In [24]:
print(wordcount_map["dock"]) # throws a KeyError

3


In [25]:
dir(mySet)

['__and__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']