## **Code playground for SDA sem 11**

# Set

In [1]:
# We do not need to import anything

s = set()

s.add(1) # O(1)
s.add(2) # O(1)
s.add(3) # O(1)

print(s)

s.remove(2) # O(1)

print(s)

{1, 2, 3}
{1, 3}


Note: Elements in the set will not be sorted. Generally the usage of the structure is to check whether we have an element in the set.

In [2]:
import random
lst = [random.randint(1, 1_000_000_000) for _ in range(10_000_000)]
s = set(lst) # O(N = 10_000_000)
len(s) # O(1)

9950263

We have added almost 10M random numbers into a set. Now let's see how long it takes to check if a number is in it.

In [3]:
%timeit 123456 in s # O(1)

18.6 ns ± 0.228 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


Nice, looks like it is fast, but how much faster than searching in a list is it?

In [4]:
%timeit 123456 in lst # O(N)

57.5 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


1ms = 1,000,000ns meaning that it is roughly 500,000 times faster! 

In [5]:
lst = [random.randint(1, 1_000_000) for _ in range(1_000_000)]
print(len(lst))
lst = list(set(lst)) # O(N) - classic way of deduplicating a list
print(len(lst))

1000000
632558


# Dict

In [6]:
d = {
    20: "Preso",
    30: "Tedo",
}

print(list(d.keys())) # Getting all the keys
print(list(d.values())) # Getting all the values
print(list(d.items())) # Getting all the keys and values as a list of tuples

[20, 30]
['Preso', 'Tedo']
[(20, 'Preso'), (30, 'Tedo')]


In [7]:
print(d[30]) # O(1)
print(10 in d) # O(1) -> Checks the keys of the dictionary (since they are hashed)
print(d.get(10)) # (1) -> Safe way to get element instead of [], it returns None if the element is not present
print(d.get(10, "This is a default value that gets returned if the key is not found")) # (1) -> We can also set a default value

Tedo
False
None
This is a default value that gets returned if the key is not found


What objects can be put inside a dict (as keys) and inside a set? We can only put immutable and hashable objects inside the hashmap data structures.

In [8]:
s = set()
s.add([1,2,3]) # Lists are mutable

TypeError: unhashable type: 'list'

In [8]:
s = set()
s.add((1,2,3)) # Tuples are immutable

In [9]:
s = set()

lst = [1,2,3]
lst_s = ','.join(map(str, lst)) # Convert the list to string

s.add(lst_s) # Hash it

print(s)

{'1,2,3'}


Let's check how our tuple from above is being hashed by it's *\_\_hash\_\_* dunder method

In [10]:
obj = (1,2,3)
num = 529344067295497451

print(obj.__hash__())
print(num.__hash__()) # -> Two objects with the same hash, but the builtin python dict and set can deal with collision

529344067295497451
529344067295497451


In [15]:
lst = [1,2,3]

print(lst.__hash__) # Doesn't have a hash method

None


In [12]:
obj = 3
obj.__hash__()

3

In [13]:
num = int(1e20)
print(num)
print(num.__hash__())

100000000000000000000
848750603811160107


In [14]:
s1 = 'http://stackoverflow.com'
print(s1.__hash__())
s2 = 'http://stackoverflow.com'
print(s2.__hash__())

1723742387674067882
1723742387674067882
