### Introduction to Python Sets

One of the most common operations to perform on a `set` is to test for membership. <br>
Imagine we had a set of `farm_animals` ... we could test to see if `cow` was a member of that set of farm animals.

**Elements of a set are unique** ... so if we were to use set `union` on two different sets, any members that appeared in both sets, would show only once in the union set <br>
A benefit of this, for example, is if converting a list to a set, all duplicate values would be removed by default

We can use a set `intersection` to check for common members across sets. <br>
Whilst using set `difference` to subtract one set from another, aka remove certain values from one set that appear in another set. <br> 
We can also check for which members exist in either or set, but NOT both, by using `symmetric difference` 

In [1]:
# create set 1 - farm animals 
farm_animals = {'cow', 'sheep', 'hen', 'goat', 'horse'} 
print(farm_animals) 

{'goat', 'cow', 'sheep', 'hen', 'horse'}


notice how the print out above, shows that these elements in a `set` do not have an explicit order like how they were typed in 

so, the key thing here is, Python does not notice the order of a set, so any two sets, so long as they have the same elements, would be equal (==), regardless of their order. That is different to Lists & Tuples. If two lists have the same elements, but in a different order, Python does not consider them equal.

see below example

In [2]:
list_1 = {'dan', 'dave', 'mark'}
list_2 = {'dave', 'mark', 'dan'} 
if list_1 == list_2:
    print("list_1 is equal to list_2") 
else:
    print("list_1 and list_2 are NOT equal") 

set_1 is equal to set_2


In [3]:
list_1 = ['dan', 'dave', 'mark']
list_2 = ['dave', 'mark', 'dan'] 
if list_1 == list_2:
    print("list_1 is equal to list_2") 
else:
    print("list_1 and list_2 are NOT equal") 

list_1 and list_2 are NOT equal


Note, we can also create a set by passing an iterable string to the set() method. <br> 
Such as set("12345") seen below 

In [6]:
x = set("12345") 
print(x) 

{'1', '4', '3', '2', '5'}


In [7]:
# creating a set of even numbers between 0 & 19 
y = set(range(0, 20, 2)) 
print(y) 

{0, 2, 4, 6, 8, 10, 12, 14, 16, 18}


When we want to test `set` membership, we can do so using the `in` method, similar to lists etc 

In [8]:
if 10 in y:
    print('10 is in our set') 
else:
    print("10 is NOT in our set") 

10 is in our set


so, why is it faster to search `in` a set, rather than a list?

Much like dictionaries (key-value pairs), a set uses hash codes in the background. <br> 
When using a list and searching in it, Linear Search is used. Aka, python has to check each element of the list one by one till it finds the element its searching for, or has searched the whole list and found no element at all. <br> 
Hash code on the otherhand let's you go directly to an item in the hash table. There is a small overhead while the hash code is calculated, but once it is done, access is very fast. <br>
You can check if a value is in a set of 1 billion items just as quick as a set of 5 items. The size of the set has no effect on the time taken to search it. 

### `Adding` items to a set 

In [12]:
# create an empty set 
numbers = set() 
print(type(numbers)) # should show class as 'set' 

# add to the set 
numbers.add(1) 
numbers.add(10) 
print(numbers) 

<class 'set'>
{1, 10}


### Using `sets` to remove duplicate values 

In [2]:
# imagine a fake data set capturing the colours of cars that pass by in a 5 minute window 
data = ["blue", "red", "blue", "green", "red", "blue", "white"] 

# create a set from the list, which will remove duplicates by default 
unique_data = set(data) 
print(unique_data)   # notice the colours are NOT duplicated anymore (remember, sets have no order, so need to be sorted) 

print(sorted(unique_data)) 

{'blue', 'white', 'red', 'green'}
['blue', 'green', 'red', 'white']


In [5]:
# we could actually use a dictionary as well to get a unique list, as a dict holds one distinct key at a time, 
# replacing any previous entry, but holding the original insert sequence

# so, in this example, it should be >>> blue, red, green, white   as the list of unique colours, in the order they first enter the dict 
unique_data_2 = list(dict.fromkeys(data)) 
unique_data_2

['blue', 'red', 'green', 'white']

### `Removing` items from a set 

In [9]:
small_ints = set(range(0,21)) 
print(small_ints) # even though it may print out in order ... remember, it is NOT actually ordered! sets never are, unless you specifically take an action to order after

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}


In [10]:
# to clear a set's contents ... you'll see the empty "set()""
small_ints.clear() 
print(small_ints) 

set()


In [11]:
# now try discard & remove options 
small_ints = set(range(0,21)) 
small_ints.discard(10) 
print(small_ints) 


small_ints.remove(11)
print(small_ints) 

# as you can see from the output below, both can remove items from the set 

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20}


In [None]:
# note, however, `discard` and `remove` have different behaviour 
# discard() will not error if the number doesn't exist, simply put, nothing will happen
# but remove() will error if number doesnt exist .. like so 
small_ints.discard(99)
small_ints.remove(99) 

### The `pop` method 

- seen that pop is used in dictionaries 
- it can also be used in a list 
- when popping items from a set, there's a slight difference. Sets aren't indexable, so the set pop method doesn't take any arguments. So it pops an arbitrary item from the set, and returns that item

A use case for the `pop` method may be you are processing a set of tasks and care not which order they are performed in. <br>
As such, you could pop them from a set, and things would work fine.

In [2]:
trial_patients = {"Denise", "Eddie", "Frank", "Georgia", "Kenny"} 


while trial_patients:      # aka while trial_patients still has something inside of it and is not Null
    patient = trial_patients.pop()   # will select one of the patients at random, popping them from the set, looping until none left 
    print(patient) 
    print(trial_patients) 
    print("-----------------------------------------------")   # as you can see from the output, a name is "popped" from the set each time

Frank
{'Denise', 'Georgia', 'Kenny', 'Eddie'}
-----------------------------------------------
Denise
{'Georgia', 'Kenny', 'Eddie'}
-----------------------------------------------
Georgia
{'Kenny', 'Eddie'}
-----------------------------------------------
Kenny
{'Eddie'}
-----------------------------------------------
Eddie
set()
-----------------------------------------------


### `Union` of sets

lets practice unioning two sets together, we will do this with `farm_animals` and `wild_animals` 

In [3]:
farm_animals = {'cow', 'sheep', 'hen', 'goat', 'horse'} 
wild_animals = {'lion', 'elephant', 'tiger', 'goat', 'panther', 'horse'} 


all_animals = farm_animals.union(wild_animals) 
print(all_animals)   # notice, no duplicates as expected, just one unique value, when there was duplication across sets 

{'lion', 'cow', 'tiger', 'goat', 'hen', 'sheep', 'panther', 'elephant', 'horse'}


### `Update` of sets 

Update provides a mechanism to union sets together, but modifying the first one one, rather than creating a `new` set <br> 
like so:

In [4]:
farm_animals = {'cow', 'sheep', 'hen', 'goat', 'horse'} 
wild_animals = {'lion', 'elephant', 'tiger', 'goat', 'panther', 'horse'} 

farm_animals.update(wild_animals) 
print(farm_animals) # you can now see, this set has been updated to include the wild animals (and any dupes removed)

{'lion', 'cow', 'tiger', 'goat', 'hen', 'sheep', 'panther', 'elephant', 'horse'}


### set `intersection` 

In [7]:
evens = set(range(0, 50, 2)) 
odds = set(range(1 ,50, 2)) 

print(evens)
print(odds) 

# prime numbers between 0 & 50 
primes = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47} 
print(primes) 

# perfect squares between 0 & 50 
squares = {1, 4, 9, 16, 25, 36, 49} 
print(squares) 

print("----------------------------------------------")

# find the odd perfect squares using intersection between the two sets 
odd_squares = squares.intersection(odds) 
print(odd_squares) 

{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48}
{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49}
{2, 3, 5, 37, 7, 41, 11, 43, 13, 47, 17, 19, 23, 29, 31}
{1, 4, 36, 9, 16, 49, 25}
----------------------------------------------
{1, 25, 9, 49}


### set `difference` 

In [8]:
evens = set(range(0, 50, 2)) 
odds = set(range(1 ,50, 2)) 

print(evens)
print(odds) 

# prime numbers between 0 & 50 
primes = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47} 
print(primes) 

# perfect squares between 0 & 50 
squares = {1, 4, 9, 16, 25, 36, 49} 
print(squares) 

print("----------------------------------------------")

# find the perfect squares which are NOT odd, using difference 
even_only_squares = squares.difference(odds) 
print(even_only_squares) 

{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48}
{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49}
{2, 3, 5, 37, 7, 41, 11, 43, 13, 47, 17, 19, 23, 29, 31}
{1, 4, 36, 9, 16, 49, 25}
----------------------------------------------
{16, 4, 36}


### set `symmetric difference` 

- this is the opposite of the intersection
- it produces items in one set or another, but not in both 

In [10]:
odds = set(range(1 ,50, 2)) 
print(odds) 

# prime numbers between 0 & 50 
primes = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47} 
print(primes) 

print("----------------------------------------")

odds_not_prime = odds.symmetric_difference(primes)
print(odds_not_prime) 

{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49}
{2, 3, 5, 37, 7, 41, 11, 43, 13, 47, 17, 19, 23, 29, 31}
----------------------------------------
{1, 2, 9, 15, 21, 25, 27, 33, 35, 39, 45, 49}


end