# Sets

**A set is a collection of unordered data, like dictionaries, and there is no way to retrieve a specific item by index position. Most importantly, its unique characteristic is that a set can only contain unique values, i.e. duplicates are removed. Sets are most commonly used to test for membership. The union of two or more sets means combining all unique elements.**

**There are several ways to create a set - literally, using `set()` function on a list or tuple, and loop over iterable to create a set.**

In [1]:
farm_animals = {'cow', 'sheep', 'hen', 'goat', 'horse'}

print(farm_animals)

{'goat', 'sheep', 'hen', 'cow', 'horse'}


In [2]:
# You can iterate over a set

for animal in farm_animals:
    print(animal)

goat
sheep
hen
cow
horse


**Python is capable of comparing items in sets to see if they are the same or not, even if the order of insertion is different, i.e. unordered data. Lists will only compare equal if they contain the same items, in the same order. This is an advantage of using sets over lists or tuples.**

In [3]:
farm_animals_2 = {'sheep', 'goat', 'cow', 'horse', 'hen'}

if farm_animals_2 == farm_animals:
    print("The sets are equal")
else:
    print("The sets are not equal")

The sets are equal


## Advantage of using set

**Sets can be useful in identifying bugs in your code.** 

In [4]:
# Program crashes if you enter nothing, i.e. press ENTER without inserting any values

activities = [
    (1, "Learn Python"), 
    (2, "Clean the house"), 
    (3, "Go swimming"), 
    (4, "Eat lunch"), 
    (5, "Go to sleep"), 
    (0, "EXIT")
]

choice = 99

while choice != 0:
    # Use set for input options instead of list
    if choice in {1, 2, 3, 4, 5}:
        print(f"You chose to {activities[choice - 1][1].casefold()}")
    else:
        print("Choose from options below:")
        for i, activity in activities:
            print(f"{i}: {activity}")
        
    choice = int(input())

Choose from options below:
1: Learn Python
2: Clean the house
3: Go swimming
4: Eat lunch
5: Go to sleep
0: EXIT
1
You chose to learn python
99
Choose from options below:
1: Learn Python
2: Clean the house
3: Go swimming
4: Eat lunch
5: Go to sleep
0: EXIT
3
You chose to go swimming
0


**You could have used a list for the input options, but Python accesses items in a set much faster than in a list. This is because sets use hashes to store their items, like the keys in a dictionary, so it works much faster than a list when comparing or searching items in a sequence.** 

**Another useful tip is that you can use `set()` function to find the unique characters in a string, i.e. remove duplicate characters:**

In [5]:
set("supercalifragilisticexpealidocious")

{'a', 'c', 'd', 'e', 'f', 'g', 'i', 'l', 'o', 'p', 'r', 's', 't', 'u', 'x'}

In [6]:
set("83442530985300123")

{'0', '1', '2', '3', '4', '5', '8', '9'}

**Another handy trick is extracting the unique values from a set to convert to keys for a dictionary:**

In [7]:
farm_animals = {'cow', 'sheep', 'hen', 'goat', 'horse'}

farm_animals_data = dict.fromkeys(farm_animals)

In [8]:
# Note that values automatically set to None

print(farm_animals_data)

{'goat': None, 'sheep': None, 'hen': None, 'cow': None, 'horse': None}


## Add items to an existing Set

**It is easy to add items to an existing set with `add()` method, but NEVER attempt to initialize an empty set literally, like `set = {}` - Python gets confused between an empty set and empty dictionary. Use empty `set()` function instead.**

In [9]:
numbers = {1, 2, 3, 4}

numbers.add(5)

In [10]:
print(numbers)

{1, 2, 3, 4, 5}


In [11]:
# Add numbers to empty set

numbers = set()

while len(numbers) < 4:
    next_value = int(input("Insert integer: "))
    numbers.add(next_value)

Insert integer: 1
Insert integer: 2
Insert integer: 3
Insert integer: 4


In [12]:
# Note that set items are not sorted

print(numbers)

{1, 2, 3, 4}


In [13]:
# You can use sorted() function that returns a list

print(sorted(numbers))

[1, 2, 3, 4]


## Delete items from a Set

**You can add, update and delete items in a set.**

**There are three ways to delete an item:**

1. **`clear()` method deletes all the items in a set.**
2. **`remove()` method deletes an individual item by its value, and will raise an exception if item does not exist.**
3. **`discard()` method deletes an individual item by its value.**

In [14]:
small_ints = set(range(21))

# Integers appear printed sorted but don't be fooled!

print(small_ints)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}


In [15]:
small_ints.clear()

print(small_ints)

set()


In [16]:
small_ints = set(range(21))

small_ints.discard(10)

# Note that 10 has been deleted

print(small_ints)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}


In [17]:
small_ints.remove(11)

# Note that 11 has been deleted

print(small_ints)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20}


In [18]:
# No error if value doesn't exist

small_ints.discard(30)

In [19]:
# Raise error if value doesn't exist

small_ints.remove(30)

KeyError: 30

In [20]:
print(small_ints)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20}


**When working with mixed collection datatypes, they can be looped over like any iterable or sequence.**

In [22]:
# Dictionary
travel_mode = {
    "1": "car", 
    "2": "plane"
}

# Set items to pack
items = {
    "can opener", 
    "fuel", 
    "jumper", 
    "knife", 
    "matches", 
    "razor blades", 
    "razor", 
    "scissors", 
    "shampoo", 
    "shaving cream", 
    "shirts (3)", 
    "shorts", 
    "sleeping bag(s)", 
    "soap", 
    "socks (3 pairs)", 
    "stove", 
    "tent", 
    "mug", 
    "toothbrush", 
    "toothpaste", 
    "towel", 
    "underwear (3 pairs)", 
    "water carrier"
}

# Set items that cannot be carried on plane
restricted_items = {
    "catapult",
    "fuel",
    "gun",
    "knife",
    "razor blades",
    "scissors",
    "shampoo"
}

**Remove restricted items from the packing set if travelling by plane, i.e. option "2" in dictionary (you can take everything if travelling by car). You don't care whether the packing list contains restricted items, as long as they are removed, so use `discard()` method. If the item is not in the packing list, an error is raised with `remove()` method. Note that you can also use the difference between the sets.**

In [23]:
print("Please choose your mode of travel:")
for key, value in travel_mode.items():
    print(f"{key}: {value}")

mode = "-"

while mode not in travel_mode:
    mode = input("> ")
    

if mode == "2":
    # Travel by plane, remove restricted items
    for item in restricted_items:
        items.discard(item)

# Print packing list
print("You need to pack:")

for item in sorted(items):
    print(item)

Please choose your mode of travel:
1: car
2: plane
> 2
You need to pack:
can opener
jumper
matches
mug
razor
shaving cream
shirts (3)
shorts
sleeping bag(s)
soap
socks (3 pairs)
stove
tent
toothbrush
toothpaste
towel
underwear (3 pairs)
water carrier


**NOTE: When using `remove()` method, you need to know that the item to be removed will always be present.**

**Lets say you have data of medication details for a patient, e.g. 'corticosteroid', 'warfarin', 'aldesterone' etc. There cannot be duplicate entries of each medication, otherwise you would be overdosing the patient! So it would make sense to store the patient prescriptions in sets. Medicines are constantly evolving in science, so if you need to change 'warfarin' to 'edoxaban', the patients need to have their details updated. Therefore, the code needs to check whether any patient is taking 'warfarin' in the first place and then discard it. In this case, you would use `discard()` method. If *every* patient in the list is supposed be taking 'warfarin', then use `remove()` method to crash program in case it is not part of a set.** 

In [24]:
# Tuples of drugs for medication

amlodipine = ("amlodipine", "Blood pressure")
buspirone = ("buspirone", "Anxiety disorders")
carbimazole = ("carbimazole", "Antithyroid agent")
citalopram = ("citalopram", "Antidepressant")
edoxaban = ("edoxaban", "anti-coagulant")
erythromycin = ("erythromycin", "Antibiotic")
lusinopril = ("lusinopril", "High blood pressure")
metformin = ("metformin", "Type 2 diabetes")
methotrexate = ("methotrexate", "Rheumatoid arthritis")
paracetamol = ("paracetamol", "Painkiller")
propranol = ("propranol", "Beta blocker")
simvastatin = ("simvastatin", "High cholesterol")
warfarin = ("warfarin", "anti-coagulant")


# Patient prescriptions - dictionary, where values are sets containing tuples above

patients = {
    "Anne": {methotrexate, paracetamol},
    "Bob": {carbimazole, erythromycin, methotrexate, paracetamol},
    "Charley": {buspirone, lusinopril, metformin},
    "Denise": {amlodipine, lusinopril, metformin, warfarin},
    "Eddie": {amlodipine, propranol, simvastatin, warfarin},
    "Frank": {buspirone, citalopram, propranol, warfarin},
    "Georgia": {carbimazole, edoxaban, warfarin},
    "Helmut": {erythromycin, paracetamol, propranol, simvastatin},
    "Izabella": {amlodipine, citalopram, simvastatin, warfarin},
    "John": {simvastatin},
    "Kenny": {amlodipine, citalopram, metformin},
}

In [25]:
# Select patients that take warfarin
trial_patients = ["Denise", "Eddie", "Frank", "Georgia", "Izabella"]

# Iterate over dictionary and replace warfarin with edoxaban
for patient in trial_patients:
    prescription = patients[patient]
    prescription.remove(warfarin)
    prescription.add(edoxaban)
    print(patient, prescription)
    print()

Denise {('edoxaban', 'anti-coagulant'), ('lusinopril', 'High blood pressure'), ('amlodipine', 'Blood pressure'), ('metformin', 'Type 2 diabetes')}

Eddie {('simvastatin', 'High cholesterol'), ('edoxaban', 'anti-coagulant'), ('propranol', 'Beta blocker'), ('amlodipine', 'Blood pressure')}

Frank {('buspirone', 'Anxiety disorders'), ('edoxaban', 'anti-coagulant'), ('citalopram', 'Antidepressant'), ('propranol', 'Beta blocker')}

Georgia {('edoxaban', 'anti-coagulant'), ('carbimazole', 'Antithyroid agent')}

Izabella {('simvastatin', 'High cholesterol'), ('edoxaban', 'anti-coagulant'), ('citalopram', 'Antidepressant'), ('amlodipine', 'Blood pressure')}



In [31]:
for key, value in patients.items():
    print(key, value, sep='\n')

Anne
{('methotrexate', 'Rheumatoid arthritis'), ('paracetamol', 'Painkiller')}
Bob
{('methotrexate', 'Rheumatoid arthritis'), ('paracetamol', 'Painkiller'), ('erythromycin', 'Antibiotic'), ('carbimazole', 'Antithyroid agent')}
Charley
{('metformin', 'Type 2 diabetes'), ('lusinopril', 'High blood pressure'), ('buspirone', 'Anxiety disorders')}
Denise
{('edoxaban', 'anti-coagulant'), ('lusinopril', 'High blood pressure'), ('amlodipine', 'Blood pressure'), ('metformin', 'Type 2 diabetes')}
Eddie
{('simvastatin', 'High cholesterol'), ('edoxaban', 'anti-coagulant'), ('propranol', 'Beta blocker'), ('amlodipine', 'Blood pressure')}
Frank
{('buspirone', 'Anxiety disorders'), ('edoxaban', 'anti-coagulant'), ('citalopram', 'Antidepressant'), ('propranol', 'Beta blocker')}
Georgia
{('edoxaban', 'anti-coagulant'), ('carbimazole', 'Antithyroid agent')}
Helmut
{('propranol', 'Beta blocker'), ('paracetamol', 'Painkiller'), ('erythromycin', 'Antibiotic'), ('simvastatin', 'High cholesterol')}
Izabella


**If you had added a patient who didn't take warfarin? An error would be raised because you used `remove()` method. This is good because you don't want to give edoxaban to everyone...just those who take warfarin. You could have used `if` statement to check whether warfarin is in the prescription in the first place, but it increases computation time greatly.**

### You REPLACE an item by removing the original item and adding a new one.

**You can use the `pop()` method with sets, but you cannot pass a value argument otherwise an error is raised. Instead, it removes and returns an item randomly.**

In [32]:
trial_patients = {"Denise", "Eddie", "Frank", "Georgia", "Kenny"}

# As long as set is not empty
while trial_patients:
    patient = trial_patients.pop()
    print(patient)

Kenny
Georgia
Denise
Eddie
Frank


In [33]:
# Remember that pop() removes the items also

print(trial_patients)

set()


## Union of Sets

**Create a new set containing all the unique items from each set, using `union()` method or `|` operator.**

In [1]:
farm_animals = {'cow', 'sheep', 'hen', 'goat', 'horse'}

wild_animals = {"lion", "elephant", "tiger", "goat", "panther", "horse"}

In [2]:
# Use union() method

all_animals = farm_animals.union(wild_animals)

print(all_animals)

{'hen', 'lion', 'panther', 'elephant', 'sheep', 'tiger', 'horse', 'goat', 'cow'}


In [17]:
# Use | operator, and unpacking * operator when printing output

all_animals_2 = farm_animals | wild_animals

print(*all_animals_2)

hen lion panther elephant sheep tiger horse goat cow


In [4]:
len(all_animals_2)

9

**Neither union method is better or worse, it is up to your coding preference. Both methods output elements in alphabetical order. There are many use-cases for the union of sets, e.g. you need to collect all the unique data to understand the scope of something.**

**Using the tuples of drugs in the section above, you have a separate list of sets below that contain drugs with adverse reactions when paired together. Create a union of all the sets in the list.**

In [7]:
# Drugs that shouldn't be taken together (list of sets, i.e. subsets)

adverse_interactions = [
    {metformin, amlodipine},
    {simvastatin, erythromycin},
    {citalopram, buspirone},
    {warfarin, citalopram},
    {warfarin, edoxaban},
    {warfarin, erythromycin},
    {warfarin, amlodipine}
]

**You can also use the `update()` method with 'unpacking' operator (`*`) to unite multiple sets by modifying an empty set, and create a new set. Really, the method should be called 'union_update' since it only applies to unions of sets...**

In [8]:
meds_to_watch = set()

for interaction in adverse_interactions:
    #meds_to_watch = meds_to_watch | interaction ---> USING MATH OPERATOR
    #meds_to_watch = meds_to_watch.union(interaction) ------> USING UNION METHOD
    meds_to_watch.update(interaction)
    
# Print unique elements in list format
print(sorted(meds_to_watch))

[('amlodipine', 'Blood pressure'), ('buspirone', 'Anxiety disorders'), ('citalopram', 'Antidepressant'), ('edoxaban', 'anti-coagulant'), ('erythromycin', 'Antibiotic'), ('metformin', 'Type 2 diabetes'), ('simvastatin', 'High cholesterol'), ('warfarin', 'anti-coagulant')]


**The three lines below do the same thing as the For loop above, using the 'unpacking' operator `*`:**

In [15]:
meds_to_watch = set()
meds_to_watch.update(*adverse_interactions)

# Print unique elements 'unpacked'
print(*sorted(meds_to_watch))

('amlodipine', 'Blood pressure') ('buspirone', 'Anxiety disorders') ('citalopram', 'Antidepressant') ('edoxaban', 'anti-coagulant') ('erythromycin', 'Antibiotic') ('metformin', 'Type 2 diabetes') ('simvastatin', 'High cholesterol') ('warfarin', 'anti-coagulant')


In [12]:
print(*sorted(meds_to_watch), sep='\n')

('amlodipine', 'Blood pressure')
('buspirone', 'Anxiety disorders')
('citalopram', 'Antidepressant')
('edoxaban', 'anti-coagulant')
('erythromycin', 'Antibiotic')
('metformin', 'Type 2 diabetes')
('simvastatin', 'High cholesterol')
('warfarin', 'anti-coagulant')


**EXERCISES - Experiment with more than two sets**

In [18]:
scorpions = {"emperor", "red claw", "arizona", "forest", "fat tail"}
snakes = {"python", "cobra", "viper", "anaconda", "mamba"}
spiders = {"tarantula", "black widow", "wolf spider", "crab spider"}
vespas = {"yellowjacket", "hornet", "paper wasp"}

In [20]:
creatures = scorpions.union(snakes, spiders, vespas)

print(sorted(creatures))

['anaconda', 'arizona', 'black widow', 'cobra', 'crab spider', 'emperor', 'fat tail', 'forest', 'hornet', 'mamba', 'paper wasp', 'python', 'red claw', 'tarantula', 'viper', 'wolf spider', 'yellowjacket']


In [22]:
biting_animals = snakes.union(spiders)
sting_animals = scorpions.union(vespas)

print("CREATURES THAT BITE! \n")
print(biting_animals)
print("\n")
print("CREATURES THAT STING! \n")
print(sting_animals)

CREATURES THAT BITE! 

{'wolf spider', 'anaconda', 'mamba', 'python', 'black widow', 'crab spider', 'cobra', 'viper', 'tarantula'}


CREATURES THAT STING! 

{'yellowjacket', 'hornet', 'emperor', 'arizona', 'fat tail', 'paper wasp', 'forest', 'red claw'}


In [25]:
arachnids = scorpions.union(spiders)

print("ARACHNIDS:")
print(*sorted(arachnids), sep='\n')

ARACHNIDS:
arizona
black widow
crab spider
emperor
fat tail
forest
red claw
tarantula
wolf spider


## Intersection of Sets

**Simply put, intersection is the collection of unique elements that occur in all sets, i.e. in both sets.**

In [26]:
farm_and_wild = farm_animals.intersection(wild_animals)

print("Both farm and wild animals:", farm_and_wild)

Both farm and wild animals: {'horse', 'goat'}


In [16]:
# Using & operator

farm_and_wild = farm_animals & wild_animals

print(farm_and_wild)

{'horse', 'goat'}


**You can loop over sets like iterables to create a new set, or use the `range()` function to create a set of unique integers, e.g. all even numbers between 0 and 50.**

In [28]:
evens = set(range(0, 50, 2))
print(evens)

odds = set(range(1, 50, 2))
print(odds)

{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48}
{1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49}


**Using the sets of even and odd numbers, extract the even and odd PRIME and SQUARE numbers into two new sets. The functions below generate the prime or square numbers between zero and a specified positive integer, to then return as an iterable.**

In [29]:
from typing import Generator


def squares_generator(n: int) -> Generator[int, None, None]:
    """Generator to return perfect squares less than `n`."""
    if n > 0:
        i = next_square = 1
        while next_square < n:
            yield next_square
            i += 1
            next_square = i * i




def primes_generator(n: int) -> Generator[int, None, None]:
    """
    Very naive implementation of the Sieve of Eratosthenes to generate prime numbers.

    This is *not* suitable for production use.
    For an optimised algorithm, check out the work by Tim Peters et al @ActiveState, and Will Ness.

    :param n: The number up to which generate primes.
    :return: A generator of all positive prime numbers less than or equal to `n`.
    """
    if n >= 2:
        # Start with set of positive, odd integers from 3 to `n` (inclusive)
        integers = set(range(3, n + 1, 2))
        # No point in removing multiples of 2 from odd numbers
        yield 2
        next_prime = 3
        while integers:
            yield next_prime
            # Remove all multiples of `next_prime`
            integers.difference_update(range(next_prime, n + 1, 2 * next_prime))
            next_prime = min(integers, default=None)  # None if set is empty




**The second function above was done by Tim Peters *et al* for the prime numbers generator, and can be found at the following link:**
    
https://stackoverflow.com/questions/2211990/how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python/19391111#19391111

In [32]:
# Set of square numbers under 100

squares = set(squares_generator(100))

print(sorted(squares))

[1, 4, 9, 16, 25, 36, 49, 64, 81]


In [40]:
# Set of prime numbers under 100

primes = set(primes_generator(100))

print(primes)

{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97}


In [34]:
# Intersection of odd numbers and square numbers = odd square numbers

print(odds.intersection(squares))

{1, 25, 9, 49}


In [35]:
# Intersection of odd numbers and prime number = odd prime numbers

print(odds.intersection(primes))

{3, 5, 37, 7, 41, 11, 43, 13, 47, 17, 19, 23, 29, 31}


In [36]:
# Intersection of even numbers and square numbers = even square numbers

print(evens.intersection(squares))

{16, 4, 36}


In [37]:
# Intersection of even numbers and prime number = even prime numbers

print(evens.intersection(primes))

{2}


**There is only one even prime under 50! The number 2, which makes sense since all other even numbers are divisible by 2...**

**NOTE: `if __name__ == '__main__':` block executes only if the script is the main program. It enables you to structure the code in a way that separates re-usable components, like the generator functions, from any executable scripts.**

In [39]:
if __name__ == '__main__':
    print("Squares less than 1000:")
    squares = list(squares_generator(1000))
    print(squares)
    print("Generated {} squares".format(len(squares)))
    print("\n")

    print("Primes up to 1000:")
    primes = set(primes_generator(1000))
    print(sorted(primes))
    print("Generated {} primes".format(len(primes)))
    print("\n")

    check = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
             31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
             73, 79, 83, 89, 97, 101, 103, 107, 109, 113,
             127, 131, 137, 139, 149, 151, 157, 163, 167, 173,
             179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
             233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
             283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
             353, 359, 367, 373, 379, 383, 389, 397, 401, 409,
             419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
             467, 479, 487, 491, 499, 503, 509, 521, 523, 541,
             547, 557, 563, 569, 571, 577, 587, 593, 599, 601,
             607, 613, 617, 619, 631, 641, 643, 647, 653, 659,
             661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
             739, 743, 751, 757, 761, 769, 773, 787, 797, 809,
             811, 821, 823, 827, 829, 839, 853, 857, 859, 863,
             877, 881, 883, 887, 907, 911, 919, 929, 937, 941,
             947, 953, 967, 971, 977, 983, 991, 997}
    print("Confirm list of primes is correct:", primes == check)

Squares less than 1000:
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961]
Generated 31 squares


Primes up to 1000:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971

**Based on the drugs and patients info in sections above, identify any patients included in more than one trial.**

In [41]:
trial_1 = {"Bob", "Charley", "Georgia", "John"}
trial_2 = {"Anne", "Charley", "Eddie", "Georgia"}

both_trials = trial_1.intersection(trial_2)

print(both_trials)

{'Georgia', 'Charley'}


In [46]:
trial_3 = {"Bob", "Denise", "Frank", "Georgia"}

In [49]:
all_three_trials = trial_1.intersection(trial_2, trial_3)

print(all_three_trials)

{'Georgia'}


In [50]:
all_trials = trial_1 & trial_2 & trial_3

print(all_trials)

{'Georgia'}


In [51]:
multi_trials = set()

print(multi_trials)

set()


**Using the sequences below, identify which prepositions are used in the text.**

In [52]:
text = """Education is not the learning of facts
but the training of the mind to think

– Albert Einstein"""


prepositions = {"as", "but", "by", "down", "for", "in", "of", "on", "to", "with"}


In [56]:
words = set(text.split())

preps_used = words.intersection(prepositions)

print(preps_used)

{'but', 'of', 'to'}


## Difference between Sets

**Quite literally, the set of elements belonging to one set but not the other, i.e. excluding any intersected elements.**

In [58]:
# Odd numbers that are not in prime numbers, i.e. odd numbers minus any primes

print(sorted(odds.difference(primes)))

[1, 9, 15, 21, 25, 27, 33, 35, 39, 45, 49]


In [59]:
# Use - operator 

print(odds - primes)

{1, 33, 35, 39, 9, 45, 15, 49, 21, 25, 27}


In [60]:
print(primes - odds)

{97, 2, 67, 71, 73, 79, 83, 53, 89, 59, 61}


**Set difference is useful when you want to remove a certain sequence of elements, e.g. items already bought from a set of items. In this way, you can avoid looping through a sequence to remove elements.**

In [61]:
items_bought = {"can opener", "sleeping bag(s)", "knife", "tent", "water carrier", "fuel", "matches"}

items = {
    "can opener", 
    "fuel", 
    "jumper", 
    "knife", 
    "matches", 
    "razor blades", 
    "razor", 
    "scissors", 
    "shampoo", 
    "shaving cream", 
    "shirts (3)", 
    "shorts", 
    "sleeping bag(s)", 
    "soap", 
    "socks (3 pairs)", 
    "stove", 
    "tent", 
    "mug", 
    "toothbrush", 
    "toothpaste", 
    "towel", 
    "underwear (3 pairs)", 
    "water carrier"
}

In [63]:
items_left = items.difference(items_bought)

print("ITEMS LEFT:")
print(*items_left, sep='\n')

ITEMS LEFT:
soap
toothpaste
jumper
shorts
scissors
stove
socks (3 pairs)
shampoo
mug
shirts (3)
underwear (3 pairs)
towel
razor blades
razor
shaving cream
toothbrush


**Working on a retail site, a customer can store their favourite items, as well as storing items to purchase in a shopping basket. Identify the favourite items that are not in the shopping basket, i.e. are you still interested in these items...?**

**Output the suggested results sorted alphabetically.**

In [64]:
favourites = {'door screen',
              'frying pan',
              'roller blind',
              'football',
              'coffee grinder',
              'bush hat',
              'stirling engine',
              'cachemira cd',
              'shirt',
              }


basket = {'garlic crusher',
          'stirling engine',
          'frying pan',
          'shirt',
          'bush hat',
          }

In [70]:
suggestions = sorted(favourites.difference(basket))

print(suggestions)

['cachemira cd', 'coffee grinder', 'door screen', 'football', 'roller blind']


## Symmetric Difference between Sets

**Symmetric difference is the collection of unique elements that occur in one or another set, but not both, i.e. the opposite of intersection.**

In [34]:
morning_classes = {'Java', 'C', 'Ruby', 'Lisp', 'C#'}

evening_classes = {'Python', 'C#', 'Java', 'C', 'Ruby'}

In [35]:
# Extract classes that are in morning or evening, but not both

available_classes = morning_classes.symmetric_difference(evening_classes)

print(available_classes)

{'Python', 'Lisp'}


In [36]:
# Use ^ operator instead of method

available_classes = morning_classes ^ evening_classes

print(available_classes)

{'Python', 'Lisp'}


## Subsets and Supersets

**You can use mathematical operator `<=` to check whether one set is a subset of another, or you can use `issubset()` method.**

**You can use mathematical operator `>=` to check whether one set is the superset of another, or you can use `issuperset()` method.**

In [1]:
animals = {'Turtle', 'Horse', 'Robin', 'Python', 'Swallow', 'Hedgehog', 'Wren', 'Aardvark', 'Cat'}

birds = {'Robin', 'Swallow', 'Wren'}

In [2]:
print(f"Birds is a subset of Animals - {birds.issubset(animals)}")

print(f"Animals is a superset of Birds - {animals.issuperset(birds)}")

Birds is a subset of Animals - True
Animals is a superset of Birds - True


In [3]:
print(f"Birds is a true subset of Animals - {birds < animals}")

print(f"Animals is a true superset of Birds - {animals > birds}")

Birds is a true subset of Animals - True
Animals is a true superset of Birds - True


**EXERCISE - Using a list of required skills for a job application, search a dictionary to find candidates that have the same skills.**

In [4]:
required_skills = ['python', 'github', 'linux']

candidates = {
    'anna': {'java', 'linux', 'windows', 'github', 'python', 'full stack'},
    'bob': {'github', 'linux', 'python'},
    'carol': {'linux', 'javascript', 'html', 'python', 'github'}, 
    'daniel': {'pascal', 'java', 'c++', 'github'},
    'ekani': {'html', 'css', 'github', 'python', 'linux'},
    'fenna': {'linux' 'pascal', 'java', 'c', 'lisp', 'modula-2', 'perl', 'github'}
}

In [5]:
interviewees = set()

for candidate, skills in candidates.items():
    if skills.issuperset(required_skills):
        interviewees.add(candidate)
        
        
print(interviewees)

{'bob', 'carol', 'ekani', 'anna'}


**You can check the candidates on more stringent conditions like looking for *true* supersets, i.e. candidates who have more than the required skills. Remember that mathematical operators only work on sets.**

In [7]:
interviewees = set()

for candidate, skills in candidates.items():
    if skills > set(required_skills):
        interviewees.add(candidate)
        
        
print(interviewees)

{'carol', 'ekani', 'anna'}


**Ideally, the required skills should be initialized as a set, not a list, before running the loop. This is more efficient computation.**