### Common Set Operations

As we saw in the lecture, there are certain mathematical concepts and operations associated with mathematical sets.

Python provides equivalent concepts and operators.

First, we can look at disjointedness - whether two sets have no elements in common:

In [1]:
s1 = {'a', 'b', 'c'}
s2 = {True, False}
s3 = {'a', 100, 200}

In [2]:
s1.isdisjoint(s2)

True

In [3]:
s1.isdisjoint(s3)

False

We can add an element to a set by using the `add()` method:

In [4]:
s = set()  # an empty set

In [5]:
s.add(100)

In [6]:
s.add(200)

In [7]:
s

{100, 200}

Keep in mind that set elemens are unique, so if we try to add an element that is already in a set, nothing happens:

In [8]:
s

{100, 200}

In [9]:
s.add(100)

In [10]:
s

{100, 200}

We can also remove an element from a set, using the `remove()` or `discard()` methods.

the main difference between the two methods is that `remove()` will generate a `KeyError` exception if the item we are attempting to discard is not present in the set. On the other hand `discard()` will not (and if the element does not exist is does not mutate the set at all).

Do note that both operations will (potentially) mutate the set - not create and return a new one.

In [11]:
s = set('abc')

In [12]:
s

{'a', 'b', 'c'}

In [13]:
s.remove('a')

In [14]:
s

{'b', 'c'}

In [15]:
s.discard('x')

In [16]:
s

{'b', 'c'}

In [17]:
s.remove('x')

KeyError: 'x'

Mathematical sets have this concept of subsets and supersets - both strict and non-strict.

For that, the `<`, `<=`, `>` and `>=` operators can be used.

In [18]:
s1 = set('abc')
s2 = set('abcd')

In [19]:
s1

{'a', 'b', 'c'}

In [20]:
s2

{'a', 'b', 'c', 'd'}

In [21]:
s1 <= s2

True

In [22]:
s1 < s2

True

In [23]:
s2 >= s1

True

In [24]:
s2 > s1

True

In [25]:
s1 = set('abc')
s2 = set('abc')

These two sets are equal, so both these comparisons will be `True`:

In [26]:
s1 <= s2

True

In [27]:
s2 >= s1

True

However, the struct version of these will not:

In [28]:
s1 < s2

False

In [29]:
s2 > s1

False

We also have mathematical operations such as union and intersection for sets. Python has the same, using the operators:

- union: `|` 
- intersection: '&'

We can think of these operators this way: `|` is often used to represent `or` (in languages such as C or Java), while `&` is used for `and`.

When we look at the union of two sets, we are creating a new set that contains elements that are in set 1 **or** set 2.

When we look at the intersection of two sets, we are creating a new set that contains elements that are in set 1 **and** set 2.

Let's look at some simple examples:

In [30]:
s1 = set('abc')
s2 = set('bcd')

In [31]:
s1

{'a', 'b', 'c'}

In [32]:
s2

{'b', 'c', 'd'}

In [33]:
s1 | s2

{'a', 'b', 'c', 'd'}

Notice how elements that were common to both sets are *not* repeated (set elements are unique).

In [34]:
s1 & s2

{'b', 'c'}

As you can see the resulting set only contains elements that were common to both sets (the intersection).

These sets and set operations can be very handy in a number of different ways that we'll see thoughtou the course.

Suppose we have two strings, and we want to find all the characters that are present in both strings.

In [35]:
str_1 = 'python is an awesome language!'
str_2 = 'a python is also a snake.'

In [36]:
set_1 = set(str_1)
set_2 = set(str_2)

In [37]:
set_1

{' ',
 '!',
 'a',
 'e',
 'g',
 'h',
 'i',
 'l',
 'm',
 'n',
 'o',
 'p',
 's',
 't',
 'u',
 'w',
 'y'}

In [38]:
set_2

{' ', '.', 'a', 'e', 'h', 'i', 'k', 'l', 'n', 'o', 'p', 's', 't', 'y'}

To find the elements common to both, we just need the intersection:

In [39]:
set_1 & set_2

{' ', 'a', 'e', 'h', 'i', 'l', 'n', 'o', 'p', 's', 't', 'y'}

Another example might be where we have two or more sets that contains some stock symbols different systems are tracking, and we want to compile a list of all these stock symbols.

In [40]:
s1 = {'FB', 'AMZN', 'AAPL', 'NFLX', 'GOOG', 'MSFT'}
s2 = {'BABA', 'WMT', 'COST'}
s3 = {'TSLA', 'F', 'GM'}

To get a consolidated list we could do this:

In [41]:
consolidated = s1 | s2 | s3

In [42]:
consolidated

{'AAPL',
 'AMZN',
 'BABA',
 'COST',
 'F',
 'FB',
 'GM',
 'GOOG',
 'MSFT',
 'NFLX',
 'TSLA',
 'WMT'}

To actually convert this set to a list, we can simply use the `list()` function:

In [43]:
symbols = list(s1 | s2 | s3)

In [44]:
symbols

['TSLA',
 'AAPL',
 'MSFT',
 'WMT',
 'GOOG',
 'F',
 'NFLX',
 'COST',
 'BABA',
 'FB',
 'AMZN',
 'GM']

Another common operation might be to "subtract" one set from another.

For example, suppose we have one list that contains all the widgets that were sold on some site, and one that contains all the widgets that had returns.

In [45]:
sold = {'w1', 'w2', 'w3', 'w4'}
returned = {'w1'}        

We want to know which widgets that were sold had no returns:

In [46]:
not_returned = sold - returned

In [47]:
not_returned

{'w2', 'w3', 'w4'}

Another possible use for this might be to determine which characters in one string are not present in another string:

In [48]:
alphabet = set('abcdefghijklmnopqrstuvwxyz')

Writing this above was a bit tedious, so we can use some of Python's built-in magic for this:

In [49]:
import string

In [50]:
string.ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [51]:
string.ascii_uppercase

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [52]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [53]:
alphabet = set(string.ascii_letters)

Now suppose we have this sentence:

In [54]:
text = 'The quick brown fox jumps over the lazy dog'

And we want to know which alphabet characters were not used in that string:

In [55]:
set(string.ascii_letters) - set(text)

{'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'U',
 'V',
 'W',
 'X',
 'Y',
 'Z'}

One problem here is that lower case and upper case characters are not the same - we may want to do this in a case insensitive fashion.

In [56]:
text

'The quick brown fox jumps over the lazy dog'

So we can use case folding to make this case insensitive:

In [57]:
set(string.ascii_letters.casefold()) - set(text.casefold())

set()

Interestingly enough, this sentence has all the characters of the english alphabet :-)

Let's try it with another string that does not contain all the letters of the alphabet:

In [58]:
text = 'aBcDeFgHiJkKlLmMnNoOpPqQrRsStTuUvVwW'

In [59]:
set(string.ascii_letters.casefold()) - set(text.casefold())

{'x', 'y', 'z'}