### What is a Set?

From our course text, a *set* is an unordered collection of unique elements.

Sets are used *A LOT* in mathematics.  Without sets, humanity's current level of mathematical sophistication would be much less.  

In mathematics, sets are usually denoted with the $\{$ and $\}$ symbols.  As an example consider the set $A$ defined as $\{1,2,3,4\}$.

Sets *do not allow for repeats*.  So, $$\{1,2,3,3,4,4,4\}=\{1,2,3,4\}$$

Also, *there is no order to a set*.  So, $$\{1,3,2,4\}=\{4,3,2,1\}=\{1,2,3,4\}$$

The numbers (or whatever else is in there) are called *elements*.

In Python, sets can be defined by using the *set* type function on a list or by using the set brackets.

##### Example 1

In [1]:
set([1,2,3,4])

{1, 2, 3, 4}

In [2]:
set([1,2,3,4])==set([2,3,4,1])

True

$\Box$

##### Example 2

In [3]:
{3,4,5,6,7}

{3, 4, 5, 6, 7}

In [4]:
{3,4,5,6,7}=={7,6,5,4,3}

True

$\Box$

Often we will want to give a name (assign a variable) to a set.

##### Example 3

In [5]:
A = set(['dog', 'cat', 'gerbil'])
A

{'cat', 'dog', 'gerbil'}

We can use the *in* keyword with sets too.  This makes sense since sets are *iterable*.

In [6]:
'cat' in A

True

In [7]:
'hamster' in A

False

$\Box$

A common use for a set in Python, is to reduce the number of iterations when iterating over a list.

##### Example 4

In [9]:
#Consider the following list
x = [0,1,2]*8
x

[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]

In [17]:
#Clearly, it is redundant to iterate over all of these, if all we want to know is the unique values of m+1 for all m in x
y = [m+1 for m in x]
y

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

In [18]:
y = [m+1 for m in set(x)]
y

[1, 2, 3]

$\Box$

##### Exercise 1

Write code so that the following function performs the way that its docstring indicates.

Example: unique_elements($[$1,2,3,1$]$ should return False.

unique_elements($[$1,2,3,4$]$ should return True.

In [25]:
def unique_elements(some_list):
    """
    Parameters
    -----------
    some_list: list
    
    Returns
    -----------
    Boolean: True, if all elements in some_list are unique and False otherwise.
    """

In [None]:
if unique_elements(['a','b','c','a', 'c']):
    print("Something is wrong with your code.")
elif not unique_elements([1,2,3,4,5]):
    print("Something is wrong with your code.")
else:
    print("All tests passed.")

##### Exercise 2

Write code so that the following function performs the way that its docstring indicates.

Example: list_of_unique_elements($[$1,2,3,4,5,1,4,5$]$ should return $[$2,3$]$.

unique_elements($[$1,2,3,4$]$ should return $[$1,2,3,4$]$.

In [5]:
def list_of_unique_elements(some_list):
    """
    Parameters
    -----------
    some_list: list
    
    Returns
    -----------
    list containing all elements of some_list that appear in some_list only once
    """

In [None]:
#Add test with randomly generated long list

if list_of_unique_elements(['a','b','c','a', 'c','d','d','e']) != ['e', 'b']:
    print("Something is wrong with your code.")
elif list_of_unique_elements([1,2,3,4,5]) != [1,2,3,4,5]:
    print("Something is wrong with your code.")
else:
    print("All tests passed.")

### Subsets and Supersets

A *subset* is a set whose elements are themselves elements in a bigger set.  The bigger set is called a *superset*.

The set $A=\{1,2,3\}$ is a subset of the set $B=\{1,2,3,4,5\}$, since all elements in $A$ are also elements in $B$.

In math, the symbol for subset is $\subseteq$.  So, $A\subseteq B$.  This stands for $\subset$ or $=$.  Using this symbol, $A\subseteq C$, where $C=\{1,2,3\}$.  This is because $\subseteq = \subset$ or $=$.

In Python, the *issubset* and *issuperset* methods implement these ideas.

##### Example 5

In [10]:
A = {1,2,3}
A

{1, 2, 3}

In [11]:
C = A
C

{1, 2, 3}

In [12]:
B = {1,2,3,4,5}
B

{1, 2, 3, 4, 5}

In [13]:
A.issubset(B)

True

In [14]:
A.issubset(C)

True

In [15]:
B.issuperset(A)

True

$\Box$

### Set Union

In Math, the *union* of two sets $A$ and $B$ is the set (possibly new) that contains all elements that are in $A$ or $B$ (or both $A$ and $B$).

To denote the union of two sets $A$ and $B$, we use the symbol $\bigcup$.

As an example, consider $A=\{1,2,3\}$ and $B=\{2,3,4,5\}$.

Then, $A\bigcup B = \{1,2,3,4,5\}$.

In Python, the *union* method implements the mathematical conception of a set.

##### Example 6

In [17]:
A = {1,2,3}
B = {2,3,4,5}
A.union(B)

{1, 2, 3, 4, 5}

In [19]:
B.union(A)

{1, 2, 3, 4, 5}

This example shows that set union is commutative: $A\bigcup B = B\bigcup A$.

$\Box$

### Set Intersection

In Math, the *intersection* of two sets $A$ and $B$ is the set that contains all elements that are in $A$ and $B$.

To denote the intersection of two sets $A$ and $B$, we use the symbol $\bigcap$.

As an example, consider $A=\{1,2,3\}$ and $B=\{2,3,4,5\}$.

Then, $A\bigcap B = \{2,3\}$.

In Python, the *intersection* method implements the mathematical conception of a set.

##### Example 7

In [20]:
#From above
A

{1, 2, 3}

In [21]:
C

{1, 2, 3}

In [22]:
B

{2, 3, 4, 5}

In [23]:
A.intersection(B)

{2, 3}

In [24]:
B.intersection(A)

{2, 3}

In [39]:
A.intersection(C)

{1, 2, 3}

In [40]:
D = {10,20,30}
D

{10, 20, 30}

$\Box$

##### Exercise 3

Write code so that the following function performs the way that its docstring indicates.

Example: intersect($[${1,2,3}, {2,3,4}, {3,4,5}$]$ should return {3}.

intersect([{1,2,3,4,5},{2,3,4,5},{3,4,5,6},{4,5,6,7}]) should return {4,5}.

In [13]:
def intersect(list_of_sets):
    """
    Parameters
    -----------
    list_of_sets: list whose elements are sets
    
    Returns
    -----------
    set containing the intersection of all sets in list_of_sets
    """  

In [None]:
if intersect([{1,2,3}, {2,3,4}, {3,4,5}]) != {3}:
    print("Something is wrong with your code.")
elif intersect([{1,2,3,'dude'}, {2,3,4,'dude'}, {3,4,5,'dude'}]) != {3,'dude'}:
    print("Something is wrong with your code.")
else:
    print("All tests passed.")

### The Empty Set

The *empty set* is the set that contains no elements.  In math, this is the set $\{$ $\}$.  Although somewhat of a weird concept, the empty set is necessary in mathematics.

In math, the empty set is denoted by the symbol $\emptyset$. 

In Python, the empty set is constructed in the same ways that non-empty sets are constructed, except that no elements are defined.

##### Example 8

In [1]:
{ }

{}

In [2]:
set([])

set()

Since the two above sets are empty (no elements), it makes sense that their length is 0.

In [4]:
len({ })

0

In [5]:
len(set([]))

0

$\Box$

### Disjoint Sets

Two sets are said to be *disjoint*, if their intersection is the empty set.

For example consider the sets $A=\{1,2,3,4\}$ and $B=\{100,200,300\}$.  These sets are disjoint, since $A\bigcap B =\emptyset$.

##### Example 9

In [6]:
A = set([1,2,3,4])
B = set([100,200,300])

In [9]:
A

{1, 2, 3, 4}

In [10]:
B

{100, 200, 300}

In [7]:
A.intersection(B)

set()

$\Box$

In Python, there is a set method for checking disjointedness.

##### Example 10

In [11]:
#Using A and B from Example 9
A.isdisjoint(B)

True

In [12]:
#Define two new sets
C = {1,2,3}
D = {2,3,4,5}

In [13]:
C.isdisjoint(D)

False

In [15]:
C.intersection(D) == {}

False

$\Box$

##### Exercise 4

Write code so that the following function performs the way that its docstring indicates.

Example: is_disjoint($[${1,2,3}, {2,3,4}, {3,4,5}$]$ should return False.

is_disjoint([{1,2},{2,3},{1,3}]) should return True.

In [14]:
def is_disjoint(list_of_sets):
    """
    Parameters
    -----------
    list_of_sets: list whose elements are sets
    
    Returns
    -----------
    boolean: True if there does not exist any elements common to all sets in list_of_sets.  False, otherwise
    """

In [None]:
if not is_disjoint([{1,2}, {2,3}, {1,3}]):
    print("Something is wrong with your code.")
elif is_disjoint([{1,2,3,4,5},{2,3,4,5},{4,5,6,7,8},{4,5}]):
    print("Something is wrong with your code.")
else:
    print("All tests passed.")

### Set Difference

The *difference* of two sets $A$ and $B$ is the set that contains all elements of $A$ that are not in $B$.

In math, the difference of two sets is the set $A-B$.

As an example, consider $A=\{1,2,3,4\}$ and $B=\{3,4,5\}$.

Then, $A-B=\{1,2\}$, since 1 and 2 are in $A$ and not in $B$.

In Python, set difference is implemented with the *difference* method.

##### Example 11

In [16]:
#Define two sets
A = {1,2,3,4}
B = {3,4,5}

In [17]:
A

{1, 2, 3, 4}

In [18]:
B

{3, 4, 5}

In [19]:
A.difference(B)

{1, 2}

$\Box$



##### Exercise 5
Suppose that we have a data set with the following column names.

In [23]:
all_columns = ['col'+str(x) for x in range(10000)]

Now, that is A LOT of columns.

In [24]:
len(all_columns)

10000

Now, lets suppose that the columns that we want are given below.

In [25]:
import numpy as np

columns_wanted = ['col'+str(np.random.randint(0,10001)) for x in range(5)]

columns_wanted

['col5125', 'col7887', 'col1140', 'col9727', 'col9069']

Now, suppose that you want to pass a list of all the columns you don't want to a *drop* function.

Give a list of the names of all columns that you want to drop as a list called *columns_to_drop*.  Put your answer below.

### Set Comprehensions

Sets can be constructed using a *comprehension* similar to those of lists and dicts.

The syntax for a set comprehension is $\{$ *expression* for *value* in *collection* if *condition*$\}$

##### Example 12

In [21]:
even_cubes = {x**3 for x in range(20) if x**3%2 == 0}
even_cubes

{0, 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832}

$\Box$