---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 2.7</h1>

## _set-operations.ipynb_
#### [Click me to learn more about Python Sets](https://www.geeksforgeeks.org/sets-in-python/)

### Learning agenda of this notebook
In Python, Set is an unordered collection of data type that is iterable, mutable and has no duplicate elements. The order of elements in a set is undefined.Sets are created using curly brackets. The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set.

1. How to create Sets?
2. Proof of concepts: Sets are heterogeneous, un-ordered, mutable, nested, and DOES NOT allow duplicate elements
3. Accessing elements of sets?
4. Set concatenation and repetition (can't be performed as on list and tuples)
5. Slicing a set (can't be performed as there is no index associated with set values)
6. Adding elements to a sets using add, and update methods
7. Removing elements from a set using pop, remove and discard methods. 
8. Converting string object to set and vice-versa (using type casting, split() and join())
9. Misc set methods 
10. Some Built-in functions that can be used on sets (len, max, min, sum)
11. Misc Concepts
    - Union of sets 
    - Intersection of sets 
    - Difference of sets 
    - Symmetric Difference of sets 
    - Subsets 
    - Supersets 
    - Disjoint sets 
12. Looping through a set

### 1. How to create Sets?

In [14]:
# A set is created by placing comma separated values in curly brackets {} OR
# A set is created by using set(), and passing a list to it
# We will try to use the second option as the curly brackets are also used by dictionary object in Python
s1 = {1,2,3,4,5}   #set of integers
s1 = set([1, 2, 3, 4, 5])
s1, type(s1)

({1, 2, 3, 4, 5}, set)

In [15]:
s2 = {3.7, 6.5, 3.8, 7.95}   #set of floats
s2 = set([3.7, 6.5, 3.8, 7.95])
print(s2)

{3.8, 3.7, 6.5, 7.95}


In [16]:
s3 = {"hello", "this", "F", "good show"}   #set of strings
s3 = set(["hello", "this", "F", "good show"])
print(s3)

{'F', 'hello', 'good show', 'this'}


In [17]:
s4 = {True, False, True, True, False}   #set of boolean
s4 = set([True, False, True, True, False])
print(s4)


{False, True}


In [18]:
# creating an empty set
#emptyset = {}  # this is not correct way

# to create empty set, we can use set()
s5 = set()
print(s5)
print(type(s5))

set()
<class 'set'>


### 2. Proof of concepts: Sets are heterogeneous, unordered, mutable, nested, and does not allow duplicate elements

#### a. Sets are heterogeneous
- Sets are heterogeneous, as their elements/items can be of any data type

In [42]:
s1 = {"Arif", 30, 5.5}
print("s1: ", s1)

s1:  {5.5, 30, 'Arif'}


#### b. Sets are unordered
- Sets are unordered means elements of a set are NOT associated by any index
- When you access set elements they may show up in different sequence. 
- Moreover, two sets having same elements in different order are same by contents but ofcourse by id.

In [66]:
s2 = set(['learning', 'is', 'fun', 'with', 'Arif'])
print(s2)

{'fun', 'with', 'is', 'learning', 'Arif'}


In [45]:
a = {1, 2, 3}
b = {2, 3, 1}
id(a), id(b), a == b, a is b

(140425097499360, 140425096637344, True, False)

#### c. Sets are mutable
- Means once a set object is created, you can make changes to it and modify its elements
- However, since they cannot be indexed, so we can't change them using subscript operator

In [52]:
numbers = set([10, 20, 30, 40, 50])
#numbers[2] = 15   # error: 'set' object does not support item assignment

print("numbers: ", numbers)

numbers:  {40, 10, 50, 20, 30}


#### d. Sets CANNOT have duplicate elements

In [53]:
# Sets do not allow duplicate elements
# The following line will not raise an error, however, 'Arif' will be added to the set only once
names = {'Arif', 'Rauf', 'Hadeed', 'Arif', 'Mujahid'}
print(names)

{'Mujahid', 'Rauf', 'Hadeed', 'Arif'}


In [59]:
# So when we want to remove duplication from list, we typecast it to a set
mylist = [2, 4, 5, 6, 8, 7, 3, 3, 2]
print("\nList: ", mylist)
myset = set(mylist)
print("List converted to set: ", myset)


List:  [2, 4, 5, 6, 8, 7, 3, 3, 2]
List converted to set:  {2, 3, 4, 5, 6, 7, 8}


#### e. Nested Sets
- You can have tuple inside a set
- You CANNOT have a list inside a set, because sets cannot contain mutable values (lists are mutable)
- Similarly, you cannot have a set within a set, because sets cannot contain mutable values (sets are mutable)
- This is one situation where you may wish to use a frozenset, which is very similar to a set except that a frozenset is immutable.

In [58]:
# Nested sets: sets can have another tuple as an item
s1 = {"Arif", 30, 5.5, (10,'rauf')}
print(s1)

{(10, 'rauf'), 5.5, 30, 'Arif'}


In [55]:
# However, you cannot have a list inside a set, , because sets cannot contain mutable values (lists are mutable)
#s1 = {"Arif", 30, 5.5, [10,'rauf']} # Error unhashable type list

In [57]:
#Similarly, you cannot have a set within a set, because sets cannot contain mutable values (sets are mutable)
#s1 = {"Arif", 30, 5.5, {10,'rauf'}} # Error unhashable type set

TypeError: unhashable type: 'set'

#### f. Packing and Unpacking Sets

In [65]:
# you can assign individual elements of set to string variables
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
print(myset)
a, b, c, d, e = myset # the number of variables on the left must match the length of set
print (a, b, c, d, e)
print(type(a))

{'fun', 'with', 'is', 'learning', 'Arif'}
fun with is learning Arif
<class 'str'>


### 3. Different ways to access elements of a Set
- Set items cannot be accessed by referring to an index, Since sets are unordered, i.e., the items have no associated index
- But you can loop through the set items using a for loop, or 
- Ask if a specified value is present in a set, by using the in keyword.

In [72]:
# Set items cannot be accessed by referring to an index, since sets are unordered the items has no index. 
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
myset = {'learning', 'is', 'fun', 'with', 'Arif'}
print("myset: ", myset)

myset:  {'fun', 'with', 'is', 'learning', 'Arif'}


In [71]:
# But you can loop through the set items using a for loop
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
for i in myset:
    print(i, end=' ')

fun with is learning Arif 

In [73]:
# To check if a specific element is there in the set, use the in keyword
rv = 'fun' in myset
rv

True

### 4. You cannot perform Slicing on Sets
- Slicing is the process of obtaining a portion of a sequence by using its indices.
- Since no indices are associated with Set elements, so they do not support slicing or indexing in [ ] operator

### 5. You cannot perform Set Concatenation and Repetition
- The concatenation operator (+) and replication operator (*) does not work on sets

### 6. Adding elements to a Set using add() and update() methods
- Sets are dynamic, as we write our Python program, we can actually make changes to our already created set, whithout having to go for compiling it again. 
- If we have to add certain elements to an already created set, the original set gorws dynamically without the need of compiling/running the program again (as in case of heap memory in C/C++)

In [86]:
# add(val) method is used to add elements to a set
# Only one element at a time can be added to the set by using add() method
# Lists and sets cannot be added to a set as elements because they are not hashable 
# Tuples can be added because tuples are immutable and hence Hashable. 
set1 = set()
print("Empty Set: ", set1)
set1.add(25)
set1.add(73)
set1.add((19,77))
print("Set after adding three elements: ", set1)


Empty Set:  set()
Set after adding three elements:  {73, 25, (19, 77)}


In [87]:
# update() method is used to add two or more elements, passed as a list
set2 = set([4, 9, 12])
set2.update(['hadeed', 4, 3.5]) # Note the duplicate element 4 will not be added twice
set2

{12, 3.5, 4, 9, 'hadeed'}

In [88]:
# update() method is used to add two or more elements, passed as a list
set3 = set([4, 9, 12])
set3.update(['arif', 'rauf'])
set3

{12, 4, 9, 'arif', 'rauf'}

In [89]:
# the update() method accepts a list having one or more tuples as its argument
set4 = set([4, 9, 12])
set4.update([(99, 88), (44, 33)])
set4

{(44, 33), (99, 88), 12, 4, 9}

### 7. Removing elements from a set using pop(), remove() and discard() methods
- Sets are dynamic, as we write our Python program, we can actually make changes to our already created sets, whithout having to go for compiling it again. 
- If we have to remove certain elements from an already created set, the original set shrinks dynamically without the need of compiling/running the program again (as in case of heap memory in C/C++)

#### a. Removing element from a set using pop(index) method

In [95]:
# pop() method without any argument removes the last item in the list and returns it
s1 = {'learning', 'is', 'fun', 'with', 'arif', 'butt'}
print("Original set: ", s1)

x  = s1.pop()
print("Element popped is: ", x)
print("Set now is: ", s1)

y  = s1.pop()
print("Element popped is: ", y)
print("Set now is: ", s1)

Original set:  {'fun', 'arif', 'with', 'is', 'learning', 'butt'}
Element popped is:  fun
Set now is:  {'arif', 'with', 'is', 'learning', 'butt'}
Element popped is:  arif
Set now is:  {'with', 'is', 'learning', 'butt'}


#### b. Removing element from a set using remove(val) method

In [97]:
# The remove(val) method is used to remove a specific element without returning it
# The remove method is passed exactly one argument, which is the value to be removed and returns none/void
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
print("\nOriginal set: ", s2)

x = s2.remove('department')
print("After remove('department'): ", s2)
print("Return value of remove() is: ", x)

# If the element to be removed does not exist in the set remove() method will flag an error
#y = s2.remove('arif')  # Error: Element doesn’t exist in the set. 


Original set:  {'Science', 'to', 'of', 'department', 'Welcome', 'Data'}
After remove('department'):  {'Science', 'to', 'of', 'Welcome', 'Data'}
Return value of remove() is:  None


#### c. Removing element from a set using discard(val) method

In [99]:
# The discard(val) method overcome the limitation of remove(val) method, 
# i.e., if the element doesn’t exist in the set, no error is raised and the set remains unchanged.
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
y = s2.discard('arif')
s2

{'Data', 'Science', 'Welcome', 'department', 'of', 'to'}

#### d. Using clear() method to remove all the set elements

In [None]:
#use the clear() method to empty a set
s2.clear()
print("\nAfter clear() the list becomes empty: ", s2)



#### e. Using del Keyword to delete the set entirely from memory

In [101]:
# use del keyword to delete entire set, (you cannot delete a specific element as it is non-indexed)
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
print("\nOriginal set: ", s2)
del s2
print(s2)


Original set:  {'Science', 'to', 'of', 'department', 'Welcome', 'Data'}


NameError: name 's2' is not defined

### 8. Converting string object to set and vice-versa (using type casting, split() and join())

In [105]:
# convert a string into set using set()
str1 = 'Learning is fun'    #this is a string
print("Original string: ", str1)

s1 = set(str1)
print("s1: ", s1, "and its type is:  ", type(s1))

Original string:  Learning is fun
s1:  {'n', 'r', 's', 'L', 'i', 'g', 'a', 'u', ' ', 'e', 'f'} and its type is:   <class 'set'>


In [110]:
# split() method is used to tokenize a string based on some delimiter, which can be stored in a list
# returns a list having tokens of the string based on spaces if no argument is passed
mystr = 'Learning is fun'    #this is a string
print("Original string: ", str1)

myset = set(str1.split(' '))
print("myset: ", myset)
print("Type of myset is: ", type(myset))

Original string:  Learning is fun
myset:  {'fun', 'Learning', 'is'}
Type of myset is:  <class 'set'>


In [111]:
#join is the reverse of split
myset = {'Learning', 'is', 'fun'}
print("Original set: ", myset)
delimeter = ' '
mystr = delimeter.join(myset)
print("mystr: ", mystr)
print("Type of mystr is:  ", type(mystr))

Original set:  {'fun', 'Learning', 'is'}
mystr:  fun Learning is
Type of mystr is:   <class 'str'>


### 9. You cannot call sort() and reverse() method on Sets being unordered in nature

### 10. Some Built-in functions that can be used on sets

In [115]:
s1 = set([3, 8, 1, 6, 0, 8, 4])

print("length of set: ", len(s1))
print("max element in set: ", max(s1))
print("min element in list: ",min(s1))
print("Sum of element in list: ",sum(s1))

# Membership (in) operator
rv1 = 9 in s1
print(rv1)


# Membership (in) operator
rv1 = 8 in s1
print(rv1)

length of set:  6
max element in set:  8
min element in list:  0
Sum of element in list:  22
False
True


### 11. Misc Concepts specifically related to Sets

#### a. Union of sets

In [119]:
# A union() method or symbol (|), returns the set of all values that are values in set1, or set2, or both
# You can use the union method to find out all the unique values in two sets.
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 | set2
set3 = set1.union(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 | set2: ", set3)

set1:  {'rauf', 'arif'}
set2:  {'hadeed', 'maaz', 'arif'}
set1 | set2:  {'rauf', 'hadeed', 'maaz', 'arif'}


#### b. Intersection of Sets

In [120]:
# The intersection() method or symbol (&), returns the set of all values that are values of both set1 and set2
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 & set2
set4 = set1.intersection(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 & set2: ", set4)

set1:  {'rauf', 'arif'}
set2:  {'hadeed', 'maaz', 'arif'}
set1 & set2:  {'arif'}


#### c. Difference of Sets

In [121]:
# The difference() method or symbol (-), returns the set of all values of set1, which are not there in set2
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 - set2
set4 = set1.difference(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 - set2: ", set4)

set1:  {'rauf', 'arif'}
set2:  {'hadeed', 'maaz', 'arif'}
set1 - set2:  {'rauf'}


#### d. Symmetric Difference of Sets

In [122]:
# The symmetric_difference() method or symbol (^), returns the (set1 | set2)  - (set1 & set2)
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 ^ set2
set4 = set1.symmetric_difference(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 ^ set2: ", set4)

set1:  {'rauf', 'arif'}
set2:  {'hadeed', 'maaz', 'arif'}
set1 ^ set2:  {'rauf', 'hadeed', 'maaz'}


#### e. Checking Subset

In [124]:
# To check if one set is a subset of other set use issubset() method or <= operator
# issubset() method returns whether another set contains this set or not
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}

print(s2.issubset(s1))     # is s2 a subset of s1
print(s2 <= s1)            # is s2 a subset of s1

True
True


#### f. Checking Superset

In [126]:
# to check if one set is a superset of anoter set use issuperset() method or >= operator
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}

print(s1.issuperset(s2)) # is s1 a superset of s2
print(s1 >= s2)          # is s1 a superset of s2

True
True


#### g. Checking Disjoint

In [127]:
# To check if two sets are disjoint (no intersection) use isdisjoint() method
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}
print(s1.isdisjoint(s2))

# Another example
s3 = {1,2,3,4}
s4 = {5,6,7,8}
print(s3.isdisjoint(s4))



False
True


### 12. Looping through set elements (More on loops in next lecture)

In [136]:
# while loop iterates over the elements until a certain condition is met
set1 = {'Learning', 'is', 'fun', 'with', 'Arif Butt'}

for x in set1:
    print(x)

fun
Learning
with
is
Arif Butt
