---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 2.4</h1>

## _set-operations.ipynb_
#### [Click me to learn more about Python Sets](https://www.geeksforgeeks.org/sets-in-python/)

### Learning agenda of this notebook
In Python, Set is an unordered collection of data type that is iterable, mutable and has no duplicate elements. The order of elements in a set is undefined.Sets are created using curly brackets. The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set.

1. How to create Sets?
2. Proof of concepts: Sets are heterogeneous, un-ordered, mutable, nested, and DOES NOT allow duplicate elements
3. Accessing elements of sets?
4. Adding elements to a sets using add, and update methods
5. Removing elements from a set using pop, remove and discard methods
6. Set concatenation and repetition (can't be performed as on list and tuples)
7. Slicing a set (can't be performed as there is no index associated with set values)
8. Converting string object to set and vice-versa (using type casting, split() and join())
9. Misc set methods 
10. Some Built-in functions that can be used on sets (len, max, min, sum)
11. Misc Concepts
    - Union of sets 
    - Intersection of sets 
    - Difference of sets 
    - Symmetric Difference of sets 
    - Subsets 
    - Supersets 
    - Disjoint sets 
12. Looping through a set

### 1. How to create Sets?

In [1]:
# A set is created by placing comma separated values in curly brackets {} OR
# A set is created by using set(), and passing a list to it
# We will try to use the second option as the curly brackets are also used by dictionary object in Python
s1 = {1,2,3,4,5}   #list of integers
s1 = set([1, 2, 3, 4, 5])
print(s1)

s2 = {3.7, 6.5, 3.8, 7.95}   #set of floats
s2 = set([3.7, 6.5, 3.8, 7.95])
print(s2)

s3 = {"hello", "this", "F", "good show"}   #set of strings
s3 = set(["hello", "this", "F", "good show"])
print(s3)

s4 = {True, False, True, True, False}   #set of boolean
s4 = set([True, False, True, True, False])
print(s4)

# creating an empty set
#emptyset = {}  # this is not correct way

# to create empty set, we can use set()
s5 = set()
print(s5)
print(type(s5))

# Nested sets: sets can also have another tuple as an item
# However, you cannot add a list to a set, because list are mutable
s7 = {"Arif", 30, 5.5, (10,'rauf')}
print(s7)
#s7 = {"Arif", 30, 5.5, [10,'rauf']} # Error unhashable type list

{1, 2, 3, 4, 5}
{3.8, 3.7, 6.5, 7.95}
{'F', 'this', 'good show', 'hello'}
{False, True}
set()
<class 'set'>
{(10, 'rauf'), 5.5, 30, 'Arif'}


### 2. Proof of concepts: Sets are heterogeneous, unordered, mutable, nested, and does not allow duplicate elements

In [2]:
# Sets are heterogeneous, as their elements/items can be of any data type
# Sets are un-ordered means their elements are NOT associated by an index (as in list and tuples)
# Moreover order of elements in a set is undefined and is unchangeable.
# Every time you access a set elements they will show up in different sequence

s1 = set([3.2, "Arif", 30, 5.5, 100])
print("s1: ", s1)


# Sets are mutable, i.e., sets elements can be changed, however,
# but since they cannot be indexed, so we can't change them using subscript operator
numbers = set([10, 20, 30, 40, 50])
# numbers[2] = 15   # error: 'set' object does not support item assignment


# Sets does not allow duplicate elements, so duplications are implicitly removed while assignment
names = set(['Arif', 'Rauf', 'Hadeed', 'Arif', 'Mujahid'])
print("\nThe duplicates in assignments are removed: ", names)

# So when we want to remove duplication from list, we typecast it to a set
mylist = [2, 4, 5, 6, 8, 7, 3, 3,2]
print("\nList: ", mylist)
myset = set(mylist)
print("List converted to set: ", myset)

# you can assign individual elements of set to string variables
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
v, w, x, y, z = myset # the number of variables on the left must match the length of string
print ("\n", v, y, z)
print(type(x))

s1:  {3.2, 100, 5.5, 30, 'Arif'}

The duplicates in assignments are removed:  {'Mujahid', 'Hadeed', 'Rauf', 'Arif'}

List:  [2, 4, 5, 6, 8, 7, 3, 3, 2]
List converted to set:  {2, 3, 4, 5, 6, 7, 8}

 learning with Arif
<class 'str'>


### 3. Different ways to access elements of a Set

In [3]:
# Set items cannot be accessed by referring to an index, since sets are unordered the items has no index. 
# But you can loop through the set items using a for loop, or 
# ask if a specified value is present in a set, by using the in keyword.
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
myset = {'learning', 'is', 'fun', 'with', 'Arif'}
print("myset: ", myset)
 
    
# If you want to perform operation on individual elements use for loop
print("\nElements of set: ",end=" ")
for i in myset:
    print(i, end=" ")
 

# To check if a specific element is there in the set, use the in keyword
rv = 'fun' in myset
print("\n\nElement exist = ", rv)

myset:  {'learning', 'is', 'fun', 'with', 'Arif'}

Elements of set:  learning is fun with Arif 

Element exist =  True


### 4. Adding elements to a set using add, and update methods

In [4]:
# add(val) method is used to add elements to a set
# Only one element at a time can be added to the set by using add() method
# Lists cannot be added to a set as elements because Lists are not hashable 
# Tuples can be added because tuples are immutable and hence Hashable. 
set1 = set()
print("Empty Set: ", set1)
set1.add(25)
set1.add(73)
set1.add((19,77))
print("Set after adding three elements: ", set1)


# update() method is used to add two or more elements, removes duplicates
set2 = set([ 4, 9, 12])
print("\nset2: ", set2)
set2.update(['hadeed', 4, 3.5])
print("After update(['hadeed',4,3.5])-> set2: ", set2)

# The update() method accepts strings as its arguments.
set2.update(['arif', 'rauf'])
print("After update([('arif', 'rauf')])-> set2: ", set2)

# the update() method accepts one or more tuples as its argument
set2.update([(99, 88), (44, 33)])
print("After update([(99,88)])-> set2: ", set2)



Empty Set:  set()
Set after adding three elements:  {73, 25, (19, 77)}

set2:  {9, 4, 12}
After update(['hadeed',4,3.5])-> set2:  {3.5, 4, 9, 12, 'hadeed'}
After update([('arif', 'rauf')])-> set2:  {3.5, 4, 'arif', 9, 12, 'rauf', 'hadeed'}
After update([(99,88)])-> set2:  {(44, 33), 3.5, 4, 'arif', 9, 12, 'rauf', 'hadeed', (99, 88)}


### 5. Removing elements from a list using pop, remove and discard methods

In [5]:
# pop() method without any argument removes the last item in the list and returns it
# The pop(index) method will remove an item from the set from a random location and returns it
s1 = ['learning', 'is', 'fun', 'with', 'arif', 'butt']
print("Original set: ", s1)

x  = s1.pop()
print("Element popped is: ", x)
print("Set now is: ", s1)

y  = s1.pop()
print("Element popped is: ", y)
print("Set now is: ", s1)

z = s1.pop(2)      #passing an argument pops a random element from the set
print("Element popped is: ", z)
print("Set now is: ", s1)



# The remove() method is used to remove a specific element without returning it
# The remove method is passed exactly one argument, which is the value to be removed and returns none/void
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
print("\nOriginal set: ", s2)
x = s2.remove('department')
print("After remove('department'): ", s2)
print("Return value of remove() is: ", x)
#y = s2.remove('arif')  # Error: Element doesn’t exist in the set. 


# The discard() method overcome the limitation of remove() method, if the element doesn’t exist in the set, it remains unchanged.
y = s2.discard('arif')




#use the clear() method to empty a set
s2.clear()
print("\nAfter clear() the list becomes empty: ", s2)

# use del keyword to delete entire set, (you cannot delete a specific element as it is non-indexed)
s3 = set(['p', 'u', 'n', 'j', 'a', 'b', 'univrersity'])
print("\nOriginal set: ", s3)
del s3
#print(s3)

Original set:  ['learning', 'is', 'fun', 'with', 'arif', 'butt']
Element popped is:  butt
Set now is:  ['learning', 'is', 'fun', 'with', 'arif']
Element popped is:  arif
Set now is:  ['learning', 'is', 'fun', 'with']
Element popped is:  fun
Set now is:  ['learning', 'is', 'with']

Original set:  {'Welcome', 'department', 'of', 'Data', 'to', 'Science'}
After remove('department'):  {'Welcome', 'of', 'Data', 'to', 'Science'}
Return value of remove() is:  None

After clear() the list becomes empty:  set()

Original set:  {'j', 'univrersity', 'n', 'a', 'p', 'u', 'b'}


### 6. Set Concatenation and Repetition

In [6]:
# You CANNOT use + operator to concatenate two sets
# food_items1 = set(['fruits', 'bread', 'veggies'])
# food_items2 = set(['meat', 'spices', 'burger'])
# food = food_items1 + food_items2
# print(food)


# You CANNOT use set * n syntax to create large sets (as you can do in list and tuples)
# name = set(['Arif', 'Hadeed', 'Mujahid'])
# print ("\n", name * 3)



### 7. Slicing Sets

In [7]:
# There is no index attached to any element in a python set. 
# So they do not support any indexing or slicing operation.


### 8. Converting string object to set and vice-versa (using type casting, split() and join())

In [8]:
# convert a string into list using list()
str1 = 'Learning is fun'    #this is a string
print(type(str1))
print("Original string: ", str1, "and its type is:  ", type(str1))
s1 = set(str1)
print("s1: ", s1, "and its type is:  ", type(s1))


# split() method is used to tokenize a string based on some delimiter, which can be stored in a list
# returns a list having tokens of the string based on spaces if no argument is passed
str1 = 'Learning is fun'    #this is a string
print("\nGiven string: ", str1)
s1 = set(str1.split(' '))
print("s1=set(str1.split(' ')): ", s1)
print(type(s1))


#join is the reverse of split
delimeter = ' '
str2 = delimeter.join(s1)
print("\n",str2)
print(type(str2))

<class 'str'>
Original string:  Learning is fun and its type is:   <class 'str'>
s1:  {'g', 's', 'n', 'L', 'a', ' ', 'u', 'f', 'i', 'e', 'r'} and its type is:   <class 'set'>

Given string:  Learning is fun
s1=set(str1.split(' ')):  {'Learning', 'is', 'fun'}
<class 'set'>

 Learning is fun
<class 'str'>


### 9. Misc Set methods in Python

In [9]:
# You cannot call sort() method on set being unordered in nature

# Similarly you cannot call reverse() method on set being unordered in nature


### 10. Some Built-in functions that can be used on lists

In [10]:
s1 = set([3, 8, 1, 6, 0, 8, 4])

print("length of list: ", len(s1))
print("max element in list: ", max(s1))
print("min element in list: ",min(s1))
print("Sum of element in list: ",sum(s1))

# Membership (in) operator
rv1 = 9 in s1
print(rv1)


length of list:  6
max element in list:  8
min element in list:  0
Sum of element in list:  22
False


### 11. Misc Concepts specifically related to Sets

#### a. Union of sets

In [11]:
# A union() method or symbol (|), returns the set of all values that are values in set1, or set2, or both
# You can use the union method to find out all the unique values in two sets.
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}
set3 = set1 | set2
set4 = set1.union(set2)
print("set1: ", set1)
print("set2: ", set2)
print("set1.union(set2): ", set4)

set1:  {'rauf', 'arif'}
set2:  {'maaz', 'hadeed', 'arif'}
set1.union(set2):  {'rauf', 'arif', 'maaz', 'hadeed'}


#### b. Intersection of Sets

In [12]:
# The intersection() method or symbol (&), returns the set of all values that are values of both set1 and set2
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}
set3 = set1 & set2
set4 = set1.intersection(set2)
print("set1: ", set1)
print("set2: ", set2)
print("set1 & set2: ", set4)

set1:  {'rauf', 'arif'}
set2:  {'maaz', 'hadeed', 'arif'}
set1 & set2:  {'arif'}


#### c. Difference of Sets

In [13]:
# The difference() method or symbol (-), returns the set of all values of set1, which are not there in set2
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}
set3 = set1 - set2
set4 = set1.difference(set2)
print("set1: ", set1)
print("set2: ", set2)
print("set1 - set2: ", set4)

set1:  {'rauf', 'arif'}
set2:  {'maaz', 'hadeed', 'arif'}
set1 - set2:  {'rauf'}


#### d. Symmetric Difference of Sets

In [14]:
# The symmetric_difference() method or symbol (^), returns the (set1 | set2)  - (set1 & set2)
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}
set3 = set1 ^ set2
set4 = set1.symmetric_difference(set2)
print("set1: ", set1)
print("set2: ", set2)
print("set1 ^ set2: ", set4)


set1:  {'rauf', 'arif'}
set2:  {'maaz', 'hadeed', 'arif'}
set1 ^ set2:  {'maaz', 'hadeed', 'rauf'}


#### e. Checking Subset

In [15]:
# To check if one set is a subset of other set use issubset() method or <= operator
# issubset() method returns whether another set contains this set or not
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}
print(s2.issubset(s1))
print(s1 <= s2) # is s1 a subset of s2



True
False


#### f. Checking Superset

In [16]:
# to check if one set is a superset of anoter set use issuperset() method or >= operator
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}
print(s1.issuperset(s2)) # is s1 a superset of s2
print(s1 >= s2)

True
True


#### g. Checking Disjoint

In [20]:
# To check if two sets are disjoint (no intersection) use isdisjoint() method
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}
print(s1.isdisjoint(s2))

# Another example
s3 = {1,2,3,4}
s4 = {5,6,7,8}
print(s3.isdisjoint(s4))



False
True


In [27]:
s3 = s1.is
s3

False

### 12. Looping through set elements (More on loops in next lecture)

In [5]:
# for loop iterates over the elements specified number of times
s1 = set(['Learning', 'is', 'fun', 'with', 'Arif Butt'])
for x in s1:
    print(x)
 

# Adding elements to the Set
# using Iterator
s2 = set()
for i in range(1, 8):
    s2.add(i)
print("s2: ", s2)


Learning
fun
is
Arif Butt
with
s2:  {1, 2, 3, 4, 5, 6, 7}


In [2]:
s1

{'Arif Butt', 'Learning', 'fun', 'is', 'with'}

In [3]:
del s1

In [4]:
s1

NameError: name 's1' is not defined

In [10]:
sum(s2)

28

In [11]:
str1 = "learning is fun"

In [15]:
t1 = set(str1.split(' '))

In [16]:
t1

{'fun', 'is', 'learning'}

In [17]:
type(t1)

set

In [18]:
str2 = ' '.join(t1)

In [19]:
str2

'is learning fun'