---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 2.7</h1>

## _Python-Sets.ipynb_
#### [Click me to learn more about Python Sets](https://www.geeksforgeeks.org/sets-in-python/)

## Learning agenda of this notebook
1. How to create Sets?
2. Proof of concepts: Sets are heterogeneous, un-ordered, mutable, nested, and DOES NOT allow duplicate elements
3. Accessing elements of sets?
4. Set concatenation and repetition (can't be performed as on list and tuples)
5. Slicing a set (can't be performed as there is no index associated with set values)
6. Adding elements to a sets using add, and update methods
7. Removing elements from a set using pop, remove and discard methods. 
8. Converting string object to set and vice-versa (using type casting, split() and join())
9. Misc set methods 
10. Some Built-in functions that can be used on sets (len, max, min, sum)
11. Misc Concepts
    - Union of sets 
    - Intersection of sets 
    - Difference of sets 
    - Symmetric Difference of sets 
    - Subsets 
    - Supersets 
    - Disjoint sets 

In [1]:
help(set)

Help on class set in module builtins:

class set(object)
 |  set() -> new empty set object
 |  set(iterable) -> new set object
 |  
 |  Build an unordered collection of unique elements.
 |  
 |  Methods defined here:
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iand__(self, value, /)
 |      Return self&=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __ior__(self, value, /)
 |      Return self|=value.
 |  
 |  __isub__(self, value, /)
 |      Return self-=value.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __ixor__(self, value, /)
 |      Re

## 1. How to create Sets?
- In Python, a Set is an unordered collection of data type that is iterable, mutable and has no duplicate elements. The order of elements in a set is undefined. 
- A set is created by placing comma separated values in curly brackets {}. 
- The preferred of creating a set is by using `set()`, and passing a list to it. (as curly brackets are also used by dictionary object in Python
- The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set.

In [2]:
s1 = {1,2,3,4,5}   #set of integers
s1 = set([1, 2, 3, 4, 5])
s1, type(s1)

({1, 2, 3, 4, 5}, set)

In [15]:
s2 = {3.7, 6.5, 3.8, 7.95}   #set of floats
s2 = set([3.7, 6.5, 3.8, 7.95])
print(s2)

{3.8, 3.7, 6.5, 7.95}


In [16]:
s3 = {"hello", "this", "F", "good show"}   #set of strings
s3 = set(["hello", "this", "F", "good show"])
print(s3)

{'F', 'hello', 'good show', 'this'}


In [17]:
s4 = {True, False, True, True, False}   #set of boolean
s4 = set([True, False, True, True, False])
print(s4)


{False, True}


In [18]:
# creating an empty set
#emptyset = {}  # this is not correct way

# to create empty set, we can use set()
s5 = set()
print(s5)
print(type(s5))

set()
<class 'set'>


## 2. Proof of concepts: Sets are heterogeneous, unordered, mutable, nested, and does not allow duplicate elements

### a. Sets are heterogeneous
- Sets are heterogeneous, as their elements/items can be of any data type

In [42]:
s1 = {"Arif", 30, 5.5}
print("s1: ", s1)

s1:  {5.5, 30, 'Arif'}


### b. Sets are unordered
- Sets are unordered means elements of a set are NOT associated by any index
- When you access set elements they may show up in different sequence. 
- Moreover, two sets having same elements in different order 
    - have different memory addresses
    - the `is` operator compares the memory adresses
    - the `==` operator compares the contents

In [66]:
s2 = set(['learning', 'is', 'fun', 'with', 'Arif'])
print(s2)

{'fun', 'with', 'is', 'learning', 'Arif'}


In [45]:
a = {1, 2, 3}
b = {2, 3, 1}
id(a), id(b), a == b, a is b

(140425097499360, 140425096637344, True, False)

### c. Sets are mutable
- Means once a set object is created, you can make changes to it and modify its elements
- However, since they cannot be indexed, so we can't change them using subscript operator

In [3]:
numbers = set([10, 20, 30, 40, 50])
#numbers[2] = 15   # error: 'set' object does not support item assignment

print("numbers: ", numbers)

numbers:  {40, 10, 50, 20, 30}


### d. Sets CANNOT have duplicate elements

In [4]:
# Sets do not allow duplicate elements
# The following line will not raise an error, however, 'Arif' will be added to the set only once
names = {'Arif', 'Rauf', 'Hadeed', 'Arif', 'Mujahid'}
print(names)

{'Arif', 'Rauf', 'Mujahid', 'Hadeed'}


In [5]:
# So when we want to remove duplication from list, we typecast it to a set
mylist = [2, 4, 5, 6, 8, 7, 3, 3, 2]
print("\nList: ", mylist)
myset = set(mylist)
print("List converted to set: ", myset)


List:  [2, 4, 5, 6, 8, 7, 3, 3, 2]
List converted to set:  {2, 3, 4, 5, 6, 7, 8}


### e. Nested Sets
- You can have tuple inside a set
- You CANNOT have a list inside a set, because sets cannot contain mutable values (lists are mutable)
- Similarly, you cannot have a set within a set, because sets cannot contain mutable values (sets are mutable)
- This is one situation where you may wish to use a frozenset, which is very similar to a set except that a frozenset is immutable.

In [58]:
# Nested sets: sets can have another tuple as an item
s1 = {"Arif", 30, 5.5, (10,'rauf')}
print(s1)

{(10, 'rauf'), 5.5, 30, 'Arif'}


In [55]:
# However, you cannot have a list inside a set, , because sets cannot contain mutable values (lists are mutable)
#s1 = {"Arif", 30, 5.5, [10,'rauf']} # Error unhashable type list

In [6]:
#Similarly, you cannot have a set within a set, because sets cannot contain mutable values (sets are mutable)
#s1 = {"Arif", 30, 5.5, {10,'rauf'}} # Error unhashable type set

### f. Packing and Unpacking Sets

In [7]:
# you can assign individual elements of set to string variables
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
print(myset)
a, b, c, d, e = myset # the number of variables on the left must match the length of set
print (a, b, c, d, e)
print(type(a))

{'Arif', 'with', 'fun', 'learning', 'is'}
Arif with fun learning is
<class 'str'>


## 3. Different ways to access elements of a Set
- Set items cannot be accessed by referring to an index, Since sets are unordered, i.e., the items have no associated index
- But you can loop through the set items using a for loop, or 
- Ask if a specified value is present in a set, by using the `in` operator.

In [8]:
# Set items cannot be accessed by referring to an index, since sets are unordered the items has no index. 
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
myset = {'learning', 'is', 'fun', 'with', 'Arif'}
print("myset: ", myset)

myset:  {'Arif', 'with', 'fun', 'learning', 'is'}


In [9]:
# But you can loop through the set items using a for loop
myset = set(['learning', 'is', 'fun', 'with', 'Arif'])
for i in myset:
    print(i, end=' ')

Arif with fun learning is 

In [10]:
# To check if a specific element is there in the set, use the in keyword
rv = 'fun' in myset
rv

True

## 4. You cannot perform Slicing on Sets
- Slicing is the process of obtaining a portion of a sequence by using its indices.
- Since no indices are associated with Set elements, so they do not support slicing or indexing in `[ ]` operator

## 5. You cannot perform Set Concatenation and Repetition
- The concatenation operator `+` and replication operator `*` does not work on sets

## 6. Adding elements to a Set
- Sets are dynamic, as we write our Python program, we can actually make changes to our already created set, whithout having to go for compiling it again. 
- If we have to add certain elements to an already created set, the original set gorws dynamically without the need of compiling/running the program again (as in case of heap memory in C/C++)

### a. Cannot Modify/Add elements to a set using [ ] operator

### b. Adding elements to a set using `set.add(value)` method
- The `set.add(val)` method is used to add an element to a set
- Only one element at a time can be added to the set by using `set.add()` method
- Lists and sets cannot be added to a set as elements because they are not hashable 
- Tuples can be added because tuples are immutable and hence Hashable. 

In [19]:
help(set.add)

Help on method_descriptor:

add(...)
    Add an element to a set.
    
    This has no effect if the element is already present.



In [20]:
set1 = set()
print("Empty Set: ", set1)
set1.add(25)
set1.add(73)
set1.add((19,77))
print("Set after adding three elements: ", set1)


Empty Set:  set()
Set after adding three elements:  {73, 25, (19, 77)}


### c. Adding elements to a set using `set.update(value)` method
- The `set.add(val)` method is used to add two or more elements to a set
- Lists and sets cannot be added to a set as elements because they are not hashable 
- Tuples can be added because tuples are immutable and hence Hashable. 

In [21]:
set1 = set()
help(set1.update)

Help on built-in function update:

update(...) method of builtins.set instance
    Update a set with the union of itself and others.



In [22]:
set2 = set([4, 9, 12])
set2.update(['hadeed', 4, 3.5]) # Note the duplicate element 4 will not be added twice
set2

{12, 3.5, 4, 9, 'hadeed'}

In [23]:
# update() method is used to add two or more elements, passed as a list
set3 = set([4, 9, 12])
set3.update(['arif', 'rauf'])
set3

{12, 4, 9, 'arif', 'rauf'}

In [24]:
# the update() method accepts a list having one or more tuples as its argument
set4 = set([4, 9, 12])
set4.update([(99, 88), (44, 33)])
set4

{(44, 33), (99, 88), 12, 4, 9}

## 7. Removing elements from a set
- Sets are dynamic, as we write our Python program, we can actually make changes to our already created sets, whithout having to go for compiling it again. 
- If we have to remove certain elements from an already created set, the original set shrinks dynamically without the need of compiling/running the program again (as in case of heap memory in C/C++)

### a. Removing element from a set using `set.pop(index)` method
- The `set.pop()` method removes and return an arbitrary set element

In [26]:
s1 = set()
help(s1.pop)

Help on built-in function pop:

pop(...) method of builtins.set instance
    Remove and return an arbitrary set element.
    Raises KeyError if the set is empty.



In [30]:
s1 = {'learning', 'is', 'fun', 'with', 'arif', 'butt'}
print("Original set: ", s1)

x  = s1.pop()
print("Element popped is: ", x)
print("Set now is: ", s1)

y  = s1.pop()
print("Element popped is: ", y)
print("Set now is: ", s1)

Original set:  {'with', 'fun', 'learning', 'arif', 'is', 'butt'}
Element popped is:  with
Set now is:  {'fun', 'learning', 'arif', 'is', 'butt'}
Element popped is:  fun
Set now is:  {'learning', 'arif', 'is', 'butt'}


### b. Removing element from a set using `set.remove(val)` method
- The `set.remove(val)` method is used to remove a specific element by value from a set without returning it
- The remove method is passed exactly one argument, which is the value to be removed and returns none/void

In [31]:
s1 = set()
help(s1.remove)

Help on built-in function remove:

remove(...) method of builtins.set instance
    Remove an element from a set; it must be a member.
    
    If the element is not a member, raise a KeyError.



In [33]:
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
print("\nOriginal set: ", s2)

x = s2.remove('department')
print("After remove('department'): ", s2)
print("Return value of remove() is: ", x)

# If the element to be removed does not exist in the set remove() method will flag an error
#y = s2.remove('arif')  # Error: Element doesn’t exist in the set. 


Original set:  {'to', 'Data', 'department', 'of', 'Science', 'Welcome'}
After remove('department'):  {'to', 'Data', 'of', 'Science', 'Welcome'}
Return value of remove() is:  None


### c. Removing element from a set using `set.discard(val)` method
- The `set.discard(val)` like `set.remove(val)` method is used to remove a specific element by value from a set without returning it
- The advantage of using `set.remove(val)` method is that, if the element doesn’t exist in the set, no error is raised and the set remains unchanged.

In [35]:
s1 = set()
help(s1.discard)

Help on built-in function discard:

discard(...) method of builtins.set instance
    Remove an element from a set if it is a member.
    
    If the element is not a member, do nothing.



In [36]:
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
y = s2.discard('arif')
s2

{'Data', 'Science', 'Welcome', 'department', 'of', 'to'}

### d. Using `set.clear()` method to remove all the set elements

In [39]:
#use the clear() method to empty a set
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
s2.clear()
print("\nAfter clear() the set becomes empty: ", s2)


After clear() the set becomes empty:  set()


### e. Using `del` Keyword to delete the set entirely from memory

In [40]:
# use del keyword to delete entire set, (you cannot delete a specific element as it is non-indexed)
s2 = set(['Welcome', 'to', 'department', 'of', 'Data', 'Science'])
print("\nOriginal set: ", s2)
del s2
print(s2)


Original set:  {'to', 'Data', 'department', 'of', 'Science', 'Welcome'}


NameError: name 's2' is not defined

## 8. Converting string object to set and vice-versa (using type casting, split() and join())

### a. Type Casting

In [41]:
# convert a string into set using set()
str1 = 'Learning is fun'    #this is a string
print("Original string: ", str1)

s1 = set(str1)
print("s1: ", s1, "and its type is:  ", type(s1))

Original string:  Learning is fun
s1:  {' ', 'i', 'L', 's', 'g', 'u', 'f', 'n', 'a', 'r', 'e'} and its type is:   <class 'set'>


### b. Use `str.split()` to Split a Tuple into Strings
- Used to tokenize a string based on some delimiter, which can be stored in a Tuple
- It returns a list having tokens of the string based on spaces if no argument is passed

In [42]:
str1 = ""
help(str1.split)

Help on built-in function split:

split(sep=None, maxsplit=-1) method of builtins.str instance
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.



In [44]:
str1 = 'Learning is fun'    #this is a string
set1 = set(str1.split(' '))
set1, type(set1)

({'Learning', 'fun', 'is'}, set)

In [45]:
str2 = "Data Science is GR8 Degree"    #this is a string
set2 = set(str2.split('c'))
set2

{'Data S', 'e is GR8 Degree', 'ien'}

### c. Use `str.join()` to Join Strings into a List
- It is the reverse of `str.split()` method, and is used to joing multiple strings by inserting the string in between on which this method is called

In [None]:
str1 = ""
help(str1.join)

In [46]:
tuple1 = {'This', 'is', 'getting', 'more', 'and', 'more', 'interesting'}
tuple1

{'This', 'and', 'getting', 'interesting', 'is', 'more'}

In [47]:
str2 = ' '.join(tuple1)
print(str2)
print(type(str2))

This more interesting is and getting
<class 'str'>


In [48]:
delimiter = " # "
str3 = delimiter.join(tuple1)
print(str3)
print(type(str3))

This # more # interesting # is # and # getting
<class 'str'>


## 9. You cannot call `sort()` and `reverse()` method on Sets being unordered in nature

## 10. Some Built-in functions that can be used on sets

In [115]:
s1 = set([3, 8, 1, 6, 0, 8, 4])

print("length of set: ", len(s1))
print("max element in set: ", max(s1))
print("min element in list: ",min(s1))
print("Sum of element in list: ",sum(s1))

# Membership (in) operator
rv1 = 9 in s1
print(rv1)


# Membership (in) operator
rv1 = 8 in s1
print(rv1)

length of set:  6
max element in set:  8
min element in list:  0
Sum of element in list:  22
False
True


## 11. Misc Concepts specifically related to Sets

### a. Union of sets
- A `s1.union(s2)` method or `s1 | s2`, returns a new set containing all values that are in s1, or s2, or both

In [50]:
s1 = set()
help(s1.union)

Help on built-in function union:

union(...) method of builtins.set instance
    Return the union of sets as a new set.
    
    (i.e. all elements that are in either set.)



In [49]:
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 | set2
set3 = set1.union(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 | set2: ", set3)

set1:  {'arif', 'rauf'}
set2:  {'hadeed', 'arif', 'maaz'}
set1 | set2:  {'hadeed', 'arif', 'maaz', 'rauf'}


### b. Intersection of sets
- A `s1.intersection(s2)` method or `s1 & s2`, returns a new set containing all values that are common in in s1 and s2

In [52]:
s1 = set()
help(s1.intersection)

Help on built-in function intersection:

intersection(...) method of builtins.set instance
    Return the intersection of two sets as a new set.
    
    (i.e. all elements that are in both sets.)



In [51]:
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 & set2
set4 = set1.intersection(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 & set2: ", set4)

set1:  {'arif', 'rauf'}
set2:  {'hadeed', 'arif', 'maaz'}
set1 & set2:  {'arif'}


### c. Difference of sets
- A `s1.difference(s2)` method or `s1 - s2`, returns a new set containing all values of s1 that are not there in s2

In [53]:
s1 = set()
help(s1.difference)

Help on built-in function difference:

difference(...) method of builtins.set instance
    Return the difference of two or more sets as a new set.
    
    (i.e. all elements that are in this set but not the others.)



In [54]:
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 - set2
set4 = set1.difference(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 - set2: ", set4)

set1:  {'arif', 'rauf'}
set2:  {'hadeed', 'arif', 'maaz'}
set1 - set2:  {'rauf'}


### d. Symmetric Difference of sets
- A `s1.symmetric_difference(s2)` method or `s1 ^ s2`, returns a new set containing all elements that are in exactly one of the sets, equivalent to `(s1 | s2)  - (s1 & s2)`

In [55]:
s1 = set()
help(s1.symmetric_difference)

Help on built-in function symmetric_difference:

symmetric_difference(...) method of builtins.set instance
    Return the symmetric difference of two sets as a new set.
    
    (i.e. all elements that are in exactly one of the sets.)



In [122]:
set1 = {'arif', 'rauf'}
set2 = {'maaz', 'hadeed', 'arif'}

set3 = set1 ^ set2
set4 = set1.symmetric_difference(set2)

print("set1: ", set1)
print("set2: ", set2)
print("set1 ^ set2: ", set4)

set1:  {'rauf', 'arif'}
set2:  {'hadeed', 'maaz', 'arif'}
set1 ^ set2:  {'rauf', 'hadeed', 'maaz'}


### e. Checking Subset
- The `s1.issubset(s2)` method or `s1 <= s2`, returns True if s1 is a subset of s2

In [56]:
s1 = set()
help(s1.issubset)

Help on built-in function issubset:

issubset(...) method of builtins.set instance
    Report whether another set contains this set.



In [57]:
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}

print(s2.issubset(s1))     # is s2 a subset of s1
print(s2 <= s1)            # is s2 a subset of s1

True
True


### f. Checking Superset
- The `s1.issuperset(s2)` method or `s1 >= s2`, returns True if s1 is a superset of s2

In [58]:
s1 = set()
help(s1.issuperset)

Help on built-in function issuperset:

issuperset(...) method of builtins.set instance
    Report whether this set contains another set.



In [126]:
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}

print(s1.issuperset(s2)) # is s1 a superset of s2
print(s1 >= s2)          # is s1 a superset of s2

True
True


### g. Checking Disjoint
- The `s1.isdisjoint(s2)` method, returns True if two sets have a null intersection

In [59]:
s1 = set()
help(s1.isdisjoint)

Help on built-in function isdisjoint:

isdisjoint(...) method of builtins.set instance
    Return True if two sets have a null intersection.



In [127]:
s1 = {1,2,3,4,5,6,7}
s2 = {1,2,3,4}
print(s1.isdisjoint(s2))

# Another example
s3 = {1,2,3,4}
s4 = {5,6,7,8}
print(s3.isdisjoint(s4))



False
True
