## Python Set Object

Sets are another standard Python data type that also store values. The major difference is that sets, unlike lists or tuples, cannot have multiple occurrences of the same element and store unordered values.

The set() is used to create set object in python.


In [1]:
## To initialize an empty set
emptySet = set()

In [4]:
## Using add() to add an element to set.
emptySet.add("R")
emptySet

{'R'}

In [6]:
## Creating two sets
dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'])
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])

In [7]:
dataScientist

{'Git', 'Python', 'R', 'SAS', 'SQL', 'Tableau'}

In [8]:
dataEngineer

{'Git', 'Hadoop', 'Java', 'Python', 'SQL', 'Scala'}

Set object can also be created using {}.  The curly braces are used to create dictionary object too, but we have to use key, value pair to create dictionary object.

In [9]:
dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
dataEngineer = {'Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'}

In [10]:
type(dataScientist)

set

In [11]:
type(dataEngineer)

set

In [1]:
# Initialize set with values
graphicDesigner = {'InDesign', 'Photoshop', 'Acrobat', 'Premiere', 'Bridge'}

In [2]:
## Adding value to exisitng set
graphicDesigner.add('Illustrator')
graphicDesigner

{'Acrobat', 'Bridge', 'Illustrator', 'InDesign', 'Photoshop', 'Premiere'}

If you notice the output, the set will automatically order the elements.

To remove an element from existing set, you can use either remove(), discard() or pop().

In [3]:
## Removing element using remove()
graphicDesigner.remove('Illustrator')
graphicDesigner

{'Acrobat', 'Bridge', 'InDesign', 'Photoshop', 'Premiere'}

In [4]:
## Removing element using discard()
graphicDesigner.discard('Premiere')
graphicDesigner

{'Acrobat', 'Bridge', 'InDesign', 'Photoshop'}

In [5]:
## First let us print the existing elements of the set
print(graphicDesigner)

## Let us remove an element from pop()
graphicDesigner.pop()
graphicDesigner

{'InDesign', 'Acrobat', 'Photoshop', 'Bridge'}


{'Acrobat', 'Bridge', 'Photoshop'}

In [21]:
## To remove all elements
graphicDesigner.clear()
graphicDesigner

set()

Python set is a collection object.  Hence we can iterate all the elements using for loop.

In [1]:
# Initialize a set
dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}

for skill in dataScientist:
    print(skill)

SAS
R
Git
Python
SQL
Tableau


In [6]:
list(dataScientist)[0:2]

['SAS', 'R']

In [24]:
## You can use sorted function to sort the values in a set
sorted(dataScientist, reverse=True)

['Tableau', 'SQL', 'SAS', 'R', 'Python', 'Git']

The List object will allow duplicates.  To remove duplicates, you can use set() to remove duplicates.

In [7]:
## Creating a list with duplicates
l1 = [1,2,3,5,3,2,1]
l1

[1, 2, 3, 5, 3, 2, 1]

In [8]:
## Converting the l1 as a set and the again converting as a list.
l1=list(set(l1))
l1

[1, 2, 3, 5]

### Set Operations

##### Union

Union will combine all elements from both sets without any duplicate.

In [9]:
dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'])
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])

In [7]:
combinedSkills = dataScientist.union(dataEngineer)
combinedSkills

{'Git', 'Hadoop', 'Java', 'Python', 'R', 'SAS', 'SQL', 'Scala', 'Tableau'}

##### Intersection

Intersection will provide the common elements from both the sets

In [8]:
differentialSkills = dataScientist.intersection(dataEngineer)
differentialSkills


{'Git', 'Python', 'SQL'}

##### isdisjoint()

isdisjoint() is used to check two sets if there is any common elements.

In [33]:
# Initialize a set
graphicDesigner = {'Illustrator', 'InDesign', 'Photoshop'}

# These sets have elements in common so it would return False
dataScientist.isdisjoint(dataEngineer)


False

In [34]:
# These sets have no elements in common so it would return True
dataScientist.isdisjoint(graphicDesigner)

True

##### difference

A difference of two sets dataScientist and dataEngineer, denoted dataScientist \ dataEngineer, is the set of all values of dataScientist that are not values of dataEngineer.

In [35]:
# Difference Operation
dataScientist.difference(dataEngineer)


{'R', 'SAS', 'Tableau'}

In [36]:
# Equivalent Result
dataScientist - dataEngineer

{'R', 'SAS', 'Tableau'}

##### symmetric_difference
A symmetric difference of two sets dataScientist and dataEngineer, denoted dataScientist △ dataEngineer, is the set of all values that are values of exactly one of two sets, but not both.  In other words, the symmetric_difference will combine both set elements without common elements.

(a union b) - (a intersection b)

In [37]:
# Symmetric Difference Operation
dataScientist.symmetric_difference(dataEngineer)



{'Hadoop', 'Java', 'R', 'SAS', 'Scala', 'Tableau'}

In [38]:
# Equivalent Result
dataScientist ^ dataEngineer

{'Hadoop', 'Java', 'R', 'SAS', 'Scala', 'Tableau'}

### Set Comprehension

In [39]:
{skill for skill in ['SQL', 'SQL', 'PYTHON', 'PYTHON']}

{'PYTHON', 'SQL'}

In [40]:
{skill for skill in ['GIT', 'PYTHON', 'SQL'] if skill not in {'GIT', 'PYTHON', 'JAVA'}}

{'SQL'}

In [41]:
# Initialize a list
possibleList = ['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS', 'Java', 'Spark', 'Scala']

# Membership test
'Python' in possibleList

True

In [42]:
# Initialize a set
possibleSet = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS', 'Java', 'Spark', 'Scala'}

# Membership test
'Python' in possibleSet

True

#### issubset

In [9]:
possibleSkills = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
mySkills = {'Python', 'R'}

In [10]:
mySkills.issubset(possibleSkills)

True

If all the elements of subset is present in parent set, then it will return true.  Otherwise it will return false.

In [11]:
possibleSkills = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
mySkills = {'Python', 'R','Power BI'}

In [12]:
## It will return False since the subset mySkills has an extra element which is not present in parent set.
mySkills.issubset(possibleSkills)

False