# Python Set

## Objectives

- Understand the set data type and its uniqueness constraint.
- Learn set operations for mathematical set theory.
- Explore practical use cases of sets in data processing.

## Background

Sets are unordered collections of unique elements, ideal for operations involving membership testing, deduplication, and mathematical operations like unions and intersections.

## Datasets Used

This notebook does not use external datasets. It explores sets as unordered collections of unique elements, details their operations and methods, and highlights their use in various applications like membership testing, data deduplication, and performing mathematical set operations.

## Set Definition

A set is a collection that is unordered and unindexed. 

Items in a set do not have a defined order. Items can appear in a different order every time you use them and cannot be referred to by index or key.

Sets are unchangeable, meaning that we cannot change the items after the set has been created.

Duplicate values are not allowed.

In Python, sets are written with curly brackets.

In [1]:
fruit = {'guava', 'banana', 'cherry'}
print(fruit)

{'guava', 'banana', 'cherry'}


In [2]:
type(fruit)

set

In [3]:
'banana' in fruit

True

In [4]:
'mango' in fruit

False

## Python methods for working with sets

**add()**: add an item to a set

In [5]:
fruit.add('mango')
fruit

{'banana', 'cherry', 'guava', 'mango'}

**update()**: add more than one item to the set

In [6]:
fruit.update(['orange','grapes'])
fruit

{'banana', 'cherry', 'grapes', 'guava', 'mango', 'orange'}

Remember, sets do not contain repeated items:

In [7]:
fruit.add('mango')
fruit

{'banana', 'cherry', 'grapes', 'guava', 'mango', 'orange'}

`mango` already belongs to fruit set. If you try to add it, you will get the same set.

**len()**: determine how many items the set has

In [8]:
print(len(fruit)) 

6


**remove()**: remove an item in a set

In [9]:
fruit.remove('banana')
fruit

{'cherry', 'grapes', 'guava', 'mango', 'orange'}

If the item to remove does not exist, **remove()** will raise an error.

In [10]:
fruit.remove('banana')       # This will raise an error

KeyError: 'banana'

**discard()**: remove an item in a set. If the item does not exist, discard will not raise an error

In [11]:
fruit.discard('banana')
fruit

{'cherry', 'grapes', 'guava', 'mango', 'orange'}

In [12]:
fruit.discard('mango')
fruit

{'cherry', 'grapes', 'guava', 'orange'}

**clear()**: empties the set

In [13]:
fruit.clear()
fruit

set()

**del**: keyword will delete the set completely

In [14]:
del fruit

Now, fruit is not defined. If you call fruit, it will raise an error.

In [15]:
fruit           # This will raise an error

NameError: name 'fruit' is not defined

A set can contain different data types.

In [16]:
set3 = {'John', 20, True, 30.33, 'male'}
print(type(set3))

<class 'set'>


In [17]:
# l is a list
l = [2, 3, 3, 4.0, 'Peter']
l

[2, 3, 3, 4.0, 'Peter']

In [18]:
# converting l into a set
set(l)

{2, 3, 4.0, 'Peter'}

Notice there is only one 3. Sets do not allow duplicates!

In [19]:
# t is a tuple
t = (2, 3, 3, 4.0, 'Peter')
t

(2, 3, 3, 4.0, 'Peter')

In [20]:
# converting t into a set
set(t)

{2, 3, 4.0, 'Peter'}

Sets cannot contain mutable objects

In [21]:
# You cannot add a list to a set (because a list is mutable).
set3.add(l)

TypeError: unhashable type: 'list'

In [22]:
# You cant add a tupple to a set (because a tupple is immutable)
set3.add(t)
set3

{(2, 3, 3, 4.0, 'Peter'), 20, 30.33, 'John', True, 'male'}

### Common Operations on Sets

In [23]:
set1 = {1, 2, 3}
set2 = {3, 4, 5}

 **union()**: join two sets. Remember sets do not have repeated items

In [24]:
#union
uni = set1.union(set2)
uni

{1, 2, 3, 4, 5}

Notice that element `3` appears only once. Sets do not allow duplicates!

In [25]:
# The set union is commutative.
set2.union(set1)

{1, 2, 3, 4, 5}

**intersection()**: returns the intersection of two sets. The intersection set contains items in both sets.

In [26]:
#intersection
inter = set1.intersection(set2)
inter

{3}

In [27]:
# The set intersection is commutative.
set2.intersection(set1)

{3}

**difference()**: returns a set containing the difference between two or more sets. 

A-B, is the set of all the elements of set A that are not in set B.

In [28]:
# difference
set1.difference(set2)

{1, 2}

In [29]:
# The set difference is not commutative,
set2.difference(set1)

{4, 5}

**symmetric_difference()**: returns a set containing the symmetric difference of two sets.

The symmetric difference is the set of elements in any of the sets but not in the intersection.

In [30]:
# symmetric_difference
set1.symmetric_difference(set2)

{1, 2, 4, 5}

In [31]:
# The Symmetric Difference is commutative.
set2.symmetric_difference(set1)

{1, 2, 4, 5}

Notice that both results are the same.

The symmetric difference between two sets is a disjunctive union. 

`symmetric_difference = union - intersection`

In [32]:
# symmetric_difference = union - intersection
set1.union(set2).difference(set1.intersection(set2)) 

{1, 2, 4, 5}

In [33]:
# symmetric_difference = union - intersection
uni - inter

{1, 2, 4, 5}

## Conclusions

Key Takeaways:
- Sets in Python enforce uniqueness among their elements, automatically removing duplicates
- The elements in a set do not maintain any order, emphasizing their utility when the data sequence is irrelevant, but data integrity is paramount.
- Python sets support mathematical set operations such as union, intersection, difference, and symmetric difference
- Sets offer highly efficient membership testing
- Sets are mutable, allowing for adding and removing elements after their creation, facilitating dynamic data manipulation.

## References

- VanderPlas, J. (2017) Python Data Science Handbook: Essential Tools for Working with Data. USA: O’Reilly Media, Inc. 