# Sets
* Author:  Johannes Maucher
* Last update: 03.07.2017

A set is an unordered collection of distinct objects, i.e. no object can be contained more than once in a set. In contrast to other collection-datatypes in Python, set elements can not be accessed by an index (there is nothing like an index in unordered collections).

## Generate Sets
An empty set can be generated by

In [2]:
mySet=set()
print mySet

set([])


Often for a given list, the set of distinct objects in this list has to be determined. For this the list-object `L` is past as argument to the `set()`-function. For example in text-processing the text is first available as a long string-variable. Using the `split()`-method of string segments the string into a list of words. The corresponding vocabulary (set of used words) can then be calculated as follows:

In [13]:
S="this is a small text but imagine it is a long text" #text available as string
L=S.split() #split string into list of words
print L
print len(L)
vocabulary=set(L) # Determine set of distinct words (=vocabulary)
print vocabulary
print len(vocabulary)

['this', 'is', 'a', 'small', 'text', 'but', 'imagine', 'it', 'is', 'a', 'long', 'text']
12
set(['a', 'this', 'text', 'is', 'it', 'but', 'long', 'imagine', 'small'])
9


## Operations on Sets
As already shown above `len(s)` returns the number of elements in a set. The methods `set.add(el)` and `set.discard()` can be applied to add and remove single elements into/from a set, respectively. For removing all elements of a set `set.clear()` can be applied. Shallow copies of sets can be generated by `set.copy()`.

In [14]:
vocab2=vocabulary.copy()
print vocab2
vocab2.add('new')
print vocab2
vocab2.add('new')
print vocab2
vocab2.remove('this')
print vocab2

set(['a', 'this', 'text', 'is', 'it', 'but', 'long', 'imagine', 'small'])
set(['a', 'this', 'text', 'is', 'it', 'but', 'long', 'imagine', 'small', 'new'])
set(['a', 'this', 'text', 'is', 'it', 'but', 'long', 'imagine', 'small', 'new'])
set(['a', 'text', 'is', 'it', 'but', 'long', 'imagine', 'small', 'new'])


**Check membership:**

In [16]:
print 'a' in vocab2
print 'A' in vocab2

True
False


**Check subset:**

In [21]:
print vocab2<vocabulary  #true if vocab 2 is subset of vocabulary
print vocab2<=vocabulary #true if vocab 2 is real subset of vocabulary (not equal)
print set(['a','text']) <= vocab2

False
False
True


**Union of two sets:** All elements, which are contained in the first **or** in the second set:

In [22]:
vocabUnion=vocab2|vocabulary
print vocabUnion

set(['a', 'this', 'text', 'is', 'it', 'but', 'long', 'imagine', 'small', 'new'])


**Intersection of two sets:** All elements, which are contained in the first **and** in the second set:

In [23]:
vocabIntersect=vocab2&vocabulary
print vocabIntersect

set(['a', 'text', 'is', 'it', 'long', 'but', 'imagine', 'small'])


**Set difference:** All elements of the first set, except the ones, which are contained also in the second set:

In [24]:
vocabDiff=vocabulary-vocab2
print vocabDiff

set(['this'])


**Exclusive Union:** All elements, except the ones, which are contained in both sets:

In [25]:
vocabEx=vocabulary^vocab2
print vocabEx

set(['this', 'new'])


## Convert set to list
As mentioned above, set elements can not be accessed by an index or another key. If access to elements is required the set is usually transformed to a list. This can be implemented as follows:

In [27]:
vocab2List=list(vocab2)
print vocab2List
print vocab2List[2]

['a', 'text', 'is', 'it', 'but', 'long', 'imagine', 'small', 'new']
is


## Immutable sets
Besides the datatype `set` there exists a second type related to sets - the `frozenset`. Sets of type `frozenset` are immutable, i.e. it's contends can not be modified after it is created.

In [30]:
fset=frozenset(['a','b','c'])
print fset

frozenset(['a', 'c', 'b'])


The attempt to modify a `frozenset` yields an error:

In [31]:
fset.add('d')

AttributeError: 'frozenset' object has no attribute 'add'