### Set
- Set is an unordered collection of data type that is iterable, mutable, and has no duplicate elements. The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set.
    - Sets are an unordered collection of elements or unintended collection of items In python.
    - Here the order in which the elements are added into the set is not fixed, it can change frequently.
    - It is defined under curly braces{}
    - Sets are mutable, however, only immutable objects can be stored in it.

### Initialize a Set
Sets are a mutable collection of distinct (unique) immutable values that are unordered.
You can initialize an empty set by using set().

In [1]:
emptySet = set()

To intialize a set with values, you can pass in a list to set().

In [2]:
dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'])
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])

In [3]:
# set built-in function union
dataScientist.union(dataEngineer)

{'Git', 'Hadoop', 'Java', 'Python', 'R', 'SAS', 'SQL', 'Scala', 'Tableau'}

In [4]:
# Intersection operation
dataScientist.intersection(dataEngineer)

{'Git', 'Python', 'SQL'}

In [5]:
# Initialize a set
graphicDesigner = {'Illustrator', 'InDesign', 'Photoshop'}

# These sets have elements in common so it would return False
print(dataScientist.isdisjoint(dataEngineer))

# These sets have no elements in common so it would return True
print(dataScientist.isdisjoint(graphicDesigner))

False
True


In [6]:
# Difference Operation
dataScientist.difference(dataEngineer)

{'R', 'SAS', 'Tableau'}

In [7]:
# Symmetric Difference Operation
dataScientist.symmetric_difference(dataEngineer)

{'Hadoop', 'Java', 'R', 'SAS', 'Scala', 'Tableau'}

### Add and Remove Values from Sets

In [8]:
# Initialize set with values
graphicDesigner = {'InDesign', 'Photoshop', 'Acrobat', 'Premiere', 'Bridge'}

In [9]:
graphicDesigner.add('Illustrator')

In [10]:
graphicDesigner

{'Acrobat', 'Bridge', 'Illustrator', 'InDesign', 'Photoshop', 'Premiere'}

In is important to note that you can only add a value that is immutable (like a string or a tuple) to a set. For example, you would get a TypeError if you try to add a list to a set.

### Remove Values from a Set

In [11]:
graphicDesigner.remove('Illustrator')

In [12]:
graphicDesigner

{'Acrobat', 'Bridge', 'InDesign', 'Photoshop', 'Premiere'}

The drawback of this method is that if you try to remove a value that is not in your set, you will get a KeyError.

In [13]:
graphicDesigner.remove('Illustrator')

KeyError: 'Illustrator'

You can use the discard method to remove a value from a set.

In [14]:
graphicDesigner.discard('Premiere')

In [15]:
graphicDesigner

{'Acrobat', 'Bridge', 'InDesign', 'Photoshop'}

The benefit of this approach over the remove method is if you try to remove a value that is not part of the set, you will not get a KeyError. If you are familiar with dictionaries, you might find that this works similarly to the dictionary method get.

**Option 3**: You can also use the pop method to **remove and return** an arbitrary value from a set.

It is important to note that the method raises a KeyError if the set is empty.

In [16]:
graphicDesigner.pop()

'InDesign'

In [17]:
graphicDesigner

{'Acrobat', 'Bridge', 'Photoshop'}

### Remove All Values from a Set
You can use the clear method to remove all values from a set.

In [18]:
graphicDesigner.clear()

In [19]:
graphicDesigner

set()

### Iterate through a Set
Like many standard python data types, it is possible to iterate through a set.

In [20]:
# Initialize a set
dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}

for skill in dataScientist:
    print(skill)

Git
R
SQL
Python
SAS
Tableau


If you look at the output of printing each of the values in dataScientist, notice that the values printed in the set are not in the order they were added in. This is because sets are unordered.

### Transform Set into Ordered Values
If you find that you need to get the values from your set in an ordered form, you can use the sorted function which outputs a list that is ordered.

In [21]:
type(sorted(dataScientist))

list

In [22]:
dataScientist

{'Git', 'Python', 'R', 'SAS', 'SQL', 'Tableau'}

The code below outputs the values in the set dataScientist in descending alphabetical order (Z-A in this case).

In [23]:
sorted(dataScientist, reverse = True)

['Tableau', 'SQL', 'SAS', 'R', 'Python', 'Git']

### Remove Duplicates from a List

In [24]:
print(list(set([1, 2, 3, 1, 7])))

[1, 2, 3, 7]


In [25]:
set([1, 2, 3, 1, 7])

{1, 2, 3, 7}

In [26]:
numberList =  [1,1,2,3,4,5,6,6,7,8,9,9,0]
print(list(set(numberList)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Set Operation Methods

- A common use of sets in Python is computing standard math operations such as union, intersection, difference, and symmetric difference. 
- Python sets have methods that allow you to perform these mathematical operations as well as operators that give you equivalent results.
- Before exploring these methods, let's start by initializing two sets dataScientist and dataEngineer.

In [27]:
dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'])
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])

A union, denoted dataScientist ∪ dataEngineer, is the set of all values that are values of dataScientist, or dataEngineer, or both. You can use the union method to find out all the unique values in two sets.

In [28]:
# set built-in function union
print(dataScientist.union(dataEngineer))

# Equivalent Result
print(dataScientist | dataEngineer)

{'Java', 'Git', 'R', 'Hadoop', 'SQL', 'Python', 'SAS', 'Tableau', 'Scala'}
{'Java', 'Git', 'R', 'Hadoop', 'SQL', 'Python', 'SAS', 'Tableau', 'Scala'}


An intersection of two sets dataScientist and dataEngineer, denoted dataScientist ∩ dataEngineer, is the set of all values that are values of both dataScientist and dataEngineer.

In [29]:
# Intersection operation
print(dataScientist.intersection(dataEngineer))

# Equivalent Result
print(dataScientist & dataEngineer)

{'Python', 'Git', 'SQL'}
{'Python', 'Git', 'SQL'}


You may find that you come across a case where you want to make sure that two sets have no value in common. In order words, you want two sets that have an intersection that is empty. These two sets are called disjoint sets. You can test for disjoint sets by using the isdisjoint method.

In [30]:
# Initialize a set
graphicDesigner = {'Illustrator', 'InDesign', 'Photoshop'}

# These sets have elements in common so it would return False
print(dataScientist.isdisjoint(dataEngineer))

# These sets have no elements in common so it would return True
print(dataScientist.isdisjoint(graphicDesigner))

False
True


A difference of two sets dataScientist and dataEngineer, denoted dataScientist \ dataEngineer, is the set of all values of dataScientist that are not values of dataEngineer.

In [31]:
# Difference Operation
print(dataScientist.difference(dataEngineer))

# Equivalent Result
print(dataScientist - dataEngineer)

{'SAS', 'Tableau', 'R'}
{'SAS', 'Tableau', 'R'}


A symmetric difference of two sets dataScientist and dataEngineer, denoted dataScientist △ dataEngineer, is the set of all values that are values of exactly one of two sets, but not both.

In [32]:
# Symmetric Difference Operation
print(dataScientist.symmetric_difference(dataEngineer))

# Equivalent Result
print(dataScientist ^ dataEngineer)

{'Hadoop', 'Java', 'SAS', 'Tableau', 'Scala', 'R'}
{'Hadoop', 'Java', 'SAS', 'Tableau', 'Scala', 'R'}


You may have previously have learned about list comprehensions, dictionary comprehensions, and generator comprehensions. There is also set comprehensions. Set comprehensions are very similar. Set comprehensions in Python can be constructed as follows:

### Set Comprehension

In [33]:
{skill for skill in ['SQL', 'SQL', 'PYTHON', 'PYTHON']}

{'PYTHON', 'SQL'}

The output above is a set of 2 values because sets cannot have multiple occurences of the same element.

The idea behind using set comprehensions is to let you write and reason in code the same way you would do mathematics by hand.

In [34]:
{skill for skill in ['GIT', 'PYTHON', 'SQL'] if skill not in {'GIT', 'PYTHON', 'JAVA'}}

{'SQL'}

Membership tests check whether a specific element is contained in a sequence, such as strings, lists, tuples, or sets. One of the main advantages of using sets in Python is that they are highly optimized for membership tests. For example, sets do membership tests a lot more efficiently than lists. In case you are from a computer science background, this is because the average case time complexity of membership tests in sets are O(1) vs O(n) for lists.

In [35]:
# Initialize a list
possibleList = ['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS', 'Java', 'Spark', 'Scala']

# Membership test
'Python' in possibleList

True

Since possibleSet is a set and the value 'Python' is a value of possibleSet, this can be denoted as 'Python' ∈ possibleSet.

If you had a value that wasn't part of the set, like 'Fortran', it would be denoted as 'Fortran' ∉ possibleSet.

In [36]:
possibleSkills = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
mySkills = {'Python', 'R'}

If every value of the set mySkills is also a value of the set possibleSkills, then mySkills is said to be a subset of possibleSkills, mathematically written mySkills ⊆ possibleSkills.

You can check to see if one set is a subset of another using the method issubset.

In [37]:
mySkills.issubset(possibleSkills)

True

In [38]:
# Nested Lists and Tuples
nestedLists = [['the', 12], ['to', 11], ['of', 9], ['and', 7], ['that', 6]]
nestedTuples = (('the', 12), ('to', 11), ('of', 9), ('and', 7), ('that', 6))

A frozenset is very similar to a set except that a frozenset is immutable. You make a frozenset by using frozenset().
The problem with nested sets is that you cannot normally have nested sets as sets cannot contain mutable values including sets.

In [39]:
nestedSets = set([set()])

TypeError: unhashable type: 'set'

This is one situation where you may wish to use a frozenset. A frozenset is very similar to a set except that a frozenset is immutable.

You make a frozenset by using frozenset().

In [40]:
# Initialize a frozenset
immutableSet = frozenset()

In [41]:
nestedSets = set([frozenset()])

In [42]:
nestedSets

{frozenset()}

It is important to keep in mind that a major disadvantage of a frozenset is that since they are immutable, it means that you cannot add or remove values.

The Python sets are highly useful to efficiently remove duplicate values from a collection like a list and to perform common math operations like unions and intersections. Some of the challenges people often encounter are when to use the various data types.

### List

- A list is an ordered set of values, where each value is identified by an index. The values that make up a list are called its elements. Lists are similar to strings, which are ordered sets of characters, except that the elements of a list can have any type.
- Limitation of List: The list has the limitation that one can only append at the end. But, in real life, there are situations that a developer has to add items at the starting of the existing list which becomes difficult in the list.
- Lists and tuples are standard Python data types that store values in a sequence. Sets are another standard Python data type that also store values. The major difference is that sets, unlike lists or tuples, cannot have multiple occurrences of the same element and store unordered values.
- Lists are just like dynamic sized arrays, declared in other languages (vector in C++ and ArrayList in Java). Lists need not be homogeneous always which makes it the most powerful tool in Python.
    - The list is a datatype available in Python which can be written as a list of comma-separated values (items) between square brackets.
    - List are mutable .i.e it can be converted into another data type and can store any data element in it.
    - List can store any type of element.

### Python Dictionary
- A dictionary in Python is a collection of items accessed by a specific key rather than by index.
- The keys in a dictionary have to be hashable.
- The items in a dictionary can have any data type.
- Important to remember is that a key has to be unique in a dictionary, no duplicates are allowed. However, in case of duplicate keys rather than giving an error, Python will take the last instance of the key to be valid and simply ignore the first key-value pair. 

In [43]:
a = {'apple': 'fruit', 'beetroot': 'vegetable', 'cake': 'dessert'}
a['doughnut'] = 'snack'
print(a['apple'])

fruit


In [44]:
a

{'apple': 'fruit',
 'beetroot': 'vegetable',
 'cake': 'dessert',
 'doughnut': 'snack'}

In [45]:
a = {'one': 1, 'two': 'to', 'three': 3.0, 'four': [4,4.0]}
print(a)

{'one': 1, 'two': 'to', 'three': 3.0, 'four': [4, 4.0]}


In [46]:
# Update a dictionary
a['one'] = 1.0 
print(a)

{'one': 1.0, 'two': 'to', 'three': 3.0, 'four': [4, 4.0]}


In [47]:
# Delete a single element
del a['one'] 
print(a)

{'two': 'to', 'three': 3.0, 'four': [4, 4.0]}


In [48]:
# Delete all elements in the dictionary
a.clear()
print(a)

{}


In [49]:
# Delete the dictionary
del a 
# print(a)

In [50]:
print(a)

NameError: name 'a' is not defined

In [51]:
sweet_dict = {'a1': 'cake', 'a2':'cookie', 'a1': 'icecream'}
print(sweet_dict['a1'])

icecream


In [52]:
sweet_dict

{'a1': 'icecream', 'a2': 'cookie'}

### Python Dictionary Comprehension
- Dictionary comprehension is a method for transforming one dictionary into another dictionary. During this transformation, items within the original dictionary can be conditionally included in the new dictionary and each item can be transformed as needed.
- A good list comprehension can make your code more expressive and thus, easier to read. The key with creating comprehensions is to not let them get so complex that your head spins when you try to decipher what they are actually doing. Keeping the idea of "easy to read" alive.
- The way to do dictionary comprehension in Python is to be able to access the key objects and the value objects of a dictionary.
- Dictionary comprehension is a powerful concept and can be used to substitute for loops and lambda functions. However, not all for loop can be written as a dictionary comprehension but all dictionary comprehension can be written with a for loop.

In [53]:
dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
# Put all keys of `dict1` in a list and returns the list
dict1.keys()

dict_keys(['a', 'b', 'c', 'd'])

In [54]:
# Put all values saved in `dict1` in a list and returns the list
dict1.values()

dict_values([1, 2, 3, 4])

In [55]:
dict1.items()

dict_items([('a', 1), ('b', 2), ('c', 3), ('d', 4)])

In [56]:
dict_variable = {key:value for (key,value) in dict1.items()}
print(dict_variable)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}


In [57]:
dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
# Double each value in the dictionary
double_dict1 = {k:v*2 for (k,v) in dict1.items()}
print(double_dict1)

{'a': 2, 'b': 4, 'c': 6, 'd': 8, 'e': 10}


In [58]:
dict1_keys = {k*2:v for (k,v) in dict1.items()}
print(dict1_keys)

{'aa': 1, 'bb': 2, 'cc': 3, 'dd': 4, 'ee': 5}


In [59]:
numbers = range(10)
new_dict_for = {}

# Add values to `new_dict` using for loop
for n in numbers:
    if n%2==0:
        new_dict_for[n] = n**2

print(new_dict_for)

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}


In [60]:
# Use dictionary comprehension
new_dict_comp = {n:n**2 for n in numbers if n%2 == 0}

print(new_dict_comp)

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}


In [61]:
# Initialize `fahrenheit` dictionary 
fahrenheit = {'t1':-30, 't2':-20, 't3':-10, 't4':0}

#Get the corresponding `celsius` values
celsius = list(map(lambda x: (float(5)/9)*(x-32), fahrenheit.values()))

#Create the `celsius` dictionary
celsius_dict = dict(zip(fahrenheit.keys(), celsius))

print(celsius_dict)

{'t1': -34.44444444444444, 't2': -28.88888888888889, 't3': -23.333333333333336, 't4': -17.77777777777778}


In [62]:
# Initialize the `fahrenheit` dictionary 
fahrenheit = {'t1': -30,'t2': -20,'t3': -10,'t4': 0}

# Get the corresponding `celsius` values and create the new dictionary
celsius = {k:(float(5)/9)*(v-32) for (k,v) in fahrenheit.items()}

print(celsius_dict)

{'t1': -34.44444444444444, 't2': -28.88888888888889, 't3': -23.333333333333336, 't4': -17.77777777777778}


In [63]:
# Adding Conditionals to Dictionary Comprehension
dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

# Check for items greater than 2
dict1_cond = {k:v for (k,v) in dict1.items() if v>2}

print(dict1_cond)

{'c': 3, 'd': 4, 'e': 5}


In [64]:
# Multiple If Conditions
dict1_doubleCond = {k:v for (k,v) in dict1.items() if v>2 if v%2 == 0}
print(dict1_doubleCond)

{'d': 4}


In [65]:
dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f':6}

dict1_tripleCond = {k:v for (k,v) in dict1.items() if v>2 if v%2 == 0 if v%3 == 0}

print(dict1_tripleCond)

{'f': 6}


In [66]:
dict1_tripleCond = {}

for (k,v) in dict1.items():
    if (v>=2 and v%2 == 0 and v%3 == 0):
            dict1_tripleCond[k] = v

print(dict1_tripleCond)

{'f': 6}


In [67]:
# If-Else Conditions
dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f':6}

# Identify odd and even entries
dict1_tripleCond = {k:('even' if v%2==0 else 'odd') for (k,v) in dict1.items()}

print(dict1_tripleCond)

{'a': 'odd', 'b': 'even', 'c': 'odd', 'd': 'even', 'e': 'odd', 'f': 'even'}


In [68]:
# Nested Dictionary Comprehension
nested_dict = {'first':{'a':1}, 'second':{'b':2}}
float_dict = {outer_k: {float(inner_v) for (inner_k, inner_v) in outer_v.items()} for (outer_k, outer_v) in nested_dict.items()}
print(float_dict)

{'first': {1.0}, 'second': {2.0}}


In [69]:
nested_dict = {'first':{'a':1}, 'second':{'b':2}}

for (outer_k, outer_v) in nested_dict.items():
    for (inner_k, inner_v) in outer_v.items():
        outer_v.update({inner_k: float(inner_v)})
nested_dict.update({outer_k:outer_v})

print(nested_dict)

{'first': {'a': 1.0}, 'second': {'b': 2.0}}


### Tuple
- Tuples are more memory efficient than the lists. When it comes to the time efficiency, again tuples have a slight advantage over the lists especially when lookup to a value is considered. If you have data which is not meant to be changed in the first place, you should choose tuple data type over lists.
- Tuples are faster than lists. We should use Tuple instead of a List if we are defining a constant set of values and all we are ever going to do with it is iterate through it. If we need an array of elements to be used as dictionary keys, we can use Tuples.
- Tuples are an ordered sequences of items, just like lists. The main difference between tuples and lists is that tuples cannot be changed (immutable) unlike lists which can (mutable).

In [70]:
# Convert Python Tuple to Dictionary
tup = ((11, "eleven"), (21, "mike"), (19, "dustin"), (46, "caleb"))
print(tup)

dct = dict((y, x) for x, y in tup)
print(dct)

((11, 'eleven'), (21, 'mike'), (19, 'dustin'), (46, 'caleb'))
{'eleven': 11, 'mike': 21, 'dustin': 19, 'caleb': 46}


In [71]:
# Using dict(), map() and reversed() method
tup = ((11, "eleven"), (21, "mike"), (19, "dustin"), (46, "caleb"))
print(tup)

dct = dict(map(reversed, tup))
print(dct)

((11, 'eleven'), (21, 'mike'), (19, 'dustin'), (46, 'caleb'))
{'eleven': 11, 'mike': 21, 'dustin': 19, 'caleb': 46}


In [72]:
# Convert a list of Tuples into Dictionary
def conversion(tup, dict):
    for x, y in tup:
        dict.setdefault(x, []).append(y)
    return dict


tups = [("Boba", 21), ("Din", 19), ("Grogu", 46), ("Ahsoka", 11)]

dictionary = {}
print(conversion(tups, dictionary))

{'Boba': [21], 'Din': [19], 'Grogu': [46], 'Ahsoka': [11]}
