### Dictionaries and Sets

Sequence data types like lists, tuples and strings are ordered. Ordering can be useful in some cases, such as if your data is sorted or has some other natural sense of ordering, but it comes at a price. When you search through sequences like lists, your computer has to go through each element one at a time to find an object you're looking for.

Consider the following code:

In [3]:
my_list = [1,2,3,4,5,6,7,8,9,10]

0 in my_list

False

When running the code above, Python has to search through the entire list, one item at a time before it returns that 0 is not in the list. This sequential searching isn't much of a concern with small lists like this one, but if you're working with data that contains thousands or millions of values, it can add up quickly.

Dictionaries and sets are unordered Python data structures that solve this issue using a technique called  hashing. We won't go into the details of their implementation, but dictionaries and sets let you check whether they contain objects without having to search through each element one at a time, at the cost of having no order and using a bit more system memory.

In [7]:
my_dict = {"name": "Joe",
           "age": 10, 
           "city": "Paris"}

print(my_dict)

{'name': 'Joe', 'age': 10, 'city': 'Paris'}


In [9]:
my_dict["name"]

'Joe'

In [11]:
my_dict["new_key"] = "new_value"

print(my_dict)

{'name': 'Joe', 'age': 10, 'city': 'Paris', 'new_key': 'new_value'}


In [14]:
del my_dict["new_key"]

print(my_dict)

{'name': 'Joe', 'age': 10, 'city': 'Paris'}


In [16]:
len(my_dict)

3

In [18]:
"name" in my_dict

True

In [22]:
my_dict.keys()

dict_keys(['name', 'age', 'city'])

In [24]:
my_dict.values()

dict_values(['Joe', 10, 'Paris'])

In [26]:
my_dict.items()

dict_items([('name', 'Joe'), ('age', 10), ('city', 'Paris')])

Real world data often comes in the form tables of rows and columns, where each column specifies a different data feature like name or age and each row represents an individual record. We can encode this sort of tabular data in a dictionary by assigning each column label a key and then storing the column values as a list.

Consider the following table:

name     age    city
Joe         10      Paris

Bob        15      New York
Harry     20     Tokyo

In [32]:
my_table = {
    "name": ["Joe", "Bob", "Harry"],
    "age": [10,15,20] , 
    "city": ["Paris", "New York", "Tokyo"]}


### Sets
Sets are unordered, mutable collections of immutable objects that cannot contain duplicates. Sets are useful for storing and performing operations on data where each value is unique. Create a set with a comma separated sequence of values within curly braces:

In [35]:
my_set = {1,2,3,4,5,6,7}

type(my_set)

set

In [37]:
my_set.add(8)

my_set

{1, 2, 3, 4, 5, 6, 7, 8}

In [39]:
my_set.remove(7)

my_set

{1, 2, 3, 4, 5, 6, 8}

In [41]:
6 in my_set

True

One of the main purposes of sets is to perform set operations that compare or combine different sets. Python sets support many common mathematical set operations like union, intersection, difference and checking whether one set is a subset of another:

In [44]:
set1 = {1,3,5,6}
set2 = {1,2,3,4}

set1.union(set2)          # Get the union of two sets

{1, 2, 3, 4, 5, 6}

In [46]:
set1.intersection(set2)   # Get the intersection of two sets

{1, 3}

In [48]:
set1.difference(set2) 

{5, 6}

In [50]:
set1.issubset(set2)

False

You can convert a list into a set using the set() function. Converting a list to a set drops any duplicate elements in the list. This can be a useful way to strip unwanted duplicate items or count the number of unique elements in a list. I can also be useful to convert a list to a set if you plan to lookup items repeatedly, since membership lookups are faster with sets than lists.

In [53]:
my_list = [1,2,2,2,3,3,4,5,5,5,6]

set(my_list)

{1, 2, 3, 4, 5, 6}