#Data Structures: Lists, Tuples, and Sets

In previous chapters, we have discussed Python’s basic data types: booleans, integers, floats, and strings. If you think of those as atoms, the data structures in this chapter are like molecules. That is, we combine those basic types in more complex ways. You will use these every day.

This chapter introduces a new concept: data structures. A data structure is a collection of data elements (such as numbers or characters, or even other data structures) that is structured in some way, such as by numbering the elements. The most basic data structure in Python is the **sequence** (e.g., string, list and tuple). Each element of a sequence is assigned a number — its position, or index. The first index is zero, the second index is one, and so forth. Some programming languages number their sequence elements starting with one, but the zero-indexing convention has a natural interpretation of an offset from the beginning of the sequence, with negative indexes wrapping around to the end. 

This chapter begins with an overview of sequences and then covers some operations that are common to all sequences, including lists, tuples and sets. These operations will also work with strings, which will be used in some of the examples, although for a full treatment of string operations, you have to wait until the next chapter. After dealing with these basics, we start working with lists and see what’s special about them. And after lists, we come to tuples, a special-purpose type of sequence similar to lists, except that you can’t change them.

Python has several built-in types of sequences. This chapter concentrates on two of the most common ones: lists and tuples. Strings are another important type, which we will also briefly discuss here.

The main difference between lists and tuples is that you can change a list, but you can’t change a tuple. This means a list might be useful if you need to add elements as you go along, while a tuple can be useful if, for some reason, you can’t allow the sequence to change. Reasons for the latter are usually rather technical, having to do with how things work internally in Python. That’s why you may see built-in functions returning tuples. For your own programs, chances are you can use lists instead of tuples in almost all circumstances. (One notable exception is using tuples as dictionary keys. There lists aren’t allowed, because you aren’t allowed to modify keys.)

Sequences are useful when you want to work with a collection of values. You might have a sequence representing students in a database, with the first element being their names and the second their grades. Written as a list (the items of a list are separated by commas and enclosed in square brackets), that would look like this:

In [None]:
tom = ['Tom', 22]
john = ['John', 21]

But sequences can contain other sequences, too, so you could make a list of such students, which would be your student record database.

In [None]:
stuRecordDB = [tom, john]

##Common Sequence Operations

There are certain things you can do with all sequence types. These operations include indexing, slicing, adding, multiplying, and checking for membership. In addition, Python has built-in functions for finding the length of a sequence and for finding its largest and smallest elements.

###Indexing

All elements in a sequence are numbered—from zero and upward. You can access them individually with a number, like this:

In [None]:
greeting = 'Hello'
greeting[0]

'H'

**Note:** a string is just a sequence of characters. the index 0 refers to the first element, in this case the letter H.  unlike some other languages, there is no separate character type, though. a character is just a single-element string.

This is called indexing. You use an index to fetch an element. All sequences can be indexed in this way. When you use a negative index, Python counts from the right, that is, from the last element. The last element is at position –1.

In [None]:
greeting[-1]

'o'

String literals (and other sequence literals, for that matter) may be indexed directly, without using a variable to refer to them. The effect is exactly the same.

In [None]:
 'Hello'[1]

'e'

###Slicing

Just as you use indexing to access individual elements, you can use slicing to access ranges of elements. You do this by using two indices, separated by a colon. The first index is inclusive; and the second index is exclusive. 

In [None]:
tag = "My PC IP is: 10.0.0.1"
ipStr = tag[13: 21]
print(ipStr)

10.0.0.1


In [2]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(numbers[-3:-1])
print(numbers[-3:])
print(numbers[:3])
print(numbers[:])

[8, 9]
[8, 9, 10]
[1, 2, 3]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


###Longer Steps

When slicing, you specify (either explicitly or implicitly) the start and end points of the slice. Another parameter, which normally is left implicit, is the step length. In a regular slice, the step length is one, which means that the slice “moves” from one element to the next, returning all the elements between the start and end.

In [3]:
numbers[0:10:1]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
numbers[0:10:2]

[1, 3, 5, 7, 9]

You can still use the shortcuts mentioned earlier. For example, if you want every fourth element of a sequence, you need to supply only a step size of four.

In [None]:
numbers[::4]

[1, 5, 9]

In [None]:
numbers[10:0:-2]

[10, 8, 6, 4, 2]

###Adding Sequences (concatenation)

Sequences can be concatenated with the addition (plus) operator.

In [None]:
[1, 2, 3] + [4, 5,  6]

[1, 2, 3, 4, 5, 6]

In [None]:
'Hello,' + 'world!'

'Hello,world!'

In [None]:
[1, 2, 3] + 'world!'

TypeError: ignored

In [None]:
[1, 2, 3] + ['world!']

[1, 2, 3, 'world!']

###Multiplication

Multiplying a sequence by a number x creates a new sequence where the original sequence is repeated x times:

In [None]:
 'Python' * 5

'PythonPythonPythonPythonPython'

In [None]:
[170] * 10

[170, 170, 170, 170, 170, 170, 170, 170, 170, 170]

###None, Empty Lists, and Initialization

An empty list is simply written as two brackets ([])—there’s nothing in it. If you want to have a list with room for ten elements but with nothing useful in it, you could use [0]*10. You now have a list with ten zeros in it. Sometimes, however, you would like a value that somehow means “nothing,” as in “we haven’t put anything here yet.” That’s when you use None. None is a Python value and means exactly that—“nothing here.” So if you want to initialize a list of length 10, you could do the following:

In [None]:
sequence = [None] * 10
print(sequence)
print(len(sequence))

[None, None, None, None, None, None, None, None, None, None]
10


###Membership

To check whether an item can be found in a sequence, you use the **in** operator. if found, it returns the boolean value True; otherwise, it returns False.

In [None]:
 permissions = 'rw'
 'w' in permissions

True

In [None]:
'x' in permissions

False

###Length, Minimum, and Maximum

The built-in functions len, min, and max can be quite useful. The function len returns the number of elements a sequence contains. min and max return the smallest and largest elements of the sequence, respectively. 

In [None]:
numbers = [377, 170, 477]
len(numbers)

3

In [None]:
max(numbers)

477

In [None]:
min(numbers)

170

In [None]:
max(377, 170, 477)

477

In [None]:
min(377, 170, 477)

170

##Lists

In the discussion above, we have seen how useful lists are. Next,  we discuss what makes lists different from tuples and strings: lists are mutable — that is, you can change their contents — and they have many useful specialized methods.

###The list Function

Because strings can’t be modified in the same way as lists, sometimes it can be useful to create a list from a string. You can do this with the **list** function.

In [None]:
myList = list('Hello')
print(myList)

['H', 'e', 'l', 'l', 'o']


to convert a list of characters such as the preceding code back to a string, you would use the following expression:

In [None]:
myStr = ''.join(myList)
print(myStr)

Hello


##Basic List Operations

You can perform all the standard sequence operations on lists, such as indexing, slicing, concatenating, and multiplying. But the interesting thing about lists is that they can be modified. In the following, we will see some of the ways how to change a list: item assignments, item deletion, slice assignments, and list methods. (Note that not all list methods actually change their list.)

###Changing Lists: Item Assignments

Changing a list is easy. You just use ordinary assignment as explained in Chapter 1. However, instead of writing something like x = 2, you use the indexing notation to assign to a specific, existing position, such as x[1] = 2.

In [None]:
x = [1,1,1]
x[1] = 2
print(x)

[1, 2, 1]


In [None]:
import socket



websiteList = ['www.google.com', 'www.ilstu.edu', 'www.whitehouse.gov']
webSiteDNSDict = {}

for site in websiteList:
    if '.com' in site:
        ip = socket.gethostbyname(site)
        print(site + ":" + ip)

for site in websiteList:
    ip = socket.gethostbyname(site)
    print(site + ":" + ip)
    webSiteDNSDict[site] = ip

print(webSiteDNSDict)

for item in webSiteDNSDict.items():
    print(item)

for key in webSiteDNSDict.keys():
    print(key)
    value = webSiteDNSDict.get(key)
    print(value)

for value in webSiteDNSDict.values():
    print(value)


www.google.com:172.217.203.104
www.google.com:172.217.203.104
www.ilstu.edu:138.87.50.5
www.whitehouse.gov:23.67.87.70
{'www.google.com': '172.217.203.104', 'www.ilstu.edu': '138.87.50.5', 'www.whitehouse.gov': '23.67.87.70'}
('www.google.com', '172.217.203.104')
('www.ilstu.edu', '138.87.50.5')
('www.whitehouse.gov', '23.67.87.70')
www.google.com
172.217.203.104
www.ilstu.edu
138.87.50.5
www.whitehouse.gov
23.67.87.70
172.217.203.104
138.87.50.5
23.67.87.70


###Adding Values to Lists with the append() and insert() Methods

To add new values to a list, use the append() and insert() methods. Enter the following into the interactive shell to call the append() method on a list value stored in the variable pet:

In [22]:
pet = ['cat', 'dog']
pet.append('bird')
pet

['cat', 'dog', 'bird']

The previous append() method call adds the argument to the end of the list. The insert() method can insert a value at any index in the list. The first argument to insert() is the index for the new value, and the second argument is the new value to be inserted. Enter the following into the interactive shell:

In [23]:
pet.insert(1, 'rabbit')
pet

['cat', 'rabbit', 'dog', 'bird']

###Removing Values from Lists with the remove() Method

In [24]:
pet.remove('cat')

##Tuple

A tuple is a collection of objects which indexed and immutable. Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed (i.e., immutable) unlike lists and tuples use parentheses (), whereas lists use square brackets [].

But the main different between tuples and lists is that tuples are immutable. Tuples cannot have their values modified, appended, or removed. 

In [19]:
tup = (1, 2, 3, 4, 5 )
tup

(1, 2, 4, 3, 5)

In [None]:
#The empty tuple is written as two parentheses containing nothing
tup = ()

In [20]:
#To write a tuple containing a single value you have to include a comma, even though there is only one value
tup = (123,)

We can convert between list and tuple Types with the list() and tuple() Functions

##Set
A set is an unordered collection of items. Every set element is unique (no duplicates) and must be immutable (cannot be changed).

However, a set itself is mutable. We can add or remove items from it.

Sets can also be used to perform mathematical set operations like union, intersection, symmetric difference, etc.

##Creating Python Sets
A set is created by placing all the items (elements) inside curly braces {}, separated by comma, or by using the built-in set() function.

It can have any number of items and they may be of different types (integer, float, tuple, string etc.). But a set cannot have mutable elements like lists, sets or dictionaries as its elements.

In [4]:
# Different types of sets in Python

aSet = {3.14, "Hello", (1, 2, 3)}
print(aSet)

{'Hello', 3.14, (1, 2, 3)}


Creating an empty set is a bit tricky.

Empty curly braces {} will make an empty dictionary in Python. To make a set without any elements, we use the set() function without any argument.

In [None]:
# Distinguish set and dictionary while creating empty set

# initialize a with {}
a = {}

# check data type of a
print(type(a))

# initialize a with set()
a = set()

# check data type of a
print(type(a))

###Set Operations

In [5]:
# Intersection 
# It's equivalent to use the intersection() function or the symbol operator &
{1, 2, 3, 4, 5}.intersection({3, 4, 5, 6}) # {3, 4, 5}
{1, 2, 3, 4, 5} & {3, 4, 5, 6} # {3, 4, 5}

{3, 4, 5}

In [6]:
# Union
{1, 2, 3, 4, 5}.union({3, 4, 5, 6}) # {1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5} | {3, 4, 5, 6} # {1, 2, 3, 4, 5, 6}

{1, 2, 3, 4, 5, 6}

In [7]:
# Difference
{1, 2, 3, 4}.difference({2, 3, 5}) # {1, 4}
{1, 2, 3, 4} - {2, 3, 5} # {1, 4}

{1, 4}

In [8]:
# Symmetric difference with
{1, 2, 3, 4}.symmetric_difference({2, 3, 5}) # {1, 4, 5}
{1, 2, 3, 4} ^ {2, 3, 5} # {1, 4, 5}

{1, 4, 5}

In [9]:
# Superset check
{1, 2}.issuperset({1, 2, 3}) # False
{1, 2} >= {1, 2, 3} # False

False

In [10]:
# Superset check
{1, 2, 3}.issuperset({1, 2}) # True
{1, 2, 3} >= {1, 2} # True

True

In [11]:
# Subset check
{1, 2}.issubset({1, 2, 3}) # True
{1, 2} <= {1, 2, 3} # True

True

In [13]:
# Disjoint check
{1, 2}.isdisjoint({3, 4}) # True

True

In [14]:
# Disjoint check
{1, 2}.isdisjoint({1, 4}) # False

False

In [None]:
# Existence or membership check
2 in {1,2,3} # True
4 in {1,2,3} # False
4 not in {1,2,3} # True

In [None]:
# Add and Remove
s = {1,2,3}
s.add(4) # s == {1,2,3,4}
s.discard(3) # s == {1,2,4}
s.discard(5) # s == {1,2,4}
s.remove(2) # s == {1,4}
s.remove(2) # KeyError!

In [16]:
s = {1, 2}
s.update({3, 4}) # s == {1, 2, 3, 4}

###Get the unique elements of a list

In [18]:
#Get unique numbers
aListOfCourses = [170, 254, 377, 170, 497]
uniqueCourses = set(aListOfCourses)
print(uniqueCourses)

{170, 254, 377, 497}