# 1.3. Collection Data Types

Having data items in collections makes it much easier to perform operations that must be applied to all of the items, and also makes it easier to handle collections of items read in from ﬁles. 

## Main Collection Data Types



### Lists

**Lists** are Python’s most flexible ordered collection object type. Unlike strings, lists can contain any sort of object: numbers, strings, and even other lists. Also, unlike strings, lists may be changed in place by assignment to offsets and slices, list method calls, deletion statements, and more—they are *mutable* objects.

Python lists do the work of many of the collection data structures you might have to implement manually in lower-level languages such as C. Here is a quick look at their main properties. Python lists are:

 - **Ordered collections of arbitrary objects**

     From a functional view, lists are just places to collect other objects so you can treat them as groups. Lists also maintain a left-to-right positional ordering among the items they contain (i.e., they are sequences).
     
 - **Accessed by offset**

     Just as with strings, you can fetch a component object out of a list by indexing the list on the object’s offset. Because items in lists are ordered by their positions, you can also do tasks such as slicing and concatenation. 

 - **Variable-length, heterogeneous, and arbitrarily nestable**

     Unlike strings, lists can grow and shrink in place (their lengths can vary), and they can contain any sort of object, not just one-character strings (they’re heterogeneous). Because lists can contain other complex objects, they also support arbitrary nesting; you can create lists of lists of lists, and so on.

 - **Of the category “mutable sequence”**

     In terms of our type category qualifiers, lists are mutable (i.e., can be changed in place) and can respond to all the sequence operations used with strings, such as indexing, slicing, and concatenation. In fact, sequence operations work the same on lists as they do on strings; the only difference is that sequence operations such as concatenation and slicing return new lists instead of new strings when applied to lists. Because lists are mutable, however, they also support other operations that strings don’t, such as deletion and index assignment operations, which change the lists in place.

 - **Arrays of object references**

     Technically, Python lists contain zero or more references to other objects. Lists might remind you of arrays of pointers (addresses) if you have a background in some other languages. Fetching an item from a Python list is about as fast as indexing a C array; in fact, lists really are arrays inside the standard Python interpreter, not linked structures. Python always follows a reference to an object whenever the reference is used, so your program deals only with objects. Whenever you assign an object to a data structure component or variable name, Python always stores a reference to that same object, not a copy of it (unless you request a copy explicitly).


#### List Creation

The list data type can be called as a function, `list()`—with no arguments it returns an empty list, with a list argument it returns a shallow copy of the argument, and with any other argument it attempts to convert the given object to a list. It does not accept more than one argument.

In [2]:
L = list()                 # An empty list
L = list('spam') 	       # List of an iterable’s items
L = list(range(-4, 4))     #list of successive integers

Lists can also be created without using the `list()` function. An empty list is created using empty brackets, `[]`, and a list of one or more items can be created by using a comma-separated sequence of items inside brackets. Another way of creating lists is to use a *list comprehension*—a topic we will cover later.

In [3]:
L = []                              # An empty list
L = [123, 'abc', 1.23, {}]          # Four items: indexes 0..3
L = ['Bob', 40.0, ['dev', 'mgr']]	# Nested sublists

Since list items are ordered, lists can have items with the same value:

In [19]:
thislist = ["apple", "banana", "cherry", "apple", "cherry"]
print(thislist)

['apple', 'banana', 'cherry', 'apple', 'cherry']


#### Reading List Items: Indexing and Slicing

Given the assignment `L = [-17.5, "kilo", 49, "V"]`, we get the list shown in the figure below:

![List index positions](../../imgs/list_index.jpg)

Because lists are sequences, indexing and slicing work the same way for lists as they do for strings. However, the result of indexing a list is whatever type of object lives at the offset you specify, while slicing a list always returns a new list:

In [13]:
L = [-17.5, "kilo", 49, "V"]

# Indexing a list
L[2], L[-1]                    # Offsets start at zero, negative: count from the right

(49, 'V')

In [14]:
# Slicing a list
L[0:2], L[-3:-1], L[0:], L[:-1], L[:]

([-17.5, 'kilo'],
 ['kilo', 49],
 [-17.5, 'kilo', 49, 'V'],
 [-17.5, 'kilo', 49],
 [-17.5, 'kilo', 49, 'V'])

**Note**: Slicing a list returns a new list.

#### Nesting

One nice feature of Python’s core data types is that they support arbitrary ***nesting***. We can nest them in any combination, and as deeply as we like. For example, we can have a list that contains a dictionary, which contains another list, and so on. One immediate application of this feature is to represent matrixes, or “multidimensional arrays” in Python. A list with nested lists will do the job for basic applications:


Because you can nest lists and other object types within lists, you will sometimes need to string together index operations to go deeper into a data structure. For example, one of the simplest ways to represent matrixes (multidimensional arrays) in Python is as lists with nested sublists. Here’s a basic 3 × 3 two-dimensional list-based array:

In [15]:
M = [[1, 2, 3],               # A 3 × 3 matrix, as nested lists
     [4, 5, 6],               # Code can span lines if bracketed
     [7, 8, 9]]
M

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

With one index, you get an entire row (really, a nested sublist), and with two, you get an item within the row:

In [16]:
M[1], M[1][1], M[2][0]

([4, 5, 6], 5, 7)

#### Changing Lists

##### Change an Item Value:

In [17]:
thislist = ["apple", "banana", "cherry"]
thislist[1] = "blackcurrant"
print(thislist)

['apple', 'blackcurrant', 'cherry']


##### Change a Range of Item Values:

In [18]:
thislist = ["apple", "banana", "cherry", "orange", "kiwi", "mango"]
thislist[1:3] = ["blackcurrant", "watermelon"]
print(thislist)

['apple', 'blackcurrant', 'watermelon', 'orange', 'kiwi', 'mango']


If you insert more items than you replace, the new items will be inserted where you specified, and the remaining items will move accordingly:

In [20]:
thislist = ["apple", "banana", "cherry"]
thislist[1:2] = ["blackcurrant", "watermelon"]
print(thislist)

['apple', 'blackcurrant', 'watermelon', 'cherry']


If you insert less items than you replace, the new items will be inserted where you specified, and the remaining items will move accordingly:

In [21]:
thislist = ["apple", "banana", "cherry"]
thislist[1:3] = ["watermelon"]
print(thislist)

['apple', 'watermelon']


##### Insert Items
To insert a new item, without replacing any of the existing values, we can use the `insert()` method, which inserts an item at the specified index:

In [22]:
thislist = ["apple", "banana", "cherry"]
thislist.insert(2, "watermelon")
print(thislist)

['apple', 'banana', 'watermelon', 'cherry']


##### Append Items
To add an item to the end of the list, use the `append()` method:

In [23]:
thislist = ["apple", "banana", "cherry"]
thislist.append("orange")
print(thislist)

['apple', 'banana', 'cherry', 'orange']


##### Extend List

To append elements from another list or collection to the current list, use the `extend()` method.

In [24]:
thislist = ["apple", "banana", "cherry"]
tropical = ["mango", "pineapple", "papaya"]
thislist.extend(tropical)
print(thislist)

['apple', 'banana', 'cherry', 'mango', 'pineapple', 'papaya']


##### Remove Specified Item

The `remove()` method removes the specified item.

In [25]:
thislist = ["apple", "banana", "cherry"]
thislist.remove("banana")
print(thislist)

['apple', 'cherry']


##### Remove Specified Index
The `pop()` method removes the specified index.

In [26]:
thislist = ["apple", "banana", "cherry"]
thislist.pop(1)
print(thislist)

['apple', 'cherry']


If you do not specify the index, the `pop()` method removes the last item.

The `del` keyword also removes the specified index:

In [27]:
thislist = ["apple", "banana", "cherry"]
del thislist[0]
print(thislist)

['banana', 'cherry']


The `del` keyword can also delete the list completely.

In [31]:
thislist = ["apple", "banana", "cherry"]
del thislist
print(thislist)

NameError: name 'thislist' is not defined

##### Clear the List
The `clear()` method empties the list.

The list still remains, but it has no content.

In [30]:
thislist = ["apple", "banana", "cherry"]
thislist.clear()
print(thislist)

[]


##### Sort Lists

List objects have a `sort()` method that will sort the list alphanumerically, ascending, by default:

In [32]:
thislist = ["orange", "mango", "kiwi", "pineapple", "banana"]
thislist.sort()
print(thislist)

['banana', 'kiwi', 'mango', 'orange', 'pineapple']


To sort descending, use the keyword argument `reverse = True`:

In [33]:
thislist = ["orange", "mango", "kiwi", "pineapple", "banana"]
thislist.sort(reverse = True)
print(thislist)

['pineapple', 'orange', 'mango', 'kiwi', 'banana']


##### Reverse the order

In [36]:
fruits = ['apple', 'banana', 'cherry']

fruits.reverse()
print(fruits)


['cherry', 'banana', 'apple']


####  List concatenation and replication 

Lists support concatenation with `+`, extending with `+=` (i.e., the appending of all the items in the right-hand operand), and replication with `*` and `*=`.

In [34]:
L = [123, 'spam', 1.23]

print(L + [4, 5, 6], L*2)

[123, 'spam', 1.23, 4, 5, 6] [123, 'spam', 1.23, 123, 'spam', 1.23]


### Tuples

The tuple object (pronounced “toople” or “tuhple,” depending on whom you ask) is roughly like a list that cannot be changed—tuples are sequences, like lists, but they are immutable,  like  strings.  Functionally,  they’re  used  to  represent  fixed  collections  of items: the components of a specific calendar date, for instance.

Tuples construct simple groups of objects. They work exactly like lists, except that tuples can’t be changed in place (they’re immutable) and are usually written as a series of items in parentheses, not square brackets. Although they don’t support as many methods, tuples share most of their properties with lists. Here’s a quick look at the basics. Tuples are:

 - **Ordered collections of arbitrary objects**

     Like strings and lists, tuples are positionally ordered collections of objects (i.e., they maintain a left-to-right order among their contents); like lists, they can embed any kind of object.

 - **Accessed by offset**
     
     Like strings and lists, items in a tuple are accessed by offset (not by key); they support all the offset-based access operations, such as indexing and slicing.

 - **Of the category “immutable sequence”**
     
     Like strings and lists, tuples are sequences; they support many of the same operations. However, like strings, tuples are immutable; they don’t support any of the in-place change operations applied to lists.

 - **Fixed-length, heterogeneous, and arbitrarily nestable**

     Because tuples are immutable, you cannot change the size of a tuple without making a copy. On the other hand, tuples can hold any type of object, including other compound objects (e.g., lists, dictionaries, other tuples), and so support arbitrary nesting.

 - **Arrays of object references**

     Like lists, tuples are best thought of as object reference arrays; tuples store access points to other objects (references), and indexing a tuple is relatively quick.


#### Tuple Creation

The tuple data type can be called as a function, `tuple()`—with no arguments it returns an empty tuple, with a tuple argument it returns a shallow copy of the argument, and with any other argument it attempts to convert the given object to a tuple. It does not accept more than one argument. 

In [37]:
T = tuple('spam')	        # Tuple of items in an iterable

Tuples can also be created without using the `tuple()` function. An empty tuple is created using empty parentheses, `()`, and a tuple of one or more items can be created by using commas. 

In [39]:
T = ()                        # An empty tuple
T = (0,)                      # A one-item tuple (not an expression)
T = (0, 'Ni', 1.2, 3)         # A four-item tuple
T = 0, 'Ni', 1.2, 3           # Another four-item tuple (same as prior line)
T = ('Bob', ('dev', 'mgr'))   # Nested tuples

##### Tuple syntax peculiarities: Commas and parentheses
Because parentheses can also enclose expressions, you need to do something special to tell Python when a single object in parentheses is a tuple object and not a simple expression. 

If you really want a single-item tuple, simply add a trailing comma after the single item, before the closing parenthesis:


In [42]:
x = (40)                   # An integer!
y = (40,)                  # A tuple containing an integer
print(x, y)

40 (40,)


As a special case, Python also allows you to omit the opening and closing parentheses for a tuple in contexts where it isn’t syntactically ambiguous to do so. In the context of an assignment statement, Python recognizes this as a tuple, even though it doesn’t have parentheses.

Sometimes tuples must be enclosed in parentheses to avoid syntactic ambiguity. For example, to pass the tuple `1, 2, 3` to a function, we would write `function((1, 2, 3))`.

The most common places where the parentheses are required for tuple literals are those where:

 - **Parentheses matter**—within a function call, or nested in a larger expression.
 - **Commas matter**—embedded in the literal of a larger data structure like a list or dictionary, or listed in a Python 2.X print statement.

In most other contexts, the enclosing parentheses are optional. 

For beginners, the best advice is that it’s probably easier to use the parentheses than it is to remember when they are optional or required. Many programmers also  find  that parentheses tend to aid script readability by making the tuples more explicit and obvious.

