# Working with Python Data Structures

-----

In an earlier notebook, we introduced several of the most popular Python data structures including the list, tuple, string, and dictionary. In this notebook, we build on these concepts to demonstrate how to work effectively with these data structures, which includes using built-in functionality and iterative techniques. Finally, we briefly review several other Python data structures.

## Table of Contents
[String](#String)  

[List](#List)  

[Dictionary](#Dictionary)  

[Tuple](#Tuple)  

[Other Data Structures](#Other-Data-Structures) 



-----
[[Back to TOC]](#Table-of-Contents)

## String

In Python, it is important to remember that a string is an instance of the `str` class. While that this might seem confusing, or an unneeded distraction, in practice this simply means that you get additional functionality for (almost) free. In this section, we explore this additional functionality by demonstrating some of the more important string functions. Primarily these extra functions can be used to process string data and testing if the string only contains alphabetical values, numerical values, or alphanumeric values. However, there are functions that can convert a string to all lowercase or uppercase characters, find substrings, replace data, and format text data

A full description of the string methods is available from the online [Python Documentation](https://docs.python.org/3/library/stdtypes.html#str) or by using `help(str)` at a Python prompt or in a Jupyter notebook cell. We now demonstrate some of the more common and useful string functions.


-----

### String Formatting

__`format`__:

The format method is used to create a new formatted string from a template string and substitution text. The classic example is a form letter, where specific fields are replaced by new data with every string. The `format` method replaces the previous `%` string formatting operator. In its basic form, the template string includes identified `{}` to indicate replacement string locations, and the format method takes arguments that are used to indicate the replacement text. For example,

```python
'Hello {}, you are visitor #{}!'.format('Alexander', 23)
```

will return

```python
'Hello Alexander, you are visitor #23!'
```

Alternatively, the curly braces can enclose a number that is used to find the matching variable for substitution in the `format` method. For example, the previous example could also be written as `'Hello {0}, you are visitor #{1}!'.format('Alexander', 23)`, or equivalently as `'Hello {1}, you are visitor #{0}!'.format(23, 'Alexander')`.

In this course, however, we will primarily use the [_f-strings_](https://www.python.org/dev/peps/pep-0498/) to create formatted text strings. As a reminder, we get the same functionality when using an f-string:

```python
name = 'Alexander'
number = 23

f'Hello {name}, you are visitor #{number}!'
```

f-String was added in Python 3.6. If you look at older code, or work with programmers who have used Python for a while, you will likely  run across the `format` method. The two approaches use many of the same formatting concepts, so it is generally straightforward to translate between them, as demonstrated in the following Code cell.

-----

In [1]:
name = 'Alexander'
number = 23

print('Hello {0}, you are visitor #{1:04d}!'.format('Alexander', 23))
print(f'Hello {name}, you are visitor #{number:04d}!')

Hello Alexander, you are visitor #0023!
Hello Alexander, you are visitor #0023!


-----

### String Case Conversion

__`upper`__:

The `upper` method creates a new string that is a copy of the original string with the characters all converted to uppercase. For example, the following function call, 

```python
"The brown dog jumped over the quick fox!".upper()
```

returns 
```python
THE BROWN DOG JUMPED OVER THE QUICK FOX!
```

__`lower`__:

The `lower` method creates a new string that is a copy of the original string with the characters all converted to lowercase. For example, the following function call, 

```python
"The brown dog jumped over the quick fox!".lower()
```

returns 
```python
the brown dog jumped over the quick fox!
```

__`title`__:

The `title` method creates a new string that is a copy of the original string with the first character of every word converted to uppercase. For example, the following function call, 

```python
"The brown dog jumped over the quick fox!".title()
```

returns 
```python
The Brown Dog Jumped Over The Quick Fox!
```

These three functions are demonstrated in the following Code cell.

-----

In [2]:
data = "The brown dog jumped over the quick fox!"

print(data.upper())
print(data.lower())
print(data.title())

THE BROWN DOG JUMPED OVER THE QUICK FOX!
the brown dog jumped over the quick fox!
The Brown Dog Jumped Over The Quick Fox!


-----

### Substring Identification

__`find`__:

The `find` method locates the first occurrence of a sub-string in the full string and returns the index position of this first occurrence. This function takes a substring to find as an argument, along with optional starting and ending indices (which default to the start and end of the parent string). Thus, the following function invocation:

```python
'The brown dog jumped over the quick fox!'.find('he')
```

will return 1 since the substring is contained in the original string and is located at index position one.

__`rfind`__:

This function is identical to the `find` function, but it starts from the end, or _right-hand side_ of the string. Thus, the following function invocation:

```python
'The brown dog jumped over the quick fox!'.rfind('he')
```

will return 27 since the substring is contained in the original string and is located at index position twenty-seven (i.e., in the last *the*).

__`count`__:

This function returns the number of occurrences of a substring within the original string. As a result, the following function invocation:

```python
'The brown dog jumped over the quick fox!'.count('he')
```

will return 2 since there are two instances of `he` occurring in the original string.

These three functions are demonstrated in the following Code cell.

-----

In [3]:
print(' find: ','The brown dog jumped over the quick fox!'.find('he'))

print('rfind: ','The brown dog jumped over the quick fox!'.rfind('he'))

print('count: ','The brown dog jumped over the quick fox!'.count('he'))

 find:  1
rfind:  27
count:  2


-----

### Substring Manipulation

Python provides several powerful functions to manipulate substrings, including splitting a string into substrings, replacing substrings with alternative text, and quickly joining substrings together into a new string. 

__`split`__:

The `split` method is very powerful and will tokenize a string into substrings based on the input arguments, which are whitespace characters by default. For example, the following function invocation

```python
"The brown dog jumped over the quick fox!".split()
```
returns a list of the token substrings:

```python
['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']
```

This function accepts an argument `sep` which is a substring that can be used to split the original string into substrings. By default, this includes any whitespace character (e.g., space, tab, and a newline). You may define other `sep` like `','`, `'|'` etc.

```python
"one, two, three, four".split(sep=',')
```
returns a list:

```python
['one', 'two', 'three', 'four']
```

__`replace`__:

This function replaces occurrences of a substring, specified by the parameter `old` with replacement text, specified by the parameter `new`. An optional `count` parameter controls how many substrings are replaced; by default, they are all replaced. 

__`join`__:

While strings can be combined by using the `+` operator, this approach is slow for many additions since each addition requires the construction of a new string to hold the combined result. A more efficient string combination approach is to use the `join` method, which can quickly combine multiple strings that are contained in an iterable object such as a `list` or `tuple` together. The string you use to call the `join` method provides the _glue text_ between each item in the iterable. For example, the following method will create a new string from a list of strings that are each separated by a comma and a single space character:

```python

data = ['1', '2', '3', '4', '5', '6', '7', '8', '9']

", ".join(data)
```
which will return

```python
1, 2, 3, 4, 5, 6, 7, 8, 9
```

The following Code cells demonstrates these string manipulation functions; you should test, change, and execute them to get a better feel for how to use these functions effectively.

----

In [4]:
data = "The brown dog jumped over the quick fox!"
subs = data.split()
print(subs)
print(type(subs))

['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']
<class 'list'>


In [5]:
# Split on the character ','
"one, two, three, four".split(sep=',')

['one', ' two', ' three', ' four']

In [6]:
# Replace characters
print(data.replace('o', '#'))

The br#wn d#g jumped #ver the quick f#x!


In [7]:
# Join substrings
print(' '.join(subs))
print(','.join(subs))

The brown dog jumped over the quick fox!
The,brown,dog,jumped,over,the,quick,fox!


-----

### String Manipulation

__`strip`__:

The `strip` method is used to remove characters specified as input arguments to the method from the beginning and end of a string. By default, whitespace characters are removed. Two variants of this function: `lstrip` and `rstrip` remove leading or trailing characters, respectively.

```python
"    Some text surrounded by white space characters    ".strip()
```
returns the string

```python
Some text surrounded by white space characters
```

-----

In [8]:
# Strip characters
new_data = "    Some text surrounded by white space characters    "
print(' Strip:', new_data.strip())
print('RStrip:', new_data.rstrip())
print('LStrip:', new_data.lstrip())

# Now specify characters to strip
print()
print(data.strip('The'))

 Strip: Some text surrounded by white space characters
RStrip:     Some text surrounded by white space characters
LStrip: Some text surrounded by white space characters    

 brown dog jumped over the quick fox!


-----

<font color='red' size = '5'> Student Exercise </font>

In the empty **Code** cell below, first create a new string called 'mystring' that contains at least 50 characters (e.g., `mystring = 'This is a demo string, which has a lot of text so that we can manipulate it using Python.'`). Next, split this string into a list of whitespace delineated tokens (i.e., `['This', 'is', ..., 'Python.']`). Next, join these substrings together by using the `*` character. Finally, replace the `*` character with a single space character.

-----

-----
[[Back to TOC]](#Table-of-Contents)


## List

As we discussed when they were introduced in an earlier notebook, a `list` is mutable. Thus, a list can be changed by adding elements, removing elements, or simply changing existing elements in place. To accomplish these tasks, Python provides a number of functions to manipulate lists(assume `data` is a list):

| Function | Description                              | Example             |
| -------- | ---------------------------------------- | ------------------- |
| `append`   | add an element to the end of the list    | `data.append(11)`     |
| `insert`   | insert an element at the specified index | `data.insert(4, '4')` |
| `del `     | delete the element at the specified index  | `del data[4]`         |
| `remove`   | remove the  element containing the value | `data.remove(11)`     |
| `pop`      | remove the element at the specified index | `data.pop(4)`     |
| `clear`    | remove all elements in the list          | `data.clear()`        |
| `sort`     | sorts list in place                      | `data.sort()`         |
| `reverse`  | reverses list in place                   | `data.reverse()`     |
| `max`      | return the maximum value in the list         | `max(data)`     |
| `min`      | return the minimum value in the list         | `min(data)`     |
| `len`      | return the number of items in the list         | `len(data)`     |

The following two Code cells demonstrate these functions by splitting the string used earlier in this notebook into a list of tokens and manipulating this new list. Note, the `del` statement is not a function, thus be careful when using it to avoid confusion (you might want to stick with the `remove` or `pop` functions whenever possible.)

-----

In [9]:
data = "The brown dog jumped over the quick fox!"

# Tokenize string on whitespace
new_data = data.split()
print(new_data)

# Reverse list
new_data.reverse()
print(new_data)

# Sort list
new_data.sort()
#Capital letters are smaller than lower case letters, that's why 'The' is the smallest string in the sorted list
print(new_data)

['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']
['fox!', 'quick', 'the', 'over', 'jumped', 'dog', 'brown', 'The']
['The', 'brown', 'dog', 'fox!', 'jumped', 'over', 'quick', 'the']


In [10]:
# Delete fourth item
new_data = data.split()
print(new_data)
new_data.pop(2)
print(new_data)

# Add a fourth item
new_data.insert(2, 'cat')
print(new_data)

# Remove specific item based on the value
new_data.remove('fox!')
print(new_data)

# Clear list, None is displayed
print(new_data.clear())

['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']
['The', 'brown', 'jumped', 'over', 'the', 'quick', 'fox!']
['The', 'brown', 'cat', 'jumped', 'over', 'the', 'quick', 'fox!']
['The', 'brown', 'cat', 'jumped', 'over', 'the', 'quick']
None


In [11]:
#maximum and minimum value in the list
number_list = [5, 3, 7, 1, 9]
print ('Maximum value in the list:', max(number_list))
print ('Minimum value in the list:', min(number_list))
#length of the list
print (f'The list has {len(number_list)} items.')

Maximum value in the list: 9
Minimum value in the list: 1
The list has 5 items.


-----

<font color='red' size = '5'> Student Exercise </font>

In the empty **Code** cell below, tokenize the `mystring` you created earlier on whitespace characters; call the result `mylist`. Sort this list and display the result. Now remove the third item. Finally, reverse the new list.

-----

-----

### Multi-dimensional Lists

A Python list can contain lists as elements, which enables them to act as multidimensional arrays or matrices. For example, we can create a two-dimensional list by using the following notation:

```python
matrix = [[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]
```

Elements from this two-dimensional array can be accessed by using the normal `list` index or slice notations, with the caveat that we can use multiple indices or slices, one for each dimension in our list. The following Code cell demonstrates how to select elements from a two-dimensional list.

-----

In [12]:
matrix = [[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]

# Select first row
print(matrix[0])

# Select single element
print(matrix[0][1])
print(matrix[2][2])

[1, 2, 3]
2
9


-----

While useful, in a future lesson we will introduce the NumPy library, which introduces fast, numerical, multi-dimensional arrays. For non-numerical data, however, they can prove useful if for some reason you need multi-dimensional array.

-----

In [13]:
my_list = [['banana', 'orange'],
           ['apple', 'pear']]

print(my_list[0][0])

banana


-----
[[Back to TOC]](#Table-of-Contents)


## Tuple

As we discussed in a previous lesson, a `tuple` is very similar to a `list`, except a `tuple` is an immutable sequence. Thus, any function that operates on a `tuple` is different than the corresponding function that works on a `list`. First, any sequence modification function like `replace`, `insert`, or `pop` are not available since they manipulate the original sequence. Second, other functions like `sort` or `reverse` are not part of the `tuple` object itself; instead, we must use built-in Python functions: `sorted` and `reversed`. Note, the two functions return lists; to make a new tuple from them, we must create a tuple from the returned iterator as shown in the following Code cell.

-----

In [14]:
my_tuple = (70, 11, 20, 73, 42, 15, 64, 17, 48, 39)

print('Original tuple:', my_tuple)

# Apply sorted function
print()
print('Normal Sort:  ', sorted(my_tuple))
print('Reverse Sort: ', sorted(my_tuple, reverse=True))

# Apply reversed function
print()
print('Original tuple:', my_tuple)
print('Reversed tuple:', tuple(reversed(my_tuple)))

Original tuple: (70, 11, 20, 73, 42, 15, 64, 17, 48, 39)

Normal Sort:   [11, 15, 17, 20, 39, 42, 48, 64, 70, 73]
Reverse Sort:  [73, 70, 64, 48, 42, 39, 20, 17, 15, 11]

Original tuple: (70, 11, 20, 73, 42, 15, 64, 17, 48, 39)
Reversed tuple: (39, 48, 17, 64, 15, 42, 73, 20, 11, 70)


-----
[[Back to TOC]](#Table-of-Contents)


## Dictionary

Python provides several useful functions as part of the `dict` class, some of which we discussed in previous lessons. Some of the more useful functions to work with dictionaries are listed in the following table:

| Operation    | Description                              |
| ------------ | ---------------------------------------- |
| `del d[k]`   | Deletes the key value pair identified by the key `k` |
| `d.keys()`   | Returns a view containing the keys from the dictionary `d` |
| `d.values()` | Returns a view containing the values from the dictionary `d` |
| `d.items()`  | Returns a view containing the key-value pairs from the dictionary `d` |
| `d.clear()`  | Removes all entries from the dictionary `d` |
| `d.copy()`   | Returns a shallow copy of the dictionary `d` |
| `d.pop(key)` | Removes and returns the `key` from the dictionary `d`|
| `d.popitem()` | Removes and returns and arbitrary key and value from the dictionary `d`|

The following code block presents a simple dictionary, along with several operations that demonstrate these functions. Note, including the character sequence `\n` forces the print statement to include a newline character (resulting in a blank line being displayed, which improves readability of the overall output).

-----

In [15]:
# Create and manipulate a dictionary
d = {'1': 1, '2': "two", '3': (1, 2, 3)}

print('Original Dictionary')
print(d)

print('\nKeys')
for k in d.keys():
    print(f'Key = {k}')

print('\nValues')
for v in d.values():
    print(f'Value = {v}')

print('\nItems')
for k, v in d.items():
    print(f'd[{k}] = {v}')

Original Dictionary
{'1': 1, '2': 'two', '3': (1, 2, 3)}

Keys
Key = 1
Key = 2
Key = 3

Values
Value = 1
Value = two
Value = (1, 2, 3)

Items
d[1] = 1
d[2] = two
d[3] = (1, 2, 3)


-----

<font color='red' size = '5'> Student Exercise </font>

In the empty **Code** cell below, first create a new dictionary called 'mydict' that contains five key-value pairs `mydict = {'one' : 1, 'two' : 2, 'three' : 3, 'four' : 4, 'five' : 5}`. Next, print out the keys and the values in the dictionary.

-----


-----
[[Back to TOC]](#Table-of-Contents)


## Other Data Structures

Python now supports a number of other data structures, including the [`set`](https://docs.python.org/3/library/stdtypes.html#frozenset), and the [`collections`](https://docs.python.org/3/library/collections.html#) module container data types. 

A `set` is an unordered collection of distinct data. Thus, a `set` does not allow indexing or slicing, but does support other sequence operations like using the `in` operator, the `len` function, or iteration. A `set` can only contain one instance of a given value, but the `set` itself can be changed. In the following example, we create and display a `set`: 

```python
my_set = {'banana', 'apple', 'orange', 'banana', 'peach', 'apple'}
print(my_set)
print('apple' in my_set)
```
which generates the following output:

```python
{'banana', 'orange', 'peach', 'apple'}
True
```

Notice how the duplicated entries are removed automatically. We can use this feature to get unique values from a list as shown below:
```
mylist = [0,0,1,1,2,3,3,3]
#convert to set then back to list
print (list(set(mylist)))
```

The `collections` module, which is automatically included in every Python installation, provides a number of additional data structures that can prove useful. 

-----

In [16]:
mylist = [0,0,1,1,2,3,3,3]
#convert to set then back to list
print (list(set(mylist)))

[0, 1, 2, 3]


## Ancillary Information

The following links are to additional documentation that you might find helpful in learning this material. Reading these web-accessible documents is completely optional.

1. The official Python documentation for [strings][1a], [lists][1b], [dictionaries][1c], and [tuples][1d]
3. A discussion on the [_native_ data types][2] mentioned in this notebook from the book, _Dive into Python_
4. The book [_Think Python_][3] includes a discussion on these data structures.
5. The official Python documentation for the [`collections`][pc] module

-----

[1a]: https://docs.python.org/3/tutorial/introduction.html#strings
[1b]: https://docs.python.org/3/tutorial/introduction.html#lists
[1c]: https://docs.python.org/3/tutorial/datastructures.html#dictionaries
[1d]: https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
[2]: http://www.diveintopython.net/native_data_types/index.html
[3]: http://greenteapress.com/thinkpython2/html/index.html
[pc]: https://docs.python.org/3/library/collections.html

**&copy; 2019: Gies College of Business at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode