# Storing and Processing Collections of Data

References:

[1] B Downey, A. (2012). *Think Python: How to Think Like a Computer Scientist-2e.*

[2] Gries, P., Campbell, J., & Montojo, J. (2017). *Practical programming: an introduction to computer science using Python 3.6.* Pragmatic Bookshelf.

[3] Matthes, E. (2023). *Python crash course: A hands-on, project-based introduction to programming.*

Up until this point, we have seen several data types in python such as strings, functions, and other numerical data types. In this notebook, we introduce lists and dictionaries. Which allow you to store sets of information in one place.

Both of these types are very powerful features readily accessible to new programmers, and they help tie together many important concepts in programming.

## 1 Lists

A *list* is a collection of items in a particular order. We can put anything inside a list and the items in the list don't have to be related in any particular way.

In Python, to define a list we use the square brackets `[]` and individual elements in the list are separated by commas. Since lists usually contain more than one element, it's a good idea to make the name of the list plural, such as `letters`, `digits`, or `names`.

**Example**: A list that contains a few kinds of bicycles

In [1]:
bicycles = ['trek', 'cannondale', 'redline', 'specialized']
print(bicycles)

['trek', 'cannondale', 'redline', 'specialized']


### Accessing elements in a list

Lists are ordered collections, thus, we can access any element by telling the Python the position, or *index*, of the item desired.

*Note: always remember that Python considers the first item in a list to be at position 0 not position 1*

For example, to pull out the first bicycle in our list `bicycles` we do:

In [2]:
print(bicycles[0])

trek


Python also has a special syntax for accessing the last element in the list. We can use the index `-1` to get the last item in the list.

The index `-2` returns the second item from the end of the list, the index `-3` returns the third, and so forth.

In [3]:
print(bicycles[-1])

specialized


You can also access a subset of a list by *slicing* it. To make a slice, specify the first and last elements you want to work with. Python stops one item before the second index you specify.

In [4]:
print(bicycles)

['trek', 'cannondale', 'redline', 'specialized']


In [5]:
print(bicycles[0:2])

['trek', 'cannondale']


### Changing, Adding, and Removing Elements

To modify an item in the list, we can access first the element of a list then perform an assignment operation to the new value you want the item to have.

In [6]:
motorcycles = ['honda', 'yamaha', 'suzuki']
print(motorcycles)

['honda', 'yamaha', 'suzuki']


In [7]:
motorcycles[0] = 'ducati'
print(motorcycles)

['ducati', 'yamaha', 'suzuki']


Aside from modifying the elements, we can also *append* a new item to the list.

In [8]:
motorcycles.append('honda')
print(motorcycles)

['ducati', 'yamaha', 'suzuki', 'honda']


Another way to do this is to extend the list using another list

In [9]:
motorcycles.extend(['kawasaki', 'triumph'])
print(motorcycles)

['ducati', 'yamaha', 'suzuki', 'honda', 'kawasaki', 'triumph']


You can also insert a specific element at a specific location on a list using the `insert` method. Here, we specify the element to be inserted and the specific index in which we wish to place it.

In [10]:
motorcycles.insert(3, 'bmw')
print(motorcycles)

['ducati', 'yamaha', 'suzuki', 'bmw', 'honda', 'kawasaki', 'triumph']


To remove an element, we can use the `del` statement to delete a specific element in the list.

In [11]:
del motorcycles[1]
print(motorcycles)

['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki', 'triumph']


Another way to do this is the `pop` method. The difference is that it returns the value of the removed item in addition to removing it.

In [12]:
print(motorcycles)

['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki', 'triumph']


In [13]:
popped_motorcycle = motorcycles.pop(-1)
print(motorcycles)
print(popped_motorcycle)

['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki']
triumph


### List Operations

Aside from changing or modifying the elements of a single list, we can also also perform operations such as `+` and `*` on several lists.

In [14]:
motorcycles + bicycles

['ducati',
 'suzuki',
 'bmw',
 'honda',
 'kawasaki',
 'trek',
 'cannondale',
 'redline',
 'specialized']

In [15]:
motorcycles * 3

['ducati',
 'suzuki',
 'bmw',
 'honda',
 'kawasaki',
 'ducati',
 'suzuki',
 'bmw',
 'honda',
 'kawasaki',
 'ducati',
 'suzuki',
 'bmw',
 'honda',
 'kawasaki']

To get the number of elements in a list we can also use the built-in function `len`.

In [16]:
print(len(motorcycles))

5


### Organizing a List

If we want to permanently sort a list, we use its `sort` method.

In [17]:
print(motorcycles)

['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki']


In [18]:
motorcycles.sort()
print(motorcycles)

['bmw', 'ducati', 'honda', 'kawasaki', 'suzuki']


By specifying the `reverse` parameter as `True`, we can sort the items in descending order. 

In [19]:
motorcycles.sort(reverse=True)
print(motorcycles)

['suzuki', 'kawasaki', 'honda', 'ducati', 'bmw']


You can also sort a list temporarily using the `sorted` function.

In [20]:
motorcycles = ['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki']

print(f"Original order: {motorcycles}")
print(f"Sorted order: {sorted(motorcycles)}")
print(f"After sorted operation: {motorcycles}")

Original order: ['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki']
Sorted order: ['bmw', 'ducati', 'honda', 'kawasaki', 'suzuki']
After sorted operation: ['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki']


### Processing Items of a List

There are various reasons as to why we want to store a set of numbers. For example, keeping track of the population of an endangered animal is an important aspect of wildlife preservation. Storing data in a list will also help in succeeding data analysis, such as the computation of simple statistics such as mean, median, or mode.

There are a few functions that will be helpful when working with lists of numbers.

In [21]:
digits = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(min(digits))
print(max(digits))
print(sum(digits))

1
10
55


You can also iterate through each elements of a list using a `for` loop to process each element. For example, if we want to get the square of a series of numbers we can:

In [22]:
squares = []
for value in digits:
    square = value ** 2
    squares.append(square)

print(squares)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


You'll notice that you'll be doing a lot of these kind of operation - transforming the items on a list then storing them on a new dictionary. For those cases, you can simplify your code greatly by using *list comprehensions*. Verify that the code above is equivalent to the one below.

In [23]:
squares = [value ** 2 for value in digits]

print(squares)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


Sometimes you might also want to add a conditional before executing the transformation. In cases like this, you add a conditional after the required `for` loop statement.

In [24]:
even_squares = []
for value in digits:
    if not value % 2:
        even_squares.append(value ** 2)

print(even_squares)

[4, 16, 36, 64, 100]


Verify that the above code is equivalent to:

In [25]:
even_squares = [value ** 2 for value in digits if not value % 2]

print(even_squares)

[4, 16, 36, 64, 100]


### Lists and Strings

We can also convert strings to list and vice-versa.

In [26]:
a_string = 'a string is a sequence of characters'
list(a_string)

['a',
 ' ',
 's',
 't',
 'r',
 'i',
 'n',
 'g',
 ' ',
 'i',
 's',
 ' ',
 'a',
 ' ',
 's',
 'e',
 'q',
 'u',
 'e',
 'n',
 'c',
 'e',
 ' ',
 'o',
 'f',
 ' ',
 'c',
 'h',
 'a',
 'r',
 'a',
 'c',
 't',
 'e',
 'r',
 's']

If we want more controllability on how we create the list from the string, we can use the `split` method of a string which also requires us to specify a delimiter.

In [27]:
a_list = a_string.split(' ')
a_list

['a', 'string', 'is', 'a', 'sequence', 'of', 'characters']

You can reverse this process by using the `join` method. Here, we specify the delimiter as a string then place in the argument the list that we wish to join.

In [28]:
print('\n'.join(a_list))

a
string
is
a
sequence
of
characters


## 2 Tuples

Lists are ideal for storing a colection of items that can change throughout the life of a program. However, sometimes you'll want to create a list of items that cannot change. Python refers to values that cannot change as *immutable*, and an immutable list is a tuple.

### Defining a tuple

In [29]:
coordinate = (200, 50)
print(coordinate[0])
print(coordinate[1])

200
50


Tuples are immutable!

In [30]:
# coordinate[0] = 100

### Tuples as `return` values

Because of this property, tuples are a natural data type that Python uses when functions return multiple values

In [31]:
def min_max(sequence):
    return min(sequence), max(sequence)

In [32]:
min_max(digits)

(1, 10)

In [33]:
type(min_max(digits))

tuple

## 3 Dictionaries

Unlike list which stores a collection of data which are not necessarily related with each other, *dictionaries* enables us to store a collection of data with some related information. For example, we can create a dictionary representing information about a pokemon then store as much information as we want about that pokemon such as their name, type, height, weight, abilities, and base experience.

To define a dictionary, we use curly brackets `{}` then use `:` to denote a correspondence between information. Elements in the dictionary are separate by `,` just like lists.

In [34]:
my_pokemon = {
    'Name': 'Evee', 'Type': 'Normal', 'Height': 0.30, 'Weight': 6.5,
    'Abilities': ['Run Away', 'Adaptability'], 'Base Exp': 65
}

A dictionary is a collection of *key-value* pairs. Each *key* is connected to a value, and you can use a key to access the value associated with that key. The *value* of a particular *key* can be of any type, however, *keys* should be types that are *immutable*. Meaning, their values cannot be change by assignment.

### Accessing values in a dictionary

To get the value associated with a key, we give the name of the dictionary then place the key inside the square brackes after it.

In [35]:
my_pokemon['Name']

'Evee'

We can also use the `get()` method to access values. This is much safer than just indexing a particular key since you can specify a default output if the key does not exist in the dictionary.

In [36]:
my_pokemon.get('Name')

'Evee'

In [37]:
# my_pokemon['Favorite Food']

In [38]:
my_pokemon.get('Favorite Food', 'Pizza')

'Pizza'

The `get` method is essentially equivalent to the following function:

In [39]:
def get_method(a_dict, key, default_value):
    if key in a_dict:
        return a_dict[key]
    else:
        return default_value

In [40]:
get_method(my_pokemon, 'Favorite Food', 'Pizza')

'Pizza'

We can also obtain the same effect using the `setdefault()` method. However, when using `setdefault()`, we also insert the key with value of default if the key is not in the dictionary.

In [41]:
my_pokemon

{'Name': 'Evee',
 'Type': 'Normal',
 'Height': 0.3,
 'Weight': 6.5,
 'Abilities': ['Run Away', 'Adaptability'],
 'Base Exp': 65}

In [42]:
my_pokemon.setdefault('Favorite Food', 'Pizza')

'Pizza'

In [43]:
my_pokemon

{'Name': 'Evee',
 'Type': 'Normal',
 'Height': 0.3,
 'Weight': 6.5,
 'Abilities': ['Run Away', 'Adaptability'],
 'Base Exp': 65,
 'Favorite Food': 'Pizza'}

### Changing, Adding, and Removing Elements

To change the value of a specific key pair, we can perform an assignment operation.

In [44]:
my_pokemon['Name'] = 'Jolteon'
my_pokemon['Type'] = 'Electric'
print(my_pokemon)

{'Name': 'Jolteon', 'Type': 'Electric', 'Height': 0.3, 'Weight': 6.5, 'Abilities': ['Run Away', 'Adaptability'], 'Base Exp': 65, 'Favorite Food': 'Pizza'}


To add an element to our dictionary, we can perform an assignment to a key with no corresponding value.

In [45]:
my_pokemon['Species'] = 'Lightning Pokemon'
print(my_pokemon)

{'Name': 'Jolteon', 'Type': 'Electric', 'Height': 0.3, 'Weight': 6.5, 'Abilities': ['Run Away', 'Adaptability'], 'Base Exp': 65, 'Favorite Food': 'Pizza', 'Species': 'Lightning Pokemon'}


To remove a piece of information, we use the `del` operator.

In [46]:
del my_pokemon['Favorite Food']
print(my_pokemon)

{'Name': 'Jolteon', 'Type': 'Electric', 'Height': 0.3, 'Weight': 6.5, 'Abilities': ['Run Away', 'Adaptability'], 'Base Exp': 65, 'Species': 'Lightning Pokemon'}


We can also add contents from another dictionary using the `update` method

In [47]:
my_pokemon.update({'EV yield': '2 Speed', 'Base Friendship': 50,
                   'Growth Rate': 'Medium Fast', 'Height': 0.8,
                   'Weight': 24.5})

In [48]:
my_pokemon

{'Name': 'Jolteon',
 'Type': 'Electric',
 'Height': 0.8,
 'Weight': 24.5,
 'Abilities': ['Run Away', 'Adaptability'],
 'Base Exp': 65,
 'Species': 'Lightning Pokemon',
 'EV yield': '2 Speed',
 'Base Friendship': 50,
 'Growth Rate': 'Medium Fast'}

### Processing Items in a Dictionary

We can also process items in a dictionary using a `for` loop in conjunction with the `dict`'s `item()` method.

In [49]:
for key, value in my_pokemon.items():
    print(f"{key}: {value}")

Name: Jolteon
Type: Electric
Height: 0.8
Weight: 24.5
Abilities: ['Run Away', 'Adaptability']
Base Exp: 65
Species: Lightning Pokemon
EV yield: 2 Speed
Base Friendship: 50
Growth Rate: Medium Fast


## 4 Sets

For cases where we want to operate on collections with unique values, we can define a **set** object which is an unordered collection of unique items. Sets behave like a collection of dictionary keys with no values. As such, we can only store immutable objects inside the sets.

For example, here we define a set containing the vowel letters:

In [50]:
vowels = {'a', 'e', 'i', 'o', 'u'}
vowels

{'a', 'e', 'i', 'o', 'u'}

Notice that if we include repeated items inside a set definition, duplicates are ignored:

In [51]:
vowels = {'a', 'o', 'o', 'i', 'e', 'u', 'a'}
vowels

{'a', 'e', 'i', 'o', 'u'}

### Set operations

In mathematics, set operations include union, intersection, addition, and difference. In Python, we can implement this using methods or through the use of operators:

In [52]:
vowels.add('y')
vowels

{'a', 'e', 'i', 'o', 'u', 'y'}

In [53]:
ten = set(range(10))
lows = set(range(5))
evens = set([i for i in range(10) if not i % 2])
print(ten)
print(lows)
print(evens)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
{0, 1, 2, 3, 4}
{0, 2, 4, 6, 8}


In [54]:
lows.difference(evens)

{1, 3}

In [55]:
lows.intersection(evens)

{0, 2, 4}

In [56]:
lows & evens

{0, 2, 4}

In [57]:
lows.union(evens)

{0, 1, 2, 3, 4, 6, 8}

In [58]:
lows | evens

{0, 1, 2, 3, 4, 6, 8}

In [59]:
lows.issubset(ten)

True

In [60]:
lows.issuperset(evens)

False

In [61]:
lows.remove(0)
lows

{1, 2, 3, 4}

In [62]:
lows.symmetric_difference(evens)

{0, 1, 3, 6, 8}

## 6 Hands-on Exercises

### Draw a card

Create a function `draw_card()` which simulates you to randomly drawing a card in a standard 52-deck. The function should not accept any parameters but should return a string representing the card that was drawn, including its suit and rank.

Example output:

```
>>> draw_card()
'Jack of Spades'
```

*Hint: Use the `choice` method of the `random` library*

In [65]:
import random


def draw_card():
    # YOUR CODE HERE

### Anagrams

Two words are anagrams if you can rearrange the letters from one to spell the other. Write a function named `is_anagram` that takes in two strings and returns `True` if they are anagrams, `False` otherwise

In [66]:
def is_anagram(word1, word2):
    # YOUR CODE HERE

### Roman Numerals

Create a function `roman_to_integer()` that takes in a roman numeral string `roman_numeral` then outputs the corresponding integer value of the `roman_numeral`

In [67]:
def roman_to_integer(roman_numeral):
    # YOUR CODE HERE

Create a function `integer_to_roman` that takes in an integer `n` then outputs the corresponding roman numeral string of the integer.

In [68]:
def integer_to_roman(n):
    # YOUR CODE HERE