# Storing and Processing Collections of Data

Up until this point, we have seen several data types in python such as strings, functions, and other numerical data types. In this notebook, we introduce lists and dictionaries. Which allow you to store sets of information in one place.

Both of these types are very powerful features readily accessible to new programmers, and they help tie together many important concepts in programming.

## 1 Lists

A *list* is a collection of items in a particular order. We can put anything inside a list and the items in the list don't have to be related in any particular way.

In Python, to define a list we use the square brackets `[]` and individual elements in the list are separated by commas. Since lists usually contain more than one element, it's a good idea to make the name of the list plural, such as `letters`, `digits`, or `names`.

**Example**: A list that contains a few kinds of bicycles

In [None]:
bicycles = ['trek', 'cannondale', 'redline', 'specialized']
print(bicycles)

### Accessing elements in a list

Lists are ordered collections, thus, we can access any element by telling the Python the position, or *index*, of the item desired.

*Note: always remember that Python considers the first item in a list to be at position 0 not position 1*

For example, to pull out the first bicycle in our list `bicycles` we do:

In [None]:
print(bicycles[0])

Python also has a special syntax for accessing the last element in the list. We can use the index `-1` to get the last item in the list.

The index `-2` returns the second item from the end of the list, the index `-3` returns the third, and so forth.

In [None]:
print(bicycles[-1])

You can also access a subset of a list by *slicing* it. To make a slice, specify the first and last elements you want to work with. Python stops one item before the second index you specify.

In [None]:
print(bicycles[0:2])

### Changing, Adding, and Removing Elements

To modify an item in the list, we can access first the element of a list then perform an assignment operation to the new value you want the item to have.

In [None]:
motorcycles = ['honda', 'yamaha', 'suzuki']
print(motorcycles)

In [None]:
motorcycles[0] = 'ducati'
print(motorcycles)

Aside from modifying the elements, we can also *append* a new item to the list.

In [None]:
motorcycles.append('honda')
print(motorcycles)

Another way to do this is to extend the list using another list

In [None]:
motorcycles.extend(['kawasaki', 'triumph'])
print(motorcycles)

You can also insert a specific element at a specific location on a list using the `insert` method. Here, we specify the element to be inserted and the specific index in which we wish to place it.

In [None]:
motorcycles.insert(3, 'bmw')
print(motorcycles)

To remove an element, we can use the `del` statement to delete a specific element in the list.

In [None]:
del motorcycles[1]
print(motorcycles)

Another way to do this is the `pop` method. The difference is that it returns the value of the removed item in addition to removing it.

In [None]:
print(motorcycles)

In [None]:
popped_motorcycle = motorcycles.pop(-1)
print(motorcycles)
print(popped_motorcycle)

### List Operations

Aside from changing or modifying the elements of a single list, we can also also perform operations such as `+` and `*` on several lists.

In [None]:
motorcycles + bicycles

In [None]:
motorcycles * 3

To get the number of elements in a list we can also use the built-in function `len`.

In [None]:
print(len(motorcycles))

### Organizing a List

If we want to permanently sort a list, we use its `sort` method.

In [None]:
print(motorcycles)

In [None]:
motorcycles.sort()
print(motorcycles)

By specifying the `reverse` parameter as `True`, we can sort the items in descending order. 

In [None]:
motorcycles.sort(reverse=True)
print(motorcycles)

You can also sort a list temporarily using the `sorted` function.

In [None]:
motorcycles = ['ducati', 'suzuki', 'bmw', 'honda', 'kawasaki']

print(f"Original order: {motorcycles}")
print(f"Sorted order: {sorted(motorcycles)}")
print(f"After sorted operation: {motorcycles}")

### Processing Items of a List

There are various reasons as to why we want to store a set of numbers. For example, keeping track of the population of an endangered animal is an important aspect of wildlife preservation. Storing data in a list will also help in succeeding data analysis, such as the computation of simple statistics such as mean, median, or mode.

There are a few functions that will be helpful when working with lists of numbers.

In [None]:
digits = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(min(digits))
print(max(digits))
print(sum(digits))

You can also iterate through each elements of a list using a `for` loop to process each element. For example, if we want to get the square of a series of numbers we can:

In [None]:
squares = []
for value in digits:
    square = value ** 2
    squares.append(square)

print(squares)

You'll notice that you'll be doing a lot of these kind of operation - transforming the items on a list then storing them on a new dictionary. For those cases, you can simplify your code greatly by using *list comprehensions*. Verify that the code above is equivalent to the one below.

In [None]:
squares = [value ** 2 for value in digits]

print(squares)

Sometimes you might also want to add a conditional before executing the transformation. In cases like this, you add a conditional after the required `for` loop statement.

In [None]:
even_squares = []
for value in digits:
    if not value % 2:
        even_squares.append(value ** 2)

print(even_squares)

Verify that the above code is equivalent to:

In [None]:
even_squares = [value ** 2 for value in digits if not value % 2]

print(even_squares)

### Lists and Strings

We can also convert strings to list and vice-versa.

In [None]:
a_string = 'a string is a sequence of characters'
list(a_string)

If we want more controllability on how we create the list from the string, we can use the `split` method of a string which also requires us to specify a delimiter.

In [None]:
a_list = a_string.split(' ')
a_list

You can reverse this process by using the `join` method. Here, we specify the delimiter as a string then place in the argument the list that we wish to join.

In [None]:
print('\n'.join(a_list))

## 2 Tuples

Lists are ideal for storing a colection of items that can change throughout the life of a program. However, sometimes you'll want to create a list of items that cannot change. Python refers to values that cannot change as *immutable*, and an immutable list is a tuple.

### Defining a tuple

In [None]:
coordinate = (200, 50)
print(coordinate[0])
print(coordinate[1])

Tuples are immutable!

In [None]:
# coordinate[0] = 100

### Tuples as `return` values

Because of this property, tuples are a natural data type that Python uses when functions return multiple values

In [None]:
def min_max(sequence):
    return min(sequence), max(sequence)

In [None]:
min_max(digits)

In [None]:
type(min_max(digits))

### Variable-length Argument Tuples

Tuples are also used as the container when specifying arbitrary length positional arguments

In [None]:
def print_all(*pargs):
    """Print an arbitrary number of positional arguments as a tuple"""
    print(pargs)

In [None]:
print_all(1, 2.0, '3')

In [None]:
def print_all(*pargs):
    """Print an arbitrary number of positional arguments in a line"""
    print(*pargs)

In [None]:
print_all(1, 2.0, '3')

In [None]:
print(1, 2.0, '3')

### Lists and Tuples

If we are given two or more sequences, we can use the `zip` function to create an iterable that interleaves the values of the two sequence then returns the resulting sequence as an iterator.

In [None]:
names = ['Leo', 'Pat', 'Gil']
ages = [32, 27, 30]

'''
Leo: 32
Pat: 27
Gil: 30
'''

for name, age in zip(names, ages):
    print(f'{name}: {age}')

Notice that if we inspect the `zip` object, it doesn't print its contents.

In [None]:
zipped_sequence = zip(names, ages)
zipped_sequence

This is because the `zip` object is a *lazy* iterators. *lazy* objects only evaluate their values until the value is needed. Once completely iterated, lazy objects also get exhausted - meaning they don't return any value.

In [None]:
list(zipped_sequence)

In [None]:
list(zipped_sequence)

To unzip a zipped sequence, we can scatter or unpack the zipped sequence on a zip function call.

In [None]:
zipped_sequence = list(zip(names, ages))
zipped_sequence

In [None]:
print(tuple(zip(*zipped_sequence)))

In this case, we can unpack the sequence then save it in a set of variables.

In [None]:
same_names, same_ages = zip(*zipped_sequence)
print(same_names)
print(same_ages)

## 3 Dictionaries

Unlike list which stores a collection of data which are not necessarily related with each other, *dictionaries* enables us to store a collection of data with some related information. For example, we can create a dictionary representing information about a pokemon then store as much information as we want about that pokemon such as their name, type, height, weight, abilities, and base experience.

To define a dictionary, we use curly brackets `{}` then use `:` to denote a correspondence between information. Elements in the dictionary are separate by `,` just like lists.

In [None]:
my_pokemon = {
    'Name': 'Evee',
    'Type': 'Normal',
    'Height': 0.30,
    'Weight': 6.5,
    'Abilities': ['Run Away', 'Adaptability'],
    'Base Exp': 65
}

<div class="alert alert-success">

**PEP 8: Lining up spaces in between braces/brackets/parenthesis**
    
The closing brace/bracket/parenthesis on multiline constructs may either line up under the first non-whitespace character of the last line of list or or it may be lined up under the first character of the line that starts the multiline construct.

Source: https://peps.python.org/pep-0008/#indentation

</div>

A dictionary is a collection of *key-value* pairs. Each *key* is connected to a value, and you can use a key to access the value associated with that key. The *value* of a particular *key* can be of any type, however, *keys* should be types that are *immutable*. Meaning, their values cannot be change by assignment.

### Accessing values in a dictionary

To get the value associated with a key, we give the name of the dictionary then place the key inside the square brackes after it.

In [None]:
my_pokemon['Name']

We can also use the `get()` method to access values. This is much safer than just indexing a particular key since you can specify a default output if the key does not exist in the dictionary.

In [None]:
my_pokemon.get('Name')

In [None]:
# my_pokemon['Favorite Food']

In [None]:
my_pokemon.get('Favorite Food', 'Pizza')

The `get` method is essentially equivalent to the following function:

In [None]:
def get_method(a_dict, key, default_value):
    if key in a_dict:
        return a_dict[key]
    else:
        return default_value

In [None]:
get_method(my_pokemon, 'Favorite Food', 'Pizza')

We can also obtain the same effect using the `setdefault()` method. However, when using `setdefault()`, we also insert the key with value of default if the key is not in the dictionary.

In [None]:
my_pokemon

In [None]:
my_pokemon.setdefault('Favorite Food', 'Pizza')

In [None]:
my_pokemon

### Changing, Adding, and Removing Elements

To change the value of a specific key pair, we can perform an assignment operation.

In [None]:
my_pokemon['Name'] = 'Jolteon'
my_pokemon['Type'] = 'Electric'
print(my_pokemon)

To add an element to our dictionary, we can perform an assignment to a key with no corresponding value.

In [None]:
my_pokemon['Species'] = 'Lightning Pokemon'
print(my_pokemon)

To remove a piece of information, we use the `del` operator.

In [None]:
del my_pokemon['Favorite Food']
print(my_pokemon)

We can also add contents from another dictionary using the `update` method

In [None]:
my_pokemon.update({'EV yield': '2 Speed', 'Base Friendship': 50,
                   'Growth Rate': 'Medium Fast', 'Height': 0.8,
                   'Weight': 24.5})

In [None]:
my_pokemon

<div class="alert alert-success">

**PEP 8: Continuation Lines**
    
Continuation lines should align wrapped elements either vertically using Python’s implicit line joining inside parentheses, brackets and braces, or using a hanging indent.

Source: https://peps.python.org/pep-0008/#indentation

</div>

### Processing Items in a Dictionary

We can also process items in a dictionary using a `for` loop in conjunction with the `dict`'s `item()` method.

In [None]:
for key, value in my_pokemon.items():
    print(f"{key}: {value}")

### Variable-length Keyword Arguments

When we place a `**` before a parameter name, it allows the function to **gather** an arbitrary number of keyword arguments into a corresponding dictionary.

In [None]:
def print_pokemon(**kwargs):
    for key, value in kwargs.items():
        print(f"{key}: {value}")

In [None]:
print_pokemon(Name='Evee', Height=0.30, Weight=6.5)

If we place a `**` before a dictionary during a function call, it **unpacks** or **scatters** the dictionary into the function's keyword parameters.

In [None]:
print_pokemon(**my_pokemon)

## 4 Sets

For cases where we want to operate on collections with unique values, we can define a **set** object which is an unordered collection of unique items. Sets behave like a collection of dictionary keys with no values. As such, we can only store immutable objects inside the sets.

For example, here we define a set containing the vowel letters:

In [None]:
vowels = {'a', 'e', 'i', 'o', 'u'}
vowels

Notice that if we include repeated items inside a set definition, duplicates are ignored:

In [None]:
vowels = {'a', 'o', 'o', 'i', 'e', 'u', 'a'}
vowels

### Set operations

In mathematics, set operations include union, intersection, addition, and difference. In Python, we can implement this using methods or through the use of operators:

In [None]:
vowels.add('y')
vowels

In [None]:
ten = set(range(10))
lows = set(range(5))
evens = set([i for i in range(10) if not i % 2])

'''
list comprehension is equivalent to:
for i in range(10):     # Iterate through numbers 0-9
    if not i % 2:       # Check if the number is even
        evens.add(i)    # Add the even number to the set
'''

print(ten)
print(lows)
print(evens)

In [None]:
lows.difference(evens)

In [None]:
lows.intersection(evens)

In [None]:
lows & evens

In [None]:
lows.union(evens)

In [None]:
lows | evens

In [None]:
lows.issubset(ten)

In [None]:
lows.issuperset(evens)

In [None]:
lows.remove(0)
lows

In [None]:
lows.symmetric_difference(evens)

In [None]:
lows.union(evens).difference(lows.intersection(evens))

In [None]:
(lows | evens) - (lows & evens)

## 5 Hands-on Exercises

### Draw a card

Create a function `draw_card()` which simulates you to randomly drawing a card in a standard 52-deck. The function should not accept any parameters but should return a string representing the card that was drawn, including its suit and rank.

Example output:

```
>>> draw_card()
'Jack of Spades'
```

*Hint: Use the `choice` method of the `random` library*

In [None]:
from random import choice

def draw_card():
    """
    Randomly draw a card from a standard 52-card deck.
    """
    # Define the full lists of ranks and suits for a standard 52-card deck
    ranks = ['2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K', 'A']
    suits = ['Hearts', 'Diamonds', 'Clubs', 'Spades']
    
    deck = []
    
    # Option 1: List Comprehension (concise way to build the deck)
    deck = [f"{r} of {s}" for r in ranks for s in suits]
    
    # Option 2: Nested For Loops (explicit way to build the deck)
    # for r in ranks:
    #     for s in suits:
    #         deck.append(f"{r} of {s}")
    
    # Randomly select and return one card from the full deck
    return choice(deck)

# Test the function
draw_card()

'4 of Clubs'

### Anagrams

Two words are anagrams if you can rearrange the letters from one to spell the other. Write a function named `is_anagram` that takes in two strings and returns `True` if they are anagrams, `False` otherwise

In [None]:
def is_anagram(word1, word2):
    """
    Check if two words are anagrams of each other.
    """
    return sorted(word1) == sorted(word2)

In [None]:
# Test
print(is_anagram('listen', 'silent'))   # True
print(is_anagram('hello', 'world'))     # False
print(is_anagram('listens', 'silent'))  # False
print(is_anagram('', 'something'))      # False

### Roman Numerals

Create a function `roman_to_integer()` that takes in a roman numeral string `roman_numeral` then outputs the corresponding integer value of the `roman_numeral`

In [62]:
def roman_to_integer(roman_numeral):
    """
    Convert a Roman numeral to an integer.
    """
    roman_dict = {
        'I': 1,
        'V': 5,
        'X': 10,
        'L': 50,
        'C': 100,
        'D': 500,
        'M': 1000
    }
    total = 0
    prev_value = 0
    for char in reversed(roman_numeral):
        curr_value = roman_dict.get(char, 0)
        if curr_value < prev_value:
            total -= curr_value
        else:
            total += curr_value
        prev_value = curr_value
    return total

In [63]:
print(roman_to_integer('XIV'))  # 14
print(roman_to_integer('XLII'))  # 42
print(roman_to_integer('MMXXV')) # 2025

14
42
2025


Create a function `integer_to_roman` that takes in an integer `n` then outputs the corresponding roman numeral string of the integer.

In [64]:
def integer_to_roman(n):
    """
    Convert an integer to a Roman numeral.
    """
    val = [
        1000, 900, 500, 400,
        100, 90, 50, 40,
        10, 9, 5, 4,
        1
    ]
    syms = [
        "M", "CM", "D", "CD",
        "C", "XC", "L", "XL",
        "X", "IX", "V", "IV",
        "I"
    ]
    roman_numeral = ""
    for i in range(len(val)):
        while n >= val[i]:
            roman_numeral += syms[i]
            n -= val[i]
    return roman_numeral


In [65]:
print(integer_to_roman(14))    # XIV
print(integer_to_roman(42))    # XLII
print(integer_to_roman(2025))  # MMXXV

XIV
XLII
MMXXV
