<a href="https://colab.research.google.com/github/2808118/Python-programming-exercises/blob/master/Lab_4_1_Lists%2C_Sets_and_Dictionaries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 4.1 - Lists, Sets and Dictionaries
In this lab we will practice using three of Python's data structures - lists, sets and dictionaries. Each of these structures serves a different purpose, and today's exercises will leave you with a better understanding of their roles.

## Programming Exercises

### List Concatenation
The program below finds the largest shoe size across three groups of people measured. Although it solves the task, it's quite clear that there's a lot of repetition in the code.

The solution we learnt previously was to move repeated code into a function and call the function instead. However, a better solution would be to combine all three lists into one big list and process that, instead of processing each of the lists separately. This can be achieved using _concatenation_, which was introduced in the workbook.

Improve the code by concatenating the three lists, storing them in a single variable. Then, replace the three `for` loops with a single loop. Run the solution to ensure that it works as expected.

In [None]:
group_a_sizes = [8, 6, 11, 6, 8, 7, 6]
group_b_sizes = [5, 9, 12, 8, 5, 6]
group_c_sizes = [6, 7, 5, 9, 11, 7, 5, 8]

group_sizes = group_a_sizes + group_b_sizes + group_c_sizes

max_size = 0

for size in group_a_sizes:
    if size > max_size:
        max_size = size

for size in group_b_sizes:
    if size > max_size:
        max_size = size

for size in group_c_sizes:
    if size > max_size:
        max_size = size

print(f'Max shoe size: {max_size}')

In [None]:
group_a_sizes = [8, 6, 11, 6, 8, 7, 6]
group_b_sizes = [5, 9, 12, 8, 5, 6]
group_c_sizes = [6, 7, 5, 9, 11, 7, 5, 8]

group_sizes = group_a_sizes + group_b_sizes + group_c_sizes

max_size = 0

for size in group_sizes:
    if size > max_size:
        max_size = size

print(f'Max shoe size: {max_size}')

Max shoe size: 12


###### Solution

This code is much shorter, and easier to understand!

In [None]:
group_a_sizes = [8, 6, 11, 6, 8, 7, 6]
group_b_sizes = [5, 9, 12, 8, 5, 6]
group_c_sizes = [6, 7, 5, 9, 11, 7, 5, 8]

all_sizes = group_a_sizes + group_b_sizes + group_c_sizes

max_size = 0
for size in all_sizes:
    if size > max_size:
        max_size = size

print(f'Max shoe size: {max_size}')

### Concatenation as Aggregation
Let's now consider a variant of the problem. Although we were able to solve it using the `+` operator, what if there were 100 or a million groups of shoe sizes? We would need to update our code every time a new group is added, and our code would be impossible to manage!

A more likely scenario is that the groups are instead stored as a _list of lists_ as shown below. The challenge now is how to concatenate an arbitrary number of groups.

Combining a list of lists into a single list is merely a form of aggregation, much like we've done in the past when computing sums or averages. This task is more difficult than computing the sum or the mean, but it comes down to two key components:
 - What is the initial value of the variable that will hold the final result? _With summation, the initial value would be 0._
 - What operation is applied to aggregate the values? _With summation, the code might be `sum = sum + value`._

Fill the gaps in the code below to solve the problem, and run it to ensure you still get the same answer. You should **not** need to add or rename any variables.

_Hint: The objective is still to concatenate all shoe sizes._

In [None]:
sizes = [
    [8, 6, 11, 6, 8, 7, 6],
    [5, 9, 12, 8, 5, 6],
    [6, 7, 5, 9, 11, 7, 5, 8]
]

all_sizes = [] # What should the initial value be?
for group_size in sizes:
    print(group_size)
    all_sizes += group_size
    # What is the aggregation operation?

max_size = 0
for size in all_sizes:
    if size > max_size:
        max_size = size

print(f'Max shoe size: {max_size}')

[8, 6, 11, 6, 8, 7, 6]
[5, 9, 12, 8, 5, 6]
[6, 7, 5, 9, 11, 7, 5, 8]
Max shoe size: 12


###### Solution

Although this solution looks more complex than the first exercise, it's far more generalised - which is a major design consideration when writing code. \
This solution - unlike the first - will work for any number of groups; a little more effort at the start will save a lot more effort in the future.

In [None]:
sizes = [
    [8, 6, 11, 6, 8, 7, 6],
    [5, 9, 12, 8, 5, 6],
    [6, 7, 5, 9, 11, 7, 5, 8]
]

all_sizes = []
for group_size in sizes:
    all_sizes = all_sizes + group_size

max_size = 0
for size in all_sizes:
    if size > max_size:
        max_size = size

print(f'Max shoe size: {max_size}')

### Mapping
You've been tasked with calculating the shipping price for each order from an online store. Instead of looking up the price for each order, you decide it would be easier to _map_ the distances to their corresponding shipping prices.

Using the table below, write code to map the shipping distances to their corresponding shipping prices, stored in a variable called `shipping_prices`.

| Distance | Price   |
|----------|---------|
| 0  - 30  | \$0     |
| 30 - 100 | \$9.99  |
| 100+     | \$14.99 |

Distance ranges should exclude the upper limit (e.g. distance 30 has a price of \$9.99, not \$0).

_Hint: You will need to use `if`/`elif`/`else`. Feel free to look at the "Filtering and Mapping" section of the workbook._


In [None]:
shipping_distances = [72, 8, 153, 109, 151, 23, 186, 68, 13]

# Write your shipping price solution here

shipping_prices = []
for distance in shipping_distances:
    if distance <=30 and distance >=0:  # Note the conditions can be simpler
        shipping_prices.append(0) # if/elif/else automatically exclude each other
    elif distance > 30 and distance <= 100:
        shipping_prices.append(9.99)
    else:
        shipping_prices.append(14.99)
    
print(shipping_prices)

[9.99, 0, 14.99, 14.99, 14.99, 0, 14.99, 9.99, 0]


###### Solution

This solution, combined with aggregation techniques we've already learnt would enable us start generating reports with anything from shipping cost totals, average shipping price per transaction, and much more!

In [None]:
shipping_distances = [72, 8, 153, 109, 151, 23, 186, 68, 13]

shipping_prices = []
for distance in shipping_distances:
    if distance < 30:
        shipping_prices.append(0)
    elif distance < 100:
        shipping_prices.append(9.99)
    else:
        shipping_prices.append(14.99)
    
print(shipping_prices)

### Filtering
By modifying the code below, can you write a program which builds a list of only the distances that qualified for free shipping? Your solution will be quite similar to the previous question.

_Hint: You can take another look at the "Filtering and Mapping" section of the workbook._

In [None]:
shipping_distances = [72, 8, 153, 109, 151, 23, 186, 68, 13]

# Write your free shipping distances solution here

free_shipping_distances = []
for distance in shipping_distances:
    if distance < 30:
        free_shipping_distances.append(distance)
    
print(free_shipping_distances)

[8, 23, 13]


###### Solution

This solution doesn't require any `elif` or `else` blocks, as we can simply do nothing if the order didn't qualify for free shipping.

In [None]:
shipping_distances = [72, 8, 153, 109, 151, 23, 186, 68, 13]

free_shipping_distances = []
for distance in shipping_distances:
    if distance < 30:
        free_shipping_distances.append(distance)
    
print(free_shipping_distances)

### Mutability
Below is a a simple program that finds and prints the largest value in a list. However, this particular implementation has a bug! Without running the code, can you spot the problem?

Whether you found it or not, run the program and see the outputs - this should reveal the bug. Do you know a way to fix this with a single line of code? A similar problem was addressed in the "Lists and References" of the workbook.

In [None]:
def find_largest(the_list):
    the_list.sort() # sort() function would change the object and thus all the references to this list would be modified.
    return the_list[-1]


my_list = [86, 80, 63, 48, 29, 97, 5, 2, 78, 0]
largest = find_largest(my_list)

print(f'The largest value in the below list is {largest}')
print(my_list)

The largest value in the below list is 97
[0, 2, 5, 29, 48, 63, 78, 80, 86, 97]


###### Solution

The method `sort` occurs in-place on lists, meaning that the object is modified, and that all references to this list will observe the change. Although we use a different reference to `my_list` inside the function (called `the_list`), there is only one list in this entire code block. Thus, the in-place operation results in the list being modified everywhere.

By copying the list first, the sort operation occurs only on the copy and leaves the original untouched. For a better solution than this, check out the bonus tasks at the end of the lab.

In [None]:
def find_largest(the_list):
    temp_list = the_list.copy()
    temp_list.sort()
    return temp_list[-1]


my_list = [86, 80, 63, 48, 29, 97, 5, 2, 78, 0]
largest = find_largest(my_list)

print(f'The largest value in the below list is {largest}')
print(my_list)

### Set Membership
Below is a very basic user management class. We desire that be capable of adding system administrators, and confirming whether a user is an administrator. Currently there is very little implemented, and it's up to you to implement the two missing method:

 1. `add_admin`: This method takes a username as an argument, and adds that user to the administrators set.
 2. `is_admin`: This method takes a username as an argument, and returns `True` if the user is an administrator.

_Hint: The required skills were introduced in the "Set Operations" section of the workbook._

In [None]:
class UserManagement:
    def __init__(self):
        self.administrators = set()

    # Implement the add_admin method here
    def add_admin(self, name1):
        self.administrators.add(name1)

    # Implement the is_admin method here
    def is_admin(self, name2):
        if name2 in self.administrators:
            print("True")
        else:
            print("False")


userMgmt = UserManagement()
userMgmt.add_admin('l.torvalds')
userMgmt.add_admin('d.ritchie')

print(userMgmt)

# Should return False
print(userMgmt.is_admin('m.zuckerberg'))

# Should return True
print(userMgmt.is_admin('l.torvalds'))

<__main__.UserManagement object at 0x7ff95c19ea00>
False
None
True
None


In [None]:
class UserManagement:
    def __init__(self):
        self.administrators = set()
    
    def add_admin(self, username):
        self.administrators.add(username)
    
    def is_admin(self, username):
        return username in self.administrators


userMgmt = UserManagement()
userMgmt.add_admin('l.torvalds')
userMgmt.add_admin('d.ritchie')

print(userMgmt)

# Should return False
print(userMgmt.is_admin('m.zuckerberg'))

# Should return True
print(userMgmt.is_admin('l.torvalds'))

<__main__.UserManagement object at 0x7ff93ec9bdc0>
False
True


###### Solution

In [None]:
class UserManagement:
    def __init__(self):
        self.administrators = set()
    
    def add_admin(self, username):
        self.administrators.add(username)
    
    def is_admin(self, username):
        return username in self.administrators

### Sets and Uniqueness
As you know by now, lists can contain practically any combination of values - repeated or unique. A special property of the `set` data structure is that it enforces uniqueness - there can only be at most one of any value.

When a set is created from a list like `set(my_list)`, duplicated entries are removed, potentially resulting in a set with fewer entries than the original list. This can be used to our advantage to check if all values in the original list are unique. Can you think how?

In the indicated section below, write some code which checks that all the provided lottery numbers are unique. If not, return False.

In [None]:
def lottery_numbers_valid(numbers):
    num_selected = len(numbers)
    if num_selected != 6:
        return False

    # Write your uniqueness check here and return False if there are non-unique values
    if len(set(numbers)) == len(numbers): # you can use set() to convert a list
        return True
    else:
        return False


# Should be invalid, as only 5 were selected
print(lottery_numbers_valid([32, 41, 17, 1, 9]))

# Should be invalid, as there are duplicated numbers
print(lottery_numbers_valid([32, 41, 17, 1, 9, 32]))

# Should be valid
print(lottery_numbers_valid([32, 41, 17, 1, 9, 25]))

False
False
True


In [None]:
# The simplification is very imporatant and useful
# return the conditon could give you the true/false directly

def lottery_numbers_valid(numbers):
    num_selected = len(numbers)
    if num_selected != 6:
        return False

    # Write your uniqueness check here and return False if there are non-unique values
    return len(set(numbers)) == len(numbers)
    # you can use set() to convert a list


# Should be invalid, as only 5 were selected
print(lottery_numbers_valid([32, 41, 17, 1, 9]))

# Should be invalid, as there are duplicated numbers
print(lottery_numbers_valid([32, 41, 17, 1, 9, 32]))

# Should be valid
print(lottery_numbers_valid([32, 41, 17, 1, 9, 25]))

False
False
True


###### Solution

If all entries in the original list were unique, the set would have the same length as the list. If the set has fewer entries, it means that some duplicated values were removed.

This trick of using a `set` to check the unique values is very convenient. Consider the alternative - you would need to compare every pair of values in the list and check that none are equal!

In [None]:
def lottery_numbers_valid(numbers):
    num_selected = len(numbers)
    if num_selected != 6:
        return False

    unique_numbers = set(numbers)
    num_unique = len(unique_numbers)

    if num_selected != num_unique:
        return False
    return True


# Should be invalid, as only 5 were selected
print(lottery_numbers_valid([32, 41, 17, 1, 9]))

# Should be invalid, as 7 were selected
print(lottery_numbers_valid([32, 41, 17, 1, 9, 25, 21]))

# Should be invalid, as there are duplicated numbers
print(lottery_numbers_valid([32, 41, 17, 1, 9, 32]))

# Should be valid
print(lottery_numbers_valid([32, 41, 17, 1, 9, 25]))

Our function works, but there's still one improvement to be made. Any time a boolean value is returned based upon some condition, such as:
```python
if some_condition:
    return True
return False
```
The if statement be removed, as it's equivalent to returning the condition itself:
```python
return some_condition
```

Thus, we can replace:
```python
if num_selected != num_unique:
    return False
return True
```
with:
```python
return num_selected == num_unique
```

_Note that the condition had to be inverted (`!=` changed to `==`)._

### Dictionaries
A clothing store receives orders via their website in the form of a list of items. It is your task to count the number of each item sold, so that the correct number can be dispatched from the warehouse.

Write a program below which counts the total number of each item in the order using a dictionary, before printing it to the screen.

_Hint: We need to handle items differently depending on whether it's the first time we've encountered it in the order. There are some relevant examples in the workbook._

In [None]:
sales = ['hat', 'pants', 'pants', 'shirt', 'pants', 'shirt']

counts = {}
# Write your counting solution here
for item in sales:
    if item in counts:
        counts[item] += 1
    else:
        counts[item] = 1
    
print(counts)

{'hat': 1, 'pants': 3, 'shirt': 2}


###### Solution

This is essentially the same as the counting example seen in the workbook. An important part of programming is identifying similarities between tasks, so existing solutions can be modified to solve new problems.

In [None]:
sales = ['hat', 'pants', 'pants', 'shirt', 'pants', 'shirt']

counts = {}
for item in sales:
    if item in counts:
        counts[item] = counts[item] + 1
    else:
        counts[item] = 1
    
print(counts)

### More Dictionaries
You are now asked to write some code for a POS (Point Of Sale) system which looks up item names and prices using their product code.You are provided with a large part of the required code, including a dictionary mapping product codes to names, and another mapping product codes to prices.

If you run the code now, you'll see that it fails! This is because there are product codes which aren't yet in the system. Your task is to fix this error by asking the user for the missing details when necessary. The example run below demonstrates the expected usage of the program.

```
hat: $21.70
pants: $55.00
pants: $55.00
Product code: 8496485676 not found.
   Enter item name: jacket
   Enter item price: 65
jacket: $65.00
jacket: $65.00
shirt: $22.90
```

_Hint: We have already seen how to check if a dictionary contains a key, so all that's left is to prompt the user and insert the new information where required._

In [None]:
product_codes = {
    '5467312287': 'hat',
    '1565467432': 'pants',
    '8534743578': 'shirt'
}
prices = {
    '5467312287': 21.70,
    '1565467432': 55,
    '8534743578': 22.90
}

sales = ['5467312287', '1565467432', '1565467432', '8496485676', '8496485676', '8534743578']

for code in sales:
    # Write code to ask the user for missing details when required
    if code in product_codes: # for loop would help
        print(f'{product_codes[code]}: ${prices[code]:.2f}')
    else:
        print(f'Product code: {code} not found.')
        product_codes[code] = input('    Enter item name: ')
        prices[code] = float(input('    Enter item name: '))
        print(f'{product_codes[code]}: ${prices[code]:.2f}')

hat: $21.70
pants: $55.00
pants: $55.00
Product code: 8496485676 not found.
    Enter item name: jacket
    Enter item name: 65
jacket: $65.00
jacket: $65.00
shirt: $22.90


###### Solution

It's important to always check if a key is in a dictionary before using it, rather than making any assumptions.

In [None]:
product_codes = {
    '5467312287': 'hat',
    '1565467432': 'pants',
    '8534743578': 'shirt'
}
prices = {
    '5467312287': 21.70,
    '1565467432': 55,
    '8534743578': 22.90
}

sales = ['5467312287', '1565467432', '1565467432', '8496485676', '8496485676', '8534743578']

for code in sales:
    if code not in product_codes:
        print(f'Product code: {code} not found.')
        product_codes[code] = input(f'   Enter item name: ')
        prices[code] = float(input(f'   Enter item price: '))

    print(f'{product_codes[code]}: ${prices[code]:.2f}')

hat: $21.70
pants: $55.00
pants: $55.00
Product code: 8496485676 not found.
   Enter item name: jacket
   Enter item price: 65
jacket: $65.00
jacket: $65.00
shirt: $22.90


## Bonus Tasks
It might be challenging to think of a solution to this bonus task, but once you figure it out - it'll seem easy! A good programmer always considers the performance implications of the code they write, especially if they're dealing with a large amount of data.

### Better `find_largest` Function
The `find_largest` method from earlier was fixed by simply copying the contents of the list. While it's a correct solution, it's not necessarily the best - what if the list contained a million entries? Copying such a long list might end up slowing down our program or using too much memory.

Can you come up with a solution that doesn't require copying the list?

In [None]:
def find_largest(the_list):
    # Implement your better solution here
    # the_list.sort()
    # return the_list[-1]
    max = 0
    for item in the_list:
        if max > item:
            return max
        else:
            max = item
            return max


my_list = [86, 80, 63, 48, 29, 97, 5, 2, 78, 0]
largest = find_largest(my_list)

print(f'The largest value in the below list is {largest}')
print(my_list)

The largest value in the below list is 86
[86, 80, 63, 48, 29, 97, 5, 2, 78, 0]
