<a href="https://colab.research.google.com/github/deliabel/CodeDivisionWorksheets/blob/main/Copy_of_Lists_and_Tuples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lists and tuples

Often we need to store a number of single items of data together so that they can be processed together. This might be because all the data refers to one person (e.g. name, age, gender, etc) OR it might be because we have a set of data (e.g. all the items that should be displayed in a drop down list, such as all the years from this year back to 100 years ago so that someone can select their year of birth)

Python has a range of data structures available including:
*   lists  
*   tuples  
*   dictionaries  
*   sets

This worksheet looks at lists and tuples.

## List
A list is a set of related, individual data objects, that are indexed and can be processed as a whole, as subsets or as individual items.  Lists are stored, essentially, as contiguous items in memory so that access can be as quick as possible.  However, they are mutable (they can be changed after they are created and stored) and so those mechanisms need to include extra functionality to deal with changing list sizes.

## Tuple
Essentially the same as a list but it is immutable.  Once it has been created it can't be changed.  It is stored in memory as contiguous items, with the size required being fixed right from the start.  This makes it faster to access.

The code below will create two lists and a tuple.
*   the first list contains 1000 random numbers between 1 and 100
*   the second list is of random length (up to 5000) and each item is one of the 9 characteristics that are protected under the Equality Act in the UK.
*   the tuple contains the 9 protected characteristics

Before you start the exercises, run the code below.  It will generate the lists and tuple so that you can use them in the exercises.  If you need to recreate the lists again (because you have changed them and need to work on the originals, just run this cell again).

***Note:***  *a list variable contains a reference to the start of the list in memory, rather than storing the list itself.  This means that if you assign the list to another variable (to make a copy), it will only copy across the reference.  If you change the copy, you change the original list.*

*If you need to make a copy of the list you will need to use a loop to create a new list and copy all items across.*

In [None]:
from random import randint, choice

def get_num_list():
  num_list = [randint(1,100) for n in range(1000)]
  return num_list

def get_protected_characteristics():
  characteristics_tuple = ('age','disability','gender reassignment','marriage and civil partnership','pregnancy and maternity','race','religion or belief','sex','sexual orientation')
  return characteristics_tuple

def get_protected_characteristic_list(protected_characteristics):
  char_list = [choice(protected_characteristics) for ch in range(randint(1,5000))]
  return char_list

nums = get_num_list()
protected_characteristics = get_protected_characteristics()
characteristics = get_protected_characteristic_list(protected_characteristics)

## The exercises below will use the lists:  
*   **nums** (a list of between 1 and 1000 random numbers, each number is between 0 and 1000)
*   **characteristics** (a list of 5000 random protected_characteristics)

and the tuple:
*  **protected_characteristics** (a set of the 9 protected characteristics identified in the Equality Act)

## You can run the cell above any number of times to generate new lists.

---
### Exercise 1 - list head, tail and shape

Write a function, **describe_list()** which will:
*  print the length of the list `nums`
*  print the first 10 items in `nums`  
*  print the last 5 items in `nums`

In [None]:
def describe_list():
  print(len(nums))
  print(nums[:5])
  print(nums[-5:])

describe_list()

1000
[32, 53, 98, 31, 83]
[77, 32, 74, 17, 37]


---
### Exercise 2 - show tuple items

Write a function which will:
*   use a loop to print the list of protected characteristics from the `protected_characteristics` tuple.


In [None]:
def print_charas():
  size = len(protected_characteristics)
  #print(size)
  for i in range(0, size): #do you not need the range?
    print(protected_characteristics[i])

print_charas()
print("....")

def print_charas_2():
  #size = len(protected_characteristics)
  #print(size)
  for i in protected_characteristics: #seems not. advantages/disadvantages?: range allows you to choose to start in the middle, etc. this just does the whole list.
    print(i)

print_charas_2()

age
disability
gender reassignment
marriage and civil partnership
pregnancy and maternity
race
religion or belief
sex
sexual orientation
....
age
disability
gender reassignment
marriage and civil partnership
pregnancy and maternity
race
religion or belief
sex
sexual orientation


---
### Exercise 3 - list a random subset

Write a function which will:
*  calculate the position of the middle item in the `characteristics` list   
(*Hint: use len() to help with this*)
*  calculate the position of the item that is 5 places before the middle item
*  calculate the position of the item that is 5 places after the middle item
*  print the part of the list that includes the items from 5 places before to 5 places after.  

Expected output:  
Your list will include 11 items.

In [None]:
def chara_subset():
  size = len(characteristics)
  middle = size // 2
  five_before = middle - 5
  five_after = middle + 5
  #print(size, middle, five_before, five_after)
  for i in range(five_before, five_after + 1): # can a slice be used instead?
    print(characteristics[i])

chara_subset()
print("....")

def chara_subset_2():
  size = len(characteristics)
  middle = size // 2
  five_before = middle - 5
  five_after = middle + 5
  #print(size, middle, five_before, five_after)
  for i in characteristics[five_before:five_after + 1]: # yep.
    print(i)

chara_subset_2()

marriage and civil partnership
sexual orientation
marriage and civil partnership
gender reassignment
sex
religion or belief
gender reassignment
religion or belief
age
religion or belief
gender reassignment
....
marriage and civil partnership
sexual orientation
marriage and civil partnership
gender reassignment
sex
religion or belief
gender reassignment
religion or belief
age
religion or belief
gender reassignment


---
### Exercise 4 - create a copy

Write a function which will: use a for loop to create a copy of the `nums` list:

*   create a new, empty, list called **new_nums**  (*Hint: an empty list is [ ]*)
*   use a for loop which uses the following syntax:  `for num in nums:`
*   each time round the loop append `num` to `new_nums`  ( *`new_nums.append(num)`*)
*   print the first 10 items of `new_nums`
*   print the first 10 items of `nums`
*   print the length of both lists

In [None]:
def make_nums_copy():
  copy_nums = []
  for num in nums:
    copy_nums.append(num)
  print("    copy:", copy_nums[:10], "length =", len(copy_nums))
  print("original:", nums[:10], "length =", len(nums))

make_nums_copy()

    copy: [17, 3, 100, 21, 80, 51, 58, 4, 3, 100] length = 1000
original: [17, 3, 100, 21, 80, 51, 58, 4, 3, 100] length = 1000




```
# This is formatted as code
```

---
### Exercise 5 - count the occurrence of age in characteristics

Write a function which will use the list method:

`num_items = list_name.count(item)`

to count the number of occurrences of 'age' in the `characteristics` list.  Print the result.

In [None]:
def count_ages():
  num_ages = characteristics.count("age")
  return num_ages

print(count_ages())

69


---
### Exercise 6 - sort the nums list

Write a function which will:
*   call the function `get_num_list()` and store the result in a new list called **sort_nums**
*   print the first, and last, 20 items in the `sort_nums` list
*   use the `list_name.sort()` method to sort the `sort_nums` list into ascending order
*   print the first, and last, 20 items again  
*   use the `list_name.sort()` method again to sort the `sort_nums` list into descending order
*   print the first, and last, 20 items again

In [None]:
def sort_nums_list():
  sort_nums = get_num_list()
  print(sort_nums[:20], "...", sort_nums[-20:])
  sort_nums.sort()
  print(sort_nums[:20], "...", sort_nums[-20:])
  sort_nums.sort()
  print(sort_nums[:20], "...", sort_nums[-20:])

sort_nums_list()

[59, 2, 3, 85, 59, 8, 43, 22, 13, 62, 25, 86, 49, 98, 62, 50, 35, 82, 28, 42] ... [72, 73, 16, 67, 62, 80, 84, 60, 17, 81, 69, 74, 90, 70, 50, 82, 9, 68, 94, 85]
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3] ... [99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3] ... [99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]


---
### Exercise 7 - get statistics (max(), min(), sum() )

Write a function which will:
*   print the maximum and minimum numbers in the `nums` list  
*   print the sum of the `nums` list
*   calculate and print the average of the `nums` list (using `len()` to help)

In [None]:
def get_stats():
  max_nums = max(nums)
  min_nums = min(nums)
  print("highest:", max_nums, "\n lowest:", min_nums)
  sum_nums = sum(nums)
  print("sum of list:", sum_nums)
  mean_nums = sum_nums / len(nums)
  print("mean of list:", mean_nums)

get_stats()

highest: 100 
 lowest: 1
sum of list: 50231
mean of list: 50.231


---
### Exercise 8 - percentage difference

Write a function which will:
*   generate a new list called **ex8_nums** using `get_num_list()`
*   calculate and print the percentage difference between the first number in each list (as a percentage of the number in the nums list) (Hint:  find the difference between the two numbers, divide the difference by the number in `nums` and multiply by 100)
*   calculate and print the percentage difference between the last numbers in each list in the same way
*   calculate and print the percentage difference between the middle numbers in each list in the same way.
*   calculate and print the percentage difference between the sums of each list in the same way

In [None]:
def percentage_diff():
  ex8_nums = get_num_list()
  first_diff = (abs(nums[0] - ex8_nums[0]) / nums[0]) * 100
  first_diff = round(first_diff, 2)
  print("the first of ex8 list is", first_diff, "% of the first of nums list")

  last_diff = (abs(nums[-1] - ex8_nums[-1]) / nums[-1]) * 100
  last_diff = round(last_diff, 2)
  print("the last of ex8 list is", last_diff, "% of the last of nums list")

  mid_nums = len(nums) // 2
  mid_ex8_nums = len(ex8_nums) // 2
  mid_diff = (abs(nums[mid_nums] - ex8_nums[mid_ex8_nums]) / nums[mid_nums]) * 100
  mid_diff = round(mid_diff, 2)
  print("the middle of ex8 list is", mid_diff, "% of the middle of nums list")

  sum_nums = sum(nums)
  sum_ex8_nums = sum(ex8_nums)
  sum_diff = (abs(sum_nums - sum_ex8_nums) / sum_nums) * 100
  sum_diff = round(sum_diff, 2)
  print("the sum of ex8 list is", last_diff, "% of the sum of nums list")

percentage_diff()
# is this ok? not sure I followed the instructions

the first of ex8 list is 258.82 % of the first of nums list
the last of ex8 list is 14.75 % of the last of nums list
the middle of ex8 list is 261.54 % of the middle of nums list
the sum of ex8 list is 14.75 % of the sum of nums list


---
### Exercise 9 - characteristic counts

Write a function which will:
*  iterate through the `protected_characteristics` tuple and for each **characteristic**:
*   *   count the number of occurrences of that `characteristic` in the `characteristics` list
*   *   print the `protected_characteristic` and the **count**  

Example expected output:

age 100  
disability 120  
gender reassignment 120  
marriage and civil partnership 111  
pregnancy and maternity 103  
race 106  
religion or belief 95  
sex 110  
sexual orientation 113  

Extra learning:  you can read [here](https://thispointer.com/python-how-to-pad-strings-with-zero-space-or-some-other-character/) how to justify the printed characteristic so that the output is organised into two columns as shown below:  
![tabulated output](https://drive.google.com/uc?id=1CCXfX6K5ZeDefnq7vUsqxCDmqvcfY8Mz)





In [None]:
def count_charas():
  longest_label = max(protected_characteristics, key=len)
  length_longest = len(longest_label)
  heading = "Protected Characteristic"
  print(heading.ljust(length_longest, ' '), "Frequency \n")
  for chara in protected_characteristics:
    freq_chara = characteristics.count(chara)
    chara = chara.ljust(length_longest, ' ')
    print(chara, freq_chara)

count_charas()

Protected Characteristic       Frequency 

age                            276
disability                     261
gender reassignment            272
marriage and civil partnership 295
pregnancy and maternity        287
race                           258
religion or belief             253
sex                            270
sexual orientation             281


---
### Exercise 10 - characteristics statistics

Assuming that the `characteristics` list may have been taken from a study of cases that have been taken to court in relation to the Equality Act.  

Write a function which will:

*   find the most common characteristic resulting in court action, from this population
*   print this in a message, e.g. The characteristic with the highest number of court cases is:  *characteristic*
*   print the list of `protected_characteristics`, on one line if possible - see [here](https://www.geeksforgeeks.org/g-fact-25-print-single-multiple-variable-python/)
*   ask the user to enter a characteristic that they would like to see statistics on and use a while loop to continue until the user has entered a valid characteristic
*   print the characteristic, its frequency and the percentage that this frequency is of the whole population.

In [None]:
def chara_stats():
  chara_count = []
  for charas in protected_characteristics:
    freq_chara = characteristics.count(charas)
    add_on = [freq_chara]
    chara_count = chara_count + add_on           # makes into a list of frequencies
  common_chara = max(chara_count)                # finds highest in list
  ref1 = chara_count.index(common_chara)         # finds index of highest value in list
  common_chara_name = protected_characteristics[ref1]  # gets the characteristic with the same index from the tuple
  print("The characteristic with the highest number of court cases is:", common_chara_name)

  size = len(protected_characteristics)
  for i in range(0, size -1):
    print(protected_characteristics[i], end = ', ')
  print(protected_characteristics[-1])           # prints list of characteristics

  chara = input("please enter a characteristic to see its statistics: ")
  chara = chara.lower()
  while protected_characteristics.count(chara) == 0:  # checks it against the tuple
    chara = input("that is not a characteristic. please check the list and enter a valid characteristic: ")
    chara = chara.lower()

  ref2 = protected_characteristics.index(chara)  # finds index of requested characteristic in tuple
  chara_freq = chara_count[ref2]                 # gets the number with the same index from the list of frequencies
  chara_stat = (chara_freq / len(characteristics)) *100
  chara_stat = round(chara_stat, 2)
  print("The requested protected characteristic,", chara + ", occurs", chara_freq, "times in the court record. This is", chara_stat, "% of the entries.")

chara_stats()

The characteristic with the highest number of court cases is: marriage and civil partnership
age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, sexual orientation
please enter a characteristic to see its statistics: egg
that is not a characteristic. please check the list and enter a valid characteristic: age
The requested protected characteristic, age, occurs 276 times in the court record. This is 11.25 % of the entries.


just notes below here

In [None]:
#test space / notes from experimenting

'''

chara = input("enter chara: ")
chara = chara.lower()
while protected_characteristics.count(chara) == 0:
  chara = input("not right, enter chara: ")
  chara = chara.lower()
print(chara, "good")

'''

print("Hello", "how are you?", sep="---") # sep doesn't apply in an iteration, since only one thing is in the brackets

for charas2 in protected_characteristics: # prints things in tuple
   print(charas2, end = '  ')
print("next")

   # when running this, the next line printed hops up to the line with the displayed list for some reason.
   # I think it is because the default end is \n, so when you change that, it takes away the new line that would usually follow.

Hello---how are you?
age  disability  gender reassignment  marriage and civil partnership  pregnancy and maternity  race  religion or belief  sex  sexual orientation  next


In [None]:
# this box has my plans for last exercise, with some notes from Karen's lesson about this worksheet.
# In a code box because I had inteded to convert it into code as I went, but now it has notes mixed in.


  # chara_count = copy previous exercise?
  # highest_chara = find highest frequency in chara_count list
  # print("The characteristic with the highest number of court cases is:", chara)

  # copy exercise 2?
  # for i in protected_characteristics:
  #   print[i]
  # also, notes below about printing

  # chara = input("please enter a characteristic to see its statistics: ")
  # while chara != one from list (can i check it aganst the list?):
  #  chara = input("that is not a characteristic. please check the list and enter a valid characcteristic: ")

  # refer to chara_count again
  # chara_freq = result from  chara_count list that matches chara
  # assuming "whole poplation" means size of the charas list, chara_stat = (chara_freq / len(characteristics)) *100
  # print(chara, chara_freq, chara_stat)


#  notes:

  # print(chara, ", ", end = " ") <- puts all on one line. end tells it how to seperate the items
  # if you just print(list) then you get the '' around the string too

  #print(char + ", ", end = ".") <- removes space by concatonating strings
  # print(protected_charas[0], end='')
  # for i in range(1, len(protected_chars)):
  #   print(", "+protected_chras[i], end='')
  # could put an if inside the loop, but this would be less efficient it would only be true the first time, but it would check all the times.
  # print(characteristics.count("age"))

In [None]:
  # compare first label to next label
  # keep longer

def find_longest():
  label_length1 = 0
  for i in protected_characteristics:
    label_length2 = len(i)
    if label_length2 < label_length1:
      label_length1 = label_length1
    if label_length2 >= label_length1:
      label_length1 = label_length2
    print(label_length2, label_length1)

  print(label_length1)

  # forgot about max(), but sort of knew there must be an inbuilt way. keeping for learning
  # max() didn't work until I added key = len, copied from an insruction page found by searching. don't know what this does
  # page: https://pythonexamples.org/python-find-longest-string-in-list/