# 02 Python Data Structures


## Plan for the Lecture:

1. Concept of a Data Structure

2. Array and Strings

3. Lists 

4. Tuples 

5. Sets

6. Dictionaries

## 1.0 Concept of a Data Structure

* Unlike a variable which stores one value at a time, a data structure is built to store a collection of values. 

* Indexed structures allow for random access (RAM) – can locate an item by the index location. An array and vector allow for this.  

* Non-indexed structures (or referenced) structures, on the other hand, are navigated sequentially. For example, a stream of data from the keyboard or from a file, or a linked list in which each node has a pointer the next in the sequence.



## 1.1 Applications of Data Structures

* The performance of typical operations (insert, delete, search and sort) vary across the structures. 

* Big O notation (complexity): constant, linear, polynomial, linearithmic, quadratic etc. 

* Path finding algorithms (search / navigation).

* Computer vision (representation of media as numbers).


## 2.0 Array 

* The items in an array are called elements.

* We specify how many elements an array will have when we declare the size of the array (if ‘fixed-size’), unlike flexible sized collections (e.g. ArrayList in Java).

* Elements are numbered and can referred to by number inside the `[ ]` is called the index. This is used when data is input and output.

* Can only store data if it matches the type the array is declared with.



## 2.1 Strings are an Array (or in Python, an str list)

* A String (str) object is an immutable array of characters. 

* Each character has a numbered position in the array (index):

* We can make use of functions to be able to perform operations on the string.



In [1]:
name = "Nick"

In [4]:
name[2]

'c'

In [5]:
i = 0
for x in name: 
    print("[" + str(i) + "]" + " : " + str(x))
    i += 1

[0] : N
[1] : i
[2] : c
[3] : k


In [7]:
i = 0
for x in name: 
    print("[", i, "]", " : ", x)
    i += 1

[ 0 ]  :  N
[ 1 ]  :  i
[ 2 ]  :  c
[ 3 ]  :  k


In [8]:
name[0]

'N'

In [9]:
name[3]

'k'

## 2.2 The `dir` of methods

In [10]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


In [11]:
name.find('c')

2

In [12]:
name.find('C')

-1

In [13]:
name.lower()

'nick'

In [14]:
name.upper()

'NICK'

## 3. Lists `[ ]`

* A list in Python does use the subscript operator `[ ]` typically associated with an array. Elements in this list are also indexed.

* The list will maintain a pointer (reference) to objects, rather the integer values (remember Python types are classes).

* Lists in python are resizable, unlike static arrays which are fixed.

* Python lists can store elements of different types, whereas arrays are declared to store values of one type.


In [15]:
l = [1,2.25,"Nick","N",True]
l

[1, 2.25, 'Nick', 'N', True]

In [16]:
l = [1,2,3,4,5,6]
l

[1, 2, 3, 4, 5, 6]

In [17]:
l[0]

1

In [18]:
l[-1]

6

In [19]:
l[-2]

5

In [20]:
l

[1, 2, 3, 4, 5, 6]

In [21]:
l[2:5]

[3, 4, 5]

In [22]:
type(l)

list

In [23]:
dir(list)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [24]:
l.append(7)
l

[1, 2, 3, 4, 5, 6, 7]

In [25]:
l.remove(7)
l

[1, 2, 3, 4, 5, 6]

## 4. Tuples in Python `( )`

* We’ve seen that a Python list is indexed and can store elements of different types (heterogeneity) 

* Tuples are constant (immutable) – once they are declared, they cannot be reassigned. 

* A list is declared with `[ ]` whereas the tuple is declared with `( )`

* We can still refer to elements in a tuple via the `[ ]` 


In [26]:
t = (1,2,3,4,5,6)
t

(1, 2, 3, 4, 5, 6)

In [28]:
t[2]

3

In [30]:
t[0] = 5

TypeError: 'tuple' object does not support item assignment

In [31]:
type(t)

tuple

In [32]:
dir(tuple)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index']

In [35]:
t1 = (1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3)

In [36]:
t1.count(2)

9

In [38]:
t

(1, 2, 3, 4, 5, 6)

In [37]:
t.index(5)

4

## 4.1 Tuples vs Lists 

* Tuples are immutable (constant) – once they are declared, they cannot be reassigned. 

* A list is mutable – elements can be reassigned. 

* A list is declared with `[ ]` whereas the tuple is declared with `( )`

* We can refer to elements in both a list and tuple via the `[ ]` 


## 5.0 Sets in Python `{ }`

* Sets in mathematics refer to a set of distinct numbers – there are no duplicates.

* Whilst one may try and assign multiple instances of the same value, the Python set only stores one instance of this value.

* Casting data to a set is a useful way to remove duplicates!

* Sets are declared with the `{ }`

* Sets are mutable (can change)


In [39]:
s = {1,2,3,4,5,6}
s

{1, 2, 3, 4, 5, 6}

In [40]:
s.add(7)
s

{1, 2, 3, 4, 5, 6, 7}

In [41]:
s.remove(7)
s

{1, 2, 3, 4, 5, 6}

In [44]:
s = {1,2,3,4,5,6,1,2,3,4,5,6}
s

{1, 2, 3, 4, 5, 6}

In [42]:
l = [1,1,2,2,3,3,4,4,5,5,6,6]
s = set(l)
s

{1, 2, 3, 4, 5, 6}

## 5.1 Set Theory 

* Intersect ` A & B `

* Union ` A | B `

* Difference ` A - B `

## 5.1.1 Set Intersect

In [43]:
s1 = {1,2,3,4,5,6}
s2 = {4,5,6,7,8,9}
s1 & s2


{4, 5, 6}

## 5.1.2 Set Union

In [47]:
s1 = {1,2,3,4,5,6}
s2 = {4,5,6,7,8,9}
s1 | s2


{1, 2, 3, 4, 5, 6, 7, 8, 9}

## 5.1.3 Set Difference

In [48]:
s1 = {1,2,3,4,5,6}
s2 = {4,5,6,7,8,9}
s1 - s2

{1, 2, 3}

In [49]:
s1 = {1,2,3,4,5,6}
s2 = {4,5,6,7,8,9}
s2 - s1

{7, 8, 9}

## 6.0 Dictionaries `{ k : v}`

* An English Dictionary would allow us to look up the definition of a word. We search the word to locate the definition. 

* In Python, we specify a key (word) to be able to get a value (definition). 

* Similar to an associative array, or a Map in Java.

* Like Set, Dictionaries also use the `{ }` but they feature : for a key and value pair  `{ k : v }`


In [44]:
d = {"USA": 200, "UK": 200, "EU": 200}
d


{'USA': 200, 'UK': 200, 'EU': 200}

In [51]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["UK"]


200

In [52]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["uk"]


KeyError: 'uk'

## 6.4 Append

In [45]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["Asia"] = 300
d


{'USA': 200, 'UK': 200, 'EU': 200, 'Asia': 300}

## 6.5 Remove

In [46]:
d = {"USA": 200, "UK": 200, "EU": 200, "Asia": 30}
del d["Asia"]
d


{'USA': 200, 'UK': 200, 'EU': 200}

In [55]:
type(d)

dict

In [47]:
dir(dict)

['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__ror__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [57]:
d = {"USA": 200, "UK": 200, "EU": 200}
print( d.keys() )
print( d.values() )


dict_keys(['USA', 'UK', 'EU'])
dict_values([200, 200, 200])


## Summary 

* You can distinguish between the key collections by the pairs of brackets used: 

| Structure | Brackets | Characteristics |
| ----------- | ----------- | --------- |
| Lists |	`[ , ]` | mutable |
| Tuples |	`( , )` | immutable | 
| Sets |	`{ , }`  | unique values (no duplicates) |
| Dict | `{k : v}` | key and value pairs |


## Exercise (bringing OOP and Types together)

Either use your Student class you created previously `student.py`, or create one afresh here.

Create a dictionary of modules where module codes are keys and student objects are values. For additional practice create a `module.py` class and create objects naming them by BNU's module code convention (e.g. `COM4008` and `COM5013`). Then store these module objects as keys, which associate with student objects as values.


In [None]:
# Using the student.py and/or module.py file(s), write your solution here or in a dedicated py file.


## Exercise 

Consider which data structure ('list', 'tuple', 'set', 'dict') would be the most appropriate to store marks for each student for a particular module as per above. 

Should these marks be added to the module dictionary above, or should you create something separate? Or would it be better to store these module marks in each student object? If so, which data structure would you use?

Have a think...

Extension: would a particular design be easier for producing statistics for each module? For example: how many students took the module, minimum mark, maximum mark, average mark etc

In [None]:
# Using the student.py and/or module.py file(s), write your solution here or in a dedicated py file.


## Exercise 

Write a function to find the mode (most frequent) number of a given Python list. 

Generate a list of 50 random numbers between 0-9 and test your function.


In [None]:
# Write your solution here.

## Exercise
Write one function which will return the intersection of two sets passed in. Write another function which will return the union of two sets passed in.

In [None]:
# Write your solution here.

## Exercise 

Write a function that will rotate any given list by a specified given number of positions $n$. 

For example: if you had the list `[6, 7, 8, 9, 10]` and this was required to be rotated by 2 positions, the output would be `[8, 9, 10, 6, 7]`. 

In [None]:
# Write your solution here.

## Exercise 

Write a function that flattens a nested list of arbitrary depth.

If the Input was a nested list, e.g., `[1, [2, 3], [[4, 5]], 6]`. The output would be a flattened list, e.g., `[1, 2, 3, 4, 5, 6]`.

In [None]:
# Write your solution here.

## Exercise 

Write a function that checks whether one set is a subset of another.

In [None]:
# Write your solution here.

## Exercise

Given a list of tuple pairs, swap the elements in each tuple.

`swap_tuples([(1, 2), (3, 4), (5, 6)])`  # Output: `[(2, 1), (4, 3), (6, 5)]`

In [None]:
# Write your solution here.

## Exercise 

Write a function that takes a list of words and groups them by their first letter into a dictionary.

For example: `["apple", "banana", "cherry", "avocado", "blueberry"]` would be organised as a dictionary: `{'a': ["apple", "avocado"], 'b': ["banana", "blueberry"], 'c': ["cherry"]}`.

In [None]:
# Write your solution here.

## Exercise 

Given a list of strings, group all the anagrams together.
Use a dictionary to store sorted words as keys and group the original words.

In [None]:
# Write your solution here.

## Exercise 

Write a function that merges two dictionaries. If a key appears in both, sum their values.

Example: Two dictionaries, e.g., `{'a': 1, 'b': 2}` and `{'b': 3, 'c': 4}` would become a merged dictionary: `{'a': 1, 'b': 5, 'c': 4}`.

In [None]:
# Write your solution here.

## Exercise 

Write a function that inverts a dictionary - the keys become values and the values become keys. If you have duplicate keys, group associated values in a list.

Extension: is there a way to do this without any temporary storage? 

In [None]:
# Write your solution here.

## Exercise: 
Given two strings, write a function to decide if one is a permutation of the other. 

In [None]:
# Write your solution here.

## Exercise
Given a string $s$ and a non-empty string $p$, find all the start indices of $p$’s anagrams in $s$.

` find_anagrams("cbaebabacd", "abc")  # [0, 6] `  
  ` find_anagrams("abab", "ab")         # [0, 1, 2]`

In [None]:
# Wrtite your solution here.

## Exercise (LCS)

You are given a list of integers, and your task is to find the longest subsequence of consecutive integers within the list. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. 

Write a Python function to solve this problem. Your function should return the longest consecutive subsequence found in the original list.

For example, given the input list: ``` [4, 2, 8, 5, 6, 7, 11, 12, 10]```

The longest consecutive subsequence is: ``` [4, 5, 6, 7, 8] ```


In [185]:
def longest_consecutive_subsequence(numbers):
    #write your solution here
    ...
    #write your solution above


numbers = [4, 2, 8, 5, 6, 7, 11, 12, 10]
result = longest_consecutive_subsequence(numbers)
print(result)  

[4, 5, 6, 7, 8]
