# 04 Python Data Structures II - Sets and Dictionaries


## Plan for the Lecture:

1. Recap on Lists and Tuples

2. Sets

3. Dictionaries

## 1. Recap on Lists and Tuples

* A list in Python does use the subscript operator `[ ]` typically associated with an array. Elements in this list are also indexed.

* The list will maintain a pointer (reference) to objects, rather the integer values (remember Python types are classes).

* Lists in python are resizable, unlike static arrays which are fixed.

* Python lists can store elements of different types, whereas arrays are declared to store values of one type.


## 1.1 Lists & Arrays 

* The items in an array are called elements.

* We specify how many elements an array will have when we declare the size of the array (if ‘fixed-size’), unlike flexible sized collections (e.g. ArrayList in Java).

* Elements are numbered and can referred to by number inside the `[ ]` is called the index. This is used when data is input and output.

* Can only store data if it matches the type the array is declared with.



In [None]:
l = [1,2.25,"Nick","N",True]
l

<img src="https://scaler.com/topics/images/character-in-character-array.webp" alt="char_array" width="650"> 

## 1.2 Tuples in Python `( )`

* We’ve seen that a Python list is indexed and can store elements of different types (heterogeneity) 

* Tuples are constant (immutable) – once they are declared, they cannot be reassigned. 

* A list is declared with `[ ]` whereas the tuple is declared with `( )`

* We can still refer to elements in a tuple via the `[ ]` 


In [None]:
t = (1,2,3,4,5,6)
t

## 1.3 Tuples vs Lists 

* Tuples are immutable (constant) – once they are declared, they cannot be reassigned. 

* A list is mutable – elements can be reassigned. 

* A list is declared with `[ ]` whereas the tuple is declared with `( )`

* We can refer to elements in both a list and tuple via the `[ ]` 


## 2.0 Sets in Python `{ }`

* Sets in mathematics refer to a set of distinct numbers – there are no duplicates.

* Whilst one may try and assign multiple instances of the same value, the Python set only stores one instance of this value.

* Casting data to a set is a useful way to remove duplicates!

* Sets are declared with the `{ }`

* Sets are mutable (can change)


In [None]:
s = {1,2,3,4,5,6}
s

In [None]:
s.add(7)
s

In [None]:
s.remove(7)
s

In [None]:
s = {1,2,3,4,5,6,1,2,3,4,5,6}
s

In [None]:
l = [1,1,2,2,3,3,4,4,5,5,6,6]
s = set(l)
s

## 2.1 Standard Sets and Notation (from Discrete Mathematics)

* $ \mathbb{N} = \{ 0, 1, 2, 3, ...\} $ the set of all non-negative integers

* $ \mathbb{Z} = \{..., -3, -2, -1, 0, 1, 2, 3, ...\} $ the set of all integers

* $ \mathbb{Q}$, the set of all rational numbers (fractions), i.e. numbers of the form $a/b$ where $a,b$ are integers with $b \neq 0$

* $ \mathbb{R}$, the set of all real numbers, also denoted as as $(-\infty, +\infty)$

In [7]:
N = {0, 1, 2, 3, 4, 5, ...}
N

{0, 1, 2, 3, 4, 5, Ellipsis}

## 2.2 Specifying Sets via Condition 

* $ \in $ means <u>in</u> 

    * if $ x \in \mathbb{N} $ then $x$ is <u>in</u> $\mathbb{N}$ - set of non-negative integers.

* $\mid$ means <u>such that</u>

    * if $ S = \{ x \in \mathbb{R} \mid x > 0 \} $, then S is going to hold real numbers greater than zero.

* $ S = \{ x \in \mathbb{R} \mid x > 0 \} $

In [18]:
s = set()
l = [-2, -1, 0, 1, 2, 3]

for number in l: 
    if number > 0:
        s.add(number)
        
s
    

{1, 2, 3}

## 2.3 Subsets and Set Equality

* $ \subset $ means subset

* subset $ A \subset B $

In [19]:
A = {1,2,3}
B = {1,2,3,4,5,6}

In [20]:
A.issubset(B)

True

In [21]:
A = {1,2,3}
B = {4,5,6}

In [22]:
A.issubset(B)

False

## 2.4 Union, Intersection and Difference 

* Intersect $ A \cap B $ = ` A & B ` in Python

* Union $ A \cup B $ =  ` A | B ` in Python

* Difference $ A \setminus B $ =  ` A - B ` in Python

## 2.4.1 Set Intersect

In [None]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
A & B


## 2.4.2 Set Union

In [None]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
A | B


## 2.4.3 Set Difference

In [None]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
A - B

In [None]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
B - A

## 3.0 Dictionaries `{ k : v}`

* An English Dictionary would allow us to look up the definition of a word. We search the word to locate the definition. 

* In Python, we specify a key (word) to be able to get a value (definition). 

* Similar to an associative array, or a Map in Java.

* Like Set, Dictionaries also use the `{ }` but they feature : for a key and value pair  `{ k : v }`


In [None]:
d = {"USA": 200, "UK": 200, "EU": 200}
d


In [None]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["UK"]


In [None]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["UK"]


## 3.1 Append

In [None]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["Asia"] = 300
d


## 3.2 Remove

In [None]:
d = {"USA": 200, "UK": 200, "EU": 200, "Asia": 30}
del d["Asia"]
d


In [None]:
type(d)

In [None]:
dir(dict)

In [None]:
d = {"USA": 200, "UK": 200, "EU": 200}
print( d.keys() )
print( d.values() )


## JSON 

* JavaScript Object Notation (JSON) is a common format for storing data. 

* Like Python dictionaries, it also uses Key : Value pairs.

* Keys translate to columns, and values translate to rows. 

## 3.1 JSON

* JavaScript Object Notation (JSON) is near universal form that REST APIs use today to communicate data. 

* We can import this format into Python. 

* As you'll see, JSON utilises a similar key and value pair to the `dict` Python structure/type.

* Python dict -> JSON string
* `json_str = json.dumps(person)`


* JSON string -> Python dict
* `person_dict = json.loads(json_str)`

In [1]:
import json

In [None]:
py_dict = {
  "name": "John",
  "age": 30,
  "city": "New York"
}

json_str = json.dumps(py_dict)

print(json_str)

{"name": "John", "age": 30, "city": "New York"}


In [26]:
json_str = '{ "name":"John", "age":30, "city":"New York"}'

py_dict = json.loads(json_str)
print(type(py_dict))
print(py_dict)

<class 'dict'>
{'name': 'John', 'age': 30, 'city': 'New York'}


## Very simple API example

* https://randomuser.me/api/ generates a fictitious user's data in the form of JSON. 

* Navigate to the website to see the `{ k : v }` format


In [22]:
import requests

url = "https://randomuser.me/api/"
response = requests.get(url) # Send a GET request to the API

data = response.json() # Convert the response to a Python dictionary

In [23]:
print(data)

{'results': [{'gender': 'male', 'name': {'title': 'Mr', 'first': 'Arnaud', 'last': 'Ma'}, 'location': {'street': {'number': 7245, 'name': '3rd St'}, 'city': 'Stratford', 'state': 'Nova Scotia', 'country': 'Canada', 'postcode': 'D1E 2D4', 'coordinates': {'latitude': '-65.9229', 'longitude': '-122.9170'}, 'timezone': {'offset': '-8:00', 'description': 'Pacific Time (US & Canada)'}}, 'email': 'arnaud.ma@example.com', 'login': {'uuid': '292b1d8b-b845-4d03-8c8b-4b5aa3beffbf', 'username': 'lazypanda344', 'password': 'rock', 'salt': 'SFsu2apW', 'md5': 'c0f7914ee5cd2c253e14e86c9b0d83da', 'sha1': '2f548a6652734bc428a07c10ad9068309dd96ac3', 'sha256': 'a139ab32b0ef73f49dad3947ffb91b2c03fd9677ba98ca76c9f48f7cff71f376'}, 'dob': {'date': '1985-02-21T19:43:08.148Z', 'age': 40}, 'registered': {'date': '2008-09-01T04:56:57.969Z', 'age': 17}, 'phone': 'Q27 X99-7948', 'cell': 'Q67 L17-9714', 'id': {'name': 'SIN', 'value': '692367139'}, 'picture': {'large': 'https://randomuser.me/api/portraits/men/65.jpg'

In [24]:
print(data["results"][0]["name"])

{'title': 'Mr', 'first': 'Arnaud', 'last': 'Ma'}


In [25]:
type(data)

dict

Python is useful language for data science!

## Summary 

* You can distinguish between the key collections by the pairs of brackets used: 

| Structure | Brackets | Characteristics |
| ----------- | ----------- | --------- |
| Lists |	`[ , ]` | mutable |
| Tuples |	`( , )` | immutable | 
| Sets |	`{ , }`  | unique values (no duplicates) |
| Dict | `{k : v}` | key and value pairs |


#### This Jupyter Notebook contains exercises for you to extend your introduction to OOP, by creating lists, tuples, sets, dictionaries of objects. Attempt the following exercises, which slowly build in complexity. If you get stuck, check back to the <a href = "https://www.youtube.com/watch?v=359eGFD7hS4"> Python lecture recording on Data Structures here</a> or view the <a href = "https://www.w3schools.com/python/python_lists.asp">W3Schools page on Python Variables</a>, which includes examples, exercises and quizzes to help your understanding. 

### Exercise 1:

Create a Python dictionary (`dict`) which stores the price for three items of food. For example; milk is £1.30, pasta is £0.75, and strawberries are £1.50. Output the dictionary to check the `values` are stored, and then see if you can access the price for one of the items by using the item name as the `key`.

Extension: Now add a new `key` and `value` pair to previously defined dictionary.

In [None]:
#Write your solution here


### Exercise 2: 

Write one function which will return the intersection of two `sets` passed in.

Write another function which will return the union of two `sets` passed in.

In [None]:
a = {1,2,3,4,5,6,7,8,9,10}
b = {7,8,9,10,11,12,13,14}

#write your solution here

### Exercise 3:
Write a function that will generate the multiplications of a number passed in, and store each multiple in a `set` of values.

For example, if the value 5 is passed in, then generate the 5 times table. The values of the multiplication table should be stored in indivdiual elements (max 12) of a `set`. The `set` should be returned at the end of the function. 


In [None]:
def generate_multiples():
    ... # Write your solution here.

In [None]:
generate_multiples()

### Exercise 4: 

Given two `sets` (prices and food names), can you create a dictionary (`dict`) that uses the foodnames as `keys`, and the prices as `values`?


In [None]:
foodnames = {"milk", "pasta", "strawberries"}
prices = {1.30, 0.75, 1.50}

print(foodnames)
print(prices)

# Write your solution: 


### (Bonus) Exercise (in the style of an interview question)

You are given a list of integers, and your task is to find the longest subsequence of consecutive integers within the list. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. 

Write a Python function to solve this problem. Your function should return the longest consecutive subsequence found in the original list.

For example, given the input list: ``` [4, 2, 8, 5, 6, 7, 11, 12, 10]```

The longest consecutive subsequence is: ``` [4, 5, 6, 7, 8] ```


In [None]:
def longest_consecutive_subsequence(numbers):
    #write your solution here
    ...
    #write your solution above


numbers = [4, 2, 8, 5, 6, 7, 11, 12, 10]
result = longest_consecutive_subsequence(numbers)
print(result)  