# 04 Python Data Structures II - Sets and Dictionaries


## Plan for the Lecture:

1. Recap on Lists and Tuples

2. Sets

3. Dictionaries

## 1. Recap on Lists and Tuples

* A list in Python does use the subscript operator `[ ]` typically associated with an array. Elements in this list are also indexed.

* The list will maintain a pointer (reference) to objects, rather the integer values (remember Python types are classes).

* Lists in python are resizable, unlike static arrays which are fixed.

* Python lists can store elements of different types, whereas arrays are declared to store values of one type.


In [37]:
def get_first(l):
    return l[0]

In [39]:
def get_last(l):
    return l[-1]

In [40]:
l = [1,2,3,4]
print(get_first(l))
print(get_last(l))

1
4


In [41]:
l.append(5)
l

[1, 2, 3, 4, 5]

In [42]:
l.remove(1)
l

[2, 3, 4, 5]

In [48]:
t = (1,1,1,1,2,3,4,5)
t

(1, 1, 1, 1, 2, 3, 4, 5)

In [45]:
t[0] = 0

TypeError: 'tuple' object does not support item assignment

In [49]:
t.count(1)

4

## 1.1 Lists & Arrays 

* The items in an array are called elements.

* We specify how many elements an array will have when we declare the size of the array (if ‘fixed-size’), unlike flexible sized collections (e.g. ArrayList in Java).

* Elements are numbered and can referred to by number inside the `[ ]` is called the index. This is used when data is input and output.

* Can only store data if it matches the type the array is declared with.



In [None]:
l = [1,2.25,"Nick","N",True]
l

<img src="https://scaler.com/topics/images/character-in-character-array.webp" alt="char_array" width="650"> 

## 1.2 Tuples in Python `( )`

* We’ve seen that a Python list is indexed and can store elements of different types (heterogeneity) 

* Tuples are constant (immutable) – once they are declared, they cannot be reassigned. 

* A list is declared with `[ ]` whereas the tuple is declared with `( )`

* We can still refer to elements in a tuple via the `[ ]` 


In [None]:
t = (1,2,3,4,5,6)
t

## 1.3 Tuples vs Lists 

* Tuples are immutable (constant) – once they are declared, they cannot be reassigned. 

* A list is mutable – elements can be reassigned. 

* A list is declared with `[ ]` whereas the tuple is declared with `( )`

* We can refer to elements in both a list and tuple via the `[ ]` 


## 2.0 Sets in Python `{ }`

* Sets in mathematics refer to a set of distinct numbers – there are no duplicates.

* Whilst one may try and assign multiple instances of the same value, the Python set only stores one instance of this value.

* Casting data to a set is a useful way to remove duplicates!

* Sets are declared with the `{ }`

* Sets are mutable (can change)


In [52]:
s = {1,2,3,4,5,6}
s

{1, 2, 3, 4, 5, 6}

In [54]:
dir(set)

['__and__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

In [60]:
s.add(6)
s

{1, 2, 3, 4, 5, 6, 7, 8}

In [62]:
s.remove(8)
s

{1, 2, 3, 4, 5, 6}

In [63]:
s = {1,2,3,4,5,6,1,2,3,4,5,6}
s

{1, 2, 3, 4, 5, 6}

In [64]:
l = [1,1,2,2,3,3,4,4,5,5,6,6]
s = set(l)
s

{1, 2, 3, 4, 5, 6}

## 2.1 Standard Sets and Notation (from Discrete Mathematics)

* $ \mathbb{N} = \{ 0, 1, 2, 3, ...\} $ the set of all non-negative integers

* $ \mathbb{Z} = \{..., -3, -2, -1, 0, 1, 2, 3, ...\} $ the set of all integers

* $ \mathbb{Q}$, the set of all rational numbers (fractions), i.e. numbers of the form $a/b$ where $a,b$ are integers with $b \neq 0$

* $ \mathbb{R}$, the set of all real numbers, also denoted as as $(-\infty, +\infty)$

In [65]:
N = {0, 1, 2, 3, 4, 5, ...}
N

{0, 1, 2, 3, 4, 5, Ellipsis}

## 2.2 Specifying Sets via Condition 

* $ \in $ means <u>in</u> 

    * if $ x \in \mathbb{N} $ then $x$ is <u>in</u> $\mathbb{N}$ - set of non-negative integers.

* $\mid$ means <u>such that</u>

    * if $ S = \{ x \in \mathbb{R} \mid x > 0 \} $, then S is going to hold real numbers greater than zero.

* $ S = \{ x \in \mathbb{R} \mid x > 0 \} $

In [66]:
s = set()
l = [-2, -1, 0, 1, 2, 3]

for x in l: 
    if x > 0:
        s.add(x)
        
s
    

{1, 2, 3}

In [68]:
l

[-2, -1, 0, 1, 2, 3]

In [69]:
l[0:4]

[-2, -1, 0, 1]

## 2.3 Subsets and Set Equality

* $ \subset $ means subset

* subset $ A \subset B $

In [70]:
A = {1,2,3}
B = {1,2,3,4,5,6}

In [71]:
A.issubset(B)

True

In [72]:
B.issubset(A)

False

In [73]:
A = {1,2,3}
B = {4,5,6}

In [74]:
A.issubset(B)

False

In [75]:
B.issubset(A)

False

## 2.4 Union, Intersection and Difference 

* Intersect $ A \cap B $ = ` A & B ` in Python

* Union $ A \cup B $ =  ` A | B ` in Python

* Difference $ A \setminus B $ =  ` A - B ` in Python

## 2.4.1 Set Intersect

In [81]:
mark = 145

In [82]:
if mark < 0 or mark > 100:
    print("Invalid mark!")

Invalid mark!


In [78]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
A & B

{4, 5, 6}

## 2.4.2 Set Union

In [83]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
A | B


{1, 2, 3, 4, 5, 6, 7, 8, 9}

## 2.4.3 Set Difference

In [84]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
A - B

{1, 2, 3}

In [85]:
A = {1,2,3,4,5,6}
B = {4,5,6,7,8,9}
B - A

{7, 8, 9}

## 3.0 Dictionaries `{ k : v}`

* An English Dictionary would allow us to look up the definition of a word. We search the word to locate the definition. 

* In Python, we specify a key (word) to be able to get a value (definition). 

* Similar to an associative array, or a Map in Java.

* Like Set, Dictionaries also use the `{ }` but they feature : for a key and value pair  `{ k : v }`


In [None]:
s = {1,2,3,4,5}

In [86]:
d = {"USA": 200, "UK": 200, "EU": 200}
d


{'USA': 200, 'UK': 200, 'EU': 200}

In [89]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["UK"]


200

In [91]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["uk"]


KeyError: 'uk'

## 3.1 Append

In [92]:
d = {"USA": 200, "UK": 200, "EU": 200}
d["Asia"] = 300
d


{'USA': 200, 'UK': 200, 'EU': 200, 'Asia': 300}

## 3.2 Remove

In [93]:
d = {"USA": 200, "UK": 200, "EU": 200, "Asia": 30}
del d["Asia"]
d


{'USA': 200, 'UK': 200, 'EU': 200}

In [94]:
type(d)

dict

In [95]:
dir(dict)

['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__ror__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [96]:
d = {"USA": 200, "UK": 200, "EU": 200}
print( d.keys() )
print( d.values() )


dict_keys(['USA', 'UK', 'EU'])
dict_values([200, 200, 200])


## 3.3 The `zip()` function

* The `zip()` function combines two or more iterables (like lists or tuples) element-by-element

* It returns an iterator of tuples, where each tuple contains elements from the same position in each iterable

* It is useful for pairing related data (e.g., names with scores, countries with values)

<img src="https://miro.medium.com/v2/1*tbujCKWQX-CrRIsn6MjY7A.gif" alt="char_array" width="650"> 
<!--![zip_](https://miro.medium.com/v2/1*tbujCKWQX-CrRIsn6MjY7A.gif)-->

In [97]:
countries = ["Japan", "Italy", "Brazil", "India"]
life_expectancy = [83.7, 82.7, 75.9, 68.3]

In [3]:
paired = list(zip(countries, life_expectancy))
paired

[('Japan', 83.7), ('Italy', 82.7), ('Brazil', 75.9), ('India', 68.3)]

In [98]:
paired_dict = dict(zip(countries, life_expectancy))
paired_dict

{'Japan': 83.7, 'Italy': 82.7, 'Brazil': 75.9, 'India': 68.3}

In [99]:
paired_keys = paired_dict.keys()
paired_keys

dict_keys(['Japan', 'Italy', 'Brazil', 'India'])

In [100]:
paired_values = paired_dict.values()
paired_values

dict_values([83.7, 82.7, 75.9, 68.3])

In [105]:
paired_dict = dict(zip(paired_keys, paired_values))
paired_dict

{'Japan': 83.7, 'Italy': 82.7, 'Brazil': 75.9, 'India': 68.3}

In [103]:
paired_dict = dict(zip(paired_values, paired_keys))
paired_dict

{83.7: 'Japan', 82.7: 'Italy', 75.9: 'Brazil', 68.3: 'India'}

## 3.4 Sorting a Dictionary 

* Dictionaries in `Python 3.7+` maintain insertion order.

* You could make use of the `sorted()` function to sort keys alphabetically.

* However, to maintain the pairing, you would need to sort the dictionary `items`

In [106]:
paired_dict

{'Japan': 83.7, 'Italy': 82.7, 'Brazil': 75.9, 'India': 68.3}

In [107]:
sorted_pairs = sorted(paired_dict)
sorted_pairs

['Brazil', 'India', 'Italy', 'Japan']

In [25]:
sorted_pairs = sorted(paired_dict.items())
sorted_pairs

[('Brazil', 75.9), ('India', 68.3), ('Italy', 82.7), ('Japan', 83.7)]

In [108]:
sorted_pairs = dict(sorted(paired_dict.items()))
sorted_pairs

{'Brazil': 75.9, 'India': 68.3, 'Italy': 82.7, 'Japan': 83.7}

## 4. JSON

* JavaScript Object Notation (JSON) is near universal format that REST APIs use today to communicate data. 

* Like Python dictionaries, JSON also uses `Key : Value` pairs.

* Keys translate to columns, and values translate to rows. 

* To go from a Python `dict` to a JSON string, use `json.dumps()`

* To go from a JSON string to a Python `dict`, use `json.loads()`

In [109]:
import json

In [110]:
py_dict = {
  "name": "John",
  "age": 30,
  "city": "New York"
}

json_str = json.dumps(py_dict)

print(json_str)

{"name": "John", "age": 30, "city": "New York"}


In [111]:
json_str = '{ "name":"John", "age":30, "city":"New York"}'

py_dict = json.loads(json_str)
print(type(py_dict))
print(py_dict)

<class 'dict'>
{'name': 'John', 'age': 30, 'city': 'New York'}


## 4.1 Very simple API example

* https://randomuser.me/api/ generates a fictitious user's data in the form of JSON. 

* Navigate to the website to see the `{ k : v }` format

* We can load this JSON data into our Python environment by utilising `requests` - e.g. Sending a HTTP GET request to the URL. 


In [112]:
import requests

url = "https://randomuser.me/api/"
response = requests.get(url) # Send a GET request to the API

data = response.json() # Convert the response to a Python dictionary

In [113]:
print(data)

{'results': [{'gender': 'male', 'name': {'title': 'Mr', 'first': 'Abel', 'last': 'Aubert'}, 'location': {'street': {'number': 727, 'name': 'Rue du Stade'}, 'city': 'Perpignan', 'state': 'Var', 'country': 'France', 'postcode': 45640, 'coordinates': {'latitude': '-16.9622', 'longitude': '-149.9643'}, 'timezone': {'offset': '-4:00', 'description': 'Atlantic Time (Canada), Caracas, La Paz'}}, 'email': 'abel.aubert@example.com', 'login': {'uuid': '0041dbe6-3be2-4b0b-8e6b-64fa1a1f0f10', 'username': 'happypanda474', 'password': 'twins', 'salt': 'oR3hTcQb', 'md5': 'c3028a54660eeaac66532bbc168e7b6f', 'sha1': '360eedea17a60666981088b02ae258b00f121cad', 'sha256': '86072e40b008729c0e492ba4460f4315ad0b3b2168fcd130850d9b218dbb19c0'}, 'dob': {'date': '1961-11-27T18:54:17.423Z', 'age': 63}, 'registered': {'date': '2022-03-07T04:12:56.959Z', 'age': 3}, 'phone': '01-60-90-90-02', 'cell': '06-70-75-37-38', 'id': {'name': 'INSEE', 'value': '1611087690601 49'}, 'picture': {'large': 'https://randomuser.me/a

In [114]:
print(data["results"][0]["name"])

{'title': 'Mr', 'first': 'Abel', 'last': 'Aubert'}


In [115]:
type(data)

dict

Python is useful language for data science!

## Summary 

* You can distinguish between the key collections by the pairs of brackets used: 

| Structure | Brackets | Characteristics |
| ----------- | ----------- | --------- |
| Lists |	`[ , ]` | mutable |
| Tuples |	`( , )` | immutable | 
| Sets |	`{ , }`  | unique values (no duplicates) |
| Dict | `{k : v}` | key and value pairs |


#### This Jupyter Notebook contains exercises for you to extend your introduction to OOP, by creating lists, tuples, sets, dictionaries of objects. Attempt the following exercises, which slowly build in complexity. If you get stuck, check back to the <a href = "https://www.youtube.com/watch?v=359eGFD7hS4"> Python lecture recording on Data Structures here</a> or view the <a href = "https://www.w3schools.com/python/python_lists.asp">W3Schools page on Python Variables</a>, which includes examples, exercises and quizzes to help your understanding. 

### Exercise 1:

Create a Python dictionary (`dict`) which stores the price for three items of food. For example; milk is £1.30, pasta is £0.75, and strawberries are £1.50. Output the dictionary to check the `values` are stored, and then see if you can access the price for one of the items by using the item name as the `key`.

Extension: Now add a new `key` and `value` pair to previously defined dictionary.

In [None]:
food_prices = {...} # add the key:value pairs here.
food_prices

### Exercise 2: 

Given two `sets` (prices and food names), can you create a dictionary (`dict`) that uses the foodnames as `keys`, and the prices as `values`?


In [None]:
foodnames = {"milk", "pasta", "strawberries"}
prices = {1.30, 0.75, 1.50}

print(foodnames)
print(prices)

# Write your solution: 


### Exercise 3: 

Write one function which will return the intersection of two `sets` passed in.

Write another function which will return the union of two `sets` passed in.

In [None]:
a = {1,2,3,4,5,6,7,8,9,10}
b = {7,8,9,10,11,12,13,14}

#write your solution here

### Exercise 4: 

Write a function to count the number of elements in a list within a specified range.



In [8]:
sample_list = [10,20,30,40,40,40,70,80,99]
sample_min = 15 
sample_max = 50
# there should be five elements between 15 and 50

In [9]:
def count_between_range(sample_list, sample_min, sample_max):
    ... # Write your solution here

In [None]:
count_between_range(sample_list, sample_min, sample_max)

### Exercise 5:
Write a function that will generate the multiplications of a number passed in, and store each multiple in a `set` of values.

For example, if the value 5 is passed in, then generate the 5 times table. The values of the multiplication table should be stored in indivdiual elements (max 12) of a `set`. The `set` should be returned at the end of the function. 


In [None]:
def generate_multiples():
    ... # Write your solution here.

In [None]:
fives = generate_multiples(5)
print(type(fives))
print(fives)

### Exercise 6: 

Using a list comprehension, create a new list called "positives" out of the list "numbers". 

This "positives" list should contains only the positive numbers from the list, as integers rounded to the closest integer value.

In [None]:
numbers = [34.6, -203.4, 44.9, 68.3, -12.2, 44.6, 12.7]

positives = [...] # Write your list comprehension solution here. 
positives

### Exercise 7:

Write a code block to convert a list of tuples to a dictionary

Sample input: ```[(28, 'February'), (30, 'April'), (31, 'July'), (31, 'August'), (30, 'November')]```

Sample output ```{'February': 28, 'April': 30, 'July': 31, 'August': 31, 'November': 30}```

Hint: If you get stuck, consider the `zip()` function...

In [None]:
tuples_list = [(28, 'February'), (30, 'April'), (31, 'July'), (31, 'August'), (30, 'November')]

In [None]:
# Write your solution here. 

### Exercise 8: 
If we define $S$ as the set of prime numbers selected from the list of numbers, then:

$S = \{\, n \in \text{numbers} \mid n \text{ is prime} \,\}$

Or, written more explicitly using a predicate function $P(n)$ meaning "n is prime":

$S = \{\, n \in \text{numbers} \mid P(n) = \text{True} \,\}$

To convert this set notation to Python, write an `is_prime(n)` function which will be called on a `list` of numbers below.

In [5]:
def is_prime(n):
    ... # complete this is_prime function

In [3]:
numbers = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

s = {n for n in numbers if is_prime(n)} # set comprehension
s

{2, 3, 5, 7, 11, 13, 17, 19}

### Exercise 9:

Write a function that takes a `dictionary` as input and retuns the mean (i.e. average value) of the values stored in the `dictionary`. 

Extension: Can you group countries by their geographical continent? (how good is your geography!) Produce a new dictionary where the continent is the key, and the value is the mean temperature value across the countries that are located in that continent. 

In [None]:
average_temperature = {
    'Canada': -5.3,
    'Norway': 1.5,
    'United Kingdom': 9.3,
    'Germany': 8.5,
    'Spain': 14.9,
    'Italy': 13.5,
    'Egypt': 22.9,
    'India': 24.6,
    'Thailand': 27.7,
    'Singapore': 27.6,
    'Brazil': 25.0,
    'Argentina': 14.8,
    'Australia': 21.5,
    'South Africa': 17.8,
    'Japan': 11.9,
    'United States': 8.6
}

In [None]:
def calculate_mean():
    ...

### Exercise 10:

Given the three dictionaries below, can you merge these so that the keys are the countries, and the values are themselves a dictionary of information (K:V pairs) for that country. 

The output should look like the following: 

```
{
    'Japan': {'temperature': 11.9, 'population': 125.7, 'gdp': 4937},
    'India': {'temperature': 24.6, 'population': 1393, 'gdp': 3173},
    'Brazil': {'temperature': 25.0, 'population': 214.3, 'gdp': 1860}
}
```

Extension: Add a new key: 'GDP per capita' to each country, and use this formula to calculate the values:
$\text{GDP per capita} = \frac{\text{GDP}}{\text{Population}}$

In [None]:
temperature = {
    'Japan': 11.9,
    'India': 24.6,
    'Brazil': 25.0
}

population = {
    'Japan': 125.7,
    'India': 1393,
    'Brazil': 214.3
}

gdp = {
    'Japan': 4937,
    'India': 3173,
    'Brazil': 1860
}

In [None]:
# Write your solution here. 
