In this notebook you will find some typical things you might want to do in python. This list is based on student requests:

- turning two List into a Dictionary
- turning a Dictionary into two Lists
- Sorting
- Finding Unique values in a list
- 

We will plat with the dataset from the testdrive assignment.



In [18]:
import json
import pprint as pp

# you do not need to understand this function. It just loads the file
def load_json_file_named(file_name):
    try: 
        loaded_data = []
        file_location = f"../data/{file_name}"
        with open(file_location, 'r') as file: # or f"data/{file_name}" depending on your files
            loaded_data =  json.load(file)
    except OSError as e:
        print(f"Error. Does the file exist in this folder? {file_location}\n\n {e}")
    return loaded_data

In [20]:
gp_practices = load_json_file_named('gp_practices.json')
gp_practices[0]

{'Organisation Code': 'A81001',
 'Name': 'THE DENSHAM SURGERY',
 'Address': {'City': 'STOCKTON-ON-TEES',
  'Area': 'CLEVELAND',
  'Address line 1': 'THE HEALTH CENTRE',
  'Address line 2': 'LAWSON STREET',
  'Address line 3': 'STOCKTON-ON-TEES',
  'Address line 4': 'CLEVELAND',
  'Full Postal Address': 'THE HEALTH CENTRE, LAWSON STREET, STOCKTON-ON-TEES, CLEVELAND',
  'Postcode': 'TS18 1HU',
  'Telephone': '01642 672351'},
 'Status': {'Open Date': '19740401', 'Close Date': '', 'Status Code': 'A'},
 'Prescribing Setting': '4'}

## Keep only unique values in a list

Force a list with repetitions to be a `set()`. Set is a type of list that does not allow duplicates. Then force it back to be a `list()`. 

This way you will end up with a list of unique values.

btw. watch out this can change the order of things.

In [23]:
#simple example
cities = ["Manchester", "Birmingham",  "Birmingham", "Sheffield"]
unique_cities = list(set(cities))
print(unique_cities)

['Sheffield', 'Manchester', 'Birmingham']


In [24]:
# practical example: get all unique statis codes (A, C, etc...)

all_status_codes = [
    gp['Status']['Status Code']
    for gp in gp_practices
]

# here: use set, and then force the result to be a list again
all_unique_status_codes = list(set(all_status_codes))

print(all_unique_status_codes)

['A', 'D', 'C', 'P']


## Join two lists (e.g. keys and values) into a dictionary

there are two techniques. Simpler one (zip) creates a 'dictionary-like object' with `zip(some_keys, some_values)` and then needs to be forced to be a dictionary with `dict()` see below:

In [17]:
# simple example
cities = ["Manchester", "Birmingham", "Sheffield"]
counts = [34,67,12]
city_counts = dict(zip(cities, counts))
print(city_counts)

{'Manchester': 34, 'Birmingham': 67, 'Sheffield': 12}


In [30]:
# practical example:
status_code_names = ['A', 'D', 'C', 'P']
# we will use variable all_status_codes from above
# it holds all status codes of all 15000 gp practices
# here's first 20:
print(all_status_codes[0:20])

['A', 'A', 'C', 'A', 'A', 'A', 'A', 'C', 'A', 'A', 'A', 'A', 'A', 'C', 'A', 'A', 'A', 'A', 'A', 'A']


In [31]:
status_code_counts = [
    all_status_codes.count(code_name)
    for code_name in status_code_names
]
print(status_code_counts)

statuses_dict = dict(zip(status_code_names, status_code_counts))
print(statuses_dict)

[11580, 383, 3095, 2]
{'A': 11580, 'D': 383, 'C': 3095, 'P': 2}


# How to separate a Dict into two lists

you can request just keys or just values of a dict with `.keys()` and `.values()`. These return sort of a list-like structure, so it's safest to force them to be lists with `list()`

This is useful for graphs, where you need separate x and y lists.


In [34]:
statuses_dict = {'Active':  11580, 
                 'Dormant': 383, 
                 'Closed':  3095, 
                 'Proposed': 2}
x_data = list(statuses_dict.keys())
y_data = list(statuses_dict.values())
print("x_data",x_data)
print("y_data",y_data)

x_data ['Active', 'Dormant', 'Closed', 'Proposed']
y_data [11580, 383, 3095, 2]


In [33]:
# see it on the graph:

import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='iframe'

fig = go.Figure(
    data=[go.Bar(y= y_data, 
                 x= x_data)],
    layout=go.Layout(
        title=go.layout.Title(text="Health practices codes")
    )
)
fig.show()

## Operations on key-value pairs of a Dict

Note that there is also `.items()` which returns (sort of) a list of two-item lists.

When you loop through them (with a for loop, or list comprehension) you would not use

`for thing in things`

but rather

`for (key, value) in key_value_pairs`

As always give your key and value most meaningful names you can think of. See example below:

In [43]:
# simple example
statuses_dict = {'Active':  11580, 
                 'Dormant': 383, 
                 'Closed':  3095, 
                 'Proposed': 2}

for (status_name, status_value) in statuses_dict.items():
    print(status_name, "has a count of", status_value)

Active has a count of 11580
Dormant has a count of 383
Closed has a count of 3095
Proposed has a count of 2


In [48]:
# practical example
one_health_practice = gp_practices[0]

for (measure, value) in one_health_practice['Status'].items():
    print(f"{measure:.<30}{value:.>10}")

Open Date.......................19740401
Close Date..............................
Status Code............................A


## Dictionary Comprehension

yes, it's like a list comprehension, but:

- will return a dictionary
- uses dictionary brackets
- the top line returns a `key: value` pair instead of just one value

In [36]:
# simple example
words = ["banana", "apple", "kiwi", "graphefruit"]
word_lengths = {
    word : len(word)
    for word in words
}
print(word_lengths)

{'banana': 6, 'apple': 5, 'kiwi': 4, 'graphefruit': 11}


In [37]:
# Practical example (this repeats the zip example from above)
# and re-uses the variable  all_status_codes

status_code_names = ['A', 'D', 'C', 'P']

statuses_dict = {
    code_name: all_status_codes.count(code_name)
    for code_name in status_code_names
}
print(statuses_dict)

{'A': 11580, 'D': 383, 'C': 3095, 'P': 2}


In [54]:
## Advanced: you can use Dict comp to 'loop through' a Dict
statuses_dict = {'Active':  11580, 
                 'Dormant': 383, 
                 'Closed':  3095, 
                 'Proposed': 2}

# here we will re-interpret key value pair like
# 'Active':  11580
# into a key value pair with a changed Key, and value as %
# 'Active (A)': 76.89
statuses_dict_percent = {
    f"{name} ({name[0]})": round(100 * value / len(gp_practices), 2) 
    for (name, value) in statuses_dict.items()
} 
print(statuses_dict_percent)

{'Active (A)': 76.89, 'Dormant (D)': 2.54, 'Closed (C)': 20.55, 'Proposed (P)': 0.01}


# Sort a simple List:

In [61]:
words = ["banana", "apple", "kiwi", "graphefruit"]

print(sorted(words)) # sort alphabetically
print(sorted(words, reverse=True)) # sort alphabetically backwards
print()
print(sorted(words, key = len)) # sort using a function 'len'
print(sorted(words, key = len, reverse=True)) # sort  'len', backwards

['apple', 'banana', 'graphefruit', 'kiwi']
['kiwi', 'graphefruit', 'banana', 'apple']

['kiwi', 'apple', 'banana', 'graphefruit']
['graphefruit', 'banana', 'apple', 'kiwi']


# Sorting Lists of Dictionaries (advanced)

When you are sorting things, the big question is: **What do you want to sort them BY** (e.g. alphabetically? by size? backwards?)

The way you would do that in python is: 

1. Create function which **turns an object into a number/string** 
2. Then we tell python to use those numbers/strings as basis for your sorting

Just like above you specified `key = len` to sort things by length, here you can specify any other function instead of len, eg `key = get_practice_name`

In [64]:
statuses_dict = {'Active':  11580, 
                 'Dormant': 383, 
                 'Closed':  3095, 
                 'Proposed': 2}

# function that takes one item and returns 'value to sort'
# it is given key-value pair in format [key, value]
# so that you can get key with key_value_pair[0] 
# and value with key_value_pair[1]

def get_key(key_value_pair):
    return key_value_pair[0]

def get_value(key_value_pair):
    return key_value_pair[1]

print(sorted(statuses_dict.items(), key = get_key))
print(sorted(statuses_dict.items(), key = get_value))

[('Active', 11580), ('Closed', 3095), ('Dormant', 383), ('Proposed', 2)]
[('Proposed', 2), ('Dormant', 383), ('Closed', 3095), ('Active', 11580)]


In [None]:
# notice that above you are not really returned a Dict
# but rather a list of key-value pairs like
# [(key, value), (key, value), ...]
# so if you want it back as a Dict, you'll need 'Dict Comprehension'

sorted_dict = {
    key_and_value[0] : key_and_value[1]
    for key_and_value in sorted(statuses_dict.items(), key = get_value)
}
print(sorted_dict)

In [71]:
# you could also use another strategy, below, pick one you prefer.
sorted_dict2 = {
    key : value
    for (key,value) in sorted(statuses_dict.items(), key = get_value)
}
print(sorted_dict2)

{'Proposed': 2, 'Dormant': 383, 'Closed': 3095, 'Active': 11580}
{'Proposed': 2, 'Dormant': 383, 'Closed': 3095, 'Active': 11580}


In [75]:
# for completeness:
# When you google around, you will see the more advanced
# 'lambda syntax' (a shortened way to write simple functions)
# but I would recommend sorting with understanding full
# what's going on. Lambda looks like this:

sorted_dict3 = {
    key : value
    for (key,value) in sorted(statuses_dict.items(),
                              key = lambda k_v_pair : k_v_pair[0])
}
print(sorted_dict3)

{'Active': 11580, 'Closed': 3095, 'Dormant': 383, 'Proposed': 2}


In [79]:
# practical example: the youngest gp practice:

def get_created_date(practice):
    return practice['Status']['Open Date']

practices_by_age = sorted(gp_practices, 
                          key = get_created_date, 
                          reverse=True)
print(practices_by_age[0])

{'Organisation Code': 'Y07248', 'Name': 'CONTINENCE PRODUCT PRESCRIPTION SERVICE', 'Address': {'City': 'NORTHAMPTON', 'Area': 'NORTHAMPTONSHIRE', 'Address line 1': 'MANFIELD COURT', 'Address line 2': 'KETTERING ROAD', 'Address line 3': 'NORTHAMPTON', 'Address line 4': 'NORTHAMPTONSHIRE', 'Full Postal Address': 'MANFIELD COURT, KETTERING ROAD, NORTHAMPTON, NORTHAMPTONSHIRE', 'Postcode': 'NN3 6NP', 'Telephone': '0300 0271381'}, 'Status': {'Open Date': '20220401', 'Close Date': '', 'Status Code': 'P'}, 'Prescribing Setting': '0'}
