## Setting up a database

A pharmaceutical company is setting up a patient database for a clinical trial. Currently, they have a dataset on a large number of patients from different hospitals. This dataset contains the following information about each patient: name, age, height, weight. This information is contained in tuples in the following form: ('John Doe', 45, 183, 79) refers to a person named John Doe who is 45 years old, 183 cm tall and weighs 79 kg. The database is currently a list of such tuples.

The company wants to transform this database into an anonymized format which is also easier to handle. They would like to store the information in a dictionary where the keys are random numerical identifiers for each patient (not names!) and the values are dictionaries themselves which store the age, height and weight of each patient.

You will need to create two functions:

Firstly, create a function called `generate_id` which takes as input the required length of the ID and returns a random sequence of numbers of that length. Note that while the ID consists of numbers, it might me more appropriate to return it as a string as there is no numerical sense to its meaning.

Secondly, create a function called `convert_database` which takes as input a database in the tuple format and returns the database in the dictionary format described above. You should generate random 6-digit numerical identifiers for each patient in the database using the `generate_id` function.

Example:

Input:
```python
database = [('John Doe', 45, 183, 79), ('Jane Doe', 32, 175, 61)]
```

Output:
<pre>
{'123456': {'age': 45, 'height': 183, 'weight': 79}, '654321': {'age': 32, 'height': 175, 'weight': 61}}
</pre>

(Made by: Zsófia Katona)

In [None]:
database = [('John Doe', 45, 183, 79), ('Jane Doe', 32, 175, 61)]

# Example solution

from typing import List, Dict, Tuple
import random

def generate_id(length: int) -> str:
    id = ''

    for i in range(length):
        number = random.randrange(0, 10)
        id += str(number)

    return id

def convert_database(database: List[Tuple]) -> Dict[str, Dict[str, int]]:
    converted_db: Dict[str, Dict[str, int]] = {}

    for patient in database:
        id = generate_id(6)

        while id in converted_db:    # Ensuring that the ID is not already used for another patient.
            id = generate_id(6)

        patient_data: Dict[str, int] = {'age': patient[1], 'height': patient[2], 'weight': patient[3]}

        converted_db[id] = patient_data

    return converted_db

print(convert_database(database))


## Selecting patients

A pharmaceutical company would like to select patients for a clinical trial on a new drug for diabetes. They have a database of patients in the form of a dictionary, where keys are 6-digit numerical identifiers for each patient, and the values are dictionaries themselves, which store the age, height and weight of each patient. For instance, `'123456': {'age': 45, 'height': 183, 'weight': 79}` can be an entry in this database dictionary, referring to a patient who is 45 years old, 183 cm tall and weighs 79 kg.

The selection of patients for the trial is based on information contained in this database. You have two tasks:

1. Define a function called `select_patients`, which takes as input a database dictionary of patients, the selection parameter, the selection criterion (larger/smaller/equal) and the selection value, and returns a set of patient identifiers which fulfill this criterion.
    
    Example:
    
    Input:
    ```python
    database = {'123456': {'age': 45, 'height': 183, 'weight': 79}, '654321': {'age': 32, 'height': 175, 'weight': 61}, '024680': {'age': 71, 'height': 157, 'weight': 63}, '086420': {'age': 16, 'height': 168, 'weight': 55}}

    parameter = 'age'
    criterion = 'smaller'
    value = 20
    ```

    Output:
    <pre>
    ('086420')
    </pre>

2. Define a new function called `intersect` which takes as input two sets of patient IDs and returns a set containing the IDs of patients who were members in both input sets.

    Use the `select_patients` function to select two groups of patients: the ones who are over 40 years old and the ones who weigh more than 60 kg. Afterward, use the `intersect` function to select the patients who are both over 40 years old AND weigh more than 60 kgs. Print these IDs in a message for the pharmaceutical company.
    
    Example output for the database shown in the example above:

    <pre>
    Patients chosen for diabetes study:
    123456 
    024680
    </pre>

Note: it is alright to work in separate cells for the different parts of the exercise.

(Made by: Zsófia Katona)

In [None]:
database = {'123456': {'age': 45, 'height': 183, 'weight': 79}, '654321': {'age': 32, 'height': 175, 'weight': 61}, \
    '024680': {'age': 71, 'height': 157, 'weight': 63}, '086420': {'age': 16, 'height': 168, 'weight': 55}}

# Example solution for part 1:

from typing import Set

def select_patients(database: Dict[str, Dict[str, int]], parameter: str, criterion: str, value: int) -> Set:
    selected: Set = set()

    for patient, data in database.items():
        if criterion == 'smaller' and data[parameter] < value:
            selected.add(patient)
        
        elif criterion == 'larger' and data[parameter] > value:
            selected.add(patient)

        elif criterion == 'equal' and data[parameter] == value:
            selected.add(patient)

    return selected

print(select_patients(database, 'age', 'smaller', 20))

In [None]:
# Example solution for part 2:

def intersect(set1: Set, set2: Set) -> Set:
    return set1.intersection(set2)

over_40_years = select_patients(database, 'age', 'larger', 40)
over_60_kg = select_patients(database, 'weight', 'larger', 60)

chosen_for_study = intersect(over_40_years, over_60_kg)

print("Patients chosen for diabetes study:")

for patient in chosen_for_study:
    print(patient)