# Assignment 1.
You have a database from a study, that needs some cleaning up. Your task is to write a Python script which would:

* load the database from a file `database.json`
* verify the age of participants. The inclusion criteria to the study assume that participants are between 18 and 70 years of age. If the user is too young or too old, he/she should be removed from the database
* verify completeness of data. Every participant should have `scores` from three measurements. In case there is not enough data the participant should be removed from the database 
* verify the code of a participant. The code should follow the pattern: 2 upper letters - 8 alphanumeric characters (lowercase), e.g. AB-ab012k3y (use RegEx!). If the code does not match the pattern, a new random code should be generated for this participant
* save the verified and cleaned database to a new JSON file
* create a dictionary of grouped participants based on their favourite color; use `groupby` form `itertools`; the values of a dictionary should include only participants' codes. Do not save it to a file. Just display its content.


## Setting up the environment

In [165]:
import json
import datetime
import re
from itertools import groupby

path = "database.json"
date = datetime.date.today()

## Loading the data

In [166]:
file = open(path, "r")

data = json.loads(file.read())

data

[{'code': 'BP-2t1e9j5b',
  'gender': 'Polygender',
  'date_birth': '1989-02-01',
  'profession': 'Environmental Tech',
  'fav_color': 'Red',
  'scores': [1.54, 3.49, 4.9]},
 {'code': 'GJ-9y9h9w8l',
  'gender': 'Female',
  'date_birth': '1959-05-02',
  'profession': 'Quality Control Specialist',
  'fav_color': 'Blue',
  'scores': [3.21, 0.28, 0.92]},
 {'code': 'HF-4y5k6a8a',
  'gender': 'Male',
  'date_birth': '1961-01-22',
  'profession': 'Quality Control Specialist',
  'fav_color': 'Red',
  'scores': [3.96, 0.67, 2.11]},
 {'code': 'AP-1d6u3j6b',
  'gender': 'Male',
  'date_birth': '1996-09-27',
  'profession': 'Environmental Tech',
  'fav_color': 'White',
  'scores': [3.41, 4.05]},
 {'code': 'WU-1e3d7w7j',
  'gender': 'Female',
  'date_birth': '2002-10-28',
  'profession': 'Marketing Manager',
  'fav_color': 'Blue',
  'scores': [3.02, 4.54, 4.77]},
 {'code': 'DM-6g3w0e4z',
  'gender': 'Male',
  'date_birth': '1974-05-11',
  'profession': 'Staff Scientist',
  'fav_color': 'Blue',
  'sc

## Veryfying the age

In [167]:
for elt in data:

    diff = (date - datetime.datetime.strptime(elt['date_birth'], "%Y-%m-%d").date()).days

    age_years = diff//365

    if age_years < 18 or age_years > 70:

        print(data.index(elt))

        print(f" User with the code: {elt['code']} has been removed from the database due to not fullfilling age restricions (participant's age: {age_years}')")

        del data[data.index(elt)]

data

11
 User with the code: HI-5o9v8s6x has been removed from the database due to not fullfilling age restricions (participant's age: 72')
25
 User with the code: KN-5x6g9p3v has been removed from the database due to not fullfilling age restricions (participant's age: 16')
27
 User with the code: RI-5w8c1m9u has been removed from the database due to not fullfilling age restricions (participant's age: 72')


[{'code': 'BP-2t1e9j5b',
  'gender': 'Polygender',
  'date_birth': '1989-02-01',
  'profession': 'Environmental Tech',
  'fav_color': 'Red',
  'scores': [1.54, 3.49, 4.9]},
 {'code': 'GJ-9y9h9w8l',
  'gender': 'Female',
  'date_birth': '1959-05-02',
  'profession': 'Quality Control Specialist',
  'fav_color': 'Blue',
  'scores': [3.21, 0.28, 0.92]},
 {'code': 'HF-4y5k6a8a',
  'gender': 'Male',
  'date_birth': '1961-01-22',
  'profession': 'Quality Control Specialist',
  'fav_color': 'Red',
  'scores': [3.96, 0.67, 2.11]},
 {'code': 'AP-1d6u3j6b',
  'gender': 'Male',
  'date_birth': '1996-09-27',
  'profession': 'Environmental Tech',
  'fav_color': 'White',
  'scores': [3.41, 4.05]},
 {'code': 'WU-1e3d7w7j',
  'gender': 'Female',
  'date_birth': '2002-10-28',
  'profession': 'Marketing Manager',
  'fav_color': 'Blue',
  'scores': [3.02, 4.54, 4.77]},
 {'code': 'DM-6g3w0e4z',
  'gender': 'Male',
  'date_birth': '1974-05-11',
  'profession': 'Staff Scientist',
  'fav_color': 'Blue',
  'sc

## Veryfying the scores

In [168]:
for elt in data:


    if len(elt['scores']) != 3:

        print(f" User with the code: {elt['code']} has been removed from the database due to inncorrect number of scores (number of participant's scores: {len(elt['scores'])}')")

        del data[data.index(elt)]

 User with the code: AP-1d6u3j6b has been removed from the database due to inncorrect number of scores (number of participant's scores: 2')
 User with the code: ZM-4y2x2u7k has been removed from the database due to inncorrect number of scores (number of participant's scores: 2')


## Veryfying the code

In [169]:
for elt in data:

    if not re.fullmatch('[A-Z]{2}-[a-z0-9]{8}', elt['code']):

        print(f" User with the code: {elt['code']} has been removed from the database due to inncorrect code (participant's code: {elt['code']}')")

        del data[data.index(elt)]

 User with the code: za2u3m2f8g has been removed from the database due to inncorrect code (participant's code: za2u3m2f8g')
 User with the code:  has been removed from the database due to inncorrect code (participant's code: ')
 User with the code: CB-22 has been removed from the database due to inncorrect code (participant's code: CB-22')
 User with the code: 99245 has been removed from the database due to inncorrect code (participant's code: 99245')


## Saving cleaned database

In [170]:
with open("database_verified.json", "w") as file:

    json.dump(data, file, indent = 2)


## Grouped by favourite colour

In [177]:
res = {}

for key, group in groupby(data, lambda x: x['fav_color']):

    if key not in res.keys():

        res[key] = [(list(group)[0]['code'])]

    else:

        res[key].append(list(group)[0]['code'])

res

{'Red': ['BP-2t1e9j5b',
  'HF-4y5k6a8a',
  'GE-1l4f5j7h',
  'HS-3q1i9o1g',
  'OY-7t4q0g0v',
  'MO-2z6z8w1e'],
 'Blue': ['GJ-9y9h9w8l',
  'WU-1e3d7w7j',
  'NF-3w7c5n3x',
  'SA-8a6q0h7e',
  'GT-3p1k9u3x',
  'HD-8t6b6n1w',
  'ZS-8i1u3e4w'],
 'White': ['DT-9i2x6z1p',
  'BU-4x8n2b1t',
  'TR-9z3v5h8a',
  'LB-7u8k3w1k',
  'LW-8k3p8a9n']}