- Script
run.py
will start Flask API and create DB. In order to create all tables, is necessary to run this command.python run.py
- Script
flask db migrate
will check changes in the db, and commandflask db upgrade
will apply the required changes - All new folders that contain python code, need to have a file
__init__.py
in the directory (most of them will be empty files) - All table table definitions needs to be imported in the
__init__.py
that lives inapp/models/__init__.py
All the db structure is based on SQLAlchemy package. The definition for each individual table is in app/models folder. We will be using a PostgreSQL db, but for local testing all of this should work with SQLite.
- Table names are singular, and table names and fields are defined with underscore casing, everything is in lower case (even acronyms) and the words are separated by underscores (some_class, some_func, some_var, etc).
- We only store images that can be defined as T1, DWI, rsfMRI or FieldMap. If the image type is different than those we will ignore the image.
- Image/visit relationship. When there is a 1-1 relationship between visit and image, we will respect that relation in the db structure, when that is not the case and images and visit happened in different days, we will relate each image to the closest visit from the past (ignore visits that happened after the image).
- If there were more visits than images, we would still want to add all visits to the db, even if images were not acquired
on those visits (check
load_db/OASIS/load_visit.py
as a reference).
We assume that each dataset will have a set of metadata files. The idea is to store those files inside a folder located
in data/<datasetname>
. We also assume that each dataset will have a specific logic in order to load the metadata into
the db. We suggest to create those files in the directory load_db/<dataset_name>
and to have each file starting with
the word load_
followed by the name of the table (or main structure) that is going to be filled by that file, for example
load_db/openpain/load_subject.py
.
- Run all scripts located into
load_db/seed
. These files create the necessary and common structure needed by all datasets. - Inside each dataset, for example OASIS, run the script to load the subjects:
load_db/OASIS/load_subject.py
- Run all other scripts from the dataset that start with
load_
, it doesn't matter the order.
The idea is to have the master seed files in data/seed_files
and download the most
updated version of those spreadsheets before filling a new database and runnning the seed scripts (located in load_db/seed
).
Currently those files only cover the datasets names and the analyzed conditions. The other existing seed script (defined in folder
load_db/seed/
) is for filling the file_type, but the definition of possible files is hard coded in the script (load_ db/seed/load_file_type.py
).
- Ensure that the dataset exists in the seed datasets_list.xls file
- Upload dataset images to cloud storage, modifying the script
load_file/upload_datasets.py
- In the folder
data
, create a new folder for the dataset, and place the metadata files in there - In the folder
load_db
, create a new folder for the dataset, and create new scripts to load the metadata into the db, number of scripts will depend on the type of metadata that we obtain from the dataset, but ideally we would want to have a script to load subjects (load_db/datasetname/load_subject.py
) and another to load images (load_db/datasetname/load_image.py
)
The master list of conditions is added as part of the seed files. Each datasets has it own mapping between the specific conditions described in their metadata and our own list of conditions. Conditions can be assigned to a Subject or to a Visit, when the condition is available for a Visit we prefer to store it at that levels as it's more specific of what is the diagnose for a specific image. If we have conditions per Visit we decide to ignore Conditions per subject. In HCP (not AGING at least) we have all healthy subjects, and as there is no diagnose per visit (and this is a general condition of the subject), we added the condition id to the subject, to have a more transparent structure.
Most of the datasets have two scripts where it will upload conditions: load_visit.py
(which creates all visits, without caring
if they contain images or not) and load_images.py
which loads the visits related to the images. In order to avoid repeated
code and more complexity for it's maintenance we recommend to store a function for mapping datasets diagnose conditions
to the master list in the __init__.py
file of each dataset, for example method get_oasis_dx
in load_db/OASIS/__init__.py
.
Is necessary to have a config file inside the config folder and a global.py
that reference the name of that file with
the key ENVIRONMENT, example of global.py
:
ENVIRONMENT = 'development'
and example of the file referenced by globals (in this case, development.py
):
SQLALCHEMY_DATABASE_URI = 'sqlite:///xls.db' SQLALCHEMY_TRACK_MODIFICATIONS = False BUCKET = "bucket_name" COS_CREDENTIALS = { "apikey": "apikey", "endpoints": "https://endpoint", "iam_apikey_description": "Description", "iam_apikey_name": "apikey_name", "iam_role_crn": "crn:example", "iam_serviceid_crn": "crn:v1:example", "resource_instance_id": "crn:v1:example" } AUTH_ENDPOINT = 'https://auth/token' SERVICE_ENDPOINT = 'https://s3.endpoint'
All the required packages are specified in the file requirements.txt. That file can be run using pip, with the command
pip install -r requirements.txt
from the root of the repository.