Roadmap for pacsanini #46

aachick · 2021-08-30T07:56:14Z

Roadmap/features

The pacsanini project now feels somewhat mature in terms of the functionalities it offers:

DICOM parsing in structured format
Emitting C-FIND requests to identify DICOM resources
Emitting C-MOVE requests to retrieve DICOM resources
Starting a storescp server to handle incoming DICOM data

The functionalities that are most interesting and could be improved the most are most probably the DICOM parsing, C-FIND operation, and storescp parsing functionalities.

DICOM parsing

The parsing of DICOM files now mainly outputs results in CSV format (sqlite too). CSV output serves research purposes well but the next step would be to be able to parse DICOM files into a database. In this way, structured data can be accessed across multiple servers simultaneously and without the need of copying data from/to servers or creating a NFS system.

The tricky part of having a database is choosing the right type. I think that a SQL database (probably postgres) should be the best. SQL is a mature standard and given the multiple engines that exist (postgres, mariadb, mysql, ...), users should have the choice to use whatever tool they prefer most. Furthermore, libraries such as sqlalchemy provide great abstraction for implementing this all.

Another tricky part shall be defining a general schema for the database. This is tricky because DICOM attributes can vary between modalities. A general solution would therefore be to expose mainly mandatory attributes and store the DICOM file's metadata as a JSON object (BLOB, JSON, JSONB (preferred)).

The database structure should also make sense for storing data received from the storescp server (more on that later).

Overall, the database would have the following tables, which would fare well with the DICOM data representation model:

patients (patient-level data - linked to studies)
studies (study-level data - linked to patients and series)
series (series-level data - linked to studies and images)
images (image-level data - linked to series)

C-FIND operations

To be able to give a bird's eye view of the data pipeline status, c-find results should be persisted in the database as well. A table named studies_find would be put into place that would store basic DICOM attributes of resources. In addition, two columns would be introduced: found_on and retrieved_on. found_on corresponds to the date at which the data was returned from a C-FIND request. moved_on corresponds to the data at which the data was retrieved from the PACS with a C-MOVE request.

storescp server

The storescp server was originally conceived to accept callbacks/plugins. This means that in addition to persisting data users can pass callbacks to perform additional actions on the data. One such action that should be facilitated is the parsing of DICOM metadata into the database. A system that updates the studies_find table and parses the DICOM file at the same time would be good.

Furthermore, the storescp server should be able to accept callbacks that will run before and after the data is persisted on disk.

pre-persistence callbacks should be put into place so that routines such as anonymization can run
post-persistence callbacks should be put into place so that routines that notify other systems or update the database can run

The text was updated successfully, but these errors were encountered:

aachick · 2021-08-30T13:26:00Z

I have attached here an ER diagram of the database that I have in mind.

Note that it is not fully denormalized. The rationale for this is that it will allow for the database to serve as a sort of data warehouse in which the images table is the central table on which all types of queries will be possible. The other tables serve a way to structure the DICOM data in a "DICOM-friendly" way.

The reason why the studies_find and the images table are linked is because when the storescp server receives new DICOM images, they will be persisted in the images collection. That will be the confirmation that the data is obtained and that the studies_find table can be updated. I am open to change this and link the studies_find table to the studies table.

aachick · 2021-08-31T18:23:29Z

After some reflection, I think that the following changes should be made to the database schema:

the studies_find table should be linked to the studies table. This makes more sense from a relational perspective. Furthermore, study table entries would only have new entries when the images table has a new entry. Because of this, the studies_find retrieved_on column will be updated at the right time.
the studies tables doesn't need the image_count column at the moment. The original idea was to denormalize the schema structure to facilitate queries. However, this is not needed at the moment.

aachick · 2021-08-31T18:35:26Z

The implementation of the database should therefore be done in the following way:

Implement the database schema using sqlalchemy. This task should define a way to create the database and its underlying tables.
a. This means that the crud operations need to be somewhat refactored.
Adapt the C-FIND queries in a way so that they can adequately update the database's studies_find table
Adapt the C-MOVE routines (and therefore the storescp server) so that when data is requested and obtained, the studies_find table is adapted and records are automatically added to the database.
Adapt the pacsanini configuration file structure so that it makes more sense after these changes.

aachick · 2021-09-02T07:37:12Z

The linked issue #47 is now closed thanks to the merge of PR #49.

aachick · 2021-10-02T16:54:39Z

With the release of 0.2.0, this issue can be closed.

aachick added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Aug 30, 2021

aachick self-assigned this Aug 30, 2021

This was referenced Sep 1, 2021

Establish database schema for pacsanini models #47

Closed

Link storescp server to application database #48

Closed

aachick mentioned this issue Oct 1, 2021

Release version 0.2.0 #61

Closed

aachick closed this as completed Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap for pacsanini #46

Roadmap for pacsanini #46

aachick commented Aug 30, 2021

aachick commented Aug 30, 2021 •

edited

Loading

aachick commented Aug 31, 2021

aachick commented Aug 31, 2021

aachick commented Sep 2, 2021

aachick commented Oct 2, 2021

Roadmap for pacsanini #46

Roadmap for pacsanini #46

Comments

aachick commented Aug 30, 2021

Roadmap/features

DICOM parsing

C-FIND operations

storescp server

aachick commented Aug 30, 2021 • edited Loading

aachick commented Aug 31, 2021

aachick commented Aug 31, 2021

aachick commented Sep 2, 2021

aachick commented Oct 2, 2021

aachick commented Aug 30, 2021 •

edited

Loading