Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for pacsanini #46

Closed
aachick opened this issue Aug 30, 2021 · 5 comments
Closed

Roadmap for pacsanini #46

aachick opened this issue Aug 30, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@aachick
Copy link
Collaborator

aachick commented Aug 30, 2021

Roadmap/features

The pacsanini project now feels somewhat mature in terms of the functionalities it offers:

  • DICOM parsing in structured format
  • Emitting C-FIND requests to identify DICOM resources
  • Emitting C-MOVE requests to retrieve DICOM resources
  • Starting a storescp server to handle incoming DICOM data

The functionalities that are most interesting and could be improved the most are most probably the DICOM parsing, C-FIND operation, and storescp parsing functionalities.

DICOM parsing

The parsing of DICOM files now mainly outputs results in CSV format (sqlite too). CSV output serves research purposes well but the next step would be to be able to parse DICOM files into a database. In this way, structured data can be accessed across multiple servers simultaneously and without the need of copying data from/to servers or creating a NFS system.

The tricky part of having a database is choosing the right type. I think that a SQL database (probably postgres) should be the best. SQL is a mature standard and given the multiple engines that exist (postgres, mariadb, mysql, ...), users should have the choice to use whatever tool they prefer most. Furthermore, libraries such as sqlalchemy provide great abstraction for implementing this all.

Another tricky part shall be defining a general schema for the database. This is tricky because DICOM attributes can vary between modalities. A general solution would therefore be to expose mainly mandatory attributes and store the DICOM file's metadata as a JSON object (BLOB, JSON, JSONB (preferred)).

The database structure should also make sense for storing data received from the storescp server (more on that later).

Overall, the database would have the following tables, which would fare well with the DICOM data representation model:

  • patients (patient-level data - linked to studies)
  • studies (study-level data - linked to patients and series)
  • series (series-level data - linked to studies and images)
  • images (image-level data - linked to series)

C-FIND operations

To be able to give a bird's eye view of the data pipeline status, c-find results should be persisted in the database as well. A table named studies_find would be put into place that would store basic DICOM attributes of resources. In addition, two columns would be introduced: found_on and retrieved_on. found_on corresponds to the date at which the data was returned from a C-FIND request. moved_on corresponds to the data at which the data was retrieved from the PACS with a C-MOVE request.

storescp server

The storescp server was originally conceived to accept callbacks/plugins. This means that in addition to persisting data users can pass callbacks to perform additional actions on the data. One such action that should be facilitated is the parsing of DICOM metadata into the database. A system that updates the studies_find table and parses the DICOM file at the same time would be good.

Furthermore, the storescp server should be able to accept callbacks that will run before and after the data is persisted on disk.

  • pre-persistence callbacks should be put into place so that routines such as anonymization can run
  • post-persistence callbacks should be put into place so that routines that notify other systems or update the database can run
@aachick aachick added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Aug 30, 2021
@aachick aachick self-assigned this Aug 30, 2021
@aachick
Copy link
Collaborator Author

aachick commented Aug 30, 2021

I have attached here an ER diagram of the database that I have in mind.

Note that it is not fully denormalized. The rationale for this is that it will allow for the database to serve as a sort of data warehouse in which the images table is the central table on which all types of queries will be possible. The other tables serve a way to structure the DICOM data in a "DICOM-friendly" way.

The reason why the studies_find and the images table are linked is because when the storescp server receives new DICOM images, they will be persisted in the images collection. That will be the confirmation that the data is obtained and that the studies_find table can be updated. I am open to change this and link the studies_find table to the studies table.

db_schema

@aachick
Copy link
Collaborator Author

aachick commented Aug 31, 2021

After some reflection, I think that the following changes should be made to the database schema:

  • the studies_find table should be linked to the studies table. This makes more sense from a relational perspective. Furthermore, study table entries would only have new entries when the images table has a new entry. Because of this, the studies_find retrieved_on column will be updated at the right time.
  • the studies tables doesn't need the image_count column at the moment. The original idea was to denormalize the schema structure to facilitate queries. However, this is not needed at the moment.

@aachick
Copy link
Collaborator Author

aachick commented Aug 31, 2021

The implementation of the database should therefore be done in the following way:

  1. Implement the database schema using sqlalchemy. This task should define a way to create the database and its underlying tables.
    a. This means that the crud operations need to be somewhat refactored.
  2. Adapt the C-FIND queries in a way so that they can adequately update the database's studies_find table
  3. Adapt the C-MOVE routines (and therefore the storescp server) so that when data is requested and obtained, the studies_find table is adapted and records are automatically added to the database.
  4. Adapt the pacsanini configuration file structure so that it makes more sense after these changes.

@aachick
Copy link
Collaborator Author

aachick commented Sep 2, 2021

The linked issue #47 is now closed thanks to the merge of PR #49.

@aachick
Copy link
Collaborator Author

aachick commented Oct 2, 2021

With the release of 0.2.0, this issue can be closed.

@aachick aachick closed this as completed Oct 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant