Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a meta data table - one-to-one relationship approach #81

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

waterflow80
Copy link
Collaborator

@waterflow80 waterflow80 commented Mar 22, 2024

Description

In this PR we're attempting to separate the seqcol meta-data from the seqcol entities, by storing them into a separate table.
Fixes #13

Changes made

  • Created a separate SeqColMetadata entity
  • Made the primary key of the this entity composite of seqcol_digest & naming_convention:
    • I think this should be discussed further to decide whether we should be adding more columns to the primary key, or we should even consider changing the relationship to a one-to-many relationship. It depends on the possible different shapes that a seqcol can take in the future.

Here's the new Java model:

classDiagram
    class SeqColLevelOneEntity
    SeqColLevelOneEntity: - digest
    SeqColLevelOneEntity: - seqColLevel1Object
    class SeqColMetadata
    SeqColMetadata: - seqColDigest
    SeqColMetadata: - sourceIdentifier
    SeqColMetadata: - sourceUrl
    SeqColMetadata: - namingConvention
    SeqColMetadata: - timestamp
    SeqColLevelOneEntity --> SeqColMetadata
Loading

And a sample content of the seqcol_md table:
Screenshot from 2024-03-22 16-50-20

Changes to make

Each change of the following can be set in a different issue

  • Change the return type of the ingestion endpoint in case we have duplicate seqcol with different meta-data.
  • Update the ingestion process by including the metadata object as parameter of the construction methods.
  • Complete the logic for the source_url & timestamp
  • Adding some test cases to cover all possible scenarios.

Discussion

I'm not sure to what extent this approach can cover all the possible scenarios of seqcol meta-data content, and I think making a one-to-many relationship might be efficient in case we had to make all columns of the seqcol_md table primary keys, which means:

  • we can have different seqcol with same digest but with different source identifiers &
  • we can have different seqcol with same digest and naming convention but with different source identifier &
  • we can have different seqcol with same digest, naming convention and source identifier but with different source_url &
  • ...

NOTE: The current seqcol_md table have both seqcol_digest & naming_convention as composite primary key, which means that we can't find two seqcol objects with same digest and same naming_convention. Can we get such case ??

- made a one-to-one-relationship between the levelOneEntity and the MetaDataEntity
- set seqcol digest & naming_convention as the primary key of the meta-data table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a 'database' for assembly accessions to map saved seqCol objects
1 participant