Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a metadata table - one-to-many relationship apporach #82

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

waterflow80
Copy link
Collaborator

Description

closes #13

In this PR, we've separated the seqcol metadata from the seqcol entities, by storing them into a new entity/table, following a
one to many relationship:

classDiagram
    class SeqColLevelOneEntity
    SeqColLevelOneEntity: - digest
    SeqColLevelOneEntity: - seqColLevel1Object
    class SeqColMetadata
    SeqColMetadata: - seqColDigest
    SeqColMetadata: - sourceIdentifier
    SeqColMetadata: - sourceUrl
    SeqColMetadata: - namingConvention
    SeqColMetadata: - timestamp
    SeqColLevelOneEntity "1" --> "1..*" SeqColMetadata
Loading

Here's a sample content of the seqcol_md table:

Screenshot from 2024-04-12 22-28-50

Changes

  • Created the entity SeqColMetadataEntity with many-to-one relationship to the seqcolLevel1
  • Added the SeqColMetadataDTO class to format the metadata
  • Set the timestamp automatically using the @Temporal and the @CreationTimestamp annotations
  • Check for the existence of the asm_accession before downloading.
  • Added a new endpoint /eva/webservices/seqcol/metadata/{digest} to fetch the metadata of a specific seqcol in the following format:
[
    {
        "sourceId": "GCA_000146045.2",
        "sourceUrl": null,
        "namingConvention": "UCSC",
        "timestamp": "2024-04-12T21:19:13.686+00:00"
    }
]

Todo

  • Write some test cases to ensure the efficiency of the new model.
  • Consider passing the metadata object as argument to some construction methods, instead of creating the metadata inside of these methods. Like here for example.
  • Complete the swagger docs for the new metadata endpoint.

Discussion

Currently, the service only checks whether the given asm_accession is already ingested (exists in the metadata table), if so it doesn't proceed with the download.
I assume we should make it able to check if all the metadata information is the same in order to abort the download, however we still don't have the required mechanisms to test that. (because we only have an ingestion endpoint with asm_accesion In).

After we complete that, we should complete the logic of updating the metadata of a defined seqcol.

Note: the new seqcol_md table has only a simple integer as primary key, so all other attribute values can be duplicated in other entries, and I think this will fulfill all the required use cases, described here by @apriltuesday.

We may eventually have to check for all attributes equality in the code to avoid full entry duplication.

- made a one-to-one-relationship between the levelOneEntity and the MetaDataEntity
- set seqcol digest & naming_convention as the primary key of the meta-data table
@@ -75,7 +78,8 @@ public Optional<String> addFullSequenceCollection(
if (numSeqCols > 0) {
logger.warn("SeqCol with digest " + levelOneEntity.getDigest() + " already exists !");
throw new DuplicateSeqColException(levelOneEntity.getDigest());
} else {
}else {
levelOneEntity.getMetadata().forEach(md -> md.setSeqColLevelOne(levelOneEntity));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nitin-ebi @apriltuesday
Just wanna confirm that this is right implementation, because it didn't run otherwise, it returned an error:

org.hibernate.PropertyValueException: not-null property references a null or transient value : uk.ac.ebi.eva.evaseqcol.entities.SeqColMetadataEntity.seqColLevelOne

Originally, I thought that the initialization would be made by JPA automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry we missed this question when we met yesterday... Others might be more familiar with Hibernate/JPA, but I think this seems right - it's similar to this question where the answers suggest you do have to set the parent explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear,
Thanks April.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a 'database' for assembly accessions to map saved seqCol objects
2 participants