-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep track of platforms and genome updates within Gemma #378
Comments
Both solution can also be combined to keep track of updates of our ExternalDatabase entities. We'll need custom audit event types for various update operations performed through the CLI. |
The most urgent step is to prepare the data model and expose that information in the RESTful API, audit events and integration in the CLI can wait. |
There are 44 external databases defined in the table, out of which only the genome ones are worth keeping track of. For probes, the platforms are represented with a database entry, which we will want to keep track of at this level. I'm introducing a new interface called |
We need to expose |
To expose RNA-Seq annotations, we could add additional external references to the corresponding generic platform. There would be one ref for the gene source (i.e. Ensembl) and another for the version the pipeline is using. |
Feedback from @ppavlidis: Clarify what the RNA-Seq annotations are. Reword the line that introduces EDs:
|
These are currently stored in a spreadsheet.
Things to keep track of
Tracking individual probes, genes and gene products is intractable. I need to see how they might relate to the same ExternalDatabase. Having all the genes relating to the same database would allow us to keep track of the update in a single location.
There might be redundant ExternalDatabase which should be merged so that we can reasonably update them.
Transitory solution
Before we decide on a way to store this metadata, we can already design the outside view of it. Genomes and gene annotations are slow-moving things in Gemma, so we can take our time to think this out.
release
,releaseUrl
,lastUpdated
attributes to theExternalDatabase
VOsDatabaseAccession
VOsSolution 1
Add columns to
EXTERNAL_DATABASE
to record the platform release being used or specific genome version. We're interested in:If we take this approach, we have to associate
ARRAY_DESIGN
withEXTERNAL_DATABASE
. This would be used for the RNA-Seq platforms to keep track of the current release being used. However, since the platforms are not versioned (i.e. there is oneGeneric_human_ncbiIds
platform), it will not be helpful for the EE.Solution 2
Use the audit trail to record platform and gene updates. This allows us to know who did the update and when, but it's not ideal for storing a release number, for example. The downside is that we'll likely have to implement the
Auditable
interface.The
ArrayDesign
already implementsAuditable
. We just need to create a new event type to indicate when a platform update was performed. It also resolves the issueOther considerations:
This relates to #20 in a way because we are trying to keep track of the above information at the EE-level.
Relevant spreadsheet that currently keeps track of these manually: https://docs.google.com/spreadsheets/d/1MIi_r9U6ufiROdwRFi5fESeHbF35UHs1mnzjNySmJFg/edit#gid=0
The text was updated successfully, but these errors were encountered: