Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should a spectrum status be used? #76

Open
edeutsch opened this issue Sep 12, 2020 · 2 comments
Open

How should a spectrum status be used? #76

edeutsch opened this issue Sep 12, 2020 · 2 comments
Labels
documentation This issue is related with the documentation or repository structure enhancement New feature or request

Comments

@edeutsch
Copy link
Contributor

The current Spectrum class defines a required attribute:

      status:
        type: string 
        enum: [READABLE, PEAK UNAVAILABLE]
        description: Status of the Spectrum

Can we define these status entries?

What does READABLE mean? Does this mean that the spectrum exists can be fetched and provided? I suppose this is fine, although a strange word, since the antonym is UNREADABLE. But what would UNREADABLE mean? And that isn't an option.

What does "PEAK UNAVAILABLE" mean exactly? Is that the first peak unavailable? or any one peak unavailable? All peaks unavailable? Some peaks unavailable? Or does it mean the spectrum is unavailable? How is this different from a 404?

How should this be used? At PeptideAtlas a spectrum is either available and provided or it is not available and just not in the returned list or is a 404. PeptideAtlas doesn't use "PEAK UNAVAILABLE" since I don't know what it should mean or how it should be used.

Should it be used if there is no such spectrum at the repository?
Should it be used if the spectrum is real and valid and should be available, but due to some technical glitch it cannot be fetched from the data store? So not 404. But closer to 500?

We should decide and document this.

@jjcarver
Copy link
Collaborator

My interpretation is that any record returned in the query corresponds to at minimum a file that is present on disk. So a status of "PEAK UNAVAILABLE" does not mean the same thing as a 404. A 404 means the MS run (i.e. file) isn't there at all. Any record returned, regardless of status, is by definition not a 404.

In the case of peak list files in open format (e.g. mzML) we can easily read the file to verify that the requested spectrum is indeed present and extract its peaks. This is what I interpret a status of "READABLE" to mean. The file is there AND we can validate/extract the actual spectrum from it.

However, sometimes a spectrum query/USI matches a raw file, which we can technically "read" in a file system sense but which we cannot (at least easily) open up to extract the actual spectrum. This is what "PEAK UNAVAILABLE" means. The MS run you asked for is there, we assume the specific spectrum you asked for may be in that file, but we can't actually give you its peaks.

I am honestly not sure if this is the best interpretation. But this is how I read the current specification.

@ypriverol
Copy link
Contributor

the PEAK UNAVAILABLE was instroduced by @nuno because is something needed in MassIVE when the usi is there but they can't read the mzML because of RAW file conversion problem or other types of issues.

@ypriverol ypriverol added documentation This issue is related with the documentation or repository structure enhancement New feature or request labels Sep 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This issue is related with the documentation or repository structure enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants